Linux

Spoof User Agent in http calls


Recently I was trying to download numerous files from a certain website using a shell script I wrote. With in the script, first I used wget to retrieve the files, but I kept on getting the following error message –

HTTP request sent, awaiting response... 403 Forbidden
2012-12-30 06:17:45 ERROR 403: Forbidden.

Then hoping that this was just a wget problem, I replaced wget with curl. It turned out that Curl would actually create a file with the same name as the one being download, but to my surprise the file was not downloaded. Instead, it contained an html file with 403 Forbidden message.

403 Forbidden
Forbidden
You don't have permission to access /dir/names.txt on this server.

What was surprising is that I could download the files using Firefox, Internet Explorer, elinks and even text based browser ‘lynx’. It seems that the website was blocking access from client browsers with certain ‘User-Agent’ header field. So the trick was to simply modify the User-Agent to a ‘legitimate’ one. Both curl and wget support the altering of User-Agent header field. You can use below commands to change the User-Agent parameter –

USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"

wget --user-agent="$USER_AGENT" -c http://linuxfreelancer.com/status.html

curl -A "$USER_AGENT" -O http://linuxfreelancer.com/status.html

In addition to wget or curl, a much easier to use CLI HTTP client httpie can be used. Passing custom HTTP headers is intuitive using httpie, installation and usage details can be accessed here. Modifying the User-Agent header using httpie is shown below –

USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"

http http://linuxfreelancer.com/ "$USER_AGENT"

All commands in one –

USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"
wget --user-agent="$USER_AGENT" -c http://linuxfreelancer.com/status.html

curl -A "$USER_AGENT" -O http://linuxfreelancer.com/status.html

http http://linuxfreelancer.com/ "$USER_AGENT"

References

Curl – https://cheat.sh/curl

Wget – https://www.gnu.org/software/wget/manual/wget.html

Httpie – https://httpie.org/

daniel

Share
Published by
daniel

Recent Posts

GCP for Linux System administrators

Linux System Admins Journey to Google Cloud Platform As a Linux system administrator, you have…

2 months ago

Top 5 Troubleshooting Tools for Network Professionals in Linux

As a network professional, troubleshooting is a crucial part of your daily routine. To streamline…

2 months ago

netstat equivalent tool

The net-tools set of packages had been deprecated years back, although the commands are still…

2 years ago

GCP GKE – run kubectl through bastion host

Re-posting my answer to a Google cloud platform's Google Kubernetes Engine (GKE) related question in…

4 years ago

Terraform – show logging

Enabling logging in terraform for debugging

4 years ago

gcloud formatting output

GCP : output in table format using gcloud sdk tool The Google Cloud Platform(GCP) provides…

4 years ago