Recently I was trying to download numerous files from a certain website using a shell script I wrote. With in the script, first I used wget to retrieve the files, but I kept on getting the following error message –
HTTP request sent, awaiting response... 403 Forbidden
2012-12-30 06:17:45 ERROR 403: Forbidden.
Then hoping that this was just a wget problem, I replaced wget with curl. It turned out that Curl would actually create a file with the same name as the one being download, but to my surprise the file was not downloaded. Instead, it contained an html file with 403 Forbidden message.
403 Forbidden
Forbidden
You don't have permission to access /dir/names.txt on this server.
What was surprising is that I could download the files using Firefox, Internet Explorer, elinks and even text based browser ‘lynx’. It seems that the website was blocking access from client browsers with certain ‘User-Agent’ header field. So the trick was to simply modify the User-Agent to a ‘legitimate’ one. Both curl and wget support the altering of User-Agent header field. You can use below commands to change the User-Agent parameter –
USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"
wget --user-agent="$USER_AGENT" -c http://linuxfreelancer.com/status.html
curl -A "$USER_AGENT" -O http://linuxfreelancer.com/status.html
In addition to wget or curl, a much easier to use CLI HTTP client httpie can be used. Passing custom HTTP headers is intuitive using httpie, installation and usage details can be accessed here. Modifying the User-Agent header using httpie is shown below –
USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"
http http://linuxfreelancer.com/ "$USER_AGENT"
All commands in one –
USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"
wget --user-agent="$USER_AGENT" -c http://linuxfreelancer.com/status.html
curl -A "$USER_AGENT" -O http://linuxfreelancer.com/status.html
http http://linuxfreelancer.com/ "$USER_AGENT"
References –
Curl – https://cheat.sh/curl
Wget – https://www.gnu.org/software/wget/manual/wget.html
Httpie – https://httpie.org/
Linux System Admins Journey to Google Cloud Platform As a Linux system administrator, you have…
As a network professional, troubleshooting is a crucial part of your daily routine. To streamline…
The net-tools set of packages had been deprecated years back, although the commands are still…
Re-posting my answer to a Google cloud platform's Google Kubernetes Engine (GKE) related question in…
GCP : output in table format using gcloud sdk tool The Google Cloud Platform(GCP) provides…