Categories: Linux

Python modify user-agent

How to generate user-agent header for web requests in Python


In previous post, we saw how to modify user-agent header in wget, curl and httpie programs. In this post, I will show you how to modify user-agent header in Python’s popular requests module. There are several reasons for modify user agent, one of which is to trigger a different response from a website. Many website offer different content based on user-agent header. You can find user-agent header details here.

In Python, one of the most popular libraries to query web servers is the requests module. The requests module allows you to pass header information using the headers option –

1. Simplest use case, without a header

import requests
requests.get('http://linuxfreelancer.com/status')

And this is how it is logged on the web server side, Apache in this case –

76.1.2.3 [20/May/2018:01:08:37 -0400] "GET /status HTTP/1.1" 200 359 "-" "python-requests/2.18.4" 1798

The user-agent is simply showing as “python-requests/2.18.4”, and some website might even block this to prevent web crawlers. So the next step is to modify this.

 

2. Modify user-agent header

headers = {'User-Agent': 'Mozilla/5.0 (Android 5.1; Tablet; rv:50.0) Gecko/50.0 Firefox/50.0'}
requests.get('http://linuxfreelancer.com/status', headers=headers)

And this is what the access log entry looks like on the web server side –

76.1.2.3  [20/May/2018:01:11:29 -0400] "GET /status HTTP/1.1" 200 359 "-" "Mozilla/5.0 (Android 5.1; Tablet; rv:50.0) Gecko/50.0 Firefox/50.0" 1289

As you can see above, the user-agent entry has several identifiers which is not easy to remember. The best way would be to programatically generate valid user-agents for different platforms.

 

3. Generate valid user-agents

The user_agent module is used for generating random and yet valid web user agents. You can install it with ‘pip install user_agent’.

This module generate user-agent strings for differnt devices types such as desktop, smartphone and table, as well as OS types (Windows, Linux, Mac, Android …). Let us try it in a virtual environment –

virtualenv /tmp/venv
source /tmp/venv/bin/activate
pip install user_agent

Now run Python in an interactive mode –

import requests
from user_agent import generate_user_agent 

In [8]: generate_user_agent()
Out[8]: 'Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2803.5 Safari/537.36'

In [9]: generate_user_agent()
Out[9]: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:49.0) Gecko/20100101 Firefox/49.0'

In [11]: generate_user_agent(os='linux')
Out[11]: 'Mozilla/5.0 (X11; Ubuntu; Linux i686 on x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2942.3 Safari/537.36'

In [13]: generate_user_agent(device_type='tablet')
Out[13]: 'Mozilla/5.0 (Linux; Android 4.4; HTC Desire 616 dual sim Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2938.21 Safari/537.36'

In [14]: generate_user_agent(device_type='desktop', os='win')
Out[14]: 'Mozilla/5.0 (Windows NT 6.1; rv:45.0) Gecko/20100101 Firefox/45.0'

In [20]: generate_user_agent(navigator='chrome', os='linux', device_type='desktop')
Out[20]: 'Mozilla/5.0 (X11; Ubuntu; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2894.26 Safari/537.36'

This allows us to generate from random to specific valid user-agent header information. We can then pass this randomly generated user-agent text to the requests module header option, and we will view our web server logs to validate –

from user_agent import generate_user_agent
import requests
requests.get('http://linuxfreelancer.com/status', headers={'User-Agent': generate_user_agent(navigator='firefox', os='linux')})

Log entry –

76.1.2.3 – – [20/May/2018:01:28:24 -0400] “GET /status HTTP/1.1” 200 359 “-” “Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0” 1486

 

References –


http://docs.python-requests.org/en/master/

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

daniel

Share
Published by
daniel

Recent Posts

GCP for Linux System administrators

Linux System Admins Journey to Google Cloud Platform As a Linux system administrator, you have…

9 months ago

Top 5 Troubleshooting Tools for Network Professionals in Linux

As a network professional, troubleshooting is a crucial part of your daily routine. To streamline…

9 months ago

netstat equivalent tool

The net-tools set of packages had been deprecated years back, although the commands are still…

2 years ago

GCP GKE – run kubectl through bastion host

Re-posting my answer to a Google cloud platform's Google Kubernetes Engine (GKE) related question in…

4 years ago

Spoof User Agent in http calls

Recently I was trying to download numerous files from a certain website using a shell…

4 years ago

Terraform – show logging

Enabling logging in terraform for debugging

5 years ago