This post was originally published here by Chris Sanders.
An attacker will use the minimal amount of effort required to compromise your network. That means when it’s possible to reuse applications, tools, and protocols…. they’ll do it! This is one reason why attackers often use HTTP to facilitate communication to and from infected hosts. In this post, I’ll discuss the HTTP user agent field and demonstrate how you can use Sqrrl to hunt for HTTP-based malware.
HTTP User Agents in Practice
Like many client-server protocols, HTTP requires that the client and server negotiate how content will be delivered prior to exchanging information. This allows interoperability between different web server and browser variants, ultimately ensuring that web browsing is a consistent experience across a diverse user base.
Content negotiation relies on several fields, but perhaps the most easily recognized is the user agent string. The user agent is used by the server to identify the HTTP client connecting to it. We most often think of an HTTP client as a browser like Internet Explorer, Chrome, or Firefox.
Table 1: Common Web Browser
However, it can be anything that connects to a web server using HTTP. This means that an HTTP client can also be a command line tool like cURL, a search engine crawler, or a python script.
Table 2: No-Browser User Agents
When a server reads the user agent of client it can make decisions on how it delivers content, and even what content it delivers based on this information. For example, two differing browsers might receive different copies of the same page to account for differences in how they process cascading style sheets and render the page design. This is one method sites use to deliver mobile versions of their page to mobile devices.
Attackers frequently use HTTP to facilitate malicious network communication. After all, there’s no need to design a custom protocol when you can ride on one that already provides most of the feature set you need. HTTP is typically allowed out of most networks without restriction, and it’s convenient for malware traffic to hide amongst already diverse and sporadic user-initiated web browsing. When an attacker chooses to use HTTP, they gain access to the wide array of functionality it provides out of the box, including the ability to control content negotiation based on the use of a user agent string.
Hunting for Suspicious User Agents
If we know that attackers often use custom HTTP user agents to achieve their goals, how can we go about detecting their existence on our networks?
Unless your HTTP traffic is encrypted, the user agent string is easily visible just like any other component of an HTTP transaction. That means you can intercept it and read it. Most organizations receive user agent data via a network proxy. The data is relatively easy to capture, so even relatively immature security departments generally have it early on.
The investigation of user agents usually begins with the question: “Did any system on my network communicate over HTTP using a suspicious or unknown user agent?”
This question can be answered with a simple aggregation wherein the user agent field in all HTTP traffic for a set time is analyzed. I’ve done this using Sqrrl Query Language here:
SELECT COUNT(*),user_agent FROM HTTPProxy GROUP BY user_agent ORDER BY COUNT(*) ASC LIMIT 20
This query selects the user_agent field from the HTTPProxy data source and groups and counts all unique entries for that field. The results are sorted by the count, with the least frequent occurrences at the top.
I chose to sort by least frequent occurrence because that is where I’m most likely to find evidence of malicious activity. These are user agents that might have only been seen a couple of times or on a single host. Sorting by most frequent occurrence probably wouldn’t yield anything interesting other than a list of normal browsers in use on the network.
Figure 1: Aggregating by User Agent Strings
Tips for Investigating Suspicious User Agents
For the most part, you should be able to easily identify the browsers that are being used in your network and that will account for most of the user agents observed. By removing those from the equation, you’re left with a list of anomalies you can start digging through. User agent strings can get long and confusing, and often one might look malicious and be completely normal. When you encounter a suspicious user agent string, you can usually identify some information about it by pasting it into http://useragentstring.com/, a favorite site of mine when performing this hunt.
Figure 2: A Suspicious UA String appears to be associated with Adware
As with most anomaly-based hunting, the more you go through this data the better you’ll get at spotting oddities. Here are a few things you can look for right off the bat:
UA’s that appear to be legitimate browsers but are off by a character or two. These are usually trying to impersonate real applications to go undetected.
UA’s indicating older browser versions that don’t match up with what exists elsewhere on your network. At best, you’ve found a system that needs to be updated. At worst, you’ve found malware.
Default UA’s associated with scripting languages such as Python-urllib/2.7. These could represent scripts written and deployed by an attacker.
You can decrease the clutter in your hunting query results by filtering out the following:
Known legitimate user agents used by your clients
User agents used by default operating system processes
Common UAs used by application updates such as antivirus definition updates
Web crawlers from popular search engines (particularly if you are looking at your own web server logs)
Once you’ve found something interesting, start by performing a simple Google search on the user agent string. If it has been discovered by someone else then you might be lucky enough to find a blog post or malware report listing the characteristics of the malware associated with it. Absent any findings there, you should pivot off the computer name or IP address and look for evidence of other malicious activity. This might include suspicious running process names, suspicious user accounts, or connections to suspicious URLs/IPs on the internet. You can also dive into the HTTP data itself to see if any useful information exists that might help you classify the disposition of the communication. The more robust your data, the more pivots you’ll have available.
Figure 3: An odd UA string leads to other malicious indicators
Attackers commonly use HTTP because of its simplicity for facilitating network communication into and out of a compromised network. By leveraging HTTP proxy data and examining the frequency of user agent strings on your network, you can better enable yourself to spot malicious activity. In this article, I showed a simple Sqrrl aggregation that allows you to suspicious user agents. Periodically reviewing this data and leveraging Sqrrl’s pivoting capability can yield malicious activity undetected by other mechanisms.