Hunting for Needles in Haystacks


This post was originally published here by Sqrrl Team.

Cyber threat hunting involves proactively and iteratively searching through networks and datasets to detect threats that evade existing automated tools. Yet, determining the Tactics, Techniques and Procedures (TTPs) used by adversaries is challenging for the very reason that there is often no roadmap that can be used to hunt. Instead, the role of hunter requires, in addition to an understanding of data science and some programming experience, a fundamental curiosity about the data and an understanding of how the bad guys work. It is with these skills that cyber threat hunters set about trying to find a needle in a haystack.

Our recent video Hunting for Needles in Haystacks examines how the company’s software approaches the data science problem of creating software which enables cyber threat hunters to find the proverbial needle. In the video, Sqrrl data scientists Chris McCubbin and Ruslan Vaulin delve into how hunters can use Sqrrl to hunt down threats. They look at how Sqrrl helps hunters seamlessly sift through and analyze massive quantities of security data in order to discover lateral movements, exfiltration, DNS tunneling and how to detect and neutralize those threats.

This blog will offer a summary as well as further explanations of the valuable points which McCubbin and Vaulin highlight in their discussion.

Challenges of the Data Scientist

Chris McCubbin points out that the first challenge of creating threat detection software is adopting the analysts’ mindset and understanding pain points. How do the analysts think about problems? What is their workflow? How do they distill huge amounts of data into concrete conclusions about threat vectors? What takes large amounts of the analyst’s time and needs to be simplified? According to McCubbin, this challenge is at its core, a data science problem.

Data science realizes that it is fundamentally impossible for one individual to look through trillions of rows of data. Instead, by bringing data science techniques to the Sqrrl software, programmers create algorithms that mirror hunters’ actions and ensures that they closely follow the hunters’ actual workflow. Sqrrl optimizes their actions and make it possible for them to act smarter.

In the image below (Figure 1), the Sqrrl software has highlighted an exfiltration with a threat factor of 81, which is high:

By finding the EXFIL-2, in this example, Sqrrl has helped identify areas where data is being downloaded that could be of concern and should warrant further consideration. In the end, the EXFIL-2 might prove to simply be George from down the hall who is making copies of specific files for staff training. But, the software has given hunters a place to start looking for issues.

Additionally, Sqrrl provides menus and options which mimic the behavior of threat hunters, enabling them to further investigate the EXFIL-2 and other issues. The image below (Figure 2) shows what an expanded search of endpoints and beacons might look like upon further investigation with endpoints as well as files and IP addresses highlighted.

These graphical presentations of inputs and outputs from a network were created by Sqrrl based on extensive study of threat hunters’ workflow and a deep understanding of cyber-attacks.

How do you find a needle in a haystack?

Ruslan Vaulin reiterates the challenge of finding these threats by likening it to finding a needle in a haystack– except that there are many different types of needles and these needles are always changing. At the same time, there is an abundance of “hay” (data sets) which hides the needles and makes them very difficult to find. The haystack metaphor is an apt one because it highlights the fundamental challenge threat hunters embrace when they search for the proverbial “bad guys”. Yet search they must and Sqrrl’s role in this challenge is to help make the needle a hundred times larger and the hay very small.

To achieve this end, Vaulin notes that Sqrrl’s platform processes tremendous quantities of data in order to find the nuggets of data that are important to the threat hunter. This streamlining of the search process is what is at the heart of Sqrrl’s approach to the problem and enables hunters to find the persistent threats.

Vaulin describes the Sqrrl software as a digital representation of this aphorism. Using the Pyramid of Pain (figure 3 below) defined by security expert (and Sqrrl advisor) David Bianco, Vaulin highlights the challenge of using data science to find various levels of indicators left by the adversary so that you can better understand their behavior. As noted by the pyramid, it is easy to find the hash values, IP addresses or domain names used by adversaries. Although TTPs are the most difficult indicators to find and verify, they’re also the most useful for finding threat actors.

Sqrrl’s software is focused on building the algorithms that will discover the TTPs used by adversaries. Sqrrl’s algorithms are designed to be self-learning which means that they are meant to recognize repetitions of patterns and bring TTPs to the forefront.

To achieve these ends, Vaulin notes that Sqrrl uses real data rather than simulated data since real data has sufficient numbers of hiccups and deviations to make it useful. This realistic data also makes it much easier for Sqrrl to tune its software to minimize the signal-to-noise ratio and false positives.

Sqrrl’s Value Proposition

In addition to facilitating the job of the threat hunter, Sqrrl’s goal is also focused on making the hunter’s job more enjoyable. The software automates the tedious parts of hunting work . This automation enables hunters to focus on what really matters which is finding real threats, isolating them and controlling them. Sqrrl tries to distill the data scientists’ and other hunters’ expertise into the product. Consequently, when analysts use the product in pursuit of a potential threat, they have the expertise of hundreds of data scientists and other hunters working with them. This expertise not only provides a basic level of confidence but also provides an indication of where hunters can start looking at problems and where they should dig in deeper.

McCubbin and Vaulin conclude that finding threats on your network is often just a matter of knowing where and how to look for them. The Sqrrl platform enables threat hunters to do just that. The Sqrrl platform, with its robust algorithms, provides the very set of tools threat hunters need because it was designed to help them find a needle in a haystack.



No posts to display