Polygraphs: Behavior Baselining to Reveal the Elephant
July 27, 2017
You are probably familiar with the parable of the blind men describing an elephant. Because they experience only what they can touch, each of them has a very different concept of what the animal is. One touches the trunk and concludes it’s a snake. Another explores a leg and concludes it’s a tree. They are, of course, all wrong: an elephant can only be understood if you can see all of it.
If you want to understand a security incident, you need to see the whole thing. In my last blog entry, Introduction to polygraphs, I discussed what a polygraph is. Now I’d like to use some real data to show how polygraphs reveal “the entire elephant”.
The following table picks a small set of 16 connections from an anonymized Lacework account. This account had more than 1 million similar connections on this day.
We can observe the following from the above set:
- Source ports are typically assigned randomly and they change all the time (as expected). The destination port is always 443 (the https service).
- There is a smaller set of unique source IP addresses – but we don’t know much else. Are they from the same machine? Are they related somehow?
- There are a number of unique destination IP addresses but we see some duplicates. What’s going on?
- Traffic levels are all over the map – with no clear patterns or correlation
There is much we don’t know: are these connections from the same application or many different ones? Is traffic normal or anomalous? Machine learning can help cluster connections but we are still missing critical insights.
Network security tools see one part of a security incident and conclude the elephant is a snake. They need more context to really understand what’s going on. That’s where Lacework excels. Let’s see the power of polygraph at work here!
If we can associate network connections with machines we can better understand what’s happening. In this case, as commonly happens, every IP is a different machine. These connections are coming from 4 VMs.
What if we could see processes? After putting processes in the table we see a pattern: there are only 8 sources. Much better than 16 random ports! The outline of our elephant is starting to emerge.
Now let’s dig into destinations. Let’s add the DNS hostname to our table. Instead of just doing a reverse-DNS of the IP address, let’s capture the hostname that was used on the particular VM before the connection was made. After adding hostnames to the above table, we find there are really only 3 destinations, not 16. Great!
Now let’s dig into the processes and group them by the application. After analyzing the properties of processes like its command lines, their past behavior, users that run them, etc., we find there are only 2 applications. We’ve gone from 16 client ports to 2 sources. Now we’re seeing the whole pachyderm! The new table is as follows:
We can baseline this simply as 2 different applications talking to 3 different DNSs. Moreover, this baseline can tell us that if App1 talks to bucket2 that would be unusual, even though bucket2 shares the exact same IP addresses as bucket1.
Going from 16 connections to the graph below may not sound like much, but this works equally well for millions of connections. We typically see 6 orders of magnitude reduction, i.e. 100 million connections typically end up with about 150-200 edges.
Also, IP, port, process, VM changes, or other data center normal operations, do not make any difference to this baseline. We have achieved our logical baseline with zero-touch.
Moreover, looking at bytes in/out we now start seeing clearer patterns. Redoing the table based on Apps and buckets, we find that there is indeed a pattern in bytes in/out that can be used for anomaly detection over time.
With Lacework, your security team is no longer a group of blind men analyzing an elephant. Now, they can see the entire problem and gain true understanding. It’s nothing short of a revelation.
Want to see Lacework at work in your environment? Give it a try with our free trial and see for yourself. The 1st Polygraph is automatically created for you in two hours.