An incredible milestone in a journey that started two years ago with a simple question: Why does it take enterprises 6+ months to find intruders in their data centers?
It’s one of the most important IT questions of our day. The answer, of course, isn’t a simple one. So we followed Toyoda-san’s 5-why method to get to the real root cause:
1. Why is it hard to find intruders in a data center?
Sophisticated hackers easily hide their activities by laying low and making as few waves as possible.
2. Why is it easy to hide?
Most enterprises monitor their data centers by sampling network activity between VMs and/or logging a limited set of important events. A seasoned intruder simply avoids doing anything that creates a log entry or a security event. Or they erase incriminating log entries to cover their tracks. Typical security tools are on the outside looking in. From that vantage point the VM is a black box with just a handful of observable characteristics. The activities of an intruder inside the box are impossible to distinguish from an authorized application.
3. Why not collect all network/process activity from inside the VM?
The volume of data collected would be staggering. And since typical log files weren’t really designed for security, that big dataset might not have the information you need. But even if you could get all this data, and even if it had everything you need, there are no tools to work with it. Analyzing this much information to spot an intruder would be all but impossible.
4. Why would it be so hard to analyze the data to spot an intruder?
In theory, spotting intruders is a straightforward process — you simply look for something amiss. A broken window or an open door at your house, for example, is a pretty reliable indication that you’ve been robbed. But virtual data centers are way more complex than physical buildings. Even in the smallest data centers, there are thousands of operations happening every second! Machines die, workloads auto-balance, services failover, processes start and end within milliseconds, connections appear and disappear, and on and on. You have to have a baseline of normal data center operations to spot broken windows and open doors, but establishing that baseline is a huge challenge.
5. Why is it hard to define a baseline for a datacenter?
To define a baseline, you’d need tens of thousands of rules and policies for the millions of things in a data center (like IP addresses, processes, users, binaries, containers etc.). Even if you could write them, upgrades, new services, or just the normal evolution of the data center would constantly break existing rules and require new ones.
Therefore we reasoned that an effective breach detection solution must:
- Leave nowhere for intruders to hide in the data center. This implies our solution has to understand everything going on inside the VM — all the network activity, all the process activity, and all the relationships — in real time and at a reasonable cost. Oh, and it would be nice if the solution were scalable, immutable, and elastic. We’re all cloud people, aren’t we?
- Automatically learn and maintain an architectural baseline for each of the logical entities in the data center, and then use that baseline to spot any “broken windows.” We can easily understand millions of processes and their relationships, if we can create a lacework of their behavior. A user should not have to enter rules or policies or figure out what logs to collect in order to define what is expected in a datacenter.
Thus we arrived at our mission. In the physical world, your eye can easily spot even the smallest deviation in an intricate lacework pattern, regardless of its complexity or thread count. We do the same for data centers: we spot deviations in data center patterns to reliably find intruders far more quickly than ever before.
A tall order? More to follow on our solution…