I have learned a lot in the past few years about running and securing public cloud infrastructure and thought I would share some areas that I believe are important. This SlideShare presentation is meant to be a self-read narrative of 13 things to think about AWS security and the move towards least privileged systems. Enjoy, and please comment with your opinions and suggestions. I have also posted a long form version of this SlideShare below if you prefer to read rather than to push slides.
A Baker’s Dozen to Securing AWS
This blog is a high level, broad, overview of some key aspects to think about when deploying in AWS. This is not meant to be a technical document with step-by-step instructions on how to perform security in AWS, but rather some guiding principles that I believe are important. In the future, however, we do plan on diving into some of these in technical detail.
The main reason I decided to post this was that I continue to see security personnel thinking about AWS security in traditional ways. Firewalls, AV, IDS/IPS, etc. and believe, this is very different.
The great thing about public cloud utilities is that they run all the infrastructure to allow you to focus on what really matters, and that is the applications you are running. With that you need to understand the applications and services at a level that allows you to design your security.
Modern security architectures are less about the castle and moat and more about: automation, scale, visibility, and context. Most important is shrinking your attack surface, minimizing mistakes, and fitting security INTO your infrastructure not IN FRONT of it.
So, where do you start?
The starting point to all of this is building least-privilege systems and architectures. Even if you are not there today, as you may be migrating some traditional on-premises systems, it should be your goal to get there and start the journey. It is certainly easier said than done, but it’s a destination you want to drive towards. And if you happen to have the luxury of starting over or from zero then, least-privilege is where you should begin.
This includes items such as:
- Templatized workload configuration tools: TerraForm, AWS CloudFormation
- Orchestration systems: Kubernetes, Swarm, Mesos
- Container technologies: Docker, Linux Containers, ECS
- Your operating system: CoreOS, Ubuntu, RedHat, CentOS
Within AWS there are many things you need to think about differently. But in particular, here is a list of items we are going to dive into:
- Security for your AWS accounts
- Security for all your AWS services you are using
- Security for all your AWS and non-AWS APIs
- Security for your applications
- Security for your users
- Compliance for all of the above
Item 1: Security for your AWS Accounts
Designing your AWS accounts properly is very important. All too often we see poorly designed accounts, sprawl, and over-provisioning on accounts. It can become difficult to unwind these poorly designed AWS accounts – matching accounts and responsibilities is a good place to start.
Although there may be a reason to have a lot of accounts, make sure that it’s a good reason. Having to manage lots of accounts adds to the complexity and increases the attack surface. The counter to this is that some organizations use many accounts to minimize the attack surface, assuming that if one account gets compromised then the surface may be smaller. While this can be true, it rapidly becomes a management overhead that you should be aware of.
AWS organizations should be used for your accounts and all console logins should be protected by multi-factor authentication. Furthermore, instance roles are good for services and roles manage ephemeral keys internally so a lost key is only as good until its expiry date.
Item 2: AWS CloudTrail
AWS CloudTrail is a must for all organizations that are deploying in AWS and should be not only be on for all accounts but should also be checked continually to make sure it has not been turned off. CloudTrail can be quite noisy as it’s essentially a ledger of all activity within your account. All account logins and API calls are recorded in CloudTrail and it’s important that you have a way to query the data. Simply storing it in an S3 bucket is not good enough, it needs to be queryable, have some added context, and you need to have the ability to find the needles in the haystack that are important to you from a threat perspective.
Understanding relevant change in your infrastructure is important for security. Change in configurations, services, API usage, and user patterns.
Item 3: Secure your Services
AWS provides a lot of services. Literally thousands of them. Some of the more popular ones are EC2, S3, RDS, KMS, Redshift, and ECS. When designing your security for these, you need to understand what each service is being used for and why. However, I recommend you don’t boil the ocean and try to understand ALL the AWS services. For now, just focus on the ones you are deploying and keep and eye on new ones developers may be deploying or testing.
Every service in AWS has its own attack surface. This blog could not possibly cover all the services and their risks, but we will be highlighting some in more detail in future blogs.
Although there are current attack scenarios that are happening today in AWS with services such as EC2 and S3, it’s important to think about new risks and new services such as Lambda and RDS and the associated risks.
Item 4: Compliance
Later in the blog there is an item around testing, and one could think of compliance as a test. That said, I thought it was important enough to have its own section. Compliance is a key building block of AWS security and a great place to start once you have CloudTrail setup. All of your accounts and service configurations needs to be checked.
This is NOT your annual compliance audit you have done in the past. This is continual checking all the time, every time.
We recommend starting with the CIS Benchmark for AWS and then expanding into your relevant areas such as; PCI, SOC II, HIPAA, and ISO.
You need to make sure that you are not only getting daily reports of your audit checks, but you should be alerted in near real-time when something goes from compliant to non-compliant.
Item 5: Secure the Network
Of all the items in security that are different in traditional data-centers and public cloud, the network is probably the most different.
Like the bare metal and host operating system you are running your applications on, the network is not owned or controlled by you. It is 100% virtual.
That said, you can, and should, still control what enters and exits your virtual network. Both to and from the perimeter. You should have defaults for no packets coming in or out of the network and only allow what is necessary with justification. Additionally you should have an understanding of your inter-network traffic between your own machines. AWS VPC Flow Logs are one way to do this. But remember, traditional security items that one is used to don’t exist. Things like: securing console access to your router and firmware upgrades are not your issue.
One big difference here is that the network is not near as relevant as it is in traditional data centers. Even though it may be static in your environment, it’s likely the workloads and the system itself is very dynamic. And auto autoscale, containers, orchestration systems, and ephemeral workloads, limit the relevance of IP addresses.
All that stated, your VPC config changes should be monitored just like everything else.
Item 6: Secure the Applications
The applications that you have written and deployed are really the center of everything in public cloud and its here where you need to really focus. Not only focus on the applications that you have written but the applications that someone else may have that have been installed (read-malware).
It’s important that security understands at a deep level the intended application topology and that all application and process behavior is logged. The ephemeral nature of workloads makes this critical as time is of the essence.
Items that are relevant to be aware of are:
- Which applications and services are my applications talking to?
- What is the purpose of the applications communication?
- What is the typical behavior of my applications?
- Gaining insight into anomalies or behavior changes of applications.
- Understanding outliers in behaviors.
When you are logging all this data it can be overwhelming, so you need a way to make meaning of all the data.
Data only turns into information when it has value and can answer key questions.
Questions that need to be answered:
- Who ran this application and where did they come from?
- When was the application run?
- What did the application do?
- Where did it connect to and from?
Additionally, a key value attribute of meaningful information is the ability to make you ask more poignant questions with context. This is very important in dev-ops as it shrinks the mean time to response window considerably and you can either gain useful insight into the application or system or uncover a potential security risk.
“Hey Dan, I noticed at 11:55PM last Thursday you added a US-East as a region, added 20 new GPU instances, and appeared to have installed a new application called monerominer. I see more than 15GB of traffic going out of there. Did you mean to do this?”
Item 7: Secure the Users
One of the premises of least-privilege systems in AWS is to limit user logins. In this area we are referring to the accounts of users on actual machines not AWS accounts. This is a system user that is connecting usually via SSH. Orchestration systems and templatized workload configurations allow you to limit user logins but sometimes they are needed.
Wherever you can limit user logins to machines, especially high privilege logins such as root.
Some guidance on users if you are allowing logins include:
- No shared accounts. Each user should have their own login
- Log all successful and unsuccessful logins
- Use multi-factor authentication
- Set up a bastion host and a VPN server/service so you can limit IP
- Use IAM for authentication with OAUTH or SAML with central directory
- Avoid service account logins
- Change or replace standard OS accounts (ec2-user, Ubuntu, core,etc.)
- Where possible limit users installing apps
- Use the orchestration and drive towards immutable images
Within your login systems you should be users behavior in terms of logging in. Where, how often, movement laterally, and privilege escalation (sudo).
Item 8: Secure the Data
Unfortunately all too often I hear from prospects that their data is not worth anything and they have nothing to worry about when it comes to data theft. The reality is that your data is almost always worth something to someone other than you and attackers will find unique ways to use it.
Fortunately, encryption technologies have come a long way in the recent years and there is little reason to not encrypt everything you can in your data stores. Your keys of course here are critical and should be rotated. Additionally you should investigate key vault technologies and services to automate this.
Item 9: The People
People, in this item are not your end-users but your developers and security personnel. A new term has been created for people who are fulfilling security in modern cloud architectures called “DevSecOps”. While I am not sure we need a new word for this, at least it’s descriptive! For me, it’s really just about strong communication and relationship between developers and security engineering. This needs to be two-ways and automation and strong technology can certainly help bridge the two. Tools such as Slack are excellent for this. Real-time alerting and sharing of information over secure channels can really help moves things forward and provide a medium for both asking questions and getting answers. I am fan of logging all critical events via another technology such as PagerDuty, and all alerts in Slack. Then having a discussion channel is questions arise.
When triaging security events, having a good relationship between security and development along with strong signals and high fidelity context makes a big difference.
Also, email still works as well. It’s a fine medium for sending alerts that are not as time sensitive also.
Item 10: Some Best Practices
One best practice I have found is not so much a practice but a philosophy that needs embedding in the culture. This is that security is not a point-in-time technology or process. There is no time continuum, it should just always be happening. It does not stop or start, it’s continual and is part of the system.
And with any system it needs testing. This starts with penetration and vulnerability testing. You often hear the horror stories of this, but they are not near as scary as they are made out to be. While you can build or buy here, most regulations require 3rd party validation. I suggest also that you do this on your own in addition to the 3rd party. Running war-gaming exercises, thinking evil, performing hack-a-thons and bug hunting are all good ways to do this.
Things to think about:
- What you would do if you have privileged unauthorized access
- Data exfiltration and data destruction
- Public disclosures
- Inadvertent configuration mistakes and compliance failures
- Low-level bugs out of your control and vendor vulnerabilities
With that, you need to be prepared for recovery and for the time when a customer, prospect, or regulator asks specifics about how your secure your infrastructure. It’s best to be prepared with a good answer before they ask, as it’s not if it’s when.
Item 11: Application Security Bugs
Nobody can write perfect code all of the time, but the good companies get very good at testing their code before, during, and after release. Also, if someone else finds a bug in your application or infrastructure I recommend running a full disclosure process with responsible disclosure.
You should have a page on your website with how you answer disclosures and the process outlined. This is often /security on your main website. Additionally you should add security@yourdomain for reporting parties.
I have found that being friendly, courteous, and responding to bug hunters in a timely manner is critical. You do not need to be held hostage to them, however, in many ways, they are doing you a service. They can save you time, reputational damage, and money. A bug bounty program is not mandatory but it’s a great step. It can also be run internally or through a service. You should also investigate rewarding developers who find security bugs, write consistently secure code, and help educate the culture on secure coding practices.
Item 12: Have Fun and Don’t be a Curmudgeon
Information Security is hard in so many ways. You are almost always defending, behind the curve, and called in often into bad scenarios. That said, I feel like times are changing as developers and security personnel work closer together. Helping systems get built, pushing code out faster, and building resilient systems have changed the equation where great security can be seen as a key competitive differentiator.
With this shift in modern architectures we should feel privileged to be building the new security stack. We can learn from the past mistakes to determine the new future.
Item 13: The Bakers Dozen
A baker would be criminal to only deliver you 12 items, so there is one more item, and that is something we want from you. The developer community pushing applications into public cloud at scale is small. The security community is also small. For those of you who are building and securing these modern architectures, share your experiences. Where can we do better? What gotchas are there? What items did I miss in this post?
Pay forward your experiences in securing the future.