Current Mindset
There is nothing new in the fact that Amazon’s folks bake new AWS features with such a speed that nobody else in various cloud communities can catch up, but the one called “Cloud HSM” is different in my view.
The reason why it’s different is that integrating a traditional Safenet HSM, Luna, with EC2 instances can add a level of assurance that will eventually change a traditional mindset.
As of today, a traditional mindset can be shortly formulated as “we do not trust them”. We do not trust them not because we think that they can’t implement security controls at the level that we have in our internal data centers, but because we don’t really know much about their security policies, processes around those policies, access controls, about what they can potentially do with our highly sensitive data and who will be responsible if that sensitive data is leaked.
The other factor of mistrust is that no matter how good or bad our internal data center’s security controls are, we’re in charge, we know exactly how they work and we can make a change quickly if we need to, while in a cloud we do not control anything and that causes a lot of fear, uncertainty and doubts (FUD), which culminate sometimes in executive’s statements like “I don’t want to lose my job”.
Possible Mitigation
I personally would not blame anyone for the described mindset, it’s normal and it’s very common for humans to try avoiding FUD by all possible means.
On the other hand, business requires agility that cloud definitely provides, and that requirement pushes envelope toward necessity of taking a bigger risk.
The biggest question here is how to make the risk manageable and the first obvious suggestion to mitigate against “we do not trust them” paradigm is to encrypt everything in transition and at rest with cryptographic keys that are not accessible by “them”.
It’s easier to say than implement that approach though, because it’s not clear how to make the keys inaccessible by “them”. You can use in house HSM of course, but you’ll need credentials of some sort to access it from a cloud and it will require storing them somewhere in an untrusted environment.
Besides, doing encryption remotely on a network appliance hosted in your data center, might not be such a good idea considering possible performance implications.
Why Cloud HSM can be a Game Changer
The beauty of Cloud HSM is that “they” (cloud folks) don’t have access to the objects stored in the HSM partitions, you’re the only one who creates and have access to them. The only thing that cloud folks can do is to delete your partition or re-initialize the appliance in some cases (e.g. you don’t pay the fees and don’t reply to multiple warnings that have been sent to you).
That fact alone puts Cloud HSM to a category of trusted infrastructure that you can rely on to securely store cryptographic keys and certificates used by EC2 instances to encrypt/decrypt data at store and to protect it in transition.
High Availability
It’s clear that in cases when a system’s high availability is required you can’t rely on a single HSM device to support mission critical cryptography. Fortunately, Cloud HSM solution does provide a method of creating a cluster with several HSM devices that look like a single device (slot) from a client’s point of view.
Possible Usage Scenario and Architecture
It’s important to have a good security controls around network level access to a cloud HSM – you don’t want to make the device accessible to the whole world. In this regard, using AWS security groups looks like a good idea.
Another approach to achieve a better security and HSM’s availability would be to create a proxy, which will be the only compute instance (or instances) that has an access to the HSM, while all other clients would need to use the proxy to access the device.
The diagram below demonstrates possible Cloud HSM architecture that solves for security and high availability of the device(s) from different AWS zones.
Security zone red (SG-RED on the diagram) will implement networking rules (ACL’s) that authorize access from the proxies only.
Security zone yellow (SG-YELLOW on the diagram) will allow connections from business layer only.
Rules defined for security zone green (SG-GREEN on the diagram) will depend on the business logic that you build.
All proxies on this diagram should be stateless and can implement additional layer of authentication for servers running in the “business logic” layer.
Limitations, Challenges
As of today, Cloud HSM is available in VPC environments only, which is a good thing from security point of view, but might not be very practical for those who want to access the device from a public cloud.
Not all AWS zones can be used to deploy Cloud HSM (I believe they are available in N. Virginia and Ireland only today).
The process of Cloud HSM provisioning is manual, so you can’t really script it by using common CloudFormation or other deployment tools.
The process of device initialization, partition creation and mutual client/server registration is rather cumbersome and includes many manual steps that in general contradict the major cloud concept that assumes automating of everything related to “infrastructure on demand” service.
All the limitations described above are resolvable though and I’ll try to describe my own experience related to that in the next “Cloud HSM How To” blog.
Stay tuned!
See also Cloud HSM – Part2
One of the most significant considerations for architecting in the cloud is state. Some parts of your app will be inherently stateful. For example, your persistence tier is designed to be stateful, to permanently store your data. Your caching tier is also stateful, but is meant to hold ephemeral data such as information about a user s session.