Category Archives: Uncategorized

Cloud HSM Security – Part3

Cloud HSM – Part 3: Passing Secrets

As mentioned in “Cloud HSM – Part 2”, to start using an HSM deployed to a cloud, you’ll need to find a way of passing credentials such as a partition level password (or PIN) and a client side certificate to all HSM’s clients.

 

Passing this kind of secrets using common AWS mechanisms like S3 buckets, user data or baking them to AMI‘s might not be such a good idea if you consider that common “we do not trust them” mentality described in “Cloud HSM – Part1”

 

General Approach

Since a VM in any environment, including AWS, has a number of attributes (or metadata), some of which are unique for a VM, these attributes can be used to verify an instance that is making a claim by sending an assertion to a verifying party (VP).

 

In static VM environments the attributes can be used to create a VM’s fingerprint and then compare that fingerprint against a white list of all trusted fingerprints stored on a VP.

 

Creating  “white list” for all instances running in an auto-scaling group (ASG) doesn’t seem possible though due a volatile nature of some attributes such as internal/external host names and instance ID. A way around would be to make an instance verification dynamic. Fortunately, AWS API does have all necessary functions to implement a dynamic credential-less EC2 instance validation and next section provides the details.

Credential-less Dynamic EC2 Instance Verification

In this approach an EC2 client would need to collect instance parameters that are both available through a meta-data interfaces on a client, which could be considered as an asserting party (AP) and through a an AWS API on a server, which can be considered as a relying party (RP). Yet another special parameter that can be accessed on both AP and RP is an internal or external IP address, which is not exactly metadata, but can be an essential part of the instance verification process.

 

I’m intentionally using a SAML terminology here, since an authentication and authorization scenarios described below look very similar to those of SAML.

 

Here are parameters that an asserting party can fetch through a meta-data interface (e.g. through http://169.254.169.254/latest/meta-data in the ‘us-east’ region):

 

  • instance-id
  • public-hostname
  • local-hostname
  • reservation-id
  • instance-type

Their counterparts could be also retrieved through AWS API (e.g. see ec2.instance.Instance object in Python’s boto library) on RP side:

 

  • id
  • public_dns_name
  • private_dns_name
  • instance_type
  • reservation_id

IP address can be obtained through a Socket API on AP and through API’s like getRemoteAddr in JEE API or retrieved from an HTTP header ‘x-forwarded-for’ where VP’s servers are running behind proxies or load balancers.

 

Authorization

If your RP serves secrets for many different AP’s, you might need to consider implementing additional authorization controls that would not allow a single AP to get access to all possible secrets, e.g. you might want a database server to have an access to a database credentials only, while your web server might need an access to a private key to terminate SSL.

 

Using AWS Roles in these cases looks like a good solution to implement a traditional RBAC approach. You’ll basically need to assign the same role to all ASG instances using IAM console, CloudFormation or other similar tools provided by AWS. A role name would need to be submitted in an assertion created by AP and verified by RP using code like follows – basically RP would need to check if an instance that submitted a request belongs to the role provided in the assertion:

 

public boolean inRole (Instance ins, String role) {

boolean ret = false;

IamInstanceProfile prof = ins.getIamInstanceProfile();

if (prof == null)

return false;

 

String arn = prof.getArn();

ListInstanceProfilesForRoleRequest req = new ListInstanceProfilesForRoleRequest();

req.setRoleName(role);

 

AmazonIdentityManagementClient iam = new AmazonIdentityManagementClient(new         BasicAWSCredentials(appId, appSecret));

if (iam == null)

return false;

ListInstanceProfilesForRoleResult res = iam.listInstanceProfilesForRole(req);

if (res == null)

return false;

List<InstanceProfile> ipl = res.getInstanceProfiles();

for (InstanceProfile ip : ipl ) {

if (ip.getArn().equals(arn)) {

ret = true;

break;

}

}

return ret;

}

 

 

 

Security Considerations and Additional Controls

There is always a question about how reliable and secure the suggested authentication and authorization methods are. Parameters like ‘instance-type’ and ‘reservation-id’ could be considered as semi-static and as those that don’t add much entropy to the verified parameters. IP ranges can be known in advance and it will inevitably decrease the entropy as well.

 

To mitigate the risks RP implementers can consider reducing a time window during which a communication between AP and RP is allowed, e.g. an RP can check an instance start time through an AWS API and deny access if an instance was running too long.

 

Another good mitigating control could be implementing a “one-time-use” policy, meaning that an AP can get secrets from RP only one time. This can be implemented by maintaining a list of AP parameters’ hashes and denying access to those instances whose parameters’ hashes are already in the list.

Cloud HSM Automation – Part 2

Why would you need HA

While the previous Cloud HSM article was mostly covering “Why Cloud HSM is Important” topics, this one describes technical details about how Luna HA (High Availability ) cluster can be built in a cloud and provides links to scripts that could help automating and codifying a rather complicated Luna’s setup process.

It’s obvious that when you build an HA system with subsystems that rely heavily on cryptographic services built around Luna, you need to put latter to the same HA category. That’s why a single Luna appliance is not usually sufficient and you need an array (or cluster) of HSM’s that look and behave as a single one from a client’s point of view.

The first few chapters of this document cover manual Luna and Luna array configuration topics, while “Setup Automation” section describes a command line tool that allows automating the whole process by creating a JSON configuration file and running a Python script. You do need to go through manual setup topics if you didn’t have a prior experience with configuring Luna, otherwise it could be very difficult to understand how to create the JSON file and troubleshoot possible issues.

Provisioning

Unfortunately, there is no way to deploy a Luna device to a cloud in the same automated manner (e.g. through CloudFormation) as other pieces of AWS infrastructure. There is a well documented  manual process for that, which is not very difficult to follow,  but it’s still manual. The process is described here.

The two important things that are worth mentioning are the facts that you’ll need a VPC to deploy a Luna appliance and that you’ll need at least two devices to create an HA array.

After the appliances are provisioned to a VPC, you’ll be given managers passwords that could be used to connect to the devices remotely and perform all necessary configuration jobs.

Configuring Luna Servers

You’ll need to login to a Luna device using SSH from a client machine to perform a configuration job. I would strongly recommend to enable key based authentication on a Luna device, because most likely you’ll need to SSH to the device many times before you’re done with configuration and verification:

scp <public-cert-file-name> manager@<luna-ip>:.

ssh manager@<luna-ip>

sysc ssh pu esysc ssh pu a -f <public-cert-file-name>   

where <public-cert-file-name> is a public certificate generated by a ‘ssh-keygen’ command on a client machine.

The following high level manual steps are required to configure a Luna server:

  1. ‘hsm init’ command to initialize the device
  2. ‘sysconf re’ command to regenerate server side certificates
  3. ‘ntls bind’ command to restart Luna’s network interfaces
  4. ‘hsm login’ to as admin to Luna
  5. ‘par cr …’ to create a partition
  6. ‘c reg …’ to register client
  7. ‘c a …’ to assign a partition to a client

The Luna configuration process is described in details here.

Configuring Luna Clients

This one is interesting and requires some re-thinking because of differences introduced by AWS’ auto scaling groups (ASG). In a traditional ‘static’ environment each client would require a unique client certificate and an IP (or client’s host name) to be registered on a Luna server. Since EC2 instances can randomly go up and down in ASG that approach would be difficult to implement. Fortunately, there is a way around that allows sharing a single client’s certificate for the whole ASG.

The first step in configuring clients is to download and install Luna’s client tools and libraries  that are available for free:

After these two components are installed, ‘vtl’ command line tool used for a client’s setup could be found at the following location on Linux: /usr/lunasa/bin/vtl. A new client certificate and a private key can be generated by running the following command:

vtl createCert -n <cert_name>

A newly generated certificate will be stored at ‘/usr/lunasa/cert/client/<cert_name>.pem’ and will need to be transferred to a Luna server for further registration:

scp /usr/lunasa/cert/client/<cert_name>.pem manager@<luna_server>:.

A trick that allows registering the whole ASG without binding a registration to an IP is to use <cert-name> as a parameter in ‘-hostname’ option and not to use ‘-ip’ option at all. It’s not obvious, but it definitely works. A command on the server will look like this:

c reg -c <client-name> -h <cert-name>

where <client-name> is a logical name that the server will use to refer the new client and <cert-name” is the same cert that we’ve just created on the client using ‘vtl’ command.

To replicate the generated certificate and private key to other ASG members you’ll need to place the generated files to /usr/lunasa/cert/client/ and to make sure that you have two following entries in ‘LunaSA Client’ section of ‘/etc/Chrystoki.conf’ file:

ClientPrivKeyFile = /usr/lunasa/cert/client/<cert-name>Key.pem;

ClientCertFile = /usr/lunasa/cert/client/<cert-name>.pem;

You’ll also need to register a Luna server on the client to be able to connect to that server:

scp manager@<luna-server-host>:server.pem .

vtl addServer -n <luna-server-host> -c server.pem

Configuring Luna HA Cluster

You’ll need at least two Luna servers configured as described in “Configuring Luna Servers” section with the following limitations:

  1. Admin and partition passwords should be the same.
  2. Partition names for partitions participating in HA cluster should be the same
  3. Cloning domain names should be the same

If the conditions above are met, registering the cluster should be easy:

vtl haAdmin -newGroup -serialNum <par-ser-nbr1> -label <group-name> -password <par-pwd>

vtl haAdmin -addMember -serialNum <par-ser-nbr2> -group <group-ser-num> -password <par-pwd>

where

<par-ser-nbr1> and <par-ser-nbr1> – serial numbers of partitions included to the cluster

<group-name> – is a logical name for the newly created cluster

<par-pwd> – is a partition password

<group-ser-num> – a serial number for the newly created group (it will be displayed after the group is created by ‘newGroup’ command.

To figure out what <par-ser-nbr1> and <par-ser-nbr1> parameters are, you can run the following command on a client:

vtl verify

The output should look as below:

The following Luna SA Slots/Partitions were found:

Slot    Serial #    Label

====    ========    =====

1    <par-ser-nbr1>     <par name>

2    <par-ser-nbr1>     <par name>

Configuring Java Client

Luna’s client includes Java classes that implement traditional JCA architecture. You’ll need to include LunaProvider.jar to a classpath and modify java.security file that can be normally found at the following location: <JRE-DIR>/lib/security/java.security. The section that needs to be updated looks as follows:

security.provider.1=sun.security.provider.Sun

security.provider.2=sun.security.rsa.SunRsaSign

security.provider.3=sun.security.ec.SunEC

security.provider.4=com.sun.net.ssl.internal.ssl.Provider

security.provider.5=com.sun.crypto.provider.SunJCE

security.provider.6=sun.security.jgss.SunProvider

security.provider.7=com.sun.security.sasl.Provider

security.provider.8=org.jcp.xml.dsig.internal.dom.XMLDSigRI

security.provider.9=sun.security.smartcardio.SunPCSC

To enable Luna Provider, add the following line to the list:

security.provider.10=com.safenetinc.luna.provider.LunaProvider

Testing HA Luna from a Java application

You can find many Java sample applications under following location: /usr/lunasa/jsp/samples/com/safenetinc/luna/sample. Connecting to an HA cluster is not different from connecting to a single Luna device – you just need to know a correct Slot number that represents the Luna array. ‘vtl haAdmin -show’ command can be used to find out what HA slot number is:

[ec2-user@ip-10-0-1-225 lunasa]$ /usr/lunasa/bin/vtl haAdmin -show

================ HA Group and Member Information ================

HA Group Label:  ha_group

HA Group Number:  <grp-ser-nbr>

HA Group Slot #:  <grp-slot-nbr>

Synchronization:  enabled

Group Members:  <par-ser-nbr1>, <par-ser-nbr1>

Standby members:  <none>

Slot #    Member S/N                      Member Label    Status

======    ==========                      ============    ======

1     <par-ser-nbr1>                          <par_name>     alive

2     <par-ser-nbr2>                           <par_name>     alive

The slot number that you’re looking for is <grp-slot-nbr>.

If you look at  Java sample found in the KeyStoreLunaDemo.java file, you’ll find the following lines :

ByteArrayInputStream is1 = new ByteArrayInputStream((“slot:1”)

.getBytes());

 

This is the place where you would need to use the slot number displayed by ‘vtl haAdmin -show’ command.

Setup Automation

As you’ve probably noticed already, the Luna HA setup process is rather cumbersome and doesn’t fit well to a major cloud concept that assumes a great deal of automation. I’ve tried to address the issue by creating a Python package that would allow setting up a Luna HA cluster by running a single command:

luna_mech -a -g -r  <luna-array-config-file>

or, if you want to configure a single Luna appliance, you can run:

luna_mech -l -g -r  <luna-config-file>

An idea here is to put all Luna parameters to a JSON file and make the Luna Mechanizer to parse and interpret it.  The next step could be to in integrate the mechanizer with a CloudFormation framework.

The code can be found in a git repo @ sf.net and can be downloaded by usual means:

git clone git clone  http://git.code.sf.net/p/lunamech/code luna_mech

or

git clone  git://git.code.sf.net/p/lunamech/code luna_mech

Check README file for further instructions.

Security Considerations

Since a cloud environment is not commonly considered as trusted,  there is still a problem of passing secrets such as an HA partition password and client side private key to ASG members. You definitely don’t want to “bake” secrets like this to an AMI or even store them in an encrypted S3 bucket, let alone putting them to unencrypted EC2’s “user data“.

I’ll try to explore other AWS specific ways of passing secrets from an internal DC to a cloud in the next blog.

Stay tuned!

 

Why Cloud HSM is Important – Part 1

Current Mindset

There is nothing new in the fact that Amazon’s folks bake new AWS features with such a speed that nobody else in various cloud communities can catch up, but the one called “Cloud HSM” is different in my view.

The reason why it’s different is that integrating a traditional Safenet HSM, Luna, with EC2 instances can add a level of assurance that will eventually change a traditional mindset.

As of today, a traditional mindset can be shortly formulated as “we do not trust them”. We do not trust them not because we think that they can’t implement security controls at the level that we have in our internal data centers, but because we don’t really know much about their security policies, processes around those policies, access controls, about what they can potentially do with our highly sensitive data and who will be responsible if that sensitive data is leaked.

The other factor of mistrust is that no matter how good or bad our internal data center’s security controls are, we’re in charge, we know exactly how they work and we can make a change quickly if we need to, while in a cloud we do not control anything and that causes a lot of fear, uncertainty and doubts (FUD), which culminate sometimes in executive’s statements like “I don’t want to lose my job”.

Possible Mitigation 

I personally would not blame anyone for the described mindset, it’s normal and it’s very common for humans to try avoiding FUD by all possible means.

On the other hand, business requires agility that cloud definitely provides, and that requirement pushes envelope toward necessity of taking a bigger risk.

The biggest question here is how to make the risk manageable and the first obvious suggestion to mitigate against “we do not trust them” paradigm is to encrypt everything in transition and at rest with cryptographic keys that are not accessible by “them”.

It’s easier to say than implement that approach though, because it’s not clear how to make the keys inaccessible by “them”. You can use in house HSM of course, but you’ll need credentials of some sort to access it from a cloud and it will require storing them somewhere in an untrusted environment.

Besides, doing encryption remotely on a network appliance hosted in your data center, might not be such a good idea considering possible performance implications.

Why Cloud HSM can be a Game Changer

The beauty of Cloud HSM is that “they” (cloud folks) don’t have access to the objects stored in the HSM partitions, you’re the only one who creates and have access to them. The only thing that cloud folks can do is to delete your partition or re-initialize the appliance in some cases (e.g. you don’t pay the fees and don’t reply to multiple warnings that have been sent to you).

That fact alone puts Cloud HSM to a category of trusted infrastructure that you can rely on to securely store cryptographic keys and certificates used by EC2 instances to encrypt/decrypt data at store and to protect it in transition.     

High Availability

It’s clear that in cases when a system’s high availability is required you can’t rely on a single HSM device to support mission critical cryptography. Fortunately, Cloud HSM solution does provide a method of creating a cluster with several HSM devices that look like a single device (slot) from a client’s point of view.

Possible Usage Scenario and Architecture

It’s important to have a good security controls around network level access to a cloud HSM – you don’t want to make the device accessible to the whole world. In this regard, using AWS security groups looks like a good idea.

Another approach to achieve a better security and HSM’s availability would be to create a proxy, which will be the only compute instance (or instances) that has an access to the HSM, while all other clients would need to use the proxy to access the device.

The diagram below demonstrates possible Cloud HSM architecture that solves for security and high availability of the device(s) from different AWS zones.

Security zone red (SG-RED on the diagram) will implement networking rules (ACL’s) that authorize access from the proxies only.

Security zone yellow (SG-YELLOW on the diagram) will allow connections from business layer only.

Rules defined for security zone green (SG-GREEN on the diagram) will depend on the business logic that you build.

All proxies on this diagram should be stateless and can implement additional layer of authentication for servers running in the “business logic” layer.

 

 

 

 

Limitations, Challenges

As of today, Cloud HSM is available in VPC environments only, which is a good thing from security point of view, but might not be very practical for those who want to access the device from a public cloud.

Not all AWS zones can be used to deploy Cloud HSM (I believe they are available in N. Virginia and Ireland only today).

The process of Cloud HSM provisioning is manual, so you can’t really script it by using common CloudFormation or other deployment tools.

 

The process of device initialization, partition creation and mutual client/server registration is rather cumbersome and includes many manual steps that in general contradict the major cloud concept that assumes automating of everything related to “infrastructure on demand” service.

 

All the limitations described above are resolvable though and I’ll try to describe my own experience related to that in the next “Cloud HSM How To” blog.

 

Stay tuned!

 

See also Cloud HSM – Part2