Encryption And Hashing Fundamentals
Having worked in numerous organisations where PII data is vitally important, especially in the era of GDPR, I’ve seen first hand many implementations of Encryption and Hashing.
Yet in this time, I’ve very often experienced people using the terms interchangeably, or incorrectly.
Therefore I aim here to help demystify some of the terminology and help share the essentials….
Personal Identifiable Information (PII)
Personally identifiable information (PII) is any data that can be used to identify a specific individual. NI numbers, email address, phone numbers are the traditional examples. More recently, other technological elements can now be considered PII such as an IP address, reward card IDs, location and behavioral data.
Encryption is a process that encodes a message or file so that it can be only be read by certain people. An example is WhatsApp, where your data is encrypted on your phone, sent encoded over the internet, and decrypted by the phone receiving your message.
The idea is that at some point you will want to read that data in the clear again, but throughout its journey that should not be possible.
There are two typical flavours of Encryption – Symmetric and Asymmetric.
Commonly, companies use Asymmetric Encryption. This is where there are two separate Keys:
- There are the Public Keys, which are used to encrypt data; these can be shared fairly widely as there is little to no security risk with someone obtaining these.
- There are the Private Keys, which are used decrypt data; these are a security risk. Anyone with these keys are able to decrypt data which is why they must remain very secure.
An important thing to note is that when you encrypt a value such as ‘ABC’ the encrypted value will always be different.
With Encryption, you aim to encrypt PII data where you may want to see it again in the clear, through a decryption service.
Hashing is the process of converting a message or file into a fixed size string of text, meaningless to someone reading it, using a mathematical function.
The purpose of Hashing is very Different to Encrypting. Hashing should be one way; there is no expectation that the resulting value will be converted back into it’s original form.
An example may be where you want to join or compare datasets together to find out if more people purchase Cornish Pasties in Devon or London, however you have no need to see who those people are.
There are various ways of Hashing, using various technologies, formats and algorithms.
Of these there are two options:
- Hashing without a salt
- Salted Hashing / Keyed Hashing
Even within these two options, there are may ways of doing so. One common way I’ve experienced is a type of keyed hashing service called HMAC Hashing.
An important thing to note is that when you hash a value such as ‘ABC’ the hashed value will always be the same.
This type of hashing is similar to salted hashing, in that when a message is passed through a mathematical function to generate the hash, a value is also passed with it. In salted hashing this is a salt, in keyed hashing this is a key.
This gives the security benefit that for a value to be reverse engineered by an attacker, they would not only need to guess the input and mathematical algorithm, but also have the salt or key as input. (see FAQ Below)
The main difference between salted and keyed hashing is that the salt is not assumed to be unknown to the attacker, but the key is. An additional difference is that salts often vary; if you hash three passwords within the same system, then you should use three distinct salt values, whereas keys are reused. Furthermore, Keyed hashing is usually faster than salted hashing.
HMAC Hashing is just a form of Keyed Hashing.
With Hashing you aim to hash PII data where you want to perform analytics on datasets without the need to see individual personal data.
What Is Tokenisation?
Tokenisation is another technique used in organisations, in my experience more rarely. In this case PII is replaced by a randomly generated value that has no intrinsic value.
How Is Our Data Classified?
Each company generally has their own agreed documentation on which data needs to be in the clear, encrypted or hashed.
Are Hashes really one way? / Why would we use a Salted / HMAC Hash?
They should be, but it is possible to retrieve the original value by cracking / hacking the Hash. The reason is down to the fact that hashing a value without using a salt returns the same value each time.
Therefore if someone has hashed ‘John’ and it produced ‘X74HZ’, all the attack has to do, is use the same hashing algorithm on a bunch of names like ‘Steve’, ‘Amy’, ‘Ben’..etc until they hash ‘John’ and find the result is ‘X74HZ’. They then realise that the input must have been John.