Encryption vs. Hashing
- Encryption brings data into a state which cannot be interpreted by anyone who does not have access to the decryption key, password, or certificates.
- Though encryption does not restrict the access to the data, it ensures if data loss happens, then in that case data is useless for the person who does not have access to the decryption key\password\certificates.
- On the other hand, Hashing brings a string of characters of arbitrary size into a usually shorter fixed-length value or key that represents the original string and acts as a shortened reference to the original data.
- A slight change in the input string of characters produces a completely different hashed output.
- To meet the demands of regulatory compliance and corporate data security standards, SQL Server allows you to enable encryption at the column\cell level or on the entire database level whereas hashing can be used for several purposes for example:
- Identifying incremental data or changed data - Hash values generated for an entire row (by concatenating values of all the columns of the row and then generating hash key on it) are useful for efficiently searching for differences between rows in tables and identifying if the rows have changed, in case there is no mechanism in place to identify incremental data on the source table.
- Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value.
- If you have a composite key on several lengthy columns, you can concatenate them and generate hash key on the concatenated value and then you can use this hash key as a joining key.
- Encryption is bidirectional, which means data encrypted can be decrypted back to the original string if you have access to the correct decryption key.
- Hashing is unidirectional, which means hashed data cannot be reversed back to the original string.
Getting Started with Hashing
- SQL Server has the HASHBYTES inbuilt function to hash the string of characters using different hashing algorithms.
- The supported algorithms are MD2, MD4, MD5, SHA, SHA1, or SHA2.
- The hashed data conforms to the algorithm standard in terms of storage size i.e. 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512. SHA2_256 and SHA2_512 algorithms are available in SQL Server 2012 and later versions.
- The stronger hash function you use, the more storage space it takes, and performance are slower but it provides a stronger hash value with minimal chance of hash collision (generating the same hashed output for two different input string of characters).
- Hence, it’s recommended to use hashing algorithms depending on your workload and data to hash by making it an appropriate trade-off.
- The example below demonstrates the use of the HASHBYTES function to do hashing using MD5 algorithm.
- As mentioned before, a slight change in the input string of characters produces a completely different hashed output and this is what you could see in the second column.
- The only difference between input for the first column and input for the second column is an extra space at the end of the input string in the second input string:
SELECT HASHBYTES('MD5', 'Sample String to hash') AS Col1,
HASHBYTES('MD5', 'Sample String to hash ') AS Col2
- No matter how many times you do the hashing, the hashed output will remain the same for the same set of input strings and the same hashing algorithm.
- Based on the hashing algorithm you choose, your hashed output will change both in terms of value and size.
- For example, hashed output with MD5 algorithm produces a 16 bytes long value whereas the SHA1 algorithm produces a 20 bytes long value.