Tech Kaizen: Are Hash Codes Unique

Hash codes generate a unique value for a given input, the fact is that, while difficult to accomplish, it is technically feasible to find two different data inputs that hash to the same value. However, the true determining factors regarding the effectiveness of a hash algorithm lie in the length of the generated hash code and the complexity of the data being hashed.

Let's first talk about the hash algorithms themselves. Hashing algorithms generate a fixed-length hash code regardless of the length of the input. For example, the MD5 hash function always generates hash codes that are 32 bytes in length, the SHA1 hash function generates 20-byte hash codes, SHA256 generates 256-bit (32 byte) hash codes, and so on. Therefore, since there are a limited range of possible values for a given hash code and an unlimited range of values to hash, it stands to reason that the length of the hash code generated with a given hash algorithm is directly related to the difficulty of finding two inputs that will generate the same hash code.

Therefore, the fact that each possible hash code doesn't uniquely match an input value only comes into play when you are dealing with random (nonstructured) data. An example of this would be spyware-removal applications that determine if a file is spyware by hashing each file in a specified folder or drive and then comparing that hash value to a list of known spyware-file hash codes. In this case, relying solely on the hash code would be a mistake as the files being hashed are of varying lengths with many files having no semantic meaning (as the application is not determining the meaning of the data; only the hash code values). As a result, it would be highly recommended that these applications either match on file name before hashing the file's contents (which dramatically reduces the possibility of "false positive" results) or use a hash code with a very long output value - such as SHA256.

This brings us to the concept of password hashing. Instead of storing their users' passwords, some applications will store only the hash code for the password. When the user attempts to log into the system, the application hashes the user's input password and compares that hash value to what has been stored for the valid password. This is a very good technique because it means that the users' passwords are not stored and therefore - theoretically cannot be hacked as hash codes cannot be reverse-engineered to their pre-hashed values. However, assuming that a hacker got his hands on the password-hash file, a brute-force method could be employed by the hacker to continually generate hash codes until a code is generated that matches a hash code on the password list. The value the hacker used to generate the matching hash code could then be used to allow the hacker unauthorized access.

This is a very real risk as passwords generally have a fixed length (such as 6-10 characters). Therefore, the hacker doesn't need to generate as many hash codes as he would if attempting to regenerate the same hash code that was originally hashed for a much longer value. As a result, anyone using this technique to protect user passwords should go a step further by adding a salt, or static piece of data, to the input value before it is hashed. This would produce an almost fool-proof system as even if someone were to obtain a list of hashed passwords and were to produce a password that generated to a hash code in the list, this password would not work because when attempting to log in, the system would apply the salt and the resulting hash code would differ from that stored for the user. In other words, the hacker's input value would already be salted, but the system is expecting a non-salted value that it salts.

As a result, the addition of the salt greatly increases the difficulty in cracking the passwords, because now the hacker would need to

1) steal the hash-code file/value

2) know that a salt has been added

3) know the value of the salt

4) know exactly how the salt was added(where and which position ...)

Therefore, while there isn't a one-to-one correlation between every possible hash code and every possible input value such that all combinations are guaranteed to be unique, hash codes are an extremely reliable means of protecting data integrity.

ref:

http://blogs.msdn.com/tomarcher/archive/2006/05/10/are-hash-codes-unique.aspx

http://blogs.msdn.com/tomarcher/

http://infoscience.epfl.ch/record/111550/files/LPG07.pdf

http://en.wikipedia.org/wiki/Cryptographic_hash_function

Handbook of applied cryptography By Alfred J. Menezes, Paul C. Van Oorschot, Scott A. Vanston -

http://books.google.ca/books?id=nSzoG72E93MC&lpg=PA381&ots=MuClx9pH8L&dq=mash-2%20algorithm&pg=PA352#v=onepage&q=&f=false

https://www.cosic.esat.kuleuven.be/publications/article-63.pdf

Recursive Constructions of Secure Codes and Hash Families using difference Function Families -

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.4240&rep=rep1&type=pdf

Tech Kaizen

Search this Blog:

Are Hash Codes Unique - Add your Salt !

The Verge - YOUTUBE

Google - YOUTUBE

Microsoft - YOUTUBE

MIT OpenCourseWare - YOUTUBE

FREE CODE CAMP - YOUTUBE

NEET CODE - YOUTUBE

GAURAV SEN INTERVIEWS - YOUTUBE

Y Combinator Discussions

SUCCESS IN TECH INTERVIEWS - YOUTUBE

IGotAnOffer: Engineering YOUTUBE

Tanay Pratap YOUTUBE

Ashish Pratap Singh YOUTUBE

Questpond YOUTUBE

Kantan Coding YOUTUBE

CYBER SECURITY - YOUTUBE

CYBER SECURITY FUNDAMENTALS PROF MESSER - YOUTUBE

DEEPLEARNING AI - YOUTUBE

STANFORD UNIVERSITY - YOUTUBE

NPTEL IISC BANGALORE - YOUTUBE

NPTEL IIT MADRAS - YOUTUBE

NPTEL HYDERABAD - YOUTUBE

MIT News

MIT News - Artificial intelligence

The Berkeley Artificial Intelligence Research Blog

Microsoft Research

MachineLearningMastery.com

Harward Business Review(HBR)

Wharton Magazine

Monthly Blog Archives

Blog Archives Categories

Popular Posts

My Other Blogs

Total Pageviews

who am i

Google Developers Blog

Blogs@Google

Berklee Blogs » Technology

Martin Fowler's Bliki

TED Blog

TEDTalks (video)

Psychology Today Blogs

Aryaka Insights

The Pragmatic Engineer

Stanford Online

MIT Corporate Relations

AI at Wharton

OpenAI

AI Workshop

Hugging Face - Blog

BYTE BYTE GO - YOUTBUE

Google Cloud Tech

3Blue1Brown

Bloomberg Originals

Dwarkesh Patel Youtube Channel

Reid Hoffman

Aswath Damodaran