Version 1 (modified by dkg, 10 years ago) (diff)


a Review of Hash Functions

Thanks to the git focus at the hackday, our discussion of how cryptographic weakness is related to bitlengths, and a couple of offline conversations with wiki:jrollins and enw, i've thought a lot more about SHA-1 over the last few days.

What is a hash function?

We all use SHA-1 all the time in modern computer systems, and we're basically dependent on it. Starting to understand the internals of git (which is extremely SHA-1-dependent) makes me realize just how much crazy stuff is possible if you're able to accept the underlying idea of a cryptographically-good hash function.

What is a hash function? What makes a hash function cryptographically-good? [ Collision-resistance and one-wayness are the two key criteria].

How can a hash function fail?

However, hash functions don't stay "good" forever. Most attacks i've seen concentrate on the math involved in the hash function to weaken the collision-resistance property. That is, if we can understand the underlying math, it might be possible to make reasonable deductions that could collapse the search space for a collision to a tiny fraction of the search space that a brute-force search for a collision would require.

This was quite famously done for MD5 (a once-widely-used hash function) about 4 years ago, and was cleverly demonstrated this year in a so-called "Nostradamus Attack" titled [ Predicting the winner of the 2008 US Presidential Elections using a Sony PlayStation 3].

Because of these attacks, MD5 is no longer taken seriously as a useful hash function for strong encryption. If you are given the choice between MD5 and SHA-1 in a cryptographic context, you should always choose SHA-1.

where does SHA-1 stand?

However, SHA-1 was found to be "broken" 3 years ago by the same cryptographer who broke MD5. But "brokenness" is on sort of a sliding scale. Where you'd expect to have to search through 280 items to find a single matching pair in a fully collision-resistant 160-bit hash, Dr. Wang showed that that number is actually only 269 for SHA-1. Bruce Schneier's blog has a good record of this event.

269 is still a very large number, and no actual collision has yet been found. But there is [ ongoing work to find a single collision in SHA-1].

What's next?

In ponderous bureaucratic response to these announced weaknesses, NIST (the national Institute of Standards and Technology, responsible for codifying SHA-1 after it was invented by the NSA) has finally set in motion a public call for a new cryptographic hash family

There are also "longer versions" of SHA-1, up to SHA-512, which produces a 512-bit hash (which means that a brute-force search for a single collision should have to examine 2256 items, which is a lot). /usr/bin/sha512sum is included in DebianPackage:coreutils. And there are other algorithms proposed, such as whirlpool (also a 512-bit hash, available in DebianPackage:whirlpool).

So what does this mean for git? Very little for the underlying system, i expect: as i understand the arrangement, any hash function could work. Linus likely chose SHA-1 because of its ubiquity more than anything else. There was some discussion early in the development of git that appears to question that choice, but i couldn't find any serious followup to it. The discussion i found actually seems to imply that there was a SHA-1 collision pair found, but i can't find any other reference to it online. Perhaps Ted T'so was confused by the collision found in SHA-0?

additional reading