Security and lax syntax

Imagine you’re sitting in the pub trying to remember a film title and you just can’t, for the life of you, summon up more than the fact that it begins with an ‘N’ (or maybe a ‘D’) and it definitely has something to do with cars… and then your friend says “Casablanca” and you say “YES!  That’s it!”  Sound familiar?  Yeah, me too.  Despite being totally unsure of any details a moment before, you know beyond any doubt that you have it, the instant somebody suggests the correct answer.  It’s the difference between recall and recognition that divides frustrated vagueness from confident certainty.  But can it be useful?

The answer is yes.  There is a way to store information so that a computer can recognise the same information if it sees it again without actually knowing the information – which means that what you stored is totally secure.  It’s called one-way encryption and it’s very simple.  You put in some information (a file, a name, a password etc) and the system creates a ‘hash’.  You always get the same hash for the same input but there’s no way to input a hash and figure out the original input.  For the 32-character format we use, there are 340,000,000,000,000,000,000,000,000,000,000,000,000 different possible hashes so it’s very unlikely (1 in 3 x 1038) that two different inputs will give you the same one.  So if you give me your password when you sign up, let’s say waterberry, I would generate a hash4fa29303bf3155789ae1e8969bb029aeNow I store that hash instead of your password.  When you log in, I take a hash of whatever you put in and compare it to the one I have in the database – if it matches, there’s a very good chance you put in the right password.  However, if the database is stolen, it’s full of meaningless hashes and your password is still safe.  So, rather like me after a few beers, the server cannot for the life of it recall what your password is but it can recognise if you put the right one in or not.

Of course it isn’t foolproof.  (Scroll down to the Identity Crisis story here) If you run every word in a dictionary through an encryption engine you can look for matches and hopefully crack somebody’s account (this is precisely why websites will often tell you not to use dictionary words in your password).  You can also defend against dictionary attacks by adding something before you run the encryption (i.e. you take the hash of 1234waterberry instead of just waterberry) – which is called ‘salting’ the hash – but of course if somebody finds out what your ‘salt’ is then they can use it in their dictionary attack and you’re back to square one.

Some people use these hashes to make URLs unique.  You’ve probably seen these hashes in URLs all over the web.  So they should be unique right?  I mean with 3 x 1038 of the things going there should be hardly any collisions at all.

Yeah, well try Googling da39a3ee5e6b4b0d3255bfef95601890afd80709

Forty thousand results isn’t bad for a seemingly random bunch of letters, eh?  Well, it’s the SHA hash of an empty word.  So every time somebody screws up on a website and creates a link or a security key with blank input, they get this.  And it happens a lot.  Try d41d8cd98f00b204e9800998ecf8427e – it’s the equivalent for the more popular MD5 encryption method.  Weird, eh?

But that’s not the most popular.  Why are there 272,000 results for 5f4dcc3b5aa765d61d8327deb882cf99 ?  Because it’s the MD5 of the word ‘password’.  Let’s think about that for a moment.  A well known quirk of PHP is that if you miss the $ designation off a variable name, it will interpret the variable name as the word.  So while

echo $password;

may give the output waterberry,

echo password;

will give you the output password. You can see where this is going.  Best part of all?  Every user gets the same hash.

I think it’s possible somebody shouldn’t have had that last pint before coding up their website!