Thursday, January 8, 2015

Real World Crypto 2015: Password Hashing according to Facebook

So I am going to start in the middle of the talk by describing the part that most people will find most interesting before looping back around to discuss the presentation in order. Given below and received with a flurry of excitement during the presentation itself (lots of camera phones appeared for this slide) is the way Facebook hash their passwords:

1)$\$$cur  = 'plaintext'
2)$\$$cur  = md5($\$$cur)
3)$\$$salt = randbytes(20)
4)$\$$cur  = hmac_sha1($\$$cur, $\$$salt)
5)$\$$cur  = cryptoservice::hmac($\$$cur)
6)        [= hmac_sha256($\$$cur, $\$$secret)]
7)$\$$cur  = scrypt($\$$cur, $\$$salt)
8)$\$$cur  = hmac_sha256($\$$cur, $\$$salt)

Ok, so why do it like this? Well while Facebook have the usual security considerations that we all have, they also have one that probably only they can claim - having to efficiently deal with over a billion users! I will now try and explain why each of the lines are in source.

1) This is just taking in the plaintext (the password) and is clearly required

2) md5 hash - this is a pretty standard thing to do, or at least was about 10 years ago. So why is it still here? The standard way to change this would be to keep two tables side by side one with the md5 hashes for the user and one for whatever the new solution is and then when the user logs in for the first time since the change you check the md5 hash and then store the new one for future uses. When all users have done this you can delete the md5 hashes and you are done. With a small number of users this seems feasible but with a billion users that is a lot of data to store and could take a (extremely) long time to get to the point everyone has transfered to the new system. Hence this is why this line is still here and then the remaining lines make the system more secure. This solution makes more sense at this scale because the whole table can be updated without having to have the user log in first and it can be done within the single table.

Interestingly in the talk afterwards "Life of a password" we learn that LinkedIn do something different here (almost the exact opposite in fact). LinkedIn do change what is being used instead of adding layers but they do this by having some of the layers being Encryption and thus (unlike hashes) is invertible, so the table can be updated by inverting the encryption step and adding the new layers without the user having to log in. One advantage of this is that it gave them the ability to timestamp the password database. If the key for the encryption is changed (say) every day then if there is a database leak LinkedIn can tell which day the leak occured making working out what the cause was easier.

To me the interesting question is can md5 collisions be used to log into other user's Facebook accounts? Now collisions are known in md5 so if I can get a user to set their password to one of these I should be able to log in with the other element in the collision. However if I can make a user set their password to something, I may as well just log in with what I made them set their password to! The interesting question then becomes while we have broken 2nd preimage resistance of md5, if we can break preimage resistance then there may be more trouble...

3-4) This is the standard step of salt and hash. The interesting point here is that 160 bits of salt are used, which seems like a lot. However it is explained that for all the Facebook users, from the beginning of time (or Feb 2004 to be precise) to now, to have a unique salt the salt would need to be about 32 bits long. However since salts are assigned randomly (as they should be) you need to consider the birthday bound on the probability of collisions, so you need 64 bits of salt. The other 100 bits (while seems a bit on the large side) allows for future proofing for things like new users and multiple password changes (people tend to forget their passwords...)

5-8) As you all probably know; hash (by design) is fast, so the goal here is to slow down the brute force time of a user's password. the interesting part is on lines 5-6 which calls this cryptoservice. What this is doing is sending it over to Facebook who hash in a secret, this has two advantages; firstly it means that passwords can not be brute forced in offline attacks and secondly it allows Facebook to monitor password hashing attempts and to block any suspicious looking activity. The scrypt on line 7 is used to slow down the local computation while the hmac_sha256 on line 8 is used to shrink the size of the output, so that the password database is manageable (after all even if each entry in the table is tiny, with a billion users it will still be a very large table. For example if each entry has to increase by a single bit the whole table will increase by a Gb in size!).

Various points from the rest of the talk:
Authentication for standard websites tends to be "something you know" (your password), while if you are security concious you can turn on two factor authentication to add "something you own" (tends to be your phone) but Facebook have started including other factors as well when you log in. One thing they now consider is where you are; if I always log on from Bristol but five minutes later I log on from Hawaii then there is probably something wrong and further authentication checks should be made. Of course now that Tor is becoming more widespread this could just be Tor doing its thing and I imagine a conversation between Tor and Facebook will be on the cards. The other check they are doing (which again can be seen as a something you own) is a "have they logged on from this browser before?" if they have the it is (more) likely to be the person who logged in last time but if it is a new device then further authentication should take place since it is less likely to be the intended user.

We have all had issues with a touch screen phone before and have especially had issues with the caps lock on the device (auto-capitalisation has been the bane of my existence when travelling with my phone). Facebook have considered this and they will not only check your password (as typed) but they will also try the password with the case of the first letter changed (because phones like to auto-capitalise the first letter) and the password with the case of letters switched (to counter the caps lock being toggled issue). For example if my password was passWORD123, they would check this as well as checking PassWORD123 and PASSword123. In a follow up discussion I learnt that (combined) these two issues tend to appear on 3% of smart phones and so it is worth doing. I asked if this was hinting at the direction that Facebook will start checking for "common mistypes" when you type your password (to be this would be a very bad idea, as would reduce the password entropy significantly) but was assured this will not be done.

The final thing I want to mention (which came as a surprise to me) is about password dumps. Now we hear on the news several times a year about a website being hacked and the usernames and password hashes being published online but realistically small dumps happen multiple times a month. What Facebook do is they keep an eye on these dumps for you and if they spot your username and password for Facebook amongst the dump they automatically notify you upon next log in that this is the case and ask you to change your password. I feel this is a particularly nice feature and they have the ability to manually notify users if their username appeared in a big data leak for a different site (even when your password isn't the same as your Facebook one) but this is more of a discretional thing than the automation for if it is the same as your Facebook password.


To conclude:
I went into this talk not knowing what to expect (it was still TBA on the schedule) I thoroughly enjoyed this talk, learnt a lot and would recommend listening to it if you are ever lucky enough to be given the opportunity.

No comments:

Post a Comment