Storing Passwords the Right Way

I consider this post a bit of an experiment in writing about what I consider “beginner” material. Not that it is necessarily simple or easy stuff anyone should know, but simply because this is not a “new discovery” as far as I am concerned. Also, I usually try not to write about security related material, as I do not consider myself a security expert. However, since I’m starting to teach a “PHP 101″ course soon (maybe I’ll post more about it in the next few weeks), and since I was asked a few times about this topic recently, I’ve decided to write up my experience on this topic and test the reactions.

So, the topic in question is “what is the right way to store user passwords in my DB”. To be clear, I am talking specifically about the passwords users will use to log in to your application, not some 3rd party password you need to store for whatever reason. This is something almost any application out there requires – unless you interface with some external authentication mechanism (OAuth, openId, your office LDAP or Kerberos server), there’s a very high chance you’ll need to authenticate users against a self-stored user name and password.

In order to figure out what is the best solution, let’s start by going over the problems we might face if we simply take the naive approach and store passwords in their original, clear text form:

  • If our database gets hacked (for example if we are exposed to an SQL injection attack through some 3rd party app we have installed on our server), passwords could get stolen. Clear-text passwords could easily be used to hijack our users’ accounts. In many cases, even if we ignore the risk of a hacked database – a clear-text password can be stolen by a disgruntled worker with access to the DB.
  • Moreover, users tend to use the same passwords for all sorts of different services. If I know someone’s password to one service, chances are I can use the same password or a similar one to impersonate that user in other sites as well. Most users do not consider the fact (but you should!) that when they type in a password for your silly site with funny kitten pictures, there’s a good chance they are entrusting the password to their bank or PayPal account in your hands.

Given the points above, it is our responsibility as web developers to ensure that nobody, not even us, could get a clear-text version of the user’s password. So what’s the right way to do that?

Best solution: avoid the problem all together

The best way is, of course, not to store passwords at all. If functional and business terms allow, you should consider using some other authentication mechanism like OAuth or OpenID – basically let someone else worry about storing sensitive data. Many users have Google or Facebook profiles, and you can find several examples of pretty big websites that simply let users authenticate through these providers instead of through their own identity management service. This is an elegant solution, but clearly it does not work in all cases. If you still need to store passwords, read on.

Password Hashing

One aspect of passwords is that really, you have no use for their actual value. A password can be anything, and as long as the user repeats the same initial password when asked to sign in, you do not care what the password’s textual value is. This is important, because it means we don’t need to store the original value of the password: it is safe to store a product of the password, and as long as we know how to reproduce this product from a typed-in password, we can compare the products and not the password itself.

Sounds confusing? Here is a simple example: assume that instead of passwords, our site uses numbers to authenticate users. Each user has to provide a user name and a random number as a password. Before we store this “password” in our DB, we pass it through this simple mathematical function:

    f(x) = x * 5

So if a user types in “4709″ as a password, we multiply that number by 5 and store the value “23545″ in our DB. When the user attempts to sign in again, we pass the value typed in as password through the same function. If we get “23545″ as the product of the typed in password passed through our function, we know the user typed in the right password.

This is good, because now the value stored in our DB is not the actual password, but an obscure value (well, sort of). If someone steals our DB, they can type in “23545″ into the sign-in form all day – but that’s not the right password (remember, we multiply whatever is typed in by 5 before comparing it!).

Unfortunately, things are not that easy. First, it’s enough for someone to know that we multiply numbers by 5 to obfuscate them, and they can easily reverse engineer passwords by dividing the data stored in the DB by 5 – our function is reversible. Second, a smart enough hacker looking at our list of stored password values would probably notice that all of them are multiples of 5 – so even without knowing what we do in advance, reverse engineering our “security” method is quite easy.

As it turns out, the approach is right but the function we are using is too simple. What we need is a function that:

  • Is irreversible, or at least is impossible to reverse in the real world. In other words, even if one knows both the result and the function used, computing the input value will be impossible or impractical.
  • Is indeed a function in the mathematical sense – that is given the same input value, only a single outcome is possible, and the same outcome is always produced for the same input. For example, a computer function which multiplies the input value by a random number is no good for us.
  • Produces a distinct, single value for each input value – or at least provides a very low risk of producing the same result for two input values. This is important because we want to make sure that only one typed-in password matches the obscured value stored in the DB.

Of course, most of us are not mathematicians – so you’d be happy to know that  such functions exist, and are usually referred to as “hash functions”. Hash functions use used extensively by programmers for all sorts of uses, and cryptography is most definitely one of them. A few popular and useful cryptographic-grade hash functions are MD5, SHA-1 and SHA-2. We will not go into the mathematical definitions of these functions (I have very little knowledge of how these functions actually work!) – it’s enough to say pretty much every popular programming language  out there has at least one implementation of these functions.

As an example, the MD5 function produces a 128-bit “hash value” for an input value provided to it. The hash is always 128 bit long, regardless of the input size. It will always produce the same hash value for the same input. The most successful known collision attack (that is an attack producing the same hash value for a modified input value) on MD5 took about 2 million execution attempts which makes it quite bad for validating SSL certificates (which it used to be used for), but still sort of Ok (although not great) for password hashing.

To compute an MD5 hash value in PHP, you can do the following:

    php > echo md5("my name is Inigo Montoya");
    d9937edae7d26a399d41dda16f137e42

As you can see the MD5 value of the string “my name is Inigo Montoya” is “d9937edae7d26a399d41dda16f137e42″ (this is in fact a hexadecimal representation of the MD5 value, which is a 128 bit number – this is the standard way to present hash values of various functions). On the other hand:

    php > echo md5("my name is Indigo Montoya");
    ae7cd5e68c73f9f44df66030cc9d1c06

Even a slight change in the input text produces a completely different MD5 hash.

It’s time to stop using MD5 for cryptographic purposes

While MD5 was the de-facto standard for storing hashed passwords for some time, it is now becoming clear that it may not be suitable for cryptographic purposes (it is definitely suitable for other things). In 2009 it was shown that producing collisions for MD5 can be done within seconds or minutes on commodity hardware – this does not mean it is easy to reverse engineer password values stored in your DB as MD5 hash values, but it does mean that if a highly skilled hacker wants to specifically target your site, they have a better chance of succeeding in doing so. In addition, it is safe to assume additional vulnerabilities will be detected in the future.

Unless you are somehow limited (not if you’re a PHP developer!), switching to stronger hash functions such as SHA-1 or SHA-256 is highly recommended.

Throughout the rest of this article I will use SHA-1 in examples. SHA-1 is not collisions free, but so far the best known theoretical collision attack on SHA-1 took 2 to the power of 51 attempts to perform (that’s a number with 16 digits!), and until now nobody has been able to show an actual successful attempt to do so. SHA-1 produces a 160-bit digest values, and can be computed in PHP like so:

    php > echo sha1("my name is Inigo Montoya");
    b208946a9c3c4b26a4d6bb87c3f630f996146ee

So, I should just store the password hash in the DB?

Well, yes and no.

Yes – because that’s the first step. By storing a SHA-1 hashed version of the password in your DB you ensure nobody can compute the password by simply stealing your users DB table data. When a user types in their password, you compare the stored hash to the SHA-1 hash of the typed in string, and if they match, you grant access. Simple and effective.

But wait… that’s still not good enough.

Stupid but Effective: Dictionary Attacks

All hash functions are vulnerable to a type of attack sometimes referred to as rainbow attacks or dictionary attacks.

These attacks take advantage of the fact that in most cases, humans are humans – and the passwords they use are of limited size (how many people can you think of that use 12 or even 10 character long passwords?) and are composed of a limited set of characters (remember that even power users that use punctuation, numbers and mixed-case characters in their passwords are still confined to the ~75 characters or so on their keyboards).

Dictionary attacks are stupid but effective: the idea is to create a dictionary (basically a big key -> value table) of predictable passwords (dictionary words, expected combinations of key strokes, all permutations of what’s on your keyboard up to 8 characters long) and their MD5 or SHA-1 values. Once such a table exists (creating it may take several hours on commodity hardware, but this is a one-time effort), you can search for an original password using it’s hash value.

A dictionary attack allows me to reverse-engineer the original password from it’s hash value not by smart computation (which, given a good hash function, is impossible or impractical), but through a simple query to a ready-made “dictionary” mapping hash values to original strings.

But it’s even easier than that: nowdays there are services that offer such dictionary lookup in their existing databases. It’s not even required to do the work of building the dictionary.

One good solution to dictionary attacks is forcing your users to mix punctuation, upper and lower-case characters and numbers in their at-least 12 character long passwords. However, we all know that in many cases this means expecting too much from your users.

The practical solution to dictionary attacks is quite simple, and is called salting.

Just Add Salt

Salting is a simple yet effective method to improve the security of stored passwords and prevent dictionary attacks. The idea is that instead of expecting a long, random password from the user, you take whatever password the user provides and add additional random noise (referred to as “salt”) to it yourself. You store that random noise next to the password, and use it to compute the hash when checking passwords.

Once a long enough and random enough salt is added, comparing hash values stored in your DB to a dictionary becomes very hard: an attacker will need to build an entire database of hash values for each different salt + password combination, effectively requiring the creation of a table with hundreds of billions of records to crack a single password.

Make sure a different random salt is added to each password: otherwise a single DB of salted hash values can be created for your application – it won’t be useful for other apps, but if someone wants to target your app they can definitely achieve their goals.

As an example, let’s assume a user who’s password is ‘inigo2001′. Here is how this user’s password will be stored in the DB without salting:

 +-------------------+------------------------------------------+
 | user              | password_hash                            |
 +-------------------+------------------------------------------+
 | inigo@montoya.com | e40900c950cc6011297b2b392b42c29688b33ac7 |
 +-------------------+------------------------------------------+

An attacker with a good dictionary can figure out that the password_hash value is in fact the SHA-1 digest of “inigo2001″. However, if we add salt:

 +-------------------+------------------------------------------+------------------------------+
 | user              | password_hash                            | password_salt                |
 +-------------------+------------------------------------------+------------------------------+
 | inigo@montoya.com | fe66f3eb9c0afc8c935dc9f3f26dbea68d48ccc1 | 9ljYI+xMaVOSloDwt9ahzTpqMHA= |
 +-------------------+------------------------------------------+------------------------------+

Guessing that password_hash is the SHA-1 digest of “indigo20019ljYI+xMaVOSloDwt9ahzTpqMHA=” is quite hard – one would need to build a huge dictionary just to figure out this one password, assuming they also have insight into our code and have figured out that we have concatenated the password_salt value after the original password value and passed that through SHA-1.

Note that the password_salt value in this case is a base-64 encoded string of 20 random bytes – using a random enough and long enough salt is important, otherwise there’s a good chance your password + salt value happens to already exist in the attacker’s DB.

Example Time

To summarize things, here is an actual example of a few PHP functions that store user information in the DB in a secure manner and verify passwords against that stored information.

Our users table in the database is assumed to look something like:

mysql> DESCRIBE users;
+---------------+---------------------+------+-----+---------+----------------+
| Field         | Type                | Null | Key | Default | Extra          |
+---------------+---------------------+------+-----+---------+----------------+
| id            | int(10) unsigned    | NO   | PRI | NULL    | auto_increment |
| email         | varchar(50)         | NO   | UNI | NULL    |                |
| password      | char(40)            | NO   |     | NULL    |                |
| password_salt | binary(16)          | YES  |     | NULL    |                |
+---------------+---------------------+------+-----+---------+----------------+

The password field is a 40 byte long CHAR (SHA-1 hashes in hexadecimal representation are always 40 byte long). The password_salt field is a 16 byte BINARY field – it will contain some random bytes with no particular encoding so it shouldn’t be a CHAR or VARCHAR field.

As new users register, the following functions are used to set the user’s password in the DB:

class User
{
  /**
   * This will contain a hashed version of the user's password
   *
   * @var string
   */
  protected $password = null;

  /**
   * This will contain the salt value used to add noise to the password hash
   *
   * @var string
   */
  protected $password_salt = null;

  /**
   * Set the user's password
   *
   * @param string $password
   */
  public function setPassword($password)
  {
    // Test that password is at least 6 characters mixing letters and digits
    if (! preg_match('/^.*(?=.{6,})(?=.*[a-z])(?=.*[A-Z])(?=.*\d).*$/')) {
        throw new \ErrorException("Password is not strong enough");
    }

    $this->password_salt = $this->generateRandomSalt();
    $this->password = sha1($password . $this->password_salt);
  }

  /**
   * Generate a random salt value, 16 bytes long
   *
   * This relies on OpenSSL being available. If it is not available, any
   * cryptographic-grade random string generation function would work. On
   * UNIX machines, you can just read 16 bytes from /dev/urandom.
   *
   * @return string
   */
  protected function generateRandomSalt()
  {
    return openssl_random_pseudo_bytes(16);
  }
}

To check a given password, we add the following function to the same class (assume that the protected values are populated from values fetched from the DB):

  /**
   * Check if password is correct
   *
   * @param  string $password
   * @return boolean
   */
  public function checkPassword($password)
  {
    $hashed = sha1($password . $this->password_salt);
    return ($hashed === $this->password);
  }

As you can see, this class (assuming a working database access layer) will do the work of properly salting and hashing passwords before saving them, and of comparing given clear-text passwords to a salted, hashed value stored in the DB. This practically ensures stealing passwords from you is near impossible.

What’s next?

I hope that this article pointed out some good practices in storing passwords in the DB. It is important to remember that while your site may not be very interesting to hack into, hacking into your users’ accounts could be a first step towards identity theft or the hijacking of accounts on another site holding much more sensitive data. Implementing the measures described here would mean you are at least treating your users’ password with the right care.

There are additional aspects to password security which you should look into: using proper security on the transport channel when asking for passwords (HTTPS with a valid certificate), requiring strong enough passwords from your users, avoiding session fixation attacks (session_regenerate_id at login) and more. I also did not touch procedures of replacing lost passwords, which are also a common weak point vulnerable for phishing attacks. There is quite a lot of material to read out there on these topics, and if I see an interest is raised I might cover some of them myself in the future.

13 thoughts on “Storing Passwords the Right Way

  1. Great article for the right audience, Shahar.

    A few comments/questions:

    1. What’s the benefit of storing a salt-per-user in the same DB vs. having one salt string set in the code that computes the hashed password with it?

    2. Cracking the site might gain the cracker an option to see the salt we’re using, since its usually in the code itself. How would you tackle that? (is there something that cane be done for this scenario?)

    3. I’d be happy to hear the “what’s next” section – how do we further secure the site beyond strong hash + good salt combination. Aside from taking out authentication to an external source, as mentioned in the article itself, is there something else we could do?

    thanks again! :)

    • Hi Boaz,

      Thanks! I’ll try to answer your questions (I hope I have the right answers):

      1. If you use a single salt someone gaining access to your code could build a single dictionary table for your site using that salt within a few hours, and reverse engineer all or most of your stores passwords. Using a single random salt for each password means the same effort will be required for each single password. Also see the next answer.

      2. The idea is that the salt value itself is meaningless – it is just a mean of adding random noise to the resulting hash – which dramatically reduces the chances of any of your hashes matching any pre-computed dictionary hash values. With a long enough salt (it just needs to be longer than what an average+ password would be) producing a table for dictionary attacks on hashes is very difficult – even if you gain access to the original salt value.

      BTW while the salt value is not important we do tend to store sensitive data in or next to our code – DB credentials, private encryption keys etc. It is important to ensure such data is kept in configuration files and out of your code repository (much more people usually have access to that…), and are stored in files on the server with the right permissions (readable to no one but the web server user) and with no chance of being accessible by Web users – preferably outside of the document root.

      3. I will try to write more about security topics – usually there is a checklist of things to verify when it comes to password and authentication security. One interesting bit I see a lot of people forgetting is to call session_regenerate_id() after logging users in to mitigate session fixation attacks – such a simple solution to a relatively common attack. I’m mentioning this as an example to one of the checklist items I was talking about.

  2. Great article.

    I usually do one or two more step to make things really mix up.

    1) Add in an application salt also. Its just an extra layer so even with the database a dictionary attack gets a bit more difficult without the app codebase to look at.

    2) Use a fancy combination algorithm. I use a little more complicated algorithm to apply the salt to the password, again this is effective if the attacker does not see the code, but its a nice little extra step to make the attackers dictionary script run slower.

  3. Some of this advice is good, but I would recommend against using SHA-256. You are correct it is better than MD5 but it is still a general purpose hashing function. It is designed to produce a digest in the shortest amount of time possible. This makes it ideal for generating hashes for large amounts of data, but means it sucks for securing passwords.

    Use bcrypt. Bcrypt is perfect for creating password as it is slow. How slow? Hashing a 4 character password with bcrypt takes about 0.3 seconds, with MD5 it takes less than a microsecond. So instead of cracking a password every 40 seconds (MD5), it’ll take about 12 years (bcrypt).

    • Hey Aaron,

      This is very interesting – I did not know about bcrypt. I will definitely read more about it. BTW as the bcrypt page explains obviously salting does not eliminate the risk from dictionary attacks completely, but a strong enough hash function with a good enough salt can make it very hard to reverse-engineer hashed passwords – so unless you are storing very valuable data or are a high value target, it is probably good enough.

      • False – an Amazon GPU instance that costs $2 per hour can compute SHA1 of all words between 1 and 6 characters in 49 minutes:
        http://stacksmashing.net/2010/11/15/cracking-in-the-cloud-amazons-new-ec2-gpu-instances/

        SHA1+salt is totally obsolete for passwords. I strongly suggest against it. You must use a hashing algorithm that is designed to be really slow, not SHA1 that was designed to be as fast as possible.

        Good alternatives: PBKDF2, BCrypt, SCrypt.

        • I have only now learned about bcrypt and others and I find it very interesting, and I should say I stand corrected about this issue.

          However, the article you linked to does not mention salting at all.

          From what I understand cracking a single hashed password (again, salt should be long enough) would still take a considerable amount of time.

          That said, I now understand the advantages of slower hashing algorithms and once I learn more about using them, I might update this article.

          Thanks!

  4. Hi, very interesting article. I will follow your suggestions in my next projects.
    I think it’s a good idea to store a hashed user together with its corresponding hashed password. In that way a hijack user will have a double task to stole not only our password but our access.

  5. Hi, very interesting article. I will follow your suggestions in my next projects.
    I think it’s a good idea to store a hashed user together with its corresponding hashed password. In that way a hijack user will have a double task to steal not only our password but our access.

    • Interestingly, when learning to use PHPAss, I searched and looked for the salt – where do I pass it to the class?
      Apparently, and for the sake of anyone looking here, PHPAss stores the salt *with* the resulted hash, thus returning a slightly stuffed hash which is composed of the salt and the hash that resulted from bcrypt hashing the salt+passwd. Kind of like: “”.
      This results in simpler life: no need to handle salt at all – PHPAss does it for you when hashing password and when checking password for matching. Yes, such a design require a “check password” method since PHPAss encapsulates the information on how the salt is glued to the hash etc.

  6. Pingback: Password hashing revisited | Pseudo Random Bytes