Let’s talk about your password model

This entire article is obviated by the password_hash family of functions.

Please check out password_hash() and friends for information on the up-to-date and correct way to handle passwords. Generally speaking, if you are using another method, it is wrong. More specifically, if you are using another method, and it is not based on crypt(), or even if it is, it cannot automatically handle recreating hashes if you do something such as change algorithms or change cost factors, you’re doing it wrong. People who understand the math behind cryptography are, of course, exempt. The remainder of this article is preserved for historical reason, but please, switch to password_hash() and friends.

Historical Content

First off, let me just say that I am by no means an expert cryptographer; there are all sorts of wonderful, terrible things about hashes and block ciphers that I just don’t understand (I’d like to believe that it’s because I’ve not been exposed to them, whoever’s fault that is, and that if given a chance I would get it), but that’s also why I’m writing this – to give the opinion of someone who recognizes his own weakness, and how that translates to another’s strength. Furthermore, this explanation gives a very simplistic view of web security that only examines one aspect of a secure system. For loads more information about securing your web application, take a look at “Dos and Don’ts of Client Authentication on the Web” [PDF] written by some very smart folks at M.I.T.

So, let’s start with a beginner’s introduction. In the beginning, there were users, and users wanted to be able to log in because otherwise being a user was rather pointless indeed. Thus, the password is born, and forevermore it becomes the goal of clever crackers and security experts alike. The first problem someone encounters with passwords is how to store them, and that depends very much on a few key factors: Audience, Exposure, and Uniqueness. If you are running a “homegrown” application (shout out to MecTracker) for use only inside the company, containing (in general) zero sensitive data, and you intend to pick user’s passwords for them (preventing the loss of a life password, itself a bad-yet-unavoidable practice), then why not just store them in plain text? Certainly makes it easy to retrieve a password for someone without having to reset it (useful for someone away from their work machine with saved password who needs to log in).

Conversely, if you’re a bank, and you’re storing any of this in plain text, you will be razed to the ground by angry tech-savvy customers and auditors alike, hopefully BEFORE you get grandma and grandpa Jones to type in the password they use for everything else, too. Hopefully, if you’re a bank, you’re using some crazy method I’m not about to describe here.

Then, there’s the middle ground. I, for example, am not a bank (who would’ve guessed? Can someone please notify my ex-girlfriend?), so my needs are much more middle-of-the-road, which is why I’ve settled for hashing. When I started using PHP, I generally stuck to simple MD5 hashes; it was 10 years ago, and breaking MD5 seemed reasonably difficult. Then I was told not to use MD5 because, at 128 bits, it was too weak, and I should be using SHA-1, which was 160 bits. Then came the recommendation for SHA-256 (guess how many bits that one is!), and then whirlpool, and so on. If you’re using a proper password strategy then you’ve been salting all along (I’ll admit I wasn’t in the old days, but you’ve got to be a beginner sometime), but if you haven’t, allow me to give you a word on salt.

“Salting” a password hash is the practice of taking a piece of input data, adding in an extra piece of information (called “salt”; see where this is going?), and hashing that, instead of just hashing the raw input. In fact, with sites that act like a search engine for MD5 and SHA-1 hashes, not salting your input is, for general purpose storage, only one-degree of separation away from just storing the data in plain text. Furthermore, good salt will be ever-changing (in this practice, the salt is also known as a ‘nonce’), and can safely be stored without obfuscation, as having included it means that a table not accounting for the nonce is useless, and a table that accounts for the nonce is only good against one of the passwords in your database. Now you’ve just made an attack much more expensive, but that may not be as useful in reality as we’d like to believe.

MD5 and SHA-1 hashes can be calculated very, very quickly. In fact, it’s generally more expensive to include some data about the current time (for use in salting/as a nonce) than it is to calculate the actual hash. Here is some experimental code to prove my point:

define('ITERATIONS',5);
 
$tt = $th = 0;
for ($j = 0; $j < ITERATIONS; ++$j) {
	$start = microtime(true);
	for ($i = 0; microtime(true) - $start < 1; ++$i) {
		$k = md5($i);
	}
	$tt += (microtime(true) - $start);
	$th += $i;
}
 
var_dump($tt / ITERATIONS, $th / ITERATIONS);

Simply hashing the value of the counter averaged 320,000 hashes per second on my work machine, which is not very powerful, and is certainly not running this in a very optimized way. By changing what is being hashed to the current time to the microsecond, the number of hashes per second is reduced to an average of about 150,000 – in short, the hash is NOT the expensive part of what’s going on here. So, let’s say that, given a more optimized environment but a more expensive dictionary list to be hashed, that the average is 200,000 hashes per second, and the dictionary is about 50,000,000 common passwords. Simple math tells you that generating a hash list for this will take about 250 seconds, or less than 5 minutes. If it takes under 5 minutes to generate a table, and only a few seconds from there to query it, then even a database of 150,000 users can be fully cracked in just under a fortnight.

So how can this be combated? Well, strong password guidelines are a good start, but if you’re relying on users to implement password security for you, you’re probably doing it very, very wrong. I’d like to challenge one of the assumptions you’ve probably made that I’ve had to challenge recently, and that is the value of speed; speed is bad. Think about it: using a hash method that can generate a table of fifty million values in under 5 minutes sounds great from a performance perspective, but who are you really helping? Is your user going to notice that your hash method took under 1ms to calculate, or is this performance more likely to benefit someone trying to crack your passwords? Who would be more hurt if your passwords took closer to 12ms to generate and verify, your users or your would-be attacker?

If you haven’t heard of it yet, may I introduce you to Blowfish Encryption. Blowfish is designed to scale with Moore’s Law by allowing you, the programmer, to decide how long it takes to generate a hash. This is done by allowing you to specify a number which will be interpreted as a log-base-2 of how many iterations the hashing sequence should take; this metadata is then stored as part of the salt, prepended to the hash, and can be verified by the same function that created it since hashes are of fixed length and will be truncated or padded accordingly. By using a log-base-2 scale, every increment of that number (n) literally doubles the time required to calculate the hash, as it will have to undertake 2n iterations to generate the password. From what I can gather, a number like 7 or 8 is a fair industry standard at this time, and on my work machine limits the hashes-per-second to around 86.6 and 43.3, respectively.

Now, performance is a factor in real world applications, so let’s pick a number like 27, which as I said allows about 87 hashes per second. At that rate, a single dictionary table (useful for only one user, since we are salting these passwords) takes about six and a half days to generate. For that same database of 150,000 users, it would take over 2,733 years to crack. Of course, computational power will get less expensive as time goes on, and the same number of operations can and will get faster, but with the blowfish algorithm you need only increment the log to double the computational cost, keeping the cracking of your database safely outside the realm of technical feasibility.

So how does one use the blowfish algorithm in PHP? The crypt() function is your friend! However, the manual is not entirely clear on the implementation details of blowfish, as it does not include one key part (which caused me to tear my hear out a little bit, since, as a Windows user, I was unable to check the man pages for crypt(3)) in any great detail, and that is the log base. When you generate the salt, you will need to prepend it with an instruction string that tells it what kind of hash to generate, and what parameters to use. Furthermore, the salt is not sixteen characters, but sixteen BYTES, and the characters in your hash will be read as a BASE64 encoded string, which means that using characters not allowed in a base64 string will cause the function to revert back to whatever the default is on your system, probably STD_DES or MD5.

All of that information might have seemed a bit hazy, so I’ll include the timing example I used before modified to suit crypt/blowfish. Note also that I am storing the microtime result on every iteration of the for-loop, as in order to give you worst-case scenarios on the cracker’s timetable, I had to give best-case timings on the hashing, and that means as few calls to microtime as possible.

define('ITERATIONS',5);
 
$tt = $th = 0;
for ($j = 0; $j < ITERATIONS; ++$j) {
	$start = microtime(true);
	for ($i = 0; ($z = microtime(true)) - $start < 1; ++$i) {
		$k = crypt($i, '$2a$07$' . (string)$z);
	}
	$tt += ($z - $start);
	$th += $i;
}
 
var_dump($tt / ITERATIONS, $th / ITERATIONS);

Of paramount importance is the literal string prepended to the stored value. The first four characters, $2a$, simply instruct crypt to use the blowfish algorithm. The next three, 07$, pass the number 7 as our log-base-2 argument, meaning the computation will run for 27 iterations. After that, we concatenate our salt (values shorter than 22 characters will be padded in a predictable fashion, and longer than 22 will be truncated) to the argument string and send it off on its merry, 12ms way.

Do I think I’ve defeated all the clever crackers out there? Certainly not. However, I’m definitely in a better boat for having stood on the shoulders of giants and listened to people smarter than I am about security. In fact, don’t listen to me, check out these links for more info:

(Victor) Xi Wang talks about salt, nonces and rainbow tables

Matasano Security, LLC, talks about blowfish and why you shouldn’t design your own password protection scheme.

Linked earlier, explains blowfish encryption – very math/pseudocode heavy.

Also linked earlier, the PHP Manual Entry for Crypt()

Happy Hashing!

February 9th, 2010 by Dereleased | 6 Comments »

Pour Some Syntactic Sugar On Me: ‘Unless’ Keyword

Let’s face it, syntactic sugar can be a very attractive feature for a language (I consider Perl to be an extremely powerful language composed almost entirely of syntactic sugar), and I think it’s about time we all started demanding the “Unless” Keyword as a counterpart to the “If” Keyword. Let me give you a pretty common example:

if (    !ctype_digit($_POST['quantity']) 
    ||  !preg_match('/one of the billions of email address validators/',$_POST['email']) 
    ||  $_POST['password1'] != $_POST['password2'] 
    ||  !my_validation_routine($_POST['Im_Running_Out_Of_Examples'])
) {
    die('Bad Data');
}

Quick-And-Dirty form validation. Of course, one would normally want to check these separately so that meaningful error messages could be dumped, but let’s assume for a moment that this isn’t the case for this project. I’m not saying this is a life or death matter. I’m not saying the above code doesn’t work or anything like that. What I am saying is that, from a readability/logical perspective, it would make sense to have an unless keyword to transform the above sequence from a bunch of or-not ideas into and-must-be ideas. Example!

unless (    ctype_digit($_POST['quantity']) 
        &&  preg_match('/one of the billions of email address validators/',$_POST['email']) 
        &&  $_POST['password1'] == $_POST['password2'] 
        &&  my_validation_routine($_POST['Im_Running_Out_Of_Examples'])
) {
    die('Bad Data');
}

I realize some people don’t or won’t care about this, and that’s fine. It’s not for them. I like to think in code, and it would make code-thought to normal-though a bit more one-to-one if I could think in terms of “Unless a is valid and b is valid and c is valid return false” instead of “If a is not valid or b is not valid or c is not valid return false”. C++ Programmers are lucky in this regard, as they are 1 macro away from having this keyword:

#define unless(cond) if(!(cond))

In conclusion, I know it doesn’t really change any thing, I know some people don’t care, but I want it. I guess it’s time to start maintaining my own PHP distro…

January 27th, 2010 by Dereleased | Comments Off on Pour Some Syntactic Sugar On Me: ‘Unless’ Keyword

Arrays of Objects and __get: Friends Forever

In PHP, an object is always passed around as a reference, which allows one to deal with objects in a very transparent manner, since the only way to deal with a by-value copy instead of the real deal is to explicitly use the clone operator. Recently, I came upon a situation in which it was very useful for me to have an array of objects inside an object; the scenario was somewhat simple, a parent object can contain an indefinite number of children, and in order to have easy access to them I created a lazy loading property to contain them all as an array, indexed by their unique IDs. Of course, setting the stage for that is a bit more complicated than is needed for this example, so here is an extremely minimal example:

class foo {
	private $bar = array();
 
	public function __construct() {
		$this->bar[0] = new stdClass;
	}
 
	public function __get($n) {
		return $this->bar;
	}
}

So now we have a simple object with an array whose single element is an instance of PHP’s default object, stdClass. In reality you’d likely have more than just one element to the array, but it’s not necessary here to prove the point. Now, since objects are always returned by reference, accessing the first index of the array returned by __get when you try to access any member will allow you unfettered access to the contents of the object, to do with what you will (or rather, what the object will allow you to do).

With that in mind, let’s examine this:

$foo = new foo;
$foo->bar[0]->baz = 'I am a test';

This code is pretty easy to follow, and in fact does exactly what you’d expect: the stdClass object sitting in the first element of the “bar” array has a new member, “baz”, defined and assigned. Viewing the contents of the object will show that this is exactly what happened:

  ["bar":"foo":private]=> {
  array(1) {
    [0]=>
    object(stdClass)#2 (1) {
      ["baz"]=>
      string(11) "I am a test"
    }
  }
}

However, there’s a problem. Somewhere along the line, we generated a notice:

Notice: Indirect modification of overloaded property foo::$bar has no effect in …

While the notice certainly won’t halt the script’s execution, and the expected (and desired) action has taken place with no other apparent side effects, we are left with the conundrum of what to do with this notice (Note: While this issue has been brought to the attention of the PHP team, no word of a fix has yet surfaced). Since I am a firm believer that Notices and Warnings are potentially more dangerous than Fatal Errors, I won’t simply turn off error reporting; indeed, since the errors are still raised that doesn’t completely fix the small performance hit of generating the error, either.

In order to address this issue, it is important to understand what the notice is trying to tell us.  Once upon a time, __get was a return-by-reference function by default.  Of course, this doesn’t really help with wanting to prevent the modification of an object’s internal data, so __get was corrected to always return by value; in fact, even objects are “returned by value” in this case, since the value of the member variable is being returned (which just happens to also be a reference to an object), whereas the old __get would have returned a reference to the member variable itself; while the difference may seem subtle, it is monumental.  Since this change occurred, it was important to notify coders that if they attempted to modify the contents of an array element which came from an overloaded array, this action would have no effect, as the modified element would only exist in the copy returned from __get.

Armed with this knowledge of history, we have a few obvious options for solving this problem

  1. public function & __get($n).  This will technically prevent the warning from coming up, but if you’re going to go this route you might as well just declare all your member variables as public anyway, as this is what it will effectively cause __get to do.  It opens the door to such dangerous situations as:
    $foo->bar = 3;

    That’s right, if you return by reference explicitly in __get, then you will circumvent any rules you’ve set for assignment via __set. Even objects are not immune to this, as a reference to the member variable (itself containing a reference) will be returned. This option removes the efficacy of even having visibility operators for anything you intend to provide overloaded access to.

  2. Assign a variable to the contents of the array element. Again, technically, this works, but it is messy, inelegant, and is nowhere near the ideal. Here are two examples:
    $bar = $foo->bar;
    $bar[0]->baz = 'This works';
    ###
    $bar = $foo->bar[0];
    $bar->baz = 'This also works';

    Again, though, this is not the clean, simple approach we were looking for to begin with.

  3. Just turn off notices. Nah, we ain’t doin’ that.

So what’s left to consider? After thinking about the problem for a little while, I realized that this problem wouldn’t even exist if I could just store the array as an object instead, but objects don’t allow numerical indices, so it would take a little jimmy-rigging to get it to work. Here was the first version:

class arrayReference {
	private $_ = array();
 
	public function __set($n, $v) {
		$this->_[$n] = $v;
	}
 
	public function __get($n) {
		if (array_key_exists($n, $this->_)) {
			return $this->_[$n];
		}
		$this->_[$n] = null;
		return $this->_[$n];
	}
 
	public function __call($n, $a) {
		if ($n == 'array') {
			return $this->_();
		}
	}
 
	public function _() {
		return $this->_;
	}
}
 
class foo {
	private $bar = null;
 
	public function __construct() {
		$this->bar = new arrayReference;
		$this->bar->{0} = new stdClass;
	}
 
	public function __get($n) {
		return $this->bar;
	}
}
 
$foo = new foo;
$foo->bar->{1} = $foo;
$foo->bar->{99} = new stdClass;
$foo->bar->{99}->baz = 33;

Which, for the adjusted syntax, actually worked out pretty well. It might take more than an instant glance from your average PHP coder for what’s going on to make sense, or even seem syntactically correct, but it certainly worked; it even allowed for loop-based iteration by doing something like so:

foreach($foo->bar->array() as $k => $v)

While that isn’t ideal, it’s fairly transparent about what it’s doing.

I wish there were a more climactic way to put this, but there isn’t: The next step involved me trying to combine the SPL’s ArrayObject built in class to allow natural array access to my wrapper class, and after a few minutes playing with my new hideous child-beast amalgamate and its Reflection, I finally settled on this for the final version of the class:

class foo {
	private $bar = null;
 
	public function __construct() {
		$this->bar = new arrayObject;
	}
 
	public function __get($n) {
		return $this->bar;
	}
}

No more messy syntax, no compromises, no hideous amalgamate beasts, and no figuring out how to mangle my behemoth class this lesson actually needed to be applied to in order to extend ArrayObject for the purposes of accessing just one property, as I saw advocated elsewhere during the googleing portion of my problem solving routine. The example I first gave? Works just fine, and no error since the property being returned is an object, not an array. Sometimes the best solution is fiendishly simple; the only real consideration I had to make here was that, in its actual application, the array in question was declared null so it could be lazy loaded, and since you can’t use the “new” keyword or even type-hinting in class member declarations, I had to be careful to make sure the lazy loading mechanism would still work, but I was never declaring a traditional array either: all in all, a 5 minute job to implement and test.

Five minutes that made the past day or so of work seem rather silly indeed.

January 11th, 2010 by Dereleased | Comments Off on Arrays of Objects and __get: Friends Forever

Did You Know? Class Visibility in PHP

While it remains an imperative-style language, since version 5 PHP’s object model has gotten significantly more sophisticated. While in PHP 4 objects were little more than arrays with functions, the newer versions have most of the trimmings of modern OOP. Among those, probably considered a basic triviality at best, is member visibility. In fact, since PHP will assume that, if no visibility modifier is supplied, a method is public, it’s possible to write a class without ever thinking about its visibility.

Of course, if you’ve done anything much with OOP you can understand exactly why visibility is important, and why you’d want public-facing getters and setters to do your bidding (in most cases) instead of directly modifying members; heck, it’s about the closest we’re going to get to strong-typing in PHP. However, there is one very important “Gotcha!” in PHP’s visibility model: it is enforced at the class-level, not at the instance level.

This means, quite simply, that instances of the same class can call each other’s private/protected methods. I’m not here to decry this and tell you how to avoid it, because it’s here to stay so far as we can tell. I am here to embrace and, yes, evangelize it. Consider the case of the “User” object — that is, an object to represent the properties of a user. In most sufficiently complicated systems, a user will have to have permissions, which may even apply to different tiers of the application, or smaller object subdivisions, groups, etc – such a discussion is beyond the scope of this article – and I happen to already have just such an object in mind, as it is what is currently occupying most of my days at work.

Now, we have a user object, and somehow this object represents permissions. In my case, the user object will not always need to load its permissions to perform a task, and because of this the permissions (as well as anything else that doesn’t reside in the main table) are lazy-loaded the first time they are accessed via the getter. The block of code to do this, of course, is abstracted away, so any time I might need user permissions, I just call this:

$this->lazyLoadPermissions();

And this function, declared as private, takes care of checking to see if permissions are loaded yet, and if not, it loads them. When the user object is stored in the session at the end of the script’s execution, the permissions are not saved with the rest of the data (thank you __sleep()), and the cycle begins anew the next time I try to access this data.

So where does this neat visibility functionality come into play? One of the tasks I need to perform now and again is determining whether or not a particular user can claim “authority” over another user. Which is to say, just because I have user management privileges doesn’t mean I should be able to disable a user who is flagged as the owner of an object equal-to or higher-than my own permissions extend. So, to solve this problem, I wrote a simple function called hasAuthorityOver which accepts an instance of the user object as a parameter, and then tries to compare their permissions. Here are the first few lines:

	public function hasAuthorityOver(User_Object $subject) {
		$this->lazyLoadPermissions();
		$subject->lazyLoadPermissions();

As you can see, since the visibility is enforced at class level, and both of these objects are instances of User_Object, it is possible to call private methods on other objects, allowing you some small amount of flexibility in maintaining object integrity; of course, if you aren’t expecting this functionality, it can be jarring, but since it’s not going to change it’s worth embracing for the sake of making life easier.

Update! A new example enters the arena!

Another application of this technique might work as follows. Suppose you have an object Foo, which represents some arbitrary resource in a heirarchy, and Foo can have children which are also of type Foo. You might have something resembling this:

class Foo {
    private $children = array();
 
    public function setup(/* ... */, Foo $parent) {
        // ...
        $parent->addChild($this);
    }
 
    protected function $addChild(Foo $child) {
        // ...
    }
}

In this case, while the addChild method is hidden from public access, potential children may call their parent’s addChild method in order to add themselves in one easy step. This solves a problem of granting visibility between certain classes while still restricting it from the public scope, as you may elect to provide a “common ancestor” in order to allow objects to cross class boundaries. Food for thought!

December 3rd, 2009 by Dereleased | Comments Off on Did You Know? Class Visibility in PHP