The importance of ZVals and Circular References

Just a quick post for now. Do you know how PHP’s symbol table works? To put it in nutshell, symbols are stored in one place and values (also called ZVals) are stored in another. Normally, this abstraction will mean nothing to you, but take the following sample code:

$foo = &$bar;
$bar = &$foo;

Pretty basic circular reference, and one that might be pretty difficult to assign in a few other languages. Now what? Well, let’s take a look at another reference construct for a moment.

$a = 'foo';
$b = 'bar';
$x = &$a;
$y = &$x;
$z = &$y;
 
var_dump($x, $y, $z);
/*
string(3) "foo"
string(3) "foo"
string(3) "foo"
*/

Pretty much what we expected. Now, let’s throw a wrench into the mix and reassign $y by reference to &$b, and then examine the results:

$y = &$b;
 
var_dump($x, $y, $z);
/*
string(3) "foo"
string(3) "bar"
string(3) "foo"
*/

Only the value of $y changed! That is because PHP, when assigning a reference to a reference, always points at the same ZVal, instead of creating a reference chain; this is one significant way in which PHP References are NOT pointers – they’re never more than one layer deep. Let’s go back to our original example and assign a value to one of those variables:

$foo = 3;
 
var_dump($foo, $bar);
/*
int(3);
int(3);
*/

Works like a charm! This is because both references pointed at the same location in the ZVal table. But what if we start over again, and reassign $foo by reference to something else?

$foo = &$bar;
$bar = &$foo;
$baz = 'baz';
 
$foo = &$baz;
 
var_dump($foo, $bar);
/*
string(3) "baz"
NULL
*/

If you’ve been following along, this should make perfect sense. $foo is created, and pointed at a ZVal location identified by $bar; when $bar is created, it points at the same place $foo was pointed. That location, of course, is null. When $foo is reassigned, the only thing that changes is to which ZVal $foo points; if we had assigned a different value to $foo first, then $bar would still retain that value.

While we’re on the topic of ZVals, I’ll mention just one more thing. PHP uses a lazy-copying (or, copy-on-write) mechanism, thanks to the ZVal table. Consider the following code:

$foo = str_repeat('x',100000);
$mem1 = memory_get_usage();
$bar1 = $bar2 = $bar3 = $bar4 = $bar5 = $bar6 = $foo;
$mem2 = memory_get_usage();
$bar1 .= "...";
$mem3 = memory_get_usage();

I leave the calls to memory_get_usage() in place so that their effects will be more obvious. If we dump those three values, we get 426040, 426408 and 526536, respectively. In the second phase, as you can see, we only increased memory usage by 386 bytes (and that includes the memory required to store the memory that was used). During the third phase, when a variable was altered, memory usage increased by 100128 bytes. PHP uses about 24 bytes of memory to make an entry into the symbol table, and 80 more to create a null entry in the ZVal table.

So, the next time you think about passing a parameter you don’t intend to modify to a function by reference in order to save memory, or returning one for the same reason, don’t worry about it so much; it’s only 24 bytes.

Update!

In my travels, I have learned much since writing this. There are actually a lot of reasons to not use references in most cases, because they actually will increase your memory usage. Because PHP uses copy-on-write, it must track which symbols point by reference, and which do not, as well as tracking which values are pointed to by reference. Essentially, as many things of one type (i.e. by-reference or by-value) can reference something of the same type without requiring a copy until write, but once you add something different in, an immediate copy is made.

$a = "123456";
$b = $a; // this initializes the symbol $b, but doesn't create a second value
$c = &$b; // this immediately splits off a new value for $b and $c to reference

References actually cost us memory in this case, because of the split required, and the same is true in reverse.

$a = "123456";
$b = &$a;
$c = $b;

Since $c was assigned by value, not by reference, a new value had to be created for that symbol to reference.

In summary, you should probably be avoiding references unless they are actually the solution to a programming problem you are having. They are not useful for saving memory, and will end up costing you more in the long run.

Recent Entries

Comments are closed.