PHP Quirks – String manipulation by offset

Just a quick update for a mild PHP Quirk/annoyance I have noticed recently while doing some manipulation of strings by character offset.

Say you have a string, such as ‘abcde’; Now, suppose you want to check the value of the third character (at index 2). You might have done something like this:

$str = 'abcde';
if ($str{2} == 'c') {
  // do something...
}

And, of course, that’s all fine, well and dandy, it does what you expect and you can move on with your life. In fact, if you’re in to micro-optimizations, that construct provides a great way to check a string for minimum length, and is, on average, 44% faster than using strlen(). However, you can use this same construct to change the value of the character at whatever string you’re working with. It works roughly as expected, but with a few gotchas:

/**
 * Gotcha #0 - Adding multiple characters to a single offset; shouldn't really be a gotcha
 */
$str = 'abc123';
 
$str{1} = 'a'; // aac123
$str{4} = '123'; // aac113
 
/**
 * Gotcha #1 - Adding characters past the end of the string
 */
$str{7} = 'c'; // aac113 c
echo ord($str{6}); // prints '32', the space character
 
/**
 * Gotcha #2 - Adding characters to an empty string
 */
$str = '';
$str{0} = 'a'; // array( 0 => 'a' )

In the first case, we see that, rather than leave the “uninitialized” area between where we’ve defined characters as a null character, it has been silently converted to a space. Arguably, this is so that an isset($str[6]); check would not return false, but this is important to know if you expected the values of those spaces to remain at zero.

In the second case, we see PHP’s weak typing in place; since an empty string has no offsets to begin with, attempts to add characters results in silent conversion to an array.

Recent Entries

Comments are closed.