Generators in PHP 5.5

Now that PHP 5.5 alpha versions are being released, I decided to grab the latest PHP source from GitHub, build it and give the new Generators feature a spin. I have used generators in the past in Python, and was excited to hear they are coming to PHP. While they are useful mostly in advanced use cases they can make a lot of simple use cases much more efficient, and I think its a handy addition to the advanced PHP programmer’s toolbox.

What are Generators?

I like to describe Generators as special functions which are iterable and maintain state. Think of a function that instead of returning once and destroying its state (local variables) after returning, can return multiple times, while maintaining the state of local variables, thus allowing iteration over an instance of that function state. In fact, a call to a generator function creates a special Generator object which can be iterated. The object maintains the internal state of the generator, and on each iteration generates a new value. The same result can be achieved by implementing a Traversable class, but with much less code.

This is very different from the way we are used to think of functions, so maybe an example is the best way to demonstrate this. I will use a simplified example based on the one given in the documentation:


function xrange($start, $end, $step = 1)
{
  for ($i = $start; $i <= $end; $i += $step) {
    yield $i;
  }
}

$start = microtime(true);
foreach (xrange(0, 1000000) as $i) {
  // do nothing
}
$end = microtime(true);

echo "Total time: " . ($end - $start) . " sec\n";
echo "Peak memory usage: " . memory_get_peak_usage() . " bytes\n";

In the example above, the xrange function is a Generator which operates in a similar yet simplified version of the range() PHP function (just like in Python!). The main thing to notice is the yield keyword – this tells the function to yield a value – which means a value is “returned” but the state of the generator is maintained.

When iterating over a generator function, as you can see in the foreach loop, iteration continues as long as a value is yielded. Once the function returns without yielding (as xrange in our example would do once the inner for loop is done), iteration stops. We get a behaviour which is (almost) equivalent to range in the sense that it allows us to iterate over numbers – but, without allocating the entire array of numbers in advance. In our example, we save a lot of memory and in fact execution is faster when a generator is used.

To demonstrate, here is the output of the script above (ok, I added some formatting to the output, but the results are real!):

$ /usr/local/bin/php /tmp/with-generators.php
Total time: 0.20149302482605 sec
Peak memory usage: 234,256 bytes

This is on a one-million integers “array” (unlike range, no real array is allocated so we can’t do random access on members, but during iteration it behaves just like an array).

By comparison, executing the same code with range() instead of xrange(), results in the following:

$ /usr/local/bin/php /tmp/without-generators.php
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes) in /private/tmp/generators.php on line 12

Ok, we reach our memory limit. Lets try to go crazy (not a good idea in production):

$ /usr/local/bin/php -d memory_limit=200M /tmp/without-generators.php
Total time: 0.31754398345947 sec
Peak memory usage: 144,617,256 bytes

After increasing the memory limit to 200 MB, the script runs: but it takes longer (honestly, to my surprise), and consumes an order of magnitude more memory.

Pretty cool, huh?

Just to demonstrate, calling var_dump on a generator would result in this:


var_dump(xrange(0, 100));
// Output:
// object(Generator)#2 (0) {
// }

But I can do the same thing with Iterator interfaces, no?

Yes! pretty much anything you can do with Generators can be done by creating class which implements either the Iterator or IteratorAggregate interfaces. But in many cases, a lot of boilerplate code can be removed if a Generator is used instead. For example, a class equivalent to the xrange generator above would look like this:


class XrangeObject implements Iterator
{
  private $value = 0;
  private $start = 0;
  private $end   = 0;
  private $step  = 1;

  public function __construct($start, $end, $step = 1)
  {
    $this->value = (int) $start;
    $this->start = (int) $start;
    $this->end   = (int) $end;
    $this->step  = (int) $step;
  }

  public function rewind()
  {
    $this->value = $this->start;
  }

  public function current()
  {
    return $this->value;
  }

  public function key()
  {
    return $this->value;
  }

  public function next()
  {
    return ($this->value += $this->step);
  }

  public function valid()
  {
    return $this->value <= $this->end;
  }
}

$start = microtime(true);
$xrange = new XRangeObject(0, 1000000);
foreach ($xrange as $i) {
  // do nothing
}
$end = microtime(true);

echo "Total time: " . ($end - $start) . " sec\n";
echo "Peak memory usage: " . memory_get_peak_usage() . " bytes\n";

Wow, that’s much more code for something we achieved very simply with a generator. BTW, the results are:


$ /usr/local/bin/php /tmp/with-iterator.php
Total time: 0.61971187591553 sec
Peak memory usage: 240,968 bytes

As you can see, memory usage is comparable to a Generator. Run time is more than 3 times slower, but in most realistic use cases this time is usually negligible – in any case unless we would have seen an order of magnitude of difference, performance is not a major issue here. The interesting thing really is the amount of boilerplate code we had to use when creating an iterator – most of this code is just generic boring stuff and not what we really care about. With Generators, the implementation is much shorter.

How about a realistic use case?

Ok, so we have used a generator to iterate over numbers. Woopti-doo. We can just drop the generator and use the for loop inside it to achieve the same thing. How about a more realistic use case?

Take a look at the following example, which I believe can be pretty useful and still has fairly straightforward code: a generator which combines the efficiency of XMLReader with the simple API of SimpleXML to bring you an efficnet yet easy to use XML reader function for possibly large XML streams with repeating structure – for example, RSS or Atom feeds.


function xml_stream_reader($url, $element)
{
  $reader = new XMLReader();
  $reader->open($url);

  while (true) {
    // Skip to next element
    while (! ($reader->nodeType == XMLReader::ELEMENT && $reader->name == $element)) {
      if (! $reader->read()) break(2);
    }

    if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == $element) {
      yield simplexml_load_string($reader->readOuterXml());
      $reader->next();
    }
  }
}

The xml_stream_reader() generator defined above will use XMLReader to open and read from an XML stream. Unlike PHP’s SimpleXML or DOM extensions, it will not read an entire XML document into memory, thus avoiding potential blowups on very large XML files. To keep things simple for the user however, whenever it encounters the XML element searched by the user (e.g. the item element in RSS feeds), it will read the entire element into memory (assume each item is small but there are potentially thousands of items) and return it as a SimpleXMLElement object – thus still providing the ease of use of SimpleXML for the consumer.

Here is how it can be used:


$feed = xml_stream_reader('http://news.google.com/?output=rss&num=100', 'item');
foreach($feed as $itemXml) {
  echo $itemXml->title . "\n";
}

While I couldn’t find a large-enough XML file to test this on, even with 2mb files, this can be much more efficient than DOM or SimpleXML, and without too much more coding.

So I’m really happy about the addition of generators – it’s a cool feature. Not one you’d use every day, but in some places where complex Iterators had to be implemented (and where OO features such as polymorphism are not required), generators can be a real neat, concise and maintainable solution.

My PHP Streams API article was published by php|architect

php|architect, one of the most prominent professional PHP magazines in the world, has published an article I wrote about PHP’s user-space Streams API in its December 2011 issue:

Go with the Flow: PHP’s Userspace Streams API

Almost every PHP application out there needs to read data from files or write data to files – or things that look like files but are not quite files – these unstructured blobs of data are commonly referred to as “streams”. Stream functions allow a scalable, portable and memory efficient way to handle data, and pretty much any PHP developer out there knows how to read data from or write data to a steam. The best part is that you don’t have to be an extension author in order to provide access to any data source as if it was just a regular file. PHP’s userspace streams API allows you do to exactly that, and this article will show you how.
If you’re a subscriber, feel free to read the article and send me your feedback. If not, go ahead an buy the issue :)

Imagick: Maintain (fake) transparency when saving as JPEG

I haven’t blogged in a while (have been busy you know), so I’ve decided to share this small piece of knowledge I’ve obtained by experimenting. I wrote a small test app (it’s for a feature of the next version of Zend Server – maybe I’ll share it one day when the API is stable), which does some image manipulation with the ImageMagick extension.

For those of you who don’t know ImageMagick allows one to preform pretty cool stuff on images – except for the usual drawing, conversion, rotation, rescaling etc., it also exposes some API to easily preform neat effects, like drop shadow, round corners and my newest favorite (apparently only available in the very latest builds of the extension) – the Polaroid effect.

In his blog Mikko Koppanen, the author of the ImageMagick PHP extension, shows how to create drop shadows (as well as other neat things – you should check out his blog!), but in his examples Mikko will always save as PNG, which is something I dare to say most web users will not do, and prefer saving as JPEG.

Problem with many of those effects, is that they leave parts of the image as transparent. When saving the picture as JPEG (as I do, since saving as PNG produces too big files), these transparent areas appear as black.

So after some experimenting, I’ve found out that the way to work around this is to composite another opaque layer as your background layer, filled with your background color of choice (white in my case). You will of course loose the ability to place the picture on other background colors and still have a nice “transparency” look – but as long as you stick to the background color you’ve set, it will look great.

Here is a code sample producing the same thumbnail + drop shadow as in Mikko’s example, but saving it with white matte color as JPEG:

<?php

$bgColor = '#ffffff'; // End result will have a white background

/* This was taken from Mikko's example */
$im = new Imagick( 'strawberry.png' );
$im->thumbnailImage( 200, null );
$im->roundCorners( 5, 5 );

$shadow = $im->clone();
$shadow->setImageBackgroundColor( new ImagickPixel( 'black' ) );
$shadow->shadowImage( 80, 3, 5, 5 );
$shadow->compositeImage( $im, Imagick::COMPOSITE_OVER, 0, 0 );

/* My addition: clone the entire image again to create the background layer */
$bg = $shadow->clone();

/* I'm using colorFloodFiilImage with high tolerance to paint it all white - maybe there are 'cleaner' ways to do it though */
$bg->colorFloodFillImage($bgColor, 100, '#777777', 0, 0);
$bg->compositeImage($shadow, Imagick::COMPOSITE_OVER, 0, 0);
$bg->setImageFormat('jpeg');
$bg->flattenImages();

/* Display the image */
header( "Content-Type: image/jpeg" );
echo $bg;

While there’s another step in the way, and the image will only look good on white backgrounds, you can now save it as a JPEG file with good compression and acceptable file size.

How much is listening to your customers worth?

I normally don’t write about work. The reason is that I feel that the slight chance that someone might feel I’m being biased towards a product that comes from the company I work for and dismiss my thoughts as “guerilla marketing” is not worth it.

However, I’m going to make an exception – and that’s because I prefer selling Zend here rather than doing it on Lukas Smith’s blog :)

Lukas raises the question of what commercial PHP distribution should be used as an alternative to RHEL outdated packages. My answer on that would be, surprisingly – use Zend Server! (well, …once it’s out of beta, of course).

Lets put the features and SLA you get from Zend Server aside for a moment.

The real reason I think you should use Zend Server is because the Zend Server product manager (hey, that’s me!) reads your blog. I’m serious about this.

I’m not sure I can quantify this, but I think that a vendor that listens so closely to what potential users (and the community) has to say is worth quite a lot in the long run. And yes, Zend has not been perfect in listening to the community – but I can honestly and whole-heartedly say that we are trying harder. The recent feedback on Zend Server gives me the feeling that we are doing ok too.

Finally, it’s out: Zend Server

I normally try not to write about work related stuff… but this is a special occasion.

Zend Server is finally out for public beta. o/

I was working so hard on this for the last year, It kind of feels like I’ve just crapped an Elephpant ;)

Seriously now, I really like this product. I think it has great potential. I know a bunch of very good people who worked very hard on it, and deserve every bit of gratitude. We went over some rough times at Zend and we still were able to release this wonderful product! I’m so proud… :)

Attending Adobe Max next week

As part of my job at Zend, I was invited by Adobe to Adobe Max in San Francisco – how cool is that? It’s a huge conference (thousands of participants – nothing like any PHP conference I know!) with so many presentations to sit in it’s just hard to choose.

Of course, I am no designer and tend to stick to the server side – so for me choosing was easier, but still confusing.

In any case if you are there, or in down town San Francisco, come and say hi!

ZendCon 2008 Slides

Well, ZendCon is over and it was much fun! I got home today (well, does 4:30 am count as “today” ?) and am still very tired – I had to stay around after the conference for some meetings (yes, the title “manager” causes some PITA even if you do not really manage anyone) which was a bit exhausting but fruitful never the less.

I gave this presentation about Zend Platform:

It’s the first time ever I’m giving a presentation about proprietary Zend technology in an open-source conference so I was a bit nervous – but to my surprise I got a full room (I estimate some ~100 people were there, and only a few Zenders) and there seemed to be a lot of interest. In general this ZendCon felt a bit more “business-oriented” than usual, but still had a good mix of community and hacker-spirit to it.

Another thing is that Siddhartha – our VP of Sales for North America actually HUGGED me after the talk. He was sitting in the room and the guy knows a lot about selling Zend Platform – but I suppose that hearing the value of the different features and some good example use cases for Zend Platform from a technical perspective gave him and the rest of the sales team some good insight into what customers are looking for in such a product.

Anyway enjoy the slides and if you have questions just post a comment. I will probably post some more about the previous week – if only I will be able to get my hands off my new iPod Touch ;)