Fixing logrotate errors and other MySQL issues on Ubuntu / Debian

For our MySQL databases on EC2, we back up the data by taking hourly snapshots of the volume that stores the MySQL data folder. This is a common practice, and in fact we do it using the popular ec2-consistent-snapshot script by Alestic. For backups this works great; in addition, having volume snapshots of MySQL data is very useful when we want to quickly launch a new MySQL server with a fairly recent state of our data – whether as a way to speed up synchronization of a new DB node or for testing purposes.

When launching a new EC2 instance we simply mount a new volume created from the latest DB snapshot on /var/lib/mysql and then start MySQL, and everything “just works” from there.

However, there are some quirks to this method: recently, I’ve encountered some errors from daily logrotate tasks which fail to flush the logs on a post-rotate script (this is from the logs of cron that runs logrotate):

Continue reading

PHP Memory Usage & Unnecessary String Concatenation

As PHP developers, especially if like me you don’t come from hard-core Comp Sci background, we are initially trained not to worry about memory. We do not allocate it, do not release it – in fact we rarely even worry about closing files and DB connections, and we hardly ever care, as many C programmers would, about the real-memory size of the different variable types we use. That’s a good thing too – in Web environments, these things tend to be negligible and putting time into optimizing them would be a wasteful micro-optimization.

However, it is also important to be mindful of the fact that most PHP servers are memory-bound. That is, a Web server running Apache or nginx and a pool of PHP processes (whether these are `php-fpm` processes or mod_php Apache forks) is most likely limited by how much memory is available to spawn more PHP processes. The number of concurrent PHP processes directly correlates to the number of concurrent requests a server can handle. On its own, each PHP process takes a few (I would guess 5-15) MBs of private memory, but very often the application code would require it to allocate many, many more MBs – just try to run your app with low memory_limit setting (the default is 128mb) and see what happens. Memory allocations have a lot of impact on PHP speed (as recent phpng benchmarks show), but speed aside, its important to remember memory hogging directly impacts your app’s hardware requirements.

For that reason, I think its a good idea to come up with a list of memory utilization good practices for PHP – we already have such “checklists” for security and for speed optimization, and I think that while micro-optimizations are usually worthless in the real world, following good practices when writing new code can save you the occasional meltdown. One good practice I’m going to suggest today is being mindful about where it is correct to use the oh-so-common operation of string concatenation.
Continue reading

Monitoring EC2 instance memory usage with CloudWatch

At Shoppimon we’ve been relying a lot on Amazon infrastructure – it may not be the most cost effective option for larger, more stable companies but for small start-ups that need to be very dynamic, can’t have high up-front costs and don’t have a large IT department its a great choice. We do try to keep our code clean from any vendor-specific APIs, but when it comes to infrastructure & operations management, AWS (with help from tools like Chef) has been great for us.

One of the AWS tools we use is CloudWatch – it allows us to monitor our infrastructure and get alerted when things go wrong. While its not the most flexible monitoring tool out there, it takes care of most of what we need right now and has the advantage of not needing to run an additional server and configure tools such as Nagios or Cacti. With its custom metrics feature, we can even send some app-level data and monitor it using the common Amazon set of tools.

However, there’s one big missing feature in CloudWatch: it doesn’t monitor your instance memory utilization. I suppose Amazon has all sorts of technical reasons not to provide this very important metric out of the box (probably related to the fact that their monitoring is done from outside the instance VM), but really if you need to monitor servers, in addition to CPU load and IO, memory utilization is one of the most important metrics to be aware of.

Continue reading

Generators in PHP 5.5

Now that PHP 5.5 alpha versions are being released, I decided to grab the latest PHP source from GitHub, build it and give the new Generators feature a spin. I have used generators in the past in Python, and was excited to hear they are coming to PHP. While they are useful mostly in advanced use cases they can make a lot of simple use cases much more efficient, and I think its a handy addition to the advanced PHP programmer’s toolbox.

What are Generators?

I like to describe Generators as special functions which are iterable and maintain state. Think of a function that instead of returning once and destroying its state (local variables) after returning, can return multiple times, while maintaining the state of local variables, thus allowing iteration over an instance of that function state. In fact, a call to a generator function creates a special Generator object which can be iterated. The object maintains the internal state of the generator, and on each iteration generates a new value. The same result can be achieved by implementing a Traversable class, but with much less code.

This is very different from the way we are used to think of functions, so maybe an example is the best way to demonstrate this. I will use a simplified example based on the one given in the documentation:


function xrange($start, $end, $step = 1)
{
  for ($i = $start; $i <= $end; $i += $step) {
    yield $i;
  }
}

$start = microtime(true);
foreach (xrange(0, 1000000) as $i) {
  // do nothing
}
$end = microtime(true);

echo "Total time: " . ($end - $start) . " sec\n";
echo "Peak memory usage: " . memory_get_peak_usage() . " bytes\n";

In the example above, the xrange function is a Generator which operates in a similar yet simplified version of the range() PHP function (just like in Python!). The main thing to notice is the yield keyword – this tells the function to yield a value – which means a value is “returned” but the state of the generator is maintained.

When iterating over a generator function, as you can see in the foreach loop, iteration continues as long as a value is yielded. Once the function returns without yielding (as xrange in our example would do once the inner for loop is done), iteration stops. We get a behaviour which is (almost) equivalent to range in the sense that it allows us to iterate over numbers – but, without allocating the entire array of numbers in advance. In our example, we save a lot of memory and in fact execution is faster when a generator is used.

To demonstrate, here is the output of the script above (ok, I added some formatting to the output, but the results are real!):

$ /usr/local/bin/php /tmp/with-generators.php
Total time: 0.20149302482605 sec
Peak memory usage: 234,256 bytes

This is on a one-million integers “array” (unlike range, no real array is allocated so we can’t do random access on members, but during iteration it behaves just like an array).

By comparison, executing the same code with range() instead of xrange(), results in the following:

$ /usr/local/bin/php /tmp/without-generators.php
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes) in /private/tmp/generators.php on line 12

Ok, we reach our memory limit. Lets try to go crazy (not a good idea in production):

$ /usr/local/bin/php -d memory_limit=200M /tmp/without-generators.php
Total time: 0.31754398345947 sec
Peak memory usage: 144,617,256 bytes

After increasing the memory limit to 200 MB, the script runs: but it takes longer (honestly, to my surprise), and consumes an order of magnitude more memory.

Pretty cool, huh?

Just to demonstrate, calling var_dump on a generator would result in this:


var_dump(xrange(0, 100));
// Output:
// object(Generator)#2 (0) {
// }

But I can do the same thing with Iterator interfaces, no?

Yes! pretty much anything you can do with Generators can be done by creating class which implements either the Iterator or IteratorAggregate interfaces. But in many cases, a lot of boilerplate code can be removed if a Generator is used instead. For example, a class equivalent to the xrange generator above would look like this:


class XrangeObject implements Iterator
{
  private $value = 0;
  private $start = 0;
  private $end   = 0;
  private $step  = 1;

  public function __construct($start, $end, $step = 1)
  {
    $this->value = (int) $start;
    $this->start = (int) $start;
    $this->end   = (int) $end;
    $this->step  = (int) $step;
  }

  public function rewind()
  {
    $this->value = $this->start;
  }

  public function current()
  {
    return $this->value;
  }

  public function key()
  {
    return $this->value;
  }

  public function next()
  {
    return ($this->value += $this->step);
  }

  public function valid()
  {
    return $this->value <= $this->end;
  }
}

$start = microtime(true);
$xrange = new XRangeObject(0, 1000000);
foreach ($xrange as $i) {
  // do nothing
}
$end = microtime(true);

echo "Total time: " . ($end - $start) . " sec\n";
echo "Peak memory usage: " . memory_get_peak_usage() . " bytes\n";

Wow, that’s much more code for something we achieved very simply with a generator. BTW, the results are:


$ /usr/local/bin/php /tmp/with-iterator.php
Total time: 0.61971187591553 sec
Peak memory usage: 240,968 bytes

As you can see, memory usage is comparable to a Generator. Run time is more than 3 times slower, but in most realistic use cases this time is usually negligible – in any case unless we would have seen an order of magnitude of difference, performance is not a major issue here. The interesting thing really is the amount of boilerplate code we had to use when creating an iterator – most of this code is just generic boring stuff and not what we really care about. With Generators, the implementation is much shorter.

How about a realistic use case?

Ok, so we have used a generator to iterate over numbers. Woopti-doo. We can just drop the generator and use the for loop inside it to achieve the same thing. How about a more realistic use case?

Take a look at the following example, which I believe can be pretty useful and still has fairly straightforward code: a generator which combines the efficiency of XMLReader with the simple API of SimpleXML to bring you an efficnet yet easy to use XML reader function for possibly large XML streams with repeating structure – for example, RSS or Atom feeds.


function xml_stream_reader($url, $element)
{
  $reader = new XMLReader();
  $reader->open($url);

  while (true) {
    // Skip to next element
    while (! ($reader->nodeType == XMLReader::ELEMENT && $reader->name == $element)) {
      if (! $reader->read()) break(2);
    }

    if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == $element) {
      yield simplexml_load_string($reader->readOuterXml());
      $reader->next();
    }
  }
}

The xml_stream_reader() generator defined above will use XMLReader to open and read from an XML stream. Unlike PHP’s SimpleXML or DOM extensions, it will not read an entire XML document into memory, thus avoiding potential blowups on very large XML files. To keep things simple for the user however, whenever it encounters the XML element searched by the user (e.g. the item element in RSS feeds), it will read the entire element into memory (assume each item is small but there are potentially thousands of items) and return it as a SimpleXMLElement object – thus still providing the ease of use of SimpleXML for the consumer.

Here is how it can be used:


$feed = xml_stream_reader('http://news.google.com/?output=rss&num=100', 'item');
foreach($feed as $itemXml) {
  echo $itemXml->title . "\n";
}

While I couldn’t find a large-enough XML file to test this on, even with 2mb files, this can be much more efficient than DOM or SimpleXML, and without too much more coding.

So I’m really happy about the addition of generators – it’s a cool feature. Not one you’d use every day, but in some places where complex Iterators had to be implemented (and where OO features such as polymorphism are not required), generators can be a real neat, concise and maintainable solution.

Serving ZF apps with the PHP 5.4 built-in Web Server

When teaching PHP to newcomers, I have found that (honestly to my surprise) one of the biggest barriers you have to cross is setting the stack up to serve PHP files properly, especially when it comes to Zend Framework apps and other rewrite rule based MVC applications. Even with strong development background, the idea of setting up Web Server configuration to get things working seems foreign to many.

Even as an experienced developer with good knowledge of the LAMP stack setup, setting up new vhosts and other configuration for each new project is sometimes a pain in the ass.

There are of course good news – starting from PHP 5.4, the Command Line Interface (CLI) version of PHP comes with a build-in Web server that can be used to serve PHP apps in development. This Web server is very easy to use – you just fire it up in the right place and it works, serving your PHP files. While it is by no means a viable production solution (it is a sequential, no-concurrency server meaning it will only serve one request at a time), it is very convenient for development purposes.

While it “just works” for simple “1-to-1 URL <-> File” apps, it can work almost as easily for rewrite based MVC apps, including Zend Framework 1.x and 2.x apps and probably for other frameworks as well.

Continue reading

On PHP Extensions

While teaching PHP I mention the term “extension” quite a lot – but I have realized that this may very well be a confusing term for non-PHPers. While most PHP training courses focus on code, I believe getting to know the PHP engine and environment is almost as important as learning to use the language. Extensions are a big and important part of PHP, but they seem to be a big knowledge gap about them and Googling for articles that explain what extensions are, and how they are used and installed, hardly returns good results. So, as part of my attempt to blog about more basic PHP topics, I’ve decided to try and come up with an overview of PHP extensions.

So what are PHP extensions?

An extension in PHP is in fact a module providing some functionality to the PHP Engine. While the term makes it sound like extensions provide some kind of special functionality, in reality many of the language’s most basic functions and classes are provided in extensions. Many extensions are shipped as part of the default PHP distribution, and some are in fact compiled into PHP in such way that they cannot even be unloaded. Come to think of it, perhaps it is best to think of PHP extensions as “language modules”.

Continue reading

Quickly Creating a New Admin User on Ubuntu

Working quite a lot on the Ubuntu Server EC2 images, I am often faced with a need to create one or more additional admin users, which have the same permissions as the first user (“ubuntu” in these images). I did a little bit of searching but didn’t find any way to easily add a new user with the same groups as the Ubuntu user, so I crafted a little command. I’m pasting it here mostly for future self reference, and also in hope this helps someone:

sudo useradd -m -G `groups ubuntu | cut -d" " -f4- | sed 's/ /,/g'` -s/bin/bash newuser

Of course ‘ubuntu’ is the user you want to copy, and ‘newuser’ is the name of the new user.

Note that the new user will be in the admin group but will still require a password when using sudo (that’s because in the EC2 images ‘ubuntu’ is the only user with NOPASSWD privileges. I personally believe this is a good thing, but if you want you can always add NOPASSWD on the admin group in /etc/sudoers.

 

Password hashing revisited

From user comments on my recent password hashing post, I’ve learned about a better solution for password hashing – rather than using hashing algorithms designed to be fast such as SHA-1 and SHA-256, use slower, and more important future-adaptable algorithms such as bcrypt. I have to say this is one of the reasons I love this community – you always learn new things.

I won’t repeat the reasons why methods such as bcrypt are preferred (read the comments on the previous post to learn why). However, I will note that starting from PHP 5.3 bcrypt is in fact built-in to PHP – so if you do not require portability to older versions of PHP, bcrypt-hasing could be done very easily, using the useful but a bit enygmatic crypt function:

Continue reading

Generating ZF Autoloader Classmaps with Phing

One of the things I’ve quickly discovered when working on Shoppimon is that we need a build process for our PHP frontend app. While the PHP files themselves do not require any traditional “build” step such as processing or compilation, there are a lot of other tasks that need to happen when taking a version from the development environment to staging and to production: among other things, our build process picks up and packages only the files needed by the app (leaving out things like documentation, unit tests and local configuration overrides), minifies and pre-compresses CSS and JavaScript files,  and performs other helpful optimizations on the app, making it ready for production.

Since Shoppimon is based on Zend Framework 2.0, it also heavily relies on the ZF2.0 autoloader stack. Class autoloading is convenient, and was shown to greatly improve performance over using require_once calls. However, different autoloading strategies have pros and cons: while PSR-0 based autoloading (the so called Standard Autoloader from ZF1 days) works automatically and doesn’t require updating any mapping code for each new class added or renamed, it has a significant performance impact compared to classmap based autoloading.

Fortunately, using ZF2′s autoloader stack and Phing, we can enjoy both worlds: while in development, standard PSR-0 autoloading is used and the developer can work smoothly without worrying about updating class maps. As we push code towards production, our build system takes care of updating class map files, ensuring super-fast autoloading in production using the ClassMapAutoloader. How is this done? Read on to learn.

Continue reading

Say Hi to Shoppimon – Magento Monitoring for “Normal” People

For a while now I have been telling people I am “working on a small project” – and now is the time to unveil the mystery and introduce Shoppimon – a new start-up which I founded together with a small group of friends, and am currently spending most of my time around.

The idea of Shoppimon is simple – we want to provide Web monitoring and availability analysis which will be useable by, and useful to “normal” people – not only the tech guy, the programmer or the IT specialist, but the site owner, the business owner or even the marketing guy – in other words the real stake holder.

Continue reading