On PHP Extensions

While teaching PHP I mention the term “extension” quite a lot – but I have realized that this may very well be a confusing term for non-PHPers. While most PHP training courses focus on code, I believe getting to know the PHP engine and environment is almost as important as learning to use the language. Extensions are a big and important part of PHP, but they seem to be a big knowledge gap about them and Googling for articles that explain what extensions are, and how they are used and installed, hardly returns good results. So, as part of my attempt to blog about more basic PHP topics, I’ve decided to try and come up with an overview of PHP extensions.

So what are PHP extensions?

An extension in PHP is in fact a module providing some functionality to the PHP Engine. While the term makes it sound like extensions provide some kind of special functionality, in reality many of the language’s most basic functions and classes are provided in extensions. Many extensions are shipped as part of the default PHP distribution, and some are in fact compiled into PHP in such way that they cannot even be unloaded. Come to think of it, perhaps it is best to think of PHP extensions as “language modules”.

Continue reading

Quickly Creating a New Admin User on Ubuntu

Working quite a lot on the Ubuntu Server EC2 images, I am often faced with a need to create one or more additional admin users, which have the same permissions as the first user (“ubuntu” in these images). I did a little bit of searching but didn’t find any way to easily add a new user with the same groups as the Ubuntu user, so I crafted a little command. I’m pasting it here mostly for future self reference, and also in hope this helps someone:

sudo useradd -m -G `groups ubuntu | cut -d" " -f4- | sed 's/ /,/g'` -s/bin/bash newuser

Of course ‘ubuntu’ is the user you want to copy, and ‘newuser’ is the name of the new user.

Note that the new user will be in the admin group but will still require a password when using sudo (that’s because in the EC2 images ‘ubuntu’ is the only user with NOPASSWD privileges. I personally believe this is a good thing, but if you want you can always add NOPASSWD on the admin group in /etc/sudoers.

 

Password hashing revisited

From user comments on my recent password hashing post, I’ve learned about a better solution for password hashing – rather than using hashing algorithms designed to be fast such as SHA-1 and SHA-256, use slower, and more important future-adaptable algorithms such as bcrypt. I have to say this is one of the reasons I love this community – you always learn new things.

I won’t repeat the reasons why methods such as bcrypt are preferred (read the comments on the previous post to learn why). However, I will note that starting from PHP 5.3 bcrypt is in fact built-in to PHP – so if you do not require portability to older versions of PHP, bcrypt-hasing could be done very easily, using the useful but a bit enygmatic crypt function:

Continue reading

Generating ZF Autoloader Classmaps with Phing

One of the things I’ve quickly discovered when working on Shoppimon is that we need a build process for our PHP frontend app. While the PHP files themselves do not require any traditional “build” step such as processing or compilation, there are a lot of other tasks that need to happen when taking a version from the development environment to staging and to production: among other things, our build process picks up and packages only the files needed by the app (leaving out things like documentation, unit tests and local configuration overrides), minifies and pre-compresses CSS and JavaScript files,  and performs other helpful optimizations on the app, making it ready for production.

Since Shoppimon is based on Zend Framework 2.0, it also heavily relies on the ZF2.0 autoloader stack. Class autoloading is convenient, and was shown to greatly improve performance over using require_once calls. However, different autoloading strategies have pros and cons: while PSR-0 based autoloading (the so called Standard Autoloader from ZF1 days) works automatically and doesn’t require updating any mapping code for each new class added or renamed, it has a significant performance impact compared to classmap based autoloading.

Fortunately, using ZF2′s autoloader stack and Phing, we can enjoy both worlds: while in development, standard PSR-0 autoloading is used and the developer can work smoothly without worrying about updating class maps. As we push code towards production, our build system takes care of updating class map files, ensuring super-fast autoloading in production using the ClassMapAutoloader. How is this done? Read on to learn.

Continue reading

Replacing a lost SSH key on an Amazon EC2 machine

Due to an unfortunate shmelting accident (read: poor backup practices), I lost the SSH private key granting me the only way to access one of my EC2 hosted servers. Being unable to access the server, and unable to easily set a new public key through Amazon’s interfaces, I panicked for a few seconds. Then I started trying to hack my way in, and eventually found a way to set a new public key to my user. Here is what I did.

First, know that I was lucky: for this method to properly work, you need a few things:

  • The machine must be EBS based
  • You need to be able to afford a couple of minutes of downtime
  • You need to be able to withstand the effects of restarting the machine – for example, if you do not have an Elastic IP address associated with the machine, its public address will change. In some situations this is not acceptable.

After trying some different approaches, what worked for me was to do the following:

  1. Generate a new keypair for yourself, and import the public key to your EC2 account
  2. Start a new, clean, cheap machine (this will only be needed to do very simple things, so I recommend using a tiny machine) in the same availability zone as the affected machine
  3. Stop the affected machine (do not terminate, STOP it – this is only possible with EBS machines)
  4. Detach the root device from the affected machine (by default attached as /dev/sda1)
  5. Attach the detached device to the new clean machine
  6. SSH into the clean machine and mount the affected machine’s root filesystem somewhere (e.g. in /mnt/fs)
  7. Now you can edit /mnt/fs/root/.ssh/authorized_keys (or on official Ubuntu machines /home/ubuntu/.ssh/authorized_keys) and add your new public key to it
  8. Unmount the volume and terminate the clean machine – you no longer need it
  9. Re-attach the root device to the affected machine (which should be stopped) – ensure to attach it as the same device it was before (e.g. /dev/sda1)
  10. Re-start your old machine – you should now be able to use your new key!

Another approach which could work but I gave up on after a couple of attempts (I think it really depends on the init scripts in the machine you are using), is to stop the machine and change the User Data of it to a shell script that sets a new public key in the right place, then start it again.

And really, you should backup your keys!

Why I don’t like the term “NoSQL”

This is a rant post, but just to clarify things, it’s not a rant against the use of non-relational databases. I think that the shift in recent years from a world in which relational databases are used almost exclusively regardless of what the need is, to today’s situation where it is possible and even considered a good idea to choose the best fitting solution from any number of data storage paradigms, is a truly blessed change. I am a big fan of some non-relational database solutions, and to be honest as a programmer I enjoy using some of them more than I enjoy MySQL or any other relational database.

This is a rant against the too-common term “NoSQL”. In my opinion, “NoSQL” is an example of layman terminology which does not properly describe the concepts which in most cases it aims to describe, and should not be used by professionals which are technical enough to understand the true meaning of these concepts.

“NoSQL” databases are all about the data model – in most cases, the term is used to describe any kind of storage engine (or database) in which data is stored in non-relational manner: object storage, document storage, key-value storage etc. Indeed, the term is more about what the database is not that about what it is.

Relational data is data that can be described as a table – in contrast to what some think, the term “relational database” has nothing to do with the ability to define and enforce relationships between data in different tables. If this was the case, MySQL using the MyISAM storage engine would not be a relational database. The term “relation” is a mathematical term, which existed before the creation of relational databases and is used to describe a relationship between two finite data sets, which can be described in a tabular manner (and I am not a mathematician, not even close – so I apologize in advance for this likely inaccurate description).

But, SQL has nothing to do with this – SQL is the language used to send commands to the database, and nothing more. It is true that there is an almost 1-to-1 correlation between database engines that store data in a relational manner and database engines that use SQL as a query language, but saying that relational databases are SQL databases is like saying that  (and assume it’s 1984 again) the Russian language should be abolished when in fact we want to say that communism is an unfitting economic system. It’s a poor way to describe your intentions, and it makes you sound like an ignorant moron.

There are many client libraries and wrappers that allow you to query a relational database such as MySQL and Oracle without writing any SQL code yourself. This doesn’t make them NoSQL databases. Some popular non-relational databases, such as Amazon SimpleDB and the Google App Engine Data Store provide query languages that are quite similar to SQL. This doesn’t make them SQL databases.SQL is just a language, and it is a good one for what it’s supposed to do (putting aside all sorts of discrepancies between vendor-specific SQL implementations). SQL is not what NoSQL databases are NOT about.

So, next time when you want to use a term that describes all databases that do not store data in a tabular manner, use the term “non-relational” or if you really like acronyms, “NonRDBMS”, and not “NoSQL”. Or even better – use a term that describes what your preferred solution is, not what it is not. After all, when you say “non-relational storage engine”, you are probably not referring to your file system, right?

HTML 5 Canvas Game of Life

I recently started looking into different HTML 5.0 related technologies, one of the most exciting ones being the new Canvas tag and API.

As a little test, I’ve implemented a little Game of Life thing using HTML 5 Canvas, which you can see in action here: http://arr.gr/playground/life/ (view source to see the code behind it).

Game of Life in HTML5 Canvas

The algorithm is not very smart so it’s kind of slow and CPU intensive, but still fun to watch. It works nicely on Firefox 4.0, and latest Chrome and Safari versions, and a bit slow on Firefox 3.6. I did not test with any IE version but I do not expect it to work in IE 6 or 7, maybe 8 and probably 9.

I think Game of Life by itself is worth at least an entire post regardless of this HTML5 implementation, especially because I’m a big fan of things that bring CS and philosophy together, so I may write about it at a later point, but for now I suggest you let it run for a while (a few hundreds of generations) and see what you get :)

ZendCon 10 talk: Amazon Services in Zend Framework

Wow I haven’t posted in a while… I’m still at ZendCon in Santa Clara, and have just finished my last talk, which was about the different Zend Framework components that can be used to work with the Amazon Cloud Services of S3 and EC2.

Presentation was pretty good, although I had to hurry up in the end and skip some of the last slides.

The slides are now up in Slideshare, and can be downloaded or viewed on-line.

XPath regular expression matching in PHP 5.3

Recently I needed to do some text pattern matching in an XML XPath query, and XPath’s built-in sub-string matching capabilities were not good enough.

While XPath 2.0 defines regular expression matching capabilities, it is still not widely implemented and in most available tools there is no easy way to do complex pattern matching on XML nodes.

Or is there?

In his blog Thomas Weinert recently gave an intro to using DOM and its XPath capabilities in PHP, but one of the cool features of DOM’s XPath, available starting from PHP 5.3.0 (have you upgraded yet?), is that the DOM extension supports registering pretty much any PHP function with the XPath engine, and using it inside XPath queries.

Here is a quick example showing usage of PHP’s own preg_match() in an XPath query, to find all the external links in Wikipedia’s PHP article:

// Supress XML parsing errors (this is needed to parse Wikipedia's XHTML)
libxml_use_internal_errors(true);

// Load the PHP Wikipedia article
$domDoc = new DOMDocument();
$domDoc->load('http://en.wikipedia.org/wiki/PHP');

// Create XPath object and register the XHTML namespace
$xPath = new DOMXPath($domDoc);
$xPath->registerNamespace('html', 'http://www.w3.org/1999/xhtml');

// Register the PHP namespace if you want to call PHP functions
$xPath->registerNamespace('php', 'http://php.net/xpath');

// Register preg_match to be available in XPath queries 
//
// You can also pass an array to register multiple functions, or call 
// registerPhpFunctions() with no parameters to register all PHP functions
$xPath->registerPhpFunctions('preg_match');

// Find all external links in the article  
$regex = '@^http://[^/]+(?<!wikipedia.org)/@';
$links = $xPath->query("//html:a[ php:functionString('preg_match', '$regex', @href) > 0 ]");

// Print out matched entries
echo "Found " . (int) $links->length . " external linksnn";
foreach($links as $linkDom) { /* @var $entry DOMElement */
    $link = simplexml_import_dom($linkDom);
    $desc = (string) $link;
    $href = (string) $link['href'];
    
    echo " - ";
    if ($desc && $desc != $href) {
        echo "$desc: ";
    } 
    echo "$href\n";
}

Note the use of php:functionString() as an XPath function, calling preg_match(). functionString() will pass XML entities such as @href as a string into the function, which is different from calling php:function() which, as far as I have seen, will pass parameters without casting them to a string first (however I am not sure what exactly they are passed as… maybe someone who knows can elaborate?).

Pretty useful huh?

Utopia in the header file

This is from the top of sqlite3.h, the header file for the SQLite3 library – most source file would have a copyright notice here referring people to read their license, but since SQLite is public domain, the author decided to put this instead:

** The author disclaims copyright to this source code. In place of
** a legal notice, here is a blessing:
**
** May you do good and not evil.
** May you find forgiveness for yourself and forgive others.
** May you share freely, never taking more than you give.

I have to admit I find this inspiring. For me, it is a strong reminder that dealing with legal limitations (on software and any other form of “intellectual property”) is at best no more than a necessary evil. That goes for free software licensing as well.