ZendCon 10 talk: Amazon Services in Zend Framework

Wow I haven’t posted in a while… I’m still at ZendCon in Santa Clara, and have just finished my last talk, which was about the different Zend Framework components that can be used to work with the Amazon Cloud Services of S3 and EC2.

Presentation was pretty good, although I had to hurry up in the end and skip some of the last slides.

The slides are now up in Slideshare, and can be downloaded or viewed on-line.

XPath regular expression matching in PHP 5.3

Recently I needed to do some text pattern matching in an XML XPath query, and XPath’s built-in sub-string matching capabilities were not good enough.

While XPath 2.0 defines regular expression matching capabilities, it is still not widely implemented and in most available tools there is no easy way to do complex pattern matching on XML nodes.

Or is there?

In his blog Thomas Weinert recently gave an intro to using DOM and its XPath capabilities in PHP, but one of the cool features of DOM’s XPath, available starting from PHP 5.3.0 (have you upgraded yet?), is that the DOM extension supports registering pretty much any PHP function with the XPath engine, and using it inside XPath queries.

Here is a quick example showing usage of PHP’s own preg_match() in an XPath query, to find all the external links in Wikipedia’s PHP article:

// Supress XML parsing errors (this is needed to parse Wikipedia's XHTML)

// Load the PHP Wikipedia article
$domDoc = new DOMDocument();

// Create XPath object and register the XHTML namespace
$xPath = new DOMXPath($domDoc);
$xPath->registerNamespace('html', 'http://www.w3.org/1999/xhtml');

// Register the PHP namespace if you want to call PHP functions
$xPath->registerNamespace('php', 'http://php.net/xpath');

// Register preg_match to be available in XPath queries 
// You can also pass an array to register multiple functions, or call 
// registerPhpFunctions() with no parameters to register all PHP functions

// Find all external links in the article  
$regex = '@^http://[^/]+(?<!wikipedia.org)/@';
$links = $xPath->query("//html:a[ php:functionString('preg_match', '$regex', @href) > 0 ]");

// Print out matched entries
echo "Found " . (int) $links->length . " external linksnn";
foreach($links as $linkDom) { /* @var $entry DOMElement */
    $link = simplexml_import_dom($linkDom);
    $desc = (string) $link;
    $href = (string) $link['href'];
    echo " - ";
    if ($desc && $desc != $href) {
        echo "$desc: ";
    echo "$href\n";

Note the use of php:functionString() as an XPath function, calling preg_match(). functionString() will pass XML entities such as @href as a string into the function, which is different from calling php:function() which, as far as I have seen, will pass parameters without casting them to a string first (however I am not sure what exactly they are passed as… maybe someone who knows can elaborate?).

Pretty useful huh?

Utopia in the header file

This is from the top of sqlite3.h, the header file for the SQLite3 library – most source file would have a copyright notice here referring people to read their license, but since SQLite is public domain, the author decided to put this instead:

** The author disclaims copyright to this source code. In place of
** a legal notice, here is a blessing:
** May you do good and not evil.
** May you find forgiveness for yourself and forgive others.
** May you share freely, never taking more than you give.

I have to admit I find this inspiring. For me, it is a strong reminder that dealing with legal limitations (on software and any other form of “intellectual property”) is at best no more than a necessary evil. That goes for free software licensing as well.

Experimenting with Glista on OS X

I haven’t blogged in a while, probably because I was too busy. I’ve been working, started to take some university classes (Philosophy & Computer Science), and… I’m doing most of my work on Mac OS X now. Don’t worry, I’m still a Linux guy – but mostly for work purposes (and out of curiosity) I decided to ask Zend for a Macbook when my Thinkpad was starting to die.

Unfortunately the negative side effect of this is that I had to put Glista on hold – since I didn’t have a Gtk+ based desktop anymore there wasn’t much point in actively working on it.

However, in the last couple of days (following some patches that came in from ananasik, for whom I immediately gave commit access) my fingers started itching, and I decided to play with porting Glista to OS X – and found this project.

After some hours of tinkering, crashing, building, rebuilding and breaking things again, I now have a somewhat working (albeit ugly, and not so OS X friendly) working Glista.app Application bundle running on my own 32 bit OS X 10.6:

Glista running on native OS X for the first time!


Glista in the Dock!

If you’re really up for it, you can get a Disk Image here.

You can also build it from source by checking out http://glista.googlecode.com/svn/branches/osx-support and doing the following:

  1. Make sure you have all the nescesary build environment (XCode is usually a good start!)
  2. Install all the gtk-osx tools and libraries including ige-mac-builder and gtk-quartz
  3. cd into the source directory and run (in a jhbuild shell after installing osx-gtk) ./configure –prefix=$PREFIX
  4. Note that some things do not work on OS X yet (or will never work) like libunique integration, gtk-spell, libnotify integration etc. – that’s normal for nowRun ‘make’, don’t (!!) run ‘make install’ (well you can, but there’s no need, you’ll just pollute your system
  5. cd into dist/mac/ and run ‘make dist-mac’. If everything is ok this should create Glista.app in that directory.
  6. Move that .app into /Applications (or anywhere else) and enjoy!

So far, it looks like it’s going to be a long time before Glista will work smoothly on Mac – and most of it is because Gtk+ is not really that portable, and making it use OS-native widgets and rendering seems to be quite a challenge. I also don’t feel I know enough about the internals of Gtk+, Quartz or OS X in general in order to help with that effort – but who knows, maybe I’ll be able to help somehow?

BTW I’m not sure if that binary will work on anything but OS X 10.6 on Intel 32 bit. If you try, let me know!

NetworkManager: Auto-HTTP login to a Wifi network

One of the cafés in my area where I frequently drink / work requires you to pass through an annoying web page forcing you to agree to some terms before allowing you to access the Internet through their Wifi network. It’s free – but they still annoy you with this silly HTTP gateway. This is actually a frequent thing in Israel – most cafés offer free Wifi access, but some will require to to log-in nevertheless.

So today I figured out how to get NetworkManager to automatically work around this HTTP gateway for me whenever I connect to Arcaffe’s Wifi network. Since it’s super cool, and since I bet lots of people are annoyed by these sort of Wifi gateways, here’s how to do it:

Apparently, NetworkManager allows you to create special post-connect or post-disconnect scripts that are executed when a network interface is brought up or down. Here is what I did:

I Created the following script and saved it at /etc/NetworkManager/dispatcher.d/100httpgateway.sh:



if [ "x$1" = "x$IFACE" ] && [ "$2" = "up" ]; then
    # Figure out the wifi SSID
    SSID=$(/sbin/iwconfig $IFACE  | grep ESSID | cut -d: -f2 | sed 's/^s*"(.*)"s*$/1/')

    case "$SSID" in
           COOKIE='Cookie: JSESSIONID=uc54121j305s; cookies=true'
           REFERER='Referer: http://captive.012.net.il/home?confirmed=true&submitButton=+OK+&CPURL=http%3A%2F%2Fwww.arcaffe.co.il%2F&t=fsm3j5oe'

           curl -d "$DATA" -H "$COOKIE" -H "$REFERER" "$URL" > /tmp/arcaffe.last 2> /tmp/arcaffe.last.err

Don’t forget to make the file executable – I did it by running chmox +x /etc/NetworkManager/dispatcher.d/100httpgateway.sh.

Some things you should note:

  • “012-ArCaffe” is the ESSID of the network I’m logging in to. This of course work for ArCaffe in Israel, but you should change that with your network’s ESSID.
  • Replace the value of IFACE with the name of your wireless interface
  • $1, the first parametter passed by NetworkManager to the script, is the network interface that was just connected or disconnected
  • $2, the second parameter, is “up” or “down” – the status of the interface.
  • The code I have inside the case block is where the magic happens. In this case, I send an HTTP POST with the correct parameters, Cookie and Referer headers and URL. This causes ArCaffe’s gateway to log me in
  • I use curl – but I could have also used wget or any other tool to do the job
  • The -d flag sends the POST data, the -H flags set a header
  • I figured exactly what request to send using LiveHttpHeaders – but you can also use tcpflow or any other packet sniffing or HTTP sniffing tool
  • You can add more options to the ‘case’ statement for more networks that need that sort of treatment. With a little of bash-fu that should be no problem.

That’s it! Man I love Linux today :D

Priceless: “The Issuer Certificate Is Unknown”

Firefox: "mossad.gov.il uses an invalid security certificate"

Another example of the all-so-frightening invalid HTTPS certificate warning in Firefox 3.0. I just found this one to be a bit ironic :)

BTW The Mossad website is mostly for recruiting purposes, they don’t really let you search their archives on-line or anything… to bad, that could have been interesting :P

(and one more thing: yes, it’s “The Mossad” and not just “Mossad” as it’s frequently mis-translated in foreign media. “Mossad” literally means “Institute” or maybe in a less literal translation, “Agency”. There are many institutes and agencies, but there is only one “The Institute”)

Subversion: Finding the “base” revision of a branch

I use Subversion a lot – but today I’ve learned something new:

You can easily find the “base” revision of a branch or a tag (i.e. the revision in which the branch or tag was created) by issuing the following command:

svn log -v --stop-on-copy 


The last revision you see in the log (in this case from one of my own Glista project’s branches) is the revision the svn copy command was issued on, i.e. the branch was made.

This can then be used when merging the same branch back into trunk.


Notify me when emerge is done

As a Gentoo user, I frequently install new software or update existing packages using emerge. Unlike binary package managers, building packages from source using emerge takes time, and I prefer running it inside a detached screen session, because (a) if I close the session it continues in the background and (b) it actually runs faster when it doesn’t need to show all the compilation output in an X terminal.

This has an annoying side effect: I sometimes start a long update process and forget I did it. If things fail (or even if they succeed) I don’t know about it.

Today, I solved this problem using notify-send, the libnotify binary client. I’ve created a short shell script wrapper to emerge which sends me a notification once emerge finishes, along with the original command line arguments and exit status code (0 = ok, failure otherwise):



/usr/bin/emerge "$ARGS"
if [ "$STATUS" = "0" ]

/usr/bin/notify-send --expire-time=0 --urgency=$LEVEL "emerge finished" "Exit Code: $STATUS
Emerge args: $ARGS"

If you save this code in a file named emerge-notify (and of course remember to chmod +x this file) you could then do this (from a GNOME terminal or any other X terminal):

$ ./emerge-notify -puD world

And you’ll get a nice notification when it’s done:

"emerge finished" notification

Of course, you need libnotify the notify-send binary and dbus for this.

PHP Error Reporting Levels – WTF is 6134?

In PHP, the error reporting level (whether errors go to the log or to the screen or whatever) is determinded by the error_reporting INI directive (or at runtime using the error_reporting() function). Both take an integer as their value – and usually this integer is represented by error level constants like E_ALL, E_STRICT or E_USER_WARNING.

So in order to set the error reporting to anything but notices and strict errors, you would set something like this in php.ini:

error_reporting = E_ALL & ~E_NOTICE & ~E_STRICT

(actually, E_ALL does not really include E_STRICT but I put it here just to be more explicit)

This is actually great – one of the more easy to use and understand APIs in my opinion (yeah, I really like bitwise operations).

However, what I really hate is that sometimes I need to work with the integer value of the error reporting level (like 1 for E_ERROR or 84 for E_PARSE | E_CORE_ERROR | E_COMPILE_ERROR) and it’s very hard for me to remember what arbitrary numbers like 6134 mean.

So, today I wrote this tiny CLI script that helps me understand what an arbitrary error_reporting level integer might mean:


$errorLevels = array(
        'E_CORE_ERROR'        => E_CORE_ERROR,
        'E_CORE_WARNING'      => E_CORE_WARNING,
        'E_ERROR'             => E_ERROR,
        'E_NOTICE'            => E_NOTICE,
        'E_PARSE'             => E_PARSE,
        'E_STRICT'            => E_STRICT,
        'E_USER_ERROR'        => E_USER_ERROR,
        'E_USER_NOTICE'       => E_USER_NOTICE,
        'E_USER_WARNING'      => E_USER_WARNING,
        'E_WARNING'           => E_WARNING,

if (! isset($_SERVER['argv'][1])) {
        fprintf(STDERR, "Usage: {$_SERVER['argv'][0]} <error_level>n" .
                        "  Where <error_level> is a PHP error reporting leveln");

$level = $_SERVER['argv'][1];

echo "Error level $level includes:n";
foreach ($errorLevels as $k => $v) {
        if ($level & $v) echo "t $k n";

echo "n";

To use, just run the script with a single value parameter, like so:

shahar.e@wintergreen ~ $ php error_level.php 6134
Error level 6134 includes:


Glista 0.3 Released

Thanks to the very good holiday layout this year*, I finally got to release the next preview release of Glista – my super simple Gtk+ based to-do list manager for your desktop.

The major improvement in this release is category support. For a while, I didn’t want to add any features that will make the UI more complex than it is. Then I noticed that I tend to add ad-hoc categories to my tasks in order to sort them – that is instead of typing “Fix ZF bug #1234″ I type “ZF: Fix bug #1234″. This is a very natural way to organize your tasks for me, and I assume that it is for most people. So I decided to add category support by recognizing this colon-separated syntax and breaking any item typed in this way into a “cateogry: item” structure.
Continue reading