MySQL Named Locks in Python Context Managers

I’ve been using MySQL (and recently MariaDB) for many years – it must be something like 14 by now – but every now and then I learn something new about it. Recently, I’ve learned about named locks and how you can use to use your already-there MySQL server as a mean to create distributed locks which are not related to a specific DB transaction.

Here is an example of a Python function who’s internal code will never execute concurrently, even in a multi-process, multi-machine distributed environment, as long as all processes talk to the same MySQL database:

NOTE: this code uses SQLAlchemy-like session semantics, but can easily be applied to any Python MySQL client.

That’s very nice! This code will try to obtain a lock for 5 seconds before resuming execution. If the lock is obtained (meaning no other MySQL client has requested to lock this specific lock), execution will resume and when finished the lock will be released. If the lock cannot be obtained within the given timeout, meaning some other client is currently running this code, an exception will be thrown. In any case the code will never run more than once at any given time. Oh, and MySQL named locks are connection-bound, meaning they are released if the connection dies or is explicitly closed – but again this should not happen while the code is executing, but will keep us safe if the entire program crashes, for example.

Before I knew about this feature, we used to do some custom logical locking in our code (which never feels like a solid solution) or use transaction-level row / table locking which coupled our application’s logic with DB operations too much; MySQL named locks are decoupled of any actual data in your tables – its just a mean to get centralized app-level locking. And while there might be other, more lean mechanisms to achieve that, if you already use MySQL I believe this is a very good solution. To clarify, the Python code between GET_LOCK and RELEASE_LOCK can be anything and does not need to tie in to the database.

However, the code example above is not very clean and has a few disadvantages:

  • It does not handle exceptions properly. If an exception is thrown after the lock was acquired and before it was released, we are most likely going to end up with the lock not being released until the MySQL connection is closed, and we don’t know when that’s going to happen. Not good.
  • No clear separation of concerns – we have a single function that handles both application logic (the part between the locks) and provides the locking implementation. This can be solved in several ways, but I believe the way I’ll demonstrate below to be most elegant.
  • No code reusability, which is somewhat tied to the previous point. We cannot reuse the locking mechanism in other code paths very easily, and need to retype it. We also cannot reuse the application logic between the locks in a non-locked context – or even in unit tests for that matter.

We can solve all these issues very elegantly using the with statement and context managers. These features are one of my favorites idioms more or less unique to the Python programming language, and it’s these sort of features that I believe really help make Python code very clean and elegant without being too verbose.

We’ll start by creating a context manager for MySQL named lock:

And then proceed to use it in our function:

So what does this do? The @contextmanager annotation help us easily create a context-managed resource using generator-like semantics; The wrapper ensures that no matter what happens, the lock is released as we leave the managed context whether it is because the code executed successfully or because an exception was thrown.

The semantics of using the locking_context.named_lock context manager are extremely simple and readable, and reusing the locking context manager is a matter of an import statement and a single line of code. By injecting a mock or monkey-patched object as the first argument of named_lock(), we can also easily test the context manager itself and any code using it. In addition, if we ever need to switch from MySQL-based locking to some other implementation, it can be done more easily.

While the same flow can be achieved in many other languages supporting, for example, try / finally semantics, in most cases I’m aware of one will need to use more complex and less readable flow control structures such as callables to accomplish (yes, if you’re a JavaScript programmer this might make sense to you, but remember that pyramids, while look impressive from the outside, are really tombs with mummies and traps on the inside). I believe it is features like context managers that make Python a language that encourage writing clean code.

Fixing logrotate errors and other MySQL issues on Ubuntu / Debian

For our MySQL databases on EC2, we back up the data by taking hourly snapshots of the volume that stores the MySQL data folder. This is a common practice, and in fact we do it using the popular ec2-consistent-snapshot script by Alestic. For backups this works great; in addition, having volume snapshots of MySQL data is very useful when we want to quickly launch a new MySQL server with a fairly recent state of our data – whether as a way to speed up synchronization of a new DB node or for testing purposes.

When launching a new EC2 instance we simply mount a new volume created from the latest DB snapshot on /var/lib/mysql and then start MySQL, and everything “just works” from there.

However, there are some quirks to this method: recently, I’ve encountered some errors from daily logrotate tasks which fail to flush the logs on a post-rotate script (this is from the logs of cron that runs logrotate):

Continue reading

Monitoring EC2 instance memory usage with CloudWatch

At Shoppimon we’ve been relying a lot on Amazon infrastructure – it may not be the most cost effective option for larger, more stable companies but for small start-ups that need to be very dynamic, can’t have high up-front costs and don’t have a large IT department its a great choice. We do try to keep our code clean from any vendor-specific APIs, but when it comes to infrastructure & operations management, AWS (with help from tools like Chef) has been great for us.

One of the AWS tools we use is CloudWatch – it allows us to monitor our infrastructure and get alerted when things go wrong. While its not the most flexible monitoring tool out there, it takes care of most of what we need right now and has the advantage of not needing to run an additional server and configure tools such as Nagios or Cacti. With its custom metrics feature, we can even send some app-level data and monitor it using the common Amazon set of tools.

However, there’s one big missing feature in CloudWatch: it doesn’t monitor your instance memory utilization. I suppose Amazon has all sorts of technical reasons not to provide this very important metric out of the box (probably related to the fact that their monitoring is done from outside the instance VM), but really if you need to monitor servers, in addition to CPU load and IO, memory utilization is one of the most important metrics to be aware of.

Continue reading

On PHP Extensions

While teaching PHP I mention the term “extension” quite a lot – but I have realized that this may very well be a confusing term for non-PHPers. While most PHP training courses focus on code, I believe getting to know the PHP engine and environment is almost as important as learning to use the language. Extensions are a big and important part of PHP, but they seem to be a big knowledge gap about them and Googling for articles that explain what extensions are, and how they are used and installed, hardly returns good results. So, as part of my attempt to blog about more basic PHP topics, I’ve decided to try and come up with an overview of PHP extensions.

So what are PHP extensions?

An extension in PHP is in fact a module providing some functionality to the PHP Engine. While the term makes it sound like extensions provide some kind of special functionality, in reality many of the language’s most basic functions and classes are provided in extensions. Many extensions are shipped as part of the default PHP distribution, and some are in fact compiled into PHP in such way that they cannot even be unloaded. Come to think of it, perhaps it is best to think of PHP extensions as “language modules”.

Continue reading

Quickly Creating a New Admin User on Ubuntu

Working quite a lot on the Ubuntu Server EC2 images, I am often faced with a need to create one or more additional admin users, which have the same permissions as the first user (“ubuntu” in these images). I did a little bit of searching but didn’t find any way to easily add a new user with the same groups as the Ubuntu user, so I crafted a little command. I’m pasting it here mostly for future self reference, and also in hope this helps someone:

sudo useradd -m -G `groups ubuntu | cut -d" " -f4- | sed 's/ /,/g'` -s/bin/bash newuser

Of course ‘ubuntu’ is the user you want to copy, and ‘newuser’ is the name of the new user.

Note that the new user will be in the admin group but will still require a password when using sudo (that’s because in the EC2 images ‘ubuntu’ is the only user with NOPASSWD privileges. I personally believe this is a good thing, but if you want you can always add NOPASSWD on the admin group in /etc/sudoers.

 

Password hashing revisited

From user comments on my recent password hashing post, I’ve learned about a better solution for password hashing – rather than using hashing algorithms designed to be fast such as SHA-1 and SHA-256, use slower, and more important future-adaptable algorithms such as bcrypt. I have to say this is one of the reasons I love this community – you always learn new things.

I won’t repeat the reasons why methods such as bcrypt are preferred (read the comments on the previous post to learn why). However, I will note that starting from PHP 5.3 bcrypt is in fact built-in to PHP – so if you do not require portability to older versions of PHP, bcrypt-hasing could be done very easily, using the useful but a bit enygmatic crypt function:

Continue reading

Generating ZF Autoloader Classmaps with Phing

One of the things I’ve quickly discovered when working on Shoppimon is that we need a build process for our PHP frontend app. While the PHP files themselves do not require any traditional “build” step such as processing or compilation, there are a lot of other tasks that need to happen when taking a version from the development environment to staging and to production: among other things, our build process picks up and packages only the files needed by the app (leaving out things like documentation, unit tests and local configuration overrides), minifies and pre-compresses CSS and JavaScript files,¬† and performs other helpful optimizations on the app, making it ready for production.

Since Shoppimon is based on Zend Framework 2.0, it also heavily relies on the ZF2.0 autoloader stack. Class autoloading is convenient, and was shown to greatly improve performance over using require_once calls. However, different autoloading strategies have pros and cons: while PSR-0 based autoloading (the so called Standard Autoloader from ZF1 days) works automatically and doesn’t require updating any mapping code for each new class added or renamed, it has a significant performance impact compared to classmap based autoloading.

Fortunately, using ZF2′s autoloader stack and Phing, we can enjoy both worlds: while in development, standard PSR-0 autoloading is used and the developer can work smoothly without worrying about updating class maps. As we push code towards production, our build system takes care of updating class map files, ensuring super-fast autoloading in production using the ClassMapAutoloader. How is this done? Read on to learn.

Continue reading

Replacing a lost SSH key on an Amazon EC2 machine

Due to an unfortunate shmelting accident (read: poor backup practices), I lost the SSH private key granting me the only way to access one of my EC2 hosted servers. Being unable to access the server, and unable to easily set a new public key through Amazon’s interfaces, I panicked for a few seconds. Then I started trying to hack my way in, and eventually found a way to set a new public key to my user. Here is what I did.

First, know that I was lucky: for this method to properly work, you need a few things:

  • The machine must be EBS based
  • You need to be able to afford a couple of minutes of downtime
  • You need to be able to withstand the effects of restarting the machine – for example, if you do not have an Elastic IP address associated with the machine, its public address will change. In some situations this is not acceptable.

After trying some different approaches, what worked for me was to do the following:

  1. Generate a new keypair for yourself, and import the public key to your EC2 account
  2. Start a new, clean, cheap machine (this will only be needed to do very simple things, so I recommend using a tiny machine) in the same availability zone as the affected machine
  3. Stop the affected machine (do not terminate, STOP it – this is only possible with EBS machines)
  4. Detach the root device from the affected machine (by default attached as /dev/sda1)
  5. Attach the detached device to the new clean machine
  6. SSH into the clean machine and mount the affected machine’s root filesystem somewhere (e.g. in /mnt/fs)
  7. Now you can edit /mnt/fs/root/.ssh/authorized_keys (or on official Ubuntu machines /home/ubuntu/.ssh/authorized_keys) and add your new public key to it
  8. Unmount the volume and terminate the clean machine – you no longer need it
  9. Re-attach the root device to the affected machine (which should be stopped) – ensure to attach it as the same device it was before (e.g. /dev/sda1)
  10. Re-start your old machine – you should now be able to use your new key!

Another approach which could work but I gave up on after a couple of attempts (I think it really depends on the init scripts in the machine you are using), is to stop the machine and change the User Data of it to a shell script that sets a new public key in the right place, then start it again.

And really, you should backup your keys!

Why I don’t like the term “NoSQL”

This is a rant post, but just to clarify things, it’s not a rant against the use of non-relational databases. I think that the shift in recent years from a world in which relational databases are used almost exclusively regardless of what the need is, to today’s situation where it is possible and even considered a good idea to choose the best fitting solution from any number of data storage paradigms, is a truly blessed change. I am a big fan of some non-relational database solutions, and to be honest as a programmer I enjoy using some of them more than I enjoy MySQL or any other relational database.

This is a rant against the too-common term “NoSQL”. In my opinion, “NoSQL” is an example of layman terminology which does not properly describe the concepts which in most cases it aims to describe, and should not be used by professionals which are technical enough to understand the true meaning of these concepts.

“NoSQL” databases are all about the data model – in most cases, the term is used to describe any kind of storage engine (or database) in which data is stored in non-relational manner: object storage, document storage, key-value storage etc. Indeed, the term is more about what the database is not that about what it is.

Relational data is data that can be described as a table – in contrast to what some think, the term “relational database” has nothing to do with the ability to define and enforce relationships between data in different tables. If this was the case, MySQL using the MyISAM storage engine would not be a relational database. The term “relation” is a mathematical term, which existed before the creation of relational databases and is used to describe a relationship between two finite data sets, which can be described in a tabular manner (and I am not a mathematician, not even close – so I apologize in advance for this likely inaccurate description).

But, SQL has nothing to do with this – SQL is the language used to send commands to the database, and nothing more. It is true that there is an almost 1-to-1 correlation between database engines that store data in a relational manner and database engines that use SQL as a query language, but saying that relational databases are SQL databases is like saying that¬† (and assume it’s 1984 again) the Russian language should be abolished when in fact we want to say that communism is an unfitting economic system. It’s a poor way to describe your intentions, and it makes you sound like an ignorant moron.

There are many client libraries and wrappers that allow you to query a relational database such as MySQL and Oracle without writing any SQL code yourself. This doesn’t make them NoSQL databases. Some popular non-relational databases, such as Amazon SimpleDB and the Google App Engine Data Store provide query languages that are quite similar to SQL. This doesn’t make them SQL databases.SQL is just a language, and it is a good one for what it’s supposed to do (putting aside all sorts of discrepancies between vendor-specific SQL implementations). SQL is not what NoSQL databases are NOT about.

So, next time when you want to use a term that describes all databases that do not store data in a tabular manner, use the term “non-relational” or if you really like acronyms, “NonRDBMS”, and not “NoSQL”. Or even better – use a term that describes what your preferred solution is, not what it is not. After all, when you say “non-relational storage engine”, you are probably not referring to your file system, right?

HTML 5 Canvas Game of Life

I recently started looking into different HTML 5.0 related technologies, one of the most exciting ones being the new Canvas tag and API.

As a little test, I’ve implemented a little Game of Life thing using HTML 5 Canvas, which you can see in action here: http://arr.gr/playground/life/ (view source to see the code behind it).

Game of Life in HTML5 Canvas

The algorithm is not very smart so it’s kind of slow and CPU intensive, but still fun to watch. It works nicely on Firefox 4.0, and latest Chrome and Safari versions, and a bit slow on Firefox 3.6. I did not test with any IE version but I do not expect it to work in IE 6 or 7, maybe 8 and probably 9.

I think Game of Life by itself is worth at least an entire post regardless of this HTML5 implementation, especially because I’m a big fan of things that bring CS and philosophy together, so I may write about it at a later point, but for now I suggest you let it run for a while (a few hundreds of generations) and see what you get :)