MySQL Named Locks in Python Context Managers

I’ve been using MySQL (and recently MariaDB) for many years – it must be something like 14 by now – but every now and then I learn something new about it. Recently, I’ve learned about named locks and how you can use to use your already-there MySQL server as a mean to create distributed locks which are not related to a specific DB transaction.

Here is an example of a Python function who’s internal code will never execute concurrently, even in a multi-process, multi-machine distributed environment, as long as all processes talk to the same MySQL database:

NOTE: this code uses SQLAlchemy-like session semantics, but can easily be applied to any Python MySQL client.

That’s very nice! This code will try to obtain a lock for 5 seconds before resuming execution. If the lock is obtained (meaning no other MySQL client has requested to lock this specific lock), execution will resume and when finished the lock will be released. If the lock cannot be obtained within the given timeout, meaning some other client is currently running this code, an exception will be thrown. In any case the code will never run more than once at any given time. Oh, and MySQL named locks are connection-bound, meaning they are released if the connection dies or is explicitly closed – but again this should not happen while the code is executing, but will keep us safe if the entire program crashes, for example.

Before I knew about this feature, we used to do some custom logical locking in our code (which never feels like a solid solution) or use transaction-level row / table locking which coupled our application’s logic with DB operations too much; MySQL named locks are decoupled of any actual data in your tables – its just a mean to get centralized app-level locking. And while there might be other, more lean mechanisms to achieve that, if you already use MySQL I believe this is a very good solution. To clarify, the Python code between GET_LOCK and RELEASE_LOCK can be anything and does not need to tie in to the database.

However, the code example above is not very clean and has a few disadvantages:

  • It does not handle exceptions properly. If an exception is thrown after the lock was acquired and before it was released, we are most likely going to end up with the lock not being released until the MySQL connection is closed, and we don’t know when that’s going to happen. Not good.
  • No clear separation of concerns – we have a single function that handles both application logic (the part between the locks) and provides the locking implementation. This can be solved in several ways, but I believe the way I’ll demonstrate below to be most elegant.
  • No code reusability, which is somewhat tied to the previous point. We cannot reuse the locking mechanism in other code paths very easily, and need to retype it. We also cannot reuse the application logic between the locks in a non-locked context – or even in unit tests for that matter.

We can solve all these issues very elegantly using the with statement and context managers. These features are one of my favorites idioms more or less unique to the Python programming language, and it’s these sort of features that I believe really help make Python code very clean and elegant without being too verbose.

We’ll start by creating a context manager for MySQL named lock:

And then proceed to use it in our function:

So what does this do? The @contextmanager annotation help us easily create a context-managed resource using generator-like semantics; The wrapper ensures that no matter what happens, the lock is released as we leave the managed context whether it is because the code executed successfully or because an exception was thrown.

The semantics of using the locking_context.named_lock context manager are extremely simple and readable, and reusing the locking context manager is a matter of an import statement and a single line of code. By injecting a mock or monkey-patched object as the first argument of named_lock(), we can also easily test the context manager itself and any code using it. In addition, if we ever need to switch from MySQL-based locking to some other implementation, it can be done more easily.

While the same flow can be achieved in many other languages supporting, for example, try / finally semantics, in most cases I’m aware of one will need to use more complex and less readable flow control structures such as callables to accomplish (yes, if you’re a JavaScript programmer this might make sense to you, but remember that pyramids, while look impressive from the outside, are really tombs with mummies and traps on the inside). I believe it is features like context managers that make Python a language that encourage writing clean code.

Bitbucket: Converting Hg repositories to Git

Recently I started using Bitbucket for private repository hosting for a project I’m working on. While I had no experience with Mercurial, I figured it can’t be that tricky – and Bitbucket offers free private hosting which is what this project needed (couldn’t go public, couldn’t pay, didn’t have the time to set up self-hosted SCM hosting).

All in all I like Bitbucket (although I have to admit on most aspects they seem to fall behind GitHub), but not so much using Mercurial – for all sorts of reasons it felt quirky and less polished than Git, which honestly I have much more experience with.

So following Bitbucket’s big announcement on Git support, I’ve decided to migrate my repositories from Hg to Git, while keeping them on Bitbucket and maintaining repository history. I’m happy to say it was relatively a piece of cake to successfully achieve. Here is what I did:

Step 1: Set up your repositories

First, I renamed my old Hg repository through Mercurial’s web interface, to something like “MyProject” to “MyProject Hg”. This changes the repository URL, but since I wasn’t planning on using it anymore that doesn’t really matters – plus you can always rename back if things go bad.

Then, I created a new Git repository with the name of the previous repository, e.g. “MyProject”. Again, that can be easily done from Bitbucket’s Web interface.

Step 2: Install the Mercurial hggit plugin

The hggit plugin allows your Mercurial command line tool hg to talk to git repositories – that is push and pull from Git. Installing it is easy, as it is probably available from your package manager. On Mac, if you use Macports, you can run:

  $ sudo port install py26-hggit

While on Ubuntu, run:

  $ sudo aptitude install mercurial-git

Then, make sure to load the plugin by adding the following lines to your ~/.hgrc file:

  [extensions]
  hggit=

Congratulations: Your hg command now speaks Git!

Step 3: Push your code into your Git repository

To push your code into your new Git repository, you basically need to run two commands:

First, create a Mercurial bookmark that references master to your default branch. This will help Git create the right refs later on:

  $ cd ~/myproject-hg-repo/
  $ hg bookmark -r default master

Next, simply push your code into the newly created Git repository:

  $ hg push git+ssh://git@bitbucket.org:shaharevron/myproject.git

Of course, make sure to change the repository URL to the URL of your new Git repository. To make sure hg understands you’re referring to a Git repository, if using SSH add the git+ssh:// prefix to the URL. This should push your entire repository to the new Git repository, and within a few seconds up to a few minutes (depending on how big your repository is), you should be able to see all your old commits in the new Git repo.

Step 4: Switch your local repository to use Git

Now that your new Git repo is up at Bitbucket, you’ll need to switch to using Git locally. There are two paths you can take here: the safe one, is to simply git clone your code to a new working directory and work from there. It’s safe, and will work well. However, if you’re a cowboy like me, and are too lazy to create a new IDE project on a different directory, you can in fact simply switch to working with Git on the same directory (but I still seriously recommend you ensure Bitbucket really does have your code as backup…).

Here is how to do it. From the local repository directory, run:

  $ git init
  $ git remote add origin git@bitbucket.org:shaharevron/myproject.git
  $ git pull origin master
  $ git reset --hard HEAD

Again, replace the repository URL with your own. This will “merge” everything in your Git repo into the local working directory. You will need to create a new .gitignore file if needed – and can now simply delete the .hg directory, as it is no longer needed. You can now happily use Git with your Bitbucket code.

While there shouldn’t be any problems, I also recommend keeping your old Hg repository around on Bitbucket for a few days, just to make sure nothing blows up – and delete it from Bitbucket’s web interface once you’re sure everything works well.