MySQL Named Locks in Python Context Managers

I’ve been using MySQL (and recently MariaDB) for many years – it must be something like 14 by now – but every now and then I learn something new about it. Recently, I’ve learned about named locks and how you can use to use your already-there MySQL server as a mean to create distributed locks which are not related to a specific DB transaction.

Here is an example of a Python function who’s internal code will never execute concurrently, even in a multi-process, multi-machine distributed environment, as long as all processes talk to the same MySQL database:

NOTE: this code uses SQLAlchemy-like session semantics, but can easily be applied to any Python MySQL client.

That’s very nice! This code will try to obtain a lock for 5 seconds before resuming execution. If the lock is obtained (meaning no other MySQL client has requested to lock this specific lock), execution will resume and when finished the lock will be released. If the lock cannot be obtained within the given timeout, meaning some other client is currently running this code, an exception will be thrown. In any case the code will never run more than once at any given time. Oh, and MySQL named locks are connection-bound, meaning they are released if the connection dies or is explicitly closed – but again this should not happen while the code is executing, but will keep us safe if the entire program crashes, for example.

Before I knew about this feature, we used to do some custom logical locking in our code (which never feels like a solid solution) or use transaction-level row / table locking which coupled our application’s logic with DB operations too much; MySQL named locks are decoupled of any actual data in your tables – its just a mean to get centralized app-level locking. And while there might be other, more lean mechanisms to achieve that, if you already use MySQL I believe this is a very good solution. To clarify, the Python code between GET_LOCK and RELEASE_LOCK can be anything and does not need to tie in to the database.

However, the code example above is not very clean and has a few disadvantages:

  • It does not handle exceptions properly. If an exception is thrown after the lock was acquired and before it was released, we are most likely going to end up with the lock not being released until the MySQL connection is closed, and we don’t know when that’s going to happen. Not good.
  • No clear separation of concerns – we have a single function that handles both application logic (the part between the locks) and provides the locking implementation. This can be solved in several ways, but I believe the way I’ll demonstrate below to be most elegant.
  • No code reusability, which is somewhat tied to the previous point. We cannot reuse the locking mechanism in other code paths very easily, and need to retype it. We also cannot reuse the application logic between the locks in a non-locked context – or even in unit tests for that matter.

We can solve all these issues very elegantly using the with statement and context managers. These features are one of my favorites idioms more or less unique to the Python programming language, and it’s these sort of features that I believe really help make Python code very clean and elegant without being too verbose.

We’ll start by creating a context manager for MySQL named lock:

And then proceed to use it in our function:

So what does this do? The @contextmanager annotation help us easily create a context-managed resource using generator-like semantics; The wrapper ensures that no matter what happens, the lock is released as we leave the managed context whether it is because the code executed successfully or because an exception was thrown.

The semantics of using the locking_context.named_lock context manager are extremely simple and readable, and reusing the locking context manager is a matter of an import statement and a single line of code. By injecting a mock or monkey-patched object as the first argument of named_lock(), we can also easily test the context manager itself and any code using it. In addition, if we ever need to switch from MySQL-based locking to some other implementation, it can be done more easily.

While the same flow can be achieved in many other languages supporting, for example, try / finally semantics, in most cases I’m aware of one will need to use more complex and less readable flow control structures such as callables to accomplish (yes, if you’re a JavaScript programmer this might make sense to you, but remember that pyramids, while look impressive from the outside, are really tombs with mummies and traps on the inside). I believe it is features like context managers that make Python a language that encourage writing clean code.

Fixing logrotate errors and other MySQL issues on Ubuntu / Debian

For our MySQL databases on EC2, we back up the data by taking hourly snapshots of the volume that stores the MySQL data folder. This is a common practice, and in fact we do it using the popular ec2-consistent-snapshot script by Alestic. For backups this works great; in addition, having volume snapshots of MySQL data is very useful when we want to quickly launch a new MySQL server with a fairly recent state of our data – whether as a way to speed up synchronization of a new DB node or for testing purposes.

When launching a new EC2 instance we simply mount a new volume created from the latest DB snapshot on /var/lib/mysql and then start MySQL, and everything “just works” from there.

However, there are some quirks to this method: recently, I’ve encountered some errors from daily logrotate tasks which fail to flush the logs on a post-rotate script (this is from the logs of cron that runs logrotate):

Continue reading

Why I don’t like the term “NoSQL”

This is a rant post, but just to clarify things, it’s not a rant against the use of non-relational databases. I think that the shift in recent years from a world in which relational databases are used almost exclusively regardless of what the need is, to today’s situation where it is possible and even considered a good idea to choose the best fitting solution from any number of data storage paradigms, is a truly blessed change. I am a big fan of some non-relational database solutions, and to be honest as a programmer I enjoy using some of them more than I enjoy MySQL or any other relational database.

This is a rant against the too-common term “NoSQL”. In my opinion, “NoSQL” is an example of layman terminology which does not properly describe the concepts which in most cases it aims to describe, and should not be used by professionals which are technical enough to understand the true meaning of these concepts.

“NoSQL” databases are all about the data model – in most cases, the term is used to describe any kind of storage engine (or database) in which data is stored in non-relational manner: object storage, document storage, key-value storage etc. Indeed, the term is more about what the database is not that about what it is.

Relational data is data that can be described as a table – in contrast to what some think, the term “relational database” has nothing to do with the ability to define and enforce relationships between data in different tables. If this was the case, MySQL using the MyISAM storage engine would not be a relational database. The term “relation” is a mathematical term, which existed before the creation of relational databases and is used to describe a relationship between two finite data sets, which can be described in a tabular manner (and I am not a mathematician, not even close – so I apologize in advance for this likely inaccurate description).

But, SQL has nothing to do with this – SQL is the language used to send commands to the database, and nothing more. It is true that there is an almost 1-to-1 correlation between database engines that store data in a relational manner and database engines that use SQL as a query language, but saying that relational databases are SQL databases is like saying that  (and assume it’s 1984 again) the Russian language should be abolished when in fact we want to say that communism is an unfitting economic system. It’s a poor way to describe your intentions, and it makes you sound like an ignorant moron.

There are many client libraries and wrappers that allow you to query a relational database such as MySQL and Oracle without writing any SQL code yourself. This doesn’t make them NoSQL databases. Some popular non-relational databases, such as Amazon SimpleDB and the Google App Engine Data Store provide query languages that are quite similar to SQL. This doesn’t make them SQL databases.SQL is just a language, and it is a good one for what it’s supposed to do (putting aside all sorts of discrepancies between vendor-specific SQL implementations). SQL is not what NoSQL databases are NOT about.

So, next time when you want to use a term that describes all databases that do not store data in a tabular manner, use the term “non-relational” or if you really like acronyms, “NonRDBMS”, and not “NoSQL”. Or even better – use a term that describes what your preferred solution is, not what it is not. After all, when you say “non-relational storage engine”, you are probably not referring to your file system, right?