Monitoring EC2 instance memory usage with CloudWatch

At Shoppimon we’ve been relying a lot on Amazon infrastructure – it may not be the most cost effective option for larger, more stable companies but for small start-ups that need to be very dynamic, can’t have high up-front costs and don’t have a large IT department its a great choice. We do try to keep our code clean from any vendor-specific APIs, but when it comes to infrastructure & operations management, AWS (with help from tools like Chef) has been great for us.

One of the AWS tools we use is CloudWatch – it allows us to monitor our infrastructure and get alerted when things go wrong. While its not the most flexible monitoring tool out there, it takes care of most of what we need right now and has the advantage of not needing to run an additional server and configure tools such as Nagios or Cacti. With its custom metrics feature, we can even send some app-level data and monitor it using the common Amazon set of tools.

However, there’s one big missing feature in CloudWatch: it doesn’t monitor your instance memory utilization. I suppose Amazon has all sorts of technical reasons not to provide this very important metric out of the box (probably related to the fact that their monitoring is done from outside the instance VM), but really if you need to monitor servers, in addition to CPU load and IO, memory utilization is one of the most important metrics to be aware of.

So, with a little bit of research I’ve found some scripts that utilize the CloudWatch API to send memory utilization info as a custom metric to AWS. However, most of these scripts require that you provide some kind of credentials (API keys) and I feel really uncomfortable storing and managing API keys on all sorts of different machines, even with automation tools like Chef. The less I have to do it, the better. Amazon has a pretty nice answer for that - IAM Roles which allow to authorize access to specific AWS services (including S3 and CloudWatch) on an EC2 instance basis. Since we want all instances to be able to do certain things (like send their own metrics to CloudWatch or access our EC2 hosted private DEB repo), all our EC2 servers get some permissions via IAM roles. But I couldn’t find any solution that supports IAM roles and does the job right.

So, I did a little bit of Python hacking using the wonderful boto library, I came up with this tiny utility that grabs memory and swap utilization percentage and sends it to CloudWatch as a custom metric. It relies on the machine having an IAM role set up, but I’m pretty sure that if you don’t want to use IAM Roles, you can simply create a boto config file with your AWS credentials instead.

You can install it like so if you want it running via cron every minute (note that we use ‘nobody’ as a user, if you rely on a ~/.boto config file you may want to adjust):

$ curl https://gist.githubusercontent.com/shevron/6204349/raw/cw-monitor-memusage.py | sudo tee /usr/local/bin/cw-monitor-memusage.py
$ sudo chmod +x /usr/local/bin/cw-monitor-memusage.py
$ echo "* * * * * nobody /usr/local/bin/cw-monitor-memusage.py" | sudo tee /etc/cron.d/cw-monitor-memusage

And that’s it – memory usage stats should now appear in your CloudWatch console and you can create alarms based on them. Note that you may need to enable advanced monitoring on instances for this to work – this comes at an additional small cost. I’m not sure if this required or not, you can try and see.

Feel free to use this little script for any purpose. If you improve it, please let us know!

EDIT: fixed the gist URL

4 thoughts on “Monitoring EC2 instance memory usage with CloudWatch

  1. I was in the same situation needing memory statistics in Cloudwatch so thanks for this post, it saved me a lot of time. :)

    Small point though, your memory percentage calculation is off since you are adding Buffers and Cached to the MemFree value which gives a much lower utilisation figure than the real value.
    This could be an OS difference in the values returned from collect_memory_usage() – I am testing this on Ubuntu and if I just use the MemFree value the percentage is correct.

    • @Stacy, what are you comparing the usage percentage against?

      I’m testing on Ubuntu as well. Buffers and Cache are usually considered free memory in the sense that its memory that the OS will free as soon as an application needs it. This is consistent with what `free` for example will show. Check out the explanation here: http://www.linuxatemyram.com/ (I have just found this site and must say its blink-tag style brings back memories :)

      I have noticed that Ubuntu’s motd typically shows an even lower utilization percentage from what I calculate, but was unable to figure out exactly what they measure.

  2. One recommendation I have is that you may want to put a if around making the swap into a percentage. I’ve got some instances where I turn off swap for instance cassandra, it’s recommended to not have swap. Soo I dropped this if statement into the mix to avoid a divided by 0 error. Thanks for taking the time to document this worked like a charm.

    if mem_usage['SwapTotal'] != 0 :
    swap_percent = swap_used / mem_usage['SwapTotal'] * 100
    else:
    swap_percent = 0

    metrics = {'MemUsage': mem_used / mem_usage['MemTotal'] * 100,
    'SwapUsage': swap_percent }