At Shoppimon we’ve been relying a lot on Amazon infrastructure – it may not be the most cost effective option for larger, more stable companies but for small start-ups that need to be very dynamic, can’t have high up-front costs and don’t have a large IT department its a great choice. We do try to keep our code clean from any vendor-specific APIs, but when it comes to infrastructure & operations management, AWS (with help from tools like Chef) has been great for us.
One of the AWS tools we use is CloudWatch – it allows us to monitor our infrastructure and get alerted when things go wrong. While its not the most flexible monitoring tool out there, it takes care of most of what we need right now and has the advantage of not needing to run an additional server and configure tools such as Nagios or Cacti. With its custom metrics feature, we can even send some app-level data and monitor it using the common Amazon set of tools.
However, there’s one big missing feature in CloudWatch: it doesn’t monitor your instance memory utilization. I suppose Amazon has all sorts of technical reasons not to provide this very important metric out of the box (probably related to the fact that their monitoring is done from outside the instance VM), but really if you need to monitor servers, in addition to CPU load and IO, memory utilization is one of the most important metrics to be aware of.
Due to an unfortunate shmelting accident (read: poor backup practices), I lost the SSH private key granting me the only way to access one of my EC2 hosted servers. Being unable to access the server, and unable to easily set a new public key through Amazon’s interfaces, I panicked for a few seconds. Then I started trying to hack my way in, and eventually found a way to set a new public key to my user. Here is what I did.
First, know that I was lucky: for this method to properly work, you need a few things:
The machine must be EBS based
You need to be able to afford a couple of minutes of downtime
You need to be able to withstand the effects of restarting the machine – for example, if you do not have an Elastic IP address associated with the machine, its public address will change. In some situations this is not acceptable.
After trying some different approaches, what worked for me was to do the following:
Generate a new keypair for yourself, and import the public key to your EC2 account
Start a new, clean, cheap machine (this will only be needed to do very simple things, so I recommend using a tiny machine) in the same availability zone as the affected machine
Stop the affected machine (do not terminate, STOP it – this is only possible with EBS machines)
Detach the root device from the affected machine (by default attached as /dev/sda1)
Attach the detached device to the new clean machine
SSH into the clean machine and mount the affected machine’s root filesystem somewhere (e.g. in /mnt/fs)
Now you can edit /mnt/fs/root/.ssh/authorized_keys (or on official Ubuntu machines /home/ubuntu/.ssh/authorized_keys) and add your new public key to it
Unmount the volume and terminate the clean machine – you no longer need it
Re-attach the root device to the affected machine (which should be stopped) – ensure to attach it as the same device it was before (e.g. /dev/sda1)
Re-start your old machine – you should now be able to use your new key!
Another approach which could work but I gave up on after a couple of attempts (I think it really depends on the init scripts in the machine you are using), is to stop the machine and change the User Data of it to a shell script that sets a new public key in the right place, then start it again.
Wow I haven’t posted in a while… I’m still at ZendCon in Santa Clara, and have just finished my last talk, which was about the different Zend Framework components that can be used to work with the Amazon Cloud Services of S3 and EC2.
Presentation was pretty good, although I had to hurry up in the end and skip some of the last slides.