At Shoppimon we’ve been relying a lot on Amazon infrastructure – it may not be the most cost effective option for larger, more stable companies but for small start-ups that need to be very dynamic, can’t have high up-front costs and don’t have a large IT department its a great choice. We do try to keep our code clean from any vendor-specific APIs, but when it comes to infrastructure & operations management, AWS (with help from tools like Chef) has been great for us.
One of the AWS tools we use is CloudWatch – it allows us to monitor our infrastructure and get alerted when things go wrong. While its not the most flexible monitoring tool out there, it takes care of most of what we need right now and has the advantage of not needing to run an additional server and configure tools such as Nagios or Cacti. With its custom metrics feature, we can even send some app-level data and monitor it using the common Amazon set of tools.
However, there’s one big missing feature in CloudWatch: it doesn’t monitor your instance memory utilization. I suppose Amazon has all sorts of technical reasons not to provide this very important metric out of the box (probably related to the fact that their monitoring is done from outside the instance VM), but really if you need to monitor servers, in addition to CPU load and IO, memory utilization is one of the most important metrics to be aware of.
Working quite a lot on the Ubuntu Server EC2 images, I am often faced with a need to create one or more additional admin users, which have the same permissions as the first user (“ubuntu” in these images). I did a little bit of searching but didn’t find any way to easily add a new user with the same groups as the Ubuntu user, so I crafted a little command. I’m pasting it here mostly for future self reference, and also in hope this helps someone:
sudo useradd -m -G `groups ubuntu | cut -d" " -f4- | sed 's/ /,/g'` -s/bin/bash newuser
Of course ‘ubuntu’ is the user you want to copy, and ‘newuser’ is the name of the new user.
Note that the new user will be in the admin group but will still require a password when using sudo (that’s because in the EC2 images ‘ubuntu’ is the only user with NOPASSWD privileges. I personally believe this is a good thing, but if you want you can always add NOPASSWD on the admin group in /etc/sudoers.
Due to an unfortunate shmelting accident (read: poor backup practices), I lost the SSH private key granting me the only way to access one of my EC2 hosted servers. Being unable to access the server, and unable to easily set a new public key through Amazon’s interfaces, I panicked for a few seconds. Then I started trying to hack my way in, and eventually found a way to set a new public key to my user. Here is what I did.
First, know that I was lucky: for this method to properly work, you need a few things:
- The machine must be EBS based
- You need to be able to afford a couple of minutes of downtime
- You need to be able to withstand the effects of restarting the machine – for example, if you do not have an Elastic IP address associated with the machine, its public address will change. In some situations this is not acceptable.
After trying some different approaches, what worked for me was to do the following:
- Generate a new keypair for yourself, and import the public key to your EC2 account
- Start a new, clean, cheap machine (this will only be needed to do very simple things, so I recommend using a tiny machine) in the same availability zone as the affected machine
- Stop the affected machine (do not terminate, STOP it – this is only possible with EBS machines)
- Detach the root device from the affected machine (by default attached as /dev/sda1)
- Attach the detached device to the new clean machine
- SSH into the clean machine and mount the affected machine’s root filesystem somewhere (e.g. in /mnt/fs)
- Now you can edit /mnt/fs/root/.ssh/authorized_keys (or on official Ubuntu machines /home/ubuntu/.ssh/authorized_keys) and add your new public key to it
- Unmount the volume and terminate the clean machine – you no longer need it
- Re-attach the root device to the affected machine (which should be stopped) – ensure to attach it as the same device it was before (e.g. /dev/sda1)
- Re-start your old machine – you should now be able to use your new key!
Another approach which could work but I gave up on after a couple of attempts (I think it really depends on the init scripts in the machine you are using), is to stop the machine and change the User Data of it to a shell script that sets a new public key in the right place, then start it again.
And really, you should backup your keys!