The Hyperpessimist

The grandest failure.

Debian 7 Update Experiences

A couple of days ago Debian 7.0, “wheezy” came out so I decided to update my servers. Updating servers is something I don’t really look forward usually, since the fact that they are remote and breaking their network connections means I’ll have a hard way fixing this. That said, it is necessary because support does not last forever and the sooner I update the better.

So I have the following setup: one host system running KVM managed by libvirt and running two VMs, basically one for PHP/MySQL and one “safe” VM for the rest, all of them running Debian 6.

I started with the PHP VM, since it was the least critical one to get some initial experiences. When following the release notes you’ll end up… perfectly fine. The nginx server got updated to a much newer version and hopefully the update scripts will not forget starting it anymore, PHP was updated from 5.3 to 5.4 which doesn’t seem to have caused any regressions as far as I know and MySQL was updated from 5.1 to 5.5 which restarted and continued to work just fine. That part went quite well.

The second system took quite a bit longer, since I had to update PostgreSQL from 8.4 to 9.1 and this is a royal pain in the ass because Postgres can’t migrate old databases on its own, but rather needs a script called pg_upgradecluster to dump data from a running 8.4 server to import it in a running 9.1 server. Now this is the part where it failed.

Between 8.4 and 9.1 the word new ceased to be a keyword, so the 9.1 version of pg_dump does not quote it. But when running against an 8.4 server, it is a keyword, so the server returns an error because of invalid SQL. I am not quite happy how the Postgres developers handle the behaviour of pg_dump because always quoting identifiers would avoid this problem once and for all, but that’s how it is. Fortunately, 9.1 pg_dump got an option --quote-all-identifiers so I copied pg_upgradecluster and added this option myself. This worked… fine, well, except that such an update of a database that was some modest 1 GB large took about 5 hours and was thrashing the server really hard. I suppose it managed to hit both a worst-case algorithmic complexity in Postgres and maybe also an ext3 failure, since kjournald was running like crazy. Anyway, in the end I am happy this worked after all, I also filed a bug. It seems to be fixed in PostgreSQL now, we’ll see whether it makes its way back to Debian Wheezy.

That was my second machine. The rest of the failures were only virtualenv, because mkvirtualenv copies the python binary, so it copied the old Python 2.6 from squeeze which was built against OpenSSL 0.9.8. As I deleted this library, since Wheezy comes with 1.0.1, all copied Python binaries failed to start. Recreating the virtualenvs was not a big deal, this way they could also be made to use Python 2.7 directly. I don’t blame Debian on this, maybe there is a reason why virtualenvwrapper does this, but it is something to keep in mind.

The last machine was the host system. I was worried about this one, since I didn’t feel like fixing something that might have broken inside libvirt. libvirt is a huge beast and I frankly am happy to leave it alone as long as it runs. So first I tested saving and restoring VMs, which worked really fine. Then I ran the update which went smoothly until it came to the point where it was unable to write GRUB2 to /dev/md0, the RAID1 that holds my /boot partition. I was getting really worried about this, because debugging a failing bootloader over the internet is something that takes forever and I could be sure that the previous GRUB-legacy wouldn’t be able to boot the system. So I backed up the partition, thought about replacing it with a non-RAID version, tried resizing the partition to fit a small GRUB2 partition and finally I decided to recreate it and restore it as a RAID1 again. In the process I realized that there seems to be multiple RAID metadata formats nowadays: 0.9 and 1.1 as well as 1.2. mdadm notified me that only 0.9 is supported for /boot partitions, but it seems current versions of GRUB2 also have modules for the newer formats. Anyway, I chose to use the 0.9 format, ran grub-install /dev/sda and grub-install /dev/sdb, hoped for the best and rebooted.

Lo and behold, the system came up again without problems. The only issue is that desipte disabling auto-start for my VMs libvirt seemed to have started them, so they lost their uptime and I couldn’t just restore them from file. So they went through a forceful poweroff but besides this, everything is fine. Overall, I’d say this update went rather smooth and I’ve bought myself some years until Debian 8.0, “Jessie” comes out.