Outage Log

5th May 2014

The running guests were shut down (cleanly) at approximately 8AM to allow the replacement of a failed hard drive.

At this point the system had an uptime of over 1000 days, so the timing was unfortunate.

7th October 2011

Joey got in touch to say his guest was "acting weird". I saw from the serial console the host machine was spamming the following message multiple times a second:

ICMPv6 ND: ndisc_build_skb() failed to allocate an skb, err=-11

Stopping all guests and restarting them "cured" it. Will keep an eye out.

12th January 2010

System accidentally rebooted by Steve, who was not paying attention to which window he typed "halt" into. I've installed molly-guard now.

19th December 2009

The same OOM problem again. Even on 2.6.27.41 kernel and newer kvm we had OOM. I've now bumped forward to the 2.6.32 kernel release and disabled the use of VNC.

10th December 2009

The same OOM condition occurred ahead of my planned reboot for tomorrow.

All guests shut down cleanly and host system rebooted into a more recent kernel + kvm pair. The packages I installed are public.

9th December 2009

The host machine started giving OOM errors, and killing guests.

This was "solved" via a reboot of the host machine. I've also upgraded the version of kvm installed upon the host - in case the OOM was caused by memory leaks in the virtio layer.

As not all guests have been migrated yet I've suspended that until we can see if the problem recurs. All the memory has been passed with memtest already - and it isn't yet obvious what the cause was. Perhaps a leak somewhere.