The move of server NC018 to the new data centre has been completed. Due to two failures, the downtime for some websites was longer than the planned 12 hours. These two failures were as follows:
- For some reason the data centre did not actually configure the server to use the new IP address, even though this was expressly a part of — and indeed a requirement of — the move. This resulted in most websites being down when the server came back online because most websites on the server use the server’s primary IP address. (Websites that have their own or share a secondary IP address had no problems, initially.) We have made a submission to the data centre to have this issue reviewed. However, given that such physical moves are so rare, it’s unlikely we’ll be in a position to test whether or not lessons have been learnt. For ourselves, we’ve learnt that a large part of the problem could have been avoided if we actually hosted most domains on a secondary IP address, rather than the primary. We’ll consider following through on this, but given other plans that will come to pass long before the chance of another physical move comes about, we may not do this at this time.
- Secondly, a script that we were assured by the provider of the control panel would work to assign domains quickly to the new IP address as soon as the server came back online, had no effect. The lesson here is that nothing can take the place of exhaustive testing.
I mentioned above that websites on their own IP address experienced no problems “initially”. Once trouble tickets were opened with the data centre, we and their technicians were working at cross purposes at one point, and they essentially redid work we had done to bring websites on the primary IP address online, and at the same time taking down those websites (including the NinerNet website) on their own IP addresses. When this was discovered it was quickly fixed.
We had some reports from clients that email was arriving out of order. This is to be expected when a server has been offline for a while. This is what happens: Let’s say an email is sent 5 minutes after the server goes offline. It can’t be delivered, so the sending mail server holds onto it and tries again in 5 minutes. It still can’t be delivered, so it tries again in 10 minutes, then half an hour, then every hour, and so on. So if the server comes back online part way through the hour wait, but a different email is sent a minute after the server comes back online, that newer email will be delivered immediately, as usual, but the older email won’t be delivered until the hour wait has expired.
Clients hosting some or all of their services on a server other than NC018 and using the third nameserver we provided were up for the duration of the server move.
There was a minor issue with some outbound email that was on the server before the move. We’re still investigating that. However, there were no issues with inbound email that we’re aware of.
Unrelated to the move itself was the fact that posts to our Identi.ca and Twitter accounts did not appear. Of course, these services are independent of NinerNet — which is part of the point, actually — so this was beyond our control. Our status website remained online at status.ninernet.net. It will revert to status.niner.net, but will still be available at the former address, now and in the future.
Again, we appreciate your patience and understanding during this necessary move. If you have any questions or concerns, please do not hesitate to let us know.
Craig