In the last post, I talked about the overhauls made to the infrastructure and code to bring everything up to date. This last post in the series covers the efforts made to minimise downtime, the mistakes we made and where RTT is heading next. I’ve also included the migration timeline at the bottom.
As mentioned previously, Realtime Trains has run for some time in a master and hot standby configuration. We wrote a substantial migration plan covering March to August which covered a number of change freezes including the Swanage Railway Diesel Gala and pre-planned leave including charity work in Zambia.
During the migrations, this was maintained continously with at least two functioning operating copies of the service. We initially installed our servers into London in April, and carried out substantial testing in May through July. Due to the length of this testing, which we initially allowed six weeks for, our migration plan was extended somewhat. The plan showed us to have completely moved out of the York DC by early August but this extended testing period somewhat stymied plans and I’m grateful for the DC allowing us to extend the cancellation by a month.
London’s copy of RTT was eventually turned on in late July and allowed to soak in for several weeks. We made Gatwick RTT’s primary in mid June1. London was made primary for the website on 19 August, the commercial API on 11 September and the public API five days later. By contrast, RailMiles was used as a network soak test and migrated to Gatwick on 8 April and London on 5 June. York was fully withdrawn on 23 August. Barring the migration of the public API, all changes to RTT took place during the day and not during overnight maintenance windows.
There was no recorded downtime to of any web services directly but did to our DNS servers on 8 June. As part of the migrations, a new set of three DNS servers were planned to be implemented in London, Manchester and Ingenio IT‘s facility in Bournemouth. The previous three were based in York, Gatwick and in DigitalOcean. The process involved copying the zones between DNS servers and effectively running in duplicate. I’ll hold my hands up and I screwed up here, the plan was to change the NS GLUE records over and leave both sets running for a few days - but I accidentally stopped two of the three DNS server services. This led to outages to some ISPs for most of the day until we proxied traffic back through the old IPs to the new set. A new operating policy to prevent this happening again is now in place.
The Future of Realtime Trains
At the risk of sounding like a broken record, not a lot has been outwardly happening in the last six years. However, lurking in the shadows there is a reasonable amount brewing. There is hopefully an announcement about Realtime Preserved Trains coming soon and there is a new website on the way which will be available before the year is out.
In the UK data sphere, we’ve been working on a detailed data warehouse called Offsetstore that ingests all of the Train Describer feed and provides statistical analysis and a graph of the UK rail network. This graph is traversable by TIPLOC, berth, etc, and can output, from a schedule, a list of berths that a train is scheduled to pass through. We’re using this to inform development on better ‘off route’ handling and hopefully will have something to show on that next year.
We’re looking at bringing RTT to other parts of the world as well, but more on that in coming months. If you have any suggestions on places high up on the list, then email us.
Overall Migration Timeline
- January: investigations commenced into finding a new DC
- February - April: visits to London & South East DC candidates
- April: initial moves into London
- April - May: visits to northerly DC candidates
- April - July: network testing in London
- June: move into Manchester, migrate RailMiles into London as production soak test
- July - August: continuing setup and migration of software into London
- August: set up RTT in London, decomissioning of York
-
This was across a series of days. ↩︎