Up until the new processor was rolled out last weekend, Realtime Trains primarily connected directly to the open data feeds provided by Network Rail. You can find out some more basic technical information about them on the Open Data Wiki. Over the years, I’ve seen various issues with them including networking glitches resulting in firewall blocks at their end as well as general loss of messaging. Most of the faults aren’t in our control but networking glitches are one of those things that it’s possible to go a long way to mitigate.
As part of the infrastructure works over the last year, we built a better mechanism to ingest these feeds, and some others, to avoid multiple connections being necessary and to improve high availability operations. Following the migration of the processor, we now only have one connection per datacentre to each feed. The layout of the systems to complete this task are a little complex (and some have said over-engineered) so thought it would be an interesting one to go over.
I’ve included a process diagram of the lifecycle of a message above. A square represents an MQ server, hexagon an instance performing message processing (Camel) and the text in the middle is the queue (blue) or topic (red) names. Our local message queue servers, running using ActiveMQ Artemis, run in a clustering pair in each DC, and each DC is independent of the other. All of the actions relating to interacting with message queues, both upstream and local, are done using Apache Camel.
There is one virtual machine dedicated to subscribing to all feeds in each DC. The Camel instance on this system also handles data transformation to mitigate known issues with any feed. Essentially, upstream message queues are subscribed to here and then the output of the Camel instance is a transformed JSON message. A fingerprint for deduplication is also added at this stage. These messages are sent into an incoming queue for each type of message. There is deliberately no time to live set on these messages.
Artemis, at this stage, is configured to move this message into an outgoing queue and a merged queue. The outgoing queue is used by the forwarder in each DC (poor terminology, as it operates in the other DC to where it pulls the queue from) which will take those messages and put them into the local merged queue. Once the forwarder has operated, the merged queue should have two copies of each message with an ordering guarantee on a per-DC basis.
Once we have a merged queue with both messages, a third Camel instance runs called merger and does, essentially, what it says on the tin. Using the fingerprint together with the ingesting server created in the initial stage, the messages are deduplicated and sent onto the final topics. Artemis does support deduplicating at the server level, rather than using a client, but in some Train Describer areas it is possible to receive valid duplicate messages with the same timestamp. The use of the ingesting server as a secondary key allows the deduplicator to allow through duplicated messages of this nature. There is a low risk of a server dropping out and losing messages but I did not feeling there was a need to work around this unlikely issue.
There is, in all this, an element of single point of failure. A single ingestion server with what appears to be a single merger and ingester in each DC does not bode well to improve reliability or have high availability. Each virtual machine in London is intended to be run on a different hypervisor to spread out the risk somewhat.
To that end, I’m also using Corosync on the forwarder (or puller) and merger together to ensure that they are constantly running, each VM is weighted to prefer its own Camel instance but either of the VMs goes down it will transfer within about 15 seconds. For reasons that I have yet to entirely understand, the use of multicast addressing between the nodes in the Corosync cluster led to several occasions of split brain. This only occurred in the London DC despite them both running identical hardware. I’ve now got them communication on a unicast basis, with configuration of the IPs done through Puppet rather than hardcoding, and it has been working perfectly for the last few months.
All of this new data processing adds around quarter of a second to the processing time if the message stays captive within a DC, or slightly longer if the local ingester is unavailable and it is using the remote DC message queue. That means that the train data now takes about a second to be processed and the output visible on Realtime Trains - it used to be about a half second.
As a side note, during testing of the infrastructure, I did some benchmark testing of the new RTT processor… and it’s capable of handling approximately 35,000 TD messages a minute. In the rush hour, it peaks at no more than about 8,000 per minute so there’s plenty of capacity for future rail upgrades. Turned out that disconnecting it from Artemis for 15 minutes and then reconnecting it was a very useful experiment.