A few days ago I had a bit of a rant on Twitter after a couple of occasions over the last week where train operating companies have made comments about alterations and cancellations being wrong on Realtime Trains. Incorrect information is a source of constant aggravation for me but I have long taken a particular stance on the matter. I don’t particularly like this situation but I think this is the right position for now.
Before we begin, the railway opening up data in 2012 provided me an opportunity to do something different to my then peers at university. Realtime Trains is effectively my full time job and I rely on this to live. In the nearly eight years since, RTT has operated on the periphery of the railway with barely any engagement from operators: it’s only in the last few months where I’ve started to have relatively active discussions with a few. Revenue has recently dropped through the floor despite traffic continuing to increase and so more consultancy and project opportunities are always helpful to bring in extra money.
With the backdrop of decreasing revenues, I remain fairly steady in the stance that I’ve taken over the years and I’m using this post to explain why for the first time in a public place. I’ve made comments about it in open data user groups (online and in person) and railway forums over the years but it seemed wise to actually explain why to both you, the users of Realtime Trains, and train operators, whose employees also use it in their droves, why we are in the current position.
There are two primary sources of information in the industry and they have a different user group targeting. Network Rail Open Data provide information from operational systems such as TRUST and the Train Describer. The National Rail Data Portal provides information from Darwin and disruption messaging from National Rail Enquiries.
At present, Realtime Trains uses data only from the Network Rail Open Data platform. The operational systems which underpin this platform have various issues but in summary during disruption, particularly when trains are altered rather than completely cancelled, the data can be wrong. The “solution” to this problem is using information from Darwin.
Darwin is used as a centralised source of data for multiple systems including customer information on stations, on board passenger information and many mobile phone apps. It takes data from a multitude of sources including TRUST and Train Describer but allows alterations including adjusting calling patterns, diversions, partial cancellations from systems such as Tyrell or through direct input. This isn’t a complete list but gives you an idea of what it does.
Using alteration data from Darwin is a fantastic idea on paper as it means we are all agreed as to what is supposed to be happening. I would love to use this data and integrate it into Realtime Trains… I wrote the code to do it about three years ago. There is, however, a critical problem: using any data from Darwin means that you must use their prediction forecasts and their time-bound data. I was informed last year, by someone at NRE/RDG, that they allow some people to ignore that; up to now I have been unable to get the same permissions.
I don’t really trust Darwin and it’s one of the reasons that RTT exists. For many years I commuted for my education and found numerous instances where it was wrong back then. More recently, I’ve started travelling weekly to London for some work I’m personally doing to diversify my interests away from rail: there are still instances I find where the forecasts are wrong, and some instances where they disagree between screens and apps!
I do not understand why, as a condition of consuming and providing alteration information, I am required to show data that I know to be wrong in some circumstances. A few years ago, it was suggested to me that I should operate RTT using just Darwin data and provide evidence of when it is wrong; I’d be doing this automatically and generating an email probably every hour or so.
Official information in disruption isn’t necessarily the truth either. A few weeks ago, I was travelling back in the evening peak from London Waterloo in the aftermath of the Eastleigh derailment…
- At 1700, there were no trains running home until the 1935 departure.
- At 1800 when I reached the station, that hadn’t changed.
- Boarded the 1820 to Salisbury with the intention of either getting a bus from there or circulating via Southampton
- Just after the 1820 departed, the 1848 departure was reinstated. Quickly followed by the 1835… and the 1905.
I disembarked the 1820 at Woking1 and awaited the 1848 to pick it up there. I hadn’t realised the 1835 had been reinstated by that point until I noticed it on one of the many mapping apps available that it was running. That train normally runs non-stop to Winchester but it was shown in the official data as running non-stop to Southampton Central. It called at Woking(!) but no-one on the platform was any the wiser as it was showing ‘not stopping’ - the guard told me that they had been told to stop there so someone had made a conscious decision about it. The train was also very quiet as it had been advertised with little notice at Waterloo and not at all at Woking. There were people waiting for a train down towards the Dorset Coast at Woking but didn’t board as they continued to wait for the 1848 due to the lack of information.
Passenger information should be better and I think that using Darwin for alteration data is a reasonable source of this. I realise that I’ve given an isolated example on a bad day above but it is evidence that official information can’t be taken as gospel either. I firmly believe that until such a point that it can be guaranteed that it’s as close to the ground truth2 as is reasonably possible then onerous licensing demands shouldn’t be placed on the data. A lot of this is data input error but that is not an excuse. The market will win out: if your output has little consistency to reality when people need it most then people will stop using whatever you provide.
The laughable thing is that there is a continued public perception, and the industry itself somewhat encourages it, that a train operator mobile app gives more accurate information than a random third party app. They should be using the same data source and, indeed, frequently are. If someone receives incorrect information in one app driven by Darwin they should be receiving the same from another… but people hop and skip between apps as if that isn’t a thing; you only need to read Twitter messages to operators during severe disruption to understand the volume that this occurs.
Some apps based off Network Rail Open Data, such as Realtime Trains, will be different for the above reasons but this is a problem created by the industry and is within its capability to resolve. Some people even come to RTT because it has different data and, in most of those instances, are using the detailed mode to understand better the true picture through multiple sources. As a passenger, we shouldn’t have to do this; and as a developer I can see ways of reducing the need of it but it’s not something I can do in the current environment.
On a side note, there are further difficulties in RTT using Darwin forecasts. RTT has a richer data environment with a far greater number of intermediate passing points than are made available within Darwin. Using external forecasts would mean that there are either blanks shown on locations that aren’t available, or having to infill with my own logic which could create non-linear timings. To my eye, that is only serving to stifle innovation or features because it doesn’t fit within the current industry purview.
The industry shouldn’t want to be seen to be inhibiting availability of information but, to me and others, that is precisely what they are doing. If the people making half a million requests for train information per day3 believe that RTT is more accurate in most cases, and I receive frequent messages that they do, then why do the industry want to stifle that? Removing the restriction, or permissions being appropriately arranged, is ultimately a win for everyone but, right now, we are where the industry currently appear to want to be4.
- I probably shouldn’t have done that, as it was pick up only but it was one of those days where no-one seemed to care. [return]
- Some operators are actually really good at getting the alteration information right. Most are not. [return]
- That’s just on the website. Through all channels it’s now over a million per day. There are over 250,000 unique users per month on the site, and over a million in the ecosystem per month. [return]
- I’ll ignore the branding empire building that Rail Delivery Group try to push with their daft ‘Powered by National Rail Enquiries’ image requirement too that appears only to exist to defend their existence. They don’t need to: they are currently an essential component of the ecosystem and I don’t see that changing given the unique and privileged position they maintain inside the industry. Equally, that’s a reason why their licences should be Fair, Reasonable and Non-Discriminatory though: licensing restrictions do not meet that criteria in my view. [return]