
Blog
We’re on a mission to create the largest public transport database in the world. In this blog we’ll explain how we’re doing it and the challenges we face along the way. If you want to explore our transport models in depth check out our data studio.
GTFS stands for General Transit Feed Specification. It’s a format that easily allows public transport data providers to share their timetables with others. These companies can take this data and build useful apps or APIs.
This is the name of the full end-to-end public transport route from start to terminus. Here’s an example of the Walthamstow Central > Brixton Tube route in London.
This covers the start time for each route. In the example image below it shows a train route starting in Bedford, UK. The trip begins in Bedford at 8:10am, another trip begins at 8.02am, another at 8.18am etc.
A route is made up of multiple stops, the stop time specifies the exact time the transport mode is scheduled to stop at each point. In the Bedford public transport image above, this means the trip begins at 8.18am and stop times are Flitwick 8.29am, Harlington at 8.33am and Leagrave at 8.52. Other stop information includes the official name of the stop and the stop’s map coordinates.
This is the name of the public transport provider. In Edinburgh the agency responsible for running local buses is Lothian Buses, whereas in Liverpool the bus operator is Arriva.
The period that the public transport timetable is accurate. It’s possible that public transport agencies update their timetables due to seasons and other factors.
Our data team found it easy to collect data in Norway, Germany, Denmark, Estonia, Iceland, Lithuania and Sweden. That’s because:
Some countries consolidate all their public transport agencies into one GTFS link location. This means you only need one GTFS data source to understand the timetables across the country. We found that smaller countries are much better equipped to do this. Larger countries such as the US, India and France have a much more de-centralised approach to GTFS data. This means our data team needs to sometimes collect hundreds of public transport agency GTFS links to get coverage in a country.
There are a lot of possible ways that GTFS data can get messed up – we’ve listed them below. If the data is messy, it requires a lot of manual fixing from our data team to ensure that the data works effectively and doesn’t show erroneous results.
If any of the data points are incorrect within the GTFS file, it can cause a challenge for our data team. First the team must use a tool to automatically identify issues with the files. Here are some of the common issues:
Each stop has a latitude and longitude – if these coordinates are the wrong way round, a stop often appears to be very far away, and often in the ocean!
The dates do not always align with different sources, so we have to make sure we have the right overlapping data for the whole country, not just the city.
Irregularities can be spotted with stop times and trip times. For example if the first stop on a route is at 7am, it’s very unlikely the next stop time is at 3pm.
Some public transport agencies don’t provide any GTFS links. Sometimes the agencies may not update their data.
In some countries the buses only leave when they’re full. This means that there’s no specific departure time. In other countries, the only way to access timetables is to go inside a bus depot, which is a lot harder to maintain.
We update our GTFS data twice a week to ensure that any timetable changes are kept up-to-date.
Not all our data is from GTFS, because it is not always available. Our data team has to generate this data from other public sources.
Becoming the world’s largest public transport database doesn’t just mean focusing on cities, we need to have data for the full country. Our database includes information for:
Create your own travel time catchment area and see how far you can reach within a time period using our map app.
We are expanding our coverage in Central and South America; Costa Rica, Peru and Uruguay as well as island territories including Gibraltar, Faroe Islands, Guernsey and the Isle of Man.
Explore our data coverage, driving model and public transport model in more depth - visit our data studio.