We’re on a mission to create the largest public transport database in the world. In this blog we’ll explain how we’re doing it and the challenges we face along the way. If you want to explore our transport models in depth check out our data studio.
What is GTFS?
GTFS stands for General Transit Feed Specification. It’s a format that easily allows public transport data providers to share their timetables with others. These companies can take this data and build useful apps or APIs.
How GTFS is used for mobile apps
- Mobile mapping apps helping consumers plan a public transport route (Google Maps, CityMapper etc.)
- Real time public transport apps (Trainline)
How GTFS is used by businesses
- Calculating public transport travel times from one location to many locations simultaneously. Great for businesses that build location-based search engine results (TravelTime API)
- Visualising where’s reachable within a travel time catchment area. Great for businesses performing location data analysis, especially through our plugins for QGIS and ArcGIS. (QGIS catchment area plugin)
Which data points are included in GTFS data?
This is the name of the full end-to-end public transport route from start to terminus. Here’s an example of the Walthamstow Central > Brixton Tube route in London.
This covers the start time for each route. In the example image below it shows a train route starting in Bedford, UK. The trip begins in Bedford at 8:10am, another trip begins at 8.02am, another at 8.18am etc.
Stops & stop time
A route is made up of multiple stops, the stop time specifies the exact time the transport mode is scheduled to stop at each point. In the Bedford public transport image above, this means the trip begins at 8.18am and stop times are Flitwick 8.29am, Harlington at 8.33am and Leagrave at 8.52. Other stop information includes the official name of the stop and the stop’s map coordinates.
This is the name of the public transport provider. In Edinburgh the agency responsible for running local buses is Lothian Buses, whereas in Liverpool the bus operator is Arriva.
The period that the public transport timetable is accurate. It’s possible that public transport agencies update their timetables due to seasons and other factors.
Which countries have great GTFS data?
Our data team found it easy to collect data in Norway, Germany, Denmark, Estonia, Iceland, Lithuania and Sweden. That’s because:
They require a low number of GTFS links for full country coverage
Some countries consolidate all their public transport agencies into one GTFS link location. This means you only need one GTFS data source to understand the timetables across the country. We found that smaller countries are much better equipped to do this. Larger countries such as the US, India and France have a much more de-centralised approach to GTFS data. This means our data team needs to sometimes collect hundreds of public transport agency GTFS links to get coverage in a country.
Clean GTFS data
There are a lot of possible ways that GTFS data can get messed up – we’ve listed them below. If the data is messy, it requires a lot of manual fixing from our data team to ensure that the data works effectively and doesn’t show erroneous results.
GTFS data collection challenges
If any of the data points are incorrect within the GTFS file, it can cause a challenge for our data team. First the team must use a tool to automatically identify issues with the files. Here are some of the common issues:
Stop coordinate mix up
Each stop has a latitude and longitude – if these coordinates are the wrong way round, a stop often appears to be very far away, and often in the ocean!
The dates do not always align with different sources, so we have to make sure we have the right overlapping data for the whole country, not just the city.
Irregularities can be spotted with stop times and trip times. For example if the first stop on a route is at 7am, it’s very unlikely the next stop time is at 3pm.
GTFS not provided or outdated
Some public transport agencies don’t provide any GTFS links. Sometimes the agencies may not update their data.
In some countries the buses only leave when they’re full. This means that there’s no specific departure time. In other countries, the only way to access timetables is to go inside a bus depot, which is a lot harder to maintain.
Data maintenance: how often does GTFS data need to be updated?
We update our GTFS data twice a week to ensure that any timetable changes are kept up-to-date.
Other public transport data sources
Not all our data is from GTFS, because it is not always available. Our data team has to generate this data from other public sources.
Country-wide public transport coverage
Becoming the world’s largest public transport database doesn’t just mean focusing on cities, we need to have data for the full country. Our database includes information for:
- 4 million transport stops
- 400,000 routes
- 10,000+ public transport agency providers
See our public transport data in action
Create your own travel time catchment area and see how far you can reach within a time period using our map app.
Which countries are next on the agenda?
We are expanding our coverage in Central and South America; Costa Rica, Peru and Uruguay as well as island territories including Gibraltar, Faroe Islands, Guernsey and the Isle of Man.
- Countries that embrace open data provide the best opportunities for collecting GTFS
- Even if the country is known for reliable public transport, it doesn’t always mean the GTFS is easy to collect – a great example of this is Japan which has great services, but a decentralised GTFS system
- Localisation is key when collecting GTFS – understanding local transport usage is key, as well as engaging with agencies in many languages
Explore our data coverage, driving model and public transport model in more depth - visit our data studio.