GTFS
The GTFS (General Transit Feed Specification) is a data format that was created for sharing public transportation information such as bus stops and bus routes and timetables.
It is useful for potential users of OSM data to provide routing using public transport as in many cases timetables change so often that representing them in OSM is de facto impossible. In some cities timetables can be expected to change daily due to road/track closures/renovations. And in some areas timetables are massively changes multiple times during the year, for example as holidays and school start/end.
In such cases data consumers can use OSM data for roads and stop positions and footways and take available trips on public transport directly from the transport organization.
It was originally called the Google Transit Feed Specification, and was developed by Google. It is now maintained by the MobilityData organization which also maintains tools for using GTFS data, a database of GTFS data, and the General Bikeshare Feed Specification.
Structure of GTFS
A GTFS feed is a (stable) URL which publishes a ZIP file containing multiple CSV files.
It is often updated regularly, in which case a particular zip file is referred to as a version of the GTFS feed.
The file stops.txt contains information about physical locations.
stop_id
- the identifier for this locationstop_code
,stop_name
- public-facing identifier and name for the locationstop_lat
,stop_lon
- GPS coordinates, for stops on the poleplatform_code
- the identifier for a specific platform
While the file is called stops.txt, it also contains information about stations, entrances, boarding areas.
These are distinguished using location_type
, and linked to a larger structure using parent_station
location_type
|
Description | Parent | GTFS Description | PTv2 concept |
---|---|---|---|---|
0 | GTFS stop | station | Place where passengers board/disembark | public_transport=platform (preferred)
public_transport=stop_position |
1 | GTFS station | - | Physical structure or area with one or more platforms | public_transport=station public_transport=stop_area public_transport=stop_area_group |
2 | entrance/exit | station | A location where passengers enter/exit a station | railway=subway_entrance railway=train_station_entrance |
3 | generic node | station | location in a station, used to define pathways | |
4 | boarding area | platform | location on a platform |
The file routes.txt contains description of a route.
A GTFS route corresponds with a type=route_master
route_id
- the identifier for this routeagency_id
- identifier for the agency running the route, which can be looked up in agency.txtroute_short_name
- short identifier of the route, like a bus numberroute_long_name
- full name of route, often with destinationsroute_type
- what sort of public transport is used; bus/train/metro/...
The file trips.txt contains descriptions of trips - a sequence of stops visited at a particular time.
The actual sequence of stops is found in stop_times.txt, which references trip_id
.
A GTFS trip roughly corresponds with a type=route, except for the inclusion of timing information.
trip_id
- the identifier for this triproute_id
- route the trips belong toservice_id
- days of operation (calendar.txt)trip_headsign
- displayed destination of triptrip_short_name
- public facing text to identify the tripdirection_id
- direction of trip, whether it is inbound or outbound / clockwise or counterclockwise / ...shape_id
- if present, the shape of the trip between stops (shapes.txt)
The file stop_times.txt describes the time that a trip stops at a stop.
A GTFS Stop time is defined by the combination of a trip_id
and a stop_sequence.
It contains the stop_id
of the stop it visits, along with a arrival_time
and departure_time
.
Additionally it can contain information on whether you can board/exit at that stop or along the route to the next stop.
The file shapes.txt contains the paths that vehicles travel, like a GPX route.
A GTFS shape roughly corresponds with a type=route, except for the exclusion of the sequence of stops.
shapes are often referenced by multiple trips that follow the same path.
Tags
General rules
There are currently two namespaces (gtfs_*
and gtfs:*
) in use for GTFS-related tags.
In Proposal:GTFS Tagging Standard it has been decided to use the gtfs:*
namespace.
Therefore, using tags in the gtfs_*
namespace is discouraged.
Any existing tags are interpreted as the same tag in the gtfs:*
namespace.
When a tag references a column of a GTFS file, it should use the full name of that column.
For example, use gtfs:route_long_name=* instead of gtfs:name=*, and gtfs:stop_id=* instead of gtfs:id=*.
This makes it easier for a data consumer to find the right file and column.
Tags for linking to a GTFS object should use the gtfs:*
namespace instead of standard tags like name=* and ref=*/ref:IFOPT=*.
To find the matching GTFS object we need an exact match with the value in the feed.
This differs for the requirements of standard tags, which are aimed at humans.
Using standard tags for matching means the link breaks whenever capitalisation is changed, abbreviations are added/removed, ... .
Instead use both standard tags and GTFS tags, even if they are exactly the same.
Linking to a GTFS object
In order to look up timetables for a particular stop or route, we want a way to find the corresponding GTFS object.
To establish this follow the following steps:
Step 1: Analyse the feed
Different versions of the feed can have different IDs for the same object.
To ensure that the link to the object does not break for a new feed version, look through historic versions.
Determine which columns have a stable value, and which do not.
Use of a value outside of the GTFS feed is also an indication that the value is unlikely to change.
Step 2: Document the feed
Look at List of GTFS feeds.
If the feed the object belongs to is not listed there, add a new entry using the GTFS feed template.
The feed code can be anything, but you are encouraged to adhere to the following two rules:
- Start the feed code with the ISO 3166-2 region code for the region the service operates in.
- Do not include colons (
:
) in the feed code, as they are used as a separator in keys.
Remember the feed code for the feed.
Step 3: Reference the object
Find a combination of the following tags to reference a GTFS object:
Type | Tags |
---|---|
stop | gtfs:stop_id:*=*, gtfs:stop_code:*=*, gtfs:stop_name:*=*, gtfs:location_type:*=0, gtfs:platform_code:*=* |
station | gtfs:stop_id:*=*, gtfs:stop_code:*=*, gtfs:stop_name:*=*, gtfs:location_type:*=1 |
entrance | gtfs:stop_id:*=*, gtfs:stop_code:*=*, gtfs:stop_name:*=*, gtfs:location_type:*=2 |
route | gtfs:trip_id:*=*, gtfs:trip_id:sample:*=*, gtfs:shape_id:*=* |
route master | gtfs:route_id:*=*, gtfs:route_long_name:*=*, gtfs:route_short_name:*=* |
The combination of tags should match exactly one object in the feed.
Additionally, try to avoid tagging the columns that you found to be unstable.
Note: tagging other columns is possible as well, but are not considered for matching.
Consider putting these in standard tags so that they can be read by applications that don't process the GTFS tags.
Example: colour=#008080 instead of (or in addition to) gtfs:route_color:*=#008080.
Default value for location_type
We sometimes need location_type
to distinguish platforms and stations.
Instead of tagging it directly, it can be inferred from the type of PTv2 object using the table under "Structure of GTFS".
Because of these default values location_type
is always used in matching of locations.
In the rare case that no value can be inferred (no or multiple matches), gtfs:location_type:*=* should be added.
Interpretation of deprecated gtfs:id:*=*
Use of gtfs:id:*=* is discouraged because it is unclear which table of the GTFS feed it refers to.
Instead use gtfs:stop_id:*=*, gtfs:trip_id:*=*, gtfs:trip_id:sample:*=*, gtfs:shape_id:*=* or gtfs:route_id:*=*.
However, if the tag is present it should be interpreted as follows:
- If any of the tags listed in the table in "Structure of GTFS" is present, it has the same meaning as gtfs:stop_id=*.
- If tagged on a type=route, it has the same meaning as gtfs:trip_id:*=*.
- If tagged on a type=route_master, it has the same maning as gtfs:route_id:*=*.
Interpretation of deprecated gtfs:name=*, gtfs:short_name:*=*, gtfs:long_name:*=*
Use of gtfs:name:*=* is discouraged because it is unclear which table of the GTFS feed it refers to.
Instead use gtfs:stop_name:*=*, gtfs:route_short_name:*=*, gtfs:route_long_name:*=*, gtfs:trip_short_name:*=*. However, if the tag is present it should be interpreted as follows:
- If any of the tags listed in the table in "Structure of GTFS" is present, it has the same meaning as gtfs:stop_name:*=*.
- If tagged on a type=route, it has the same meaning as gtfs:trip_short_name:*=*.
- If tagged on a type=route_master, it has the same maning as gtfs:route_short_name:*=*.
Step 4: Group the tags with the feed code
Add the feed code as a suffix the keys.
This ensures that a data consumer can find the feed and knows which columns it should look at to find the right column.
Using the code as a suffix also means that we can reference objects in multiple feeds, for example for stations near borders.
Interpretation of deprecated gtfs:feed=*
Use of gtfs:feed=* is discouraged, instead use the feed code suffixes.
However, if the tag is present it should be interpreted as if it's value has been added as a feed code suffix to all GTFS keys that do not already have such a suffix.
As such, the combination gtfs:feed=NL-OVApi + gtfs:stop_code=nm is interpreted as gtfs:stop_code:NL-OVApi=nm
Overview of used tags
Keys with wiki pages
- gtfs:feed=* - deprecated - describe which feed the object belongs to
- gtfs:name=* - deprecated - the name of the object according to the GTFS feed
- gtfs:release date=* - the version of the feed used for the
- gtfs:route id=* - identifier to associate type=route_master relations with routes
- gtfs:shape id=* - preferred identifier for a route variant -- but not always present, does not provide information about stop positions
- gtfs:stop id=* - identifier to associate stops with their GTFS counterpart
- gtfs:trip id=* - alternative for route variants with only one trip
- gtfs:trip id:sample=* - fallback for identifying a route variant -- but more likely to change, provides information on stop positions and their sequence only
- gtfs id=* - deprecated - the id of the object in a GTFS feed
Keys by use (over 100 uses as of when this was updated)
Currently unused tags previously mentioned on this page
- Mapping to OSM tags (draft)
Alternative for stops
In Europe, for public transport stops, the European standard IFOPT is defined and in some GTFS-data the stop_code
is identical to the IFOPT references. In these situations, tagging both gtfs:stop_code:*=* and ref:IFOPT:[[Key:|]]=* is encouraged.
Data sources
- PTNA - Public Transport Network Analysis aggregates open and correctly licensed GTFS data from some countries. More countries can easily be supported if demanded and links to sources are provided.
- GTFS Data Exchange - Data available for 1000 transit agencies (as of 9 Dec 2016), though licensing varies. Soon to be shutting down.
- Mobility Database (formerly TransitFeeds/OpenMobilityData) - open source aggregation project of GTFS data.
- Transitland at transit.land - commercially funded aggregation of GTFS data.
- transport.data.gouv.fr - french open data GTFS (ODbL)
- European Union NAPs - links to `.pdf` with EU National Access Points (see also unofficial list)
Visualizing of GTFS
- PTNA - nice online visualization of aggregated and correctly licensed GTFS data with tag recommendations for route relations and map overlay for shapes.
Conversion of OpenStreetMap and GTFS
OSM → GTFS
- osm2gtfs - An extendable python script to query OpenStreetMap data about public transport, combining it with time information provided from a different source and convert it into the GTFS format.
GTFS → OSM
- GO-Sync (aka gtfs-osm-sync) - a desktop tool to synchronize GTFS feeds with OSM
- GTFS-OSM-Validator - console tool that will read GTFS and output exact problems it finds in OSM
- gtfs-sql-importer - This tool can convert GTFS to SQL postgis schema where GTFS can be further manipulated. More examples of this tool can be found in GTFS SQL examples.
- GTFS-OSM-Import - Open-source tool to automate and simplify as much as possible imports of GTFS data into OSM.
- GTFS Janitor - Web-based tool to conflate GTFS feeds with OSM data.
Editor support
- The external JOSM preset Public Transport GTFS and rule Public Transport GTFS support some of the tags.
Software using tags
- PTNA evaluates gtfs:feed=*, gtfs:release_date=*, gtfs:route_id=*, gtfs:shape_id=*, gtfs:trip_id:sample=* and gtfs:trip_id=* to provide a link from the relation to the GTFS data.
- Comparing GTFS with OSM, PTNA evaluates also gtfs:stop_id=* and
gtfs:stop_id:<feed>
(example: Lannion, Gare Clémencau as of 2024-06-02) on the comparison page (as of 2024-06-02).
Discussions
- GO-Sync - a GTFS and OpenStreetMap data synchronization tool - a Google Groups thread announcing gtfs-osm-sync, and difficulties of multiple operators for bus stops
- GO-Sync - a GTFS and OpenStreetMap data synchronization tool - gtfs-osm-sync announcement on Talk-transit
- GTFS compatibility (and [1] and [2]) - discussion on Talk-transit
- Bus stops in North America from GTFS data - thread on Talk-transit
- Proposal:GTFS Tagging Standard