Import/Catalogue/Queensland GTFS Data
Goal
Complete an import of gtfs stop data maintained by the Queensland Government Department of Transport and Main Roads (TMR). This import will also conflate existing South East Queensland (SEQ) manually surveyed bus stop nodes imported in 2009 (See QROTI page for details. There are currently 6,600 nodes imported from this privately owned dataset).
Schedule
The project is currently focused on importing Bus stop data with possible extensions to include routes a future consideration. Should routes be included within this import scope then they will only be completed after stop node importing is complete. Route and Route Master creation may utilise the gtfs data from this dataset and it may not. Consideration to creation of a Wiki page specifically dedicated to Route creation is being made. (Reference here is the Bus Routes in London Wiki page )
Import Data
Background
Data source site: http://data.qld.gov.au
Data license: Creative Commons Attribution
Link to permission/attribution: Queensland Government data (Department of Transport and Main Roads - TMR)
Data description
There are 2 recognised regions in Queensland with regards to the datasets concerned. Both are in GTFS format
South East Queensland (SEQ)
South East Queensland (SEQ) contains data for the Capital City area and surrounding regions (Sunshine Coast, Gold Coast & Ipswich). This Mapcraft pie shows the extent of the single GTFS file for the SEQ region. It will be used for importing and QA of this region given the high number of bus stop nodes already existing within this geography.
Regional Queensland
In the context of this import the second dataset is for the remaining regional centres throughout Queensland and this contains town named files. Note: Mapcraft will not be used for this town based dataset. Each town is geographically separate so each importer will work on and complete a town in isolation so as not without risk of duplication of effort. The table below shows the status of each town import along with any notes the mapper chooses to provide:
Name | Import Status | Notes (Mappers to include a link to the changeset(s) of completed imports in this section) |
---|---|---|
airlie (view, Github) | Incomplete | This data, sourced from a prior version of this dataset has been manually imported and reviewed prior to this exercise (See changeset: 19628395 (This was done by the author of this Wiki page prior to understanding the correct import process. If necessary this can be reverted - @imports will be asked to advise on this point. If deemed okay the tagging will need to be updated to reflect the information in this Wiki page) |
bowen (view, Github) | Incomplete | |
bundaberg (view, Github) | Incomplete | |
cairns (view, Github) | Incomplete | #139700396 |
gladstone (view, Github) | Incomplete | |
gympie (view, Github) | Incomplete | |
innisfail (view, Github) | Incomplete | |
kilcoy (view, Github) | Incomplete | |
mackay (view, Github) | Incomplete | |
magnetic-island (view, Github) | Incomplete | |
maleny-landsborough (view, Github) | Incomplete | |
maryborough-herveybay (view, Github) | Incomplete | |
rockhampton (view, Github) | Incomplete | |
sealink (view, Github) | Incomplete | This is a Ferry Terminal import. There are only 2 nodes. |
toowoomba (view, Github) | Incomplete | #139699966 |
townsville (view, Github) | Incomplete | #139699017 |
yeppoon (view, Github) | Incomplete |
Import Type
This is a one-time import, but it will require periodic reviews to keep the information up-to-date. JOSM is the editor being used for this import. Josm will be used for checking, validation, uploading and reverting of changesets if needed.
GO-Sync
A modified version of GO-Sync will be used to correlate the agency data with the existing OSM data. From GO-Sync, a changeset will be exported using the dummy upload option. This will be verified and uploaded using JOSM.
Data Preparation
Data Reduction & Simplification
The github repo [1] that holds each source file has the following file variants included in each folder:
- stops.txt (The raw, unedited source file)
- stops.csv (file extension renamed, File header change to allow for a JOSM import (see tables below for transformation of header information)
- stops.osm (The saved file after importing into JOSM using the opendata plugin)
Tagging Plans
Public transport schema 2 will be used for tagging. This is the map between source attributes and OSM tags.
Bus stops (qconnect dataset - stops.txt > stops.csv > stops.osm)
File | GTFS attribute | CSV Header | OSM tag |
---|---|---|---|
stops.txt | stop_id | gtfs_id | gtfs_id=* |
stops.txt | stop_name | name | name=* |
stops.txt | stop_url | url | url=* |
stops.txt | stop_lat | lat | |
stops.txt | stop_lon | lon | |
highway=bus_stop | |||
public_transport=platform | |||
bus=yes |
Bus stops (SEQ dataset - stops.txt > stops.csv > stops.osm)
This file will have the station data removed from it. This station data will be retained in a new file that does not come with the dataset called stations.csv (See table below for how this content will be handled). Nodes that have a parent_station=* OR platform_code=* will not have the highway=bus_stop, public_transport=platform or bus=yes applied. During manual verification of these nodes a determination of whether these are Bus Station or Rail Station nodes will be made and the appropriate key:value=* will be applied. These nodes have been imported and saved to the stops.osm file for the SEQ dataset.
File | GTFS attribute | CSV Header | OSM tag |
---|---|---|---|
stops.txt | stop_id | gtfs_id | gtfs_id=* |
stops.txt | stop_name | name | name=* |
stops.txt | stop_url | url | url=* |
stops.txt | stop_lat | lat | |
stops.txt | stop_lon | lon | |
highway=bus_stop | |||
public_transport=platform | |||
bus=yes |
Bus & Rail Stations (SEQ dataset - stops.txt > stations.csv > stations.csv)
The stations.csv file is a subset of the original stops.txt source file. The location_type is the field used to delineate between stops and stations. (See table below for how this content will be handled)
File | GTFS attribute | CSV Header | OSM tag |
---|---|---|---|
stops.txt | stop_id | gtfs_id | gtfs_id=* |
stops.txt | stop_name | name | name=* |
stops.txt | stop_lat | lat | |
stops.txt | stop_lon | lon | |
public_transport=station | |||
railway=station OR amenity=bus_station ^ | |||
agency.txt | agency_name | network=TransLink SEQ |
^ The railway=station will be applied, as a node, to the track way it is associated with as part of a manual verification of each station. The SEQ geography has a rail network along with a Busway network so the distinction of each will be done manually rather than programmatically. Where no track ways exist but Bing aerial imagery suggests this is a rail station then the node will be placed at the coordinates provided and the railway=station tagging will be applied.
Changeset Tags
We will use the following changeset tags.
- comment=*
- created_by=JOSM/version
- source=data.qld.gov.au
- type=import
- url=http://wiki.openstreetmap.org/wiki/Import/Catalogue/Queensland_GTFS_Data
Data Transformation
TBA
Data Transformation Results
TBA
Data Merge Workflow
There is a significant amount of existing bus stop data already in OSM for the SEQ region of the state (>10,000 Nodes with "highway"="bus_stop" tagging). The merging and/or deprecation of these nodes will be considered as part of this workflow.
Within this existing dataset there are nodes the were created from an import in 2009 (Approximately 7000). The sourcing and accuracy of this dataset have been verified and the decision taken will be to retain existing stops and conflate the new tags into them. Actions taken against each existing QROTI tags are documented in the table below. The existing QROTI Wiki page will be updated with a circular reference to this section to ensure future maintainers are aware of the veracity of this dataset and the changes made as a result of this import.
QROTI Tag Handling
Counts are sourced from taginfo and are current from: 2014-03-19 23:58 UTC.
Count @ 2014-03-19 | Current Count | Key | Action |
---|---|---|---|
6,622 | qroti:place_id | Remove - Marker was for a site that no longer exists | |
6,529 | qroti:mode | Remove - No documentation exists | |
6,526 | qroti:mode_name | Remove - No documentation exists | |
6,366 | qroti:surveyed | Retain - This is when the bus stop was last surveyed in the field | |
6,331 | qroti:url | Remove - These URLs no longer exist | |
4,646 | qroti:stop_num | Remove - Imported gtfs_id deprecates this value | |
2,014 | qroti:name | Remove - Imported name deprecates this value | |
1,245 | qroti:name_onsite | Remove - Imported name deprecates this value | |
182 | qroti:part | Remove - Individual platforms are included in the import dataset and will be mapped | |
41 | qroti:major_location | Remove - Stations are now being mapped (See Bus and Rail Stations) | |
1 | qroti:fare_zone | Remove |
Team Approach
TBA
References
This import was discussed in talk-au.
Workflow
- For the Initial SEQ import, a separate OSC file will be created for each of the applicable GO-Sync "Stops to view" categories ("New GTFS stops with Potential Matches in OSM", "New GTFS stops with No OSM Matches", "Existing stops with Updates") using the dummy upload option. For the "New GTFS stops with Potential Matches in OSM", category, the GTFS stops will be matched to the existing OSM nodes.
In go-sync, stops can be added for export individually or category-at once ("Upload All"). During the matching process, notes will be taken for stops that will need to be created after the existing stops have been matched. - The OSC files will be converted to OSM using osmconvert.
osmconvert DUMMY_OSM_CHANGE.txt > category.osm
- These will be imported into JOSM and using the JOSM search function with the following queries:
query results gtfs_location_type=1 Bus & Rail Stations parent_station|platform_code platforms and sublocations - The set will then be exported and uploaded to the github repository.
"New stops with no matches" should be run again after importing "New GTFS stops with Potential Matches in OSM" into OpenStreetMap as there may be new stops that were unable to be matched (as they may have been stops within the 400m inclusion zone).
Bus stops
Other more detailed workflow is TBA.
Quality Assurance
TBA