Digitransit/Data validation
This page contains information related to OSM data validation that has been found useful specificly for the Digitransit project. It serves as an documentation of the validation both for the organisations that use Digitransit and for the whole OSM community.
This documentation is now under construction (by osm-user HSL_HRT). Any ideas are very welcome.
Introduction
When you use any data eg. OSM you would naturally want all your data to be up to date and not contain any locational or logical errors. The idea with OSM data validation or QA (Quality Assurance) is precisely this - to ensure that the OSM data is as accurate as possible. One reason to use crowdsourced data is to use the power of the community in data updating processes. This should also include the data validation processes so it's higly reccomended that all the validation tools should be made available to the whole OSM community as the organisations that use OSM often have very limited resources themselves.
Available tools
There are a number of global error detecting tools available that perform a number of different analyses and report them in a way or another to the user. Here's a list of tools that we've observed to be useful to help ensure the quality of the OSM data that digitransit use.
- Keep Right (updated weekly)
- Osmose (updated daily)
- OSM Inspector (updated daily)
The most common OSM-editors (ID and JOSM) also contain some data validation funtionalities that are run before the data is saved. Also Taginfo can be used to find unconventional tagging and standardize the tagging. Note that there are nowadays localized versions for taginfo provided by GeoFabrik eg. for Finland.
Overpass Turbo can also be a handy tool when you know exactly what you're looking for eg. searching for a specific tagging combination error.
OSM Map notes also offer an environment for reporting and correcting map errors.
OSMCha is a tool that offer monitoring capabilities when you want to monitor all the changes made in an specific area.
Digitransit specific validation
There are different aspects to OSM data validation. Below we've listed the aspects we've found relevant for Digitransit.
Geometry validation
The geometry of objects can be analysed in many ways to find errors either by investigating the OSM data or comparing it to a reference dataset. The first thing you want to ensure is that there are no missing objects. Then you'd probably also like everything to be mapped only once in accordance to OSM good practice to avoid duplicates in your geocoding service.
- finding missing objects To perform this anaysis a local reference dataset is needed (eg. govermental & municipal data). This is a use case that is being tested at HSL at the moment. Here some examples :
- finding duplicate objects ... Keepright has a function named doubled places
Topology validation
Topology describes the relations between different map objects. There are a number of examples of topological errors. Topological errors with polygon features can include unclosed polygons, gaps between polygon outlines or overlapping polygons. The most interesting topological errors within digitransit are however the connectivity errors of road objects that form the routing network. There can be some seperate ways that aren't connected to each other preventing routing between the two. Some ways can even be totally isolated from the rest of the network and form so called routing islands.
- finding overlapping buildings Osmose's JOSM validator finds overlapping buildings.
- finding unconnected ways The routing view in OSM-inspector contains an overlay with unconnected nodes.
- finding routing islands The routing view in OSM-inspector contains an overlay with routing islands.
- visualization of digitransit routing network One way to analyze the network topology is to look at the routing network over areas you know well to spot errors. The quickest way to see the routing network or graph as it's also called (that digitransit uses with walking and cycling legs) is to add the text ?debugTiles in the web browser at the end of the journey planner URL eg. reittiopas.fi/etusivu?debugTiles. This will place the route graph as the background layer in your journey planner. This works for all the different Digitransit -instances (HSL, matka.fi etc.). If you would like to see the OTP route graph of the Helsinki area as a base layer in your OSM editor (iD, JOSM, ...) then you can add it as a TMS layer: HSL -
tms[21]:http://api.digitransit.fi/routing/v1/routers/hsl/inspector/tile/traversal/{z}/{x}/{y}.png
, Waltti regions -tms[21]:http://api.digitransit.fi/routing/v1/routers/waltti/inspector/tile/traversal/{z}/{x}/{y}.png
, and Finland -tms[21]:http://api.digitransit.fi/routing/v1/routers/finland/inspector/tile/traversal/{z}/{x}/{y}.png
The colour coding of this route graph visualisation is explained here.
Tagging validation
Map objects can be assigned a number of different tags to describe the object. How you tag objects can affect routing, geocoding as well as map rendering and therefore changes to the tags are very interesting.
- deprecated tagging Deprecated tags refers to tags that have commonly been decided to be no longer in use. These can be found in Keepright with the function named deprecated tags. Individual deprecated tags can be searched with Overpass Turbo. Here below are some examples.
- tagging errors There are many ways to go wrong with tagging. Here are some of the most relevant errors regarding digitransit.
- CONSTRUCTION=YES construction=yes is not the way to tag roads that are under construction. It will not remove the road in question from the routing graph nor from the background map. The right way to do this is presented here. Here's an overpass query to detect this kind of error tagging.
- POI Points of interest without name will not appear in Digitransit's name search. Keepright's point of interest without name can help you locate these.
- ADDRESS Errors in address tags can prevent some addresses to be found via Digitransit. Osmose's JOSM validator finds existing logical address problems. Conflicts between street name in building address and surrounding street object may generate unnecessary addresses names in digitransit. Osmose has a tool to locate these.
- ROAD NAMES Some major roads can be missing a name tag. Here's an overpass query to locate these. Some of the lower road classes (eg. unclassified and residential) have for now been left out of this to avoid larger amounts of false positives due to unconsistency in road class tagging in OSM. There can also be conflicts between the different language versions in OSM eg. with road names between name:fi (or name:sv) and the "main" name-tag. Here's an overpass query to locate these in the HSL-area. You should also be aware of that some múnicipalities (outside the HSL-area) have swedish as their major language. Here's a swedish version of the same script for Inkoo municipality.
- PATH NAMES Paths shouldn't be provided with the name-tag unless the name is an official road name and even in these cases the highway=path-taging should maybe be upgraded to walkway or cycleway etc. If the name is somewhat unofficial or of a descriptive nature then it's prefered to use the description-tag. If the name refers to a larger route that goes through the line object, then this route should be created with a route relation. According to the wiki name=* -tag should only reflect the official road names. NB there are a lot of other name tag variants to consider for unofficial names. This overpass query finds the paths with name tagging.
- LOGICAL ERRORS As you're able to add any tag to any object there's a possibility that sometimes the added tags may contradict with each other. You may eg. accidentally add a bicycle=no tag to a cycleway. This will cause the routing engines to ignore the cycleway in cycle routing. Here's an overpass query to detect these kind of contradictions within the cycling and walking networks.
Monitoring changes
If you are very familiar with a smaller area and have thoroughly mapped it, you might want to follow all changes and verify that no unintended damage or vandalism has happened in the area. Another use case could be when a new public transport terminal is opened and you have done edits to make routing possible through the terminal and want to monitor the activity in the area for a limited time period. OSMCha is a tool that offer this kind monitoring capabilities. Here's an example of a saved OSMCha filter that monitors changes in the Pasila area Helsinki starting from January 1st 2020. You're available to share filters to any other OSMCha user but to be able to use the tool you need to sign in with you OSM account.
Other specific validations
- stop matching at HSL To make the routing interchange between GTFS-based public transport vehicle routing and OSM-based pedestrian routing as smooth as possible digitransit connects OSM-stops with their counterpart in the public transport GTFS-dataset. This connection has three requirements.
- The ref-tag value in the OSM-data must match the stop_code-value in the GTFS-data
- These two stop representations can't be more than 250 meters apart
- The stop representation in OSM must be connected to a way-object (or a bus_stop relation) in OSM
The stops that meet these matching requirements can be found here and the stops that don't here. To ensure the matching you could also investigate the stops in OSM and check that there are no ref-tag duplicates. Here's a Overpass-query that lists all the different ref-tags at bus stops and their frequency in the HSL-area.
- park & ride parkings for intermodal routing using park & ride mode, digitransit (more specifically the backend routing engine OpenTripPlanner) requires a parking to be tagged with a park_ride tag (either with value yes or any mode like bus, rail etc.). Parkings which are close to a station should be explicitly tagged with a park_ride (yes/no/bus etc.). Currently, park_ride parkings need to be connected to the streets network. These are requirements that aren't at the moment of importance to the Digitransit instances in Finland as the HSL version is the only one using the "park & ride" mode and HSL has it's own data source for park & ride sites. Other finnish instances use "kiss & ride" mode ie. "drop off" mode and hence don't need routing to car parks.
- missing house numbers some regions or municipalities publish house number lists as open data, which can be used to compare with OSM to identify missing addresses. regio-osm is a tool to cross-compare such address lists. This is also a validation option not available in Finland (at least at the moment).
Checklist for validation
Here's a shortlist of things (discussed above) to check out in order to ensure that the OSM data in your area is OK to be used with digitransit. This list is very much under construction and will contain more links for the different digitransit instances when ready...