Completeness

From OpenStreetMap Wiki
Jump to navigation Jump to search
Road network completeness levels calculated in a 2017 research article by Barrington-Leigh and Millard-Ball, which also finds OpenStreetMap overall to be ~83% complete

Measuring completeness has been an ongoing challenge for the OpenStreetMap community.

Completeness Metrics

The various completeness studies outlined below have used a range of techniques to estimate completeness:

Comparison with External Data

  • A straight numeric comparison (counts, summation) of elements within a given geographic area compared with equivalent counts from an external source. This is usually the simplest metric to calculate, merely requiring that consistent areas are used for the estimates. This approach can also make use of simple summary statistics.
  • One-to-one comparison of undifferentiated features. Most often used for highway networks, in conjunction with buffering of features from one data source. Enables both numeric comparison with an estimate of accuracy (assuming the external source is presumed to be more accurate than OSM). At a minimum this requires an external geographical dataset of a given accuracy. Often such data might not be open, but may be made available to academic researchers.
  • One-to-one comparison of matched features. This requires conflation of the two datasets. Technically this is harder, but also vulnerable to false positive and negatives (for instance whether datasets classify a feature as a bar or restaurant).
  • Population based estimation. It is possible for many feature types to make crude estimates based on number /100000 head of population. If population data are available (usually via wikidata items on admin units) such estimates can be derived if administrative units are mapped.

Internal comparisons of OSM data

The ability to use OSM data itself to bootstrap completeness estimates becomes more important for areas where large volumes of current statistical information are lacking, or for features which are not important in official data.

  • Cross-locational comparison. Once locations are known to be well-mapped it may be possible to derive metrics for likely numbers of a given feature for a given level of economic development which can then be used as comparative estimators for other places. To be of most use, it would help to have groupings of similar locations (for instance European cities with populations between 250 & 500k are likely to be largely comparable; US & Canadian cities of similar size may or may not & so on).
  • Feature accumulation. In well-mapped places it is predicted that the addition of new features in a given class will tail off trending towards an asymptote representing the likely total number of features in the location. Asymptotes do need to be controlled for sensible bounds (a city which is poorly mapped may show an apparently saturated curve because the active mappers moved away).
  • Feature density. A crude metric is the number of nodes/unit area, which has been used in the past. This is probably too subject to vagaries of individual mapping styles, imported data (often overnoded), but can be valuable to identify under-mapped locations for a given area subject to the same type of mapping influences.

List of completeness resources

List maps / displays / articles / studies here:

UK:

Switzerland

Brazil

The map is never complete

Is the map complete? In true wiki-style, the answer is, the map can never be complete. There will always be more details we can add to the map in any particular area. However levels of completeness do vary widely across different areas of the map, at all scales. Some countries and provinces might have complete landcover, waterways, etc. mapped while others might lack such features. In one town mappers may have collected all the fine details of every shop with addresses, opening hours, wheelchair accessibility information etc. (see Completeness/example). Meanwhile, neighbouring towns may just have a basic road network. One school might have all rooms, doors, paths mapped, while a nearby school might only be mapped as a node. In some parts of the world the map is still a blank slate.

OpenStreetMap is a project involving mass collaboration. We need large numbers of people to get involved to help us create a free map of the world. If an area is not complete, it's because we didn't manage to get anybody involved in mapping that area yet. We'd love to attract local people who will enjoy mapping their neighbourhood and taking ownership of their map. But also existing OSMers can fill in the blanks by travelling to those areas. An incomplete map is a just an opportunity waiting for somebody to go do some mapping!

Measuring completeness

Many people are curious about how the project is progressing. Other people are interested in using OpenStreetMap, and need to assess how complete the map is. Within the project we're also keen on figuring out where the juiciest mapping opportunities remain. For these reasons it's important to try to measure completeness, establishing metrics to enable us to track progress and perhaps plot a map of how complete areas are.

License issues

OpenStreetMap has to maintain a strict clean-room approach to inputting of data, but there may even be issues around use of copyrighted or licensed map data sets for completeness analysis. This makes sense if you consider that any completeness map could be used by the mapping community, steering them towards areas where OSM data is missing a bit. If we can see that an individual road is missing or wrong, then this starts to feel like copying. Zoom out a bit, and maybe it feels more acceptable, but this is exactly the kind of shades-of-grey legal issue which we like to steer clear of, by avoiding proprietary data sources entirely.

So we shouldn't really use existing maps to test completeness of our maps. That leaves us with a rather tricky challenge. How can we test completeness? In the above links you'll see that some people have turned a blind eye to some extent, whilst others have come up with rather clever calculations of completeness metrics using only OSM itself, or other open data sources.


See also