Import/LINZ Topo50 Continuation
LINZ Data Import | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Author: | Kylenz | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
License: | MIT License | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Platform: | Web | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Status: | Active | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Version: | 1.0.0 (2021-03-09) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Language: | multiple languages | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Website: | https://osm-nz.github.io/RapiD | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Source code: | osm-nz/linz-address-import GitHub | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Programming language: | TypeScript | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Modification of RapiD to compare and update OSM data based on LINZ data |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Background
Data from LINZ's Topo50 maps was imported into OSM between 2009 and 2016. Not all the datasets were imported during this time. This page documents the process used to import data since 2021. The main wiki page contains details about the tagging and the current status.
Source data & source code
- Source code for the backend (data processing part): osm-nz/linz-address-import GitHub and osm-nz/place-name-conflation GitHub.
- Source code for the frontend (fork of RapiD) is available from osm-nz/RapiD GitHub.
This project is just a small modification of the LINZ Address Import system. Most of the code is the same.
How it works
Every imported feature has the tag ref:linz:topo50_id=* or ref:linz:place_id=*, which is the unique UFID used by LINZ. This allows the data to be easily conflated, just like NZ Street Addresses. This works as follows:
- A script runs daily during the import, which downloads the OSM Planet file (geofabric lets you download just Oceania + Antarctica).
- Every occurrence of the tag ref:linz:topo50_id=* and ref:linz:place_id=* is extracted from the OSM Planet file
- The list of
topo50_id
s to ignore is downloaded from the Google Sheet. - Each incomplete LINZ layer is processed: the features that are not in OSM nor in the Google Sheet are converted to geojson.
- The geojson files are split up into geographic regions depending on its size:
- Datasets with very few features are crudely split into 8 large areas (roughly equivalent to NZ Regions)
- Datasets with a moderate number of features are split into 33 areas according to this map (roughly the size of NZ Districts)
- Datasets with a large number of features are split into 'sectors', such as
K15
. The mainland is divided into 26 columns (A-Z) and 26 rows (1-26). Sectors span roughly 0.5 degrees of latitude and 0.5 degrees of longitude.
- The segmented geojson files are uploaded to the CDN, along with the geojson files from the LINZ Address Import.
- --
- The list of datasets that were uploaded can be seen in the fork of RapiD and the JOSM download page.
- When you select a dataset, it becomes 'locked' for an hour.
- If you upload some- or, all of- that dataset, it becomes 'locked' until the next daily conflation (step 1).
- If you use RapiD, and click 'Ignore this feature', it gets added to the Google Sheet from step 3. Otherwise you would get prompted to add that feature forever. \
This process is a small part of the pipeline for LINZ Addresses. This page has a flowchart which shows the entire system.
Continuing partially completed layers
Some layers were partially imported between 2009 and 2016, but without the ref:linz:topo50_id=* tag. This makes conflation more difficult.
However, it is possible to re-continue these layers. This is still a work in progress.
Method A:
- Use overpass-turbo to identify which parts of the country are already imported. Define these areas as bboxes
- Update the code for that layer to skip features within those bboxes.
Method B:
- The data that is already in OSM is extracted using overpass-turbo, and downloaded as geojson.
- We loop through every feature in OSM, and find the closest feature in the LINZ data.
- If the nearest LINZ feature is within 3 metres of the OSM feature, we add the tag ref:linz:topo50_id=* to the OSM feature. This is done in bulk using Level0 (exact steps tbc)
- The rule above is more complicated for ways, areas, and multipolygons. We check if 80% of the nodes in OSM are within 3 metres of the a node in that LINZ feature.
- The next day, the conflation process will pick up the existing features in OSM, since they now have the ref:linz:topo50_id=* tag.
How do I contribute?
The tool is available here, anyone can import data. If you prefer using JOSM, you can download osmChange files from here (however, this is not the recommended option)
Potential issues
This table will be updated as the project progresses.
Issue | Mitigation |
---|---|
Duplicate data being imported | The fork of RapiD has an added feature to prevent duplicate addresses being imported based on the ref:linz:topo50_id=* tag. This conflation happens in real-time, in the browser |
Multiple people editing the same dataset at the same time | Users will be presented with a warning if someone else is/was editing that dataset in the last hour |
Duplicate nodes when importing data that abuts existing features | RapiD will intelligently re-use nodes that are already in OSM. If this is not good enough, iD#8671 will make it easy to join abutting ways. |
Imported rivers/roads are disconnected from existing features | ^ |
Imported rivers cross roads | RapiD's validator will warn you about this |
LINZ's data uses way too many nodes at corners | We use the Douglas-Peucker algorithm to simplify the geometry during processing. The original import did not do this, so if a way imported after 2020 abuts an way imported before 2020, there may be a gap where the ways don't abut. |
Ways with over 2000 nodes break OSM | If an Area has >2000 nodes, it gets split into a MultiPolygon with multiple outer ways, each with at most 495 nodes.
If a MultiPolygon has >2000 nodes in one of it's ring, that ring gets split into segments with up to 495 nodes each. |
LINZ's data is out of date | This hasn't been an issue yet, but mappers can press Ctrl+B to cycle through LINZ Aerial Imagery (2017), Maxar (2021), and the LINZ Topo50 map. |
No aerial imagery available for parts of the Ross Dependency | You need to use the standard OSM-Carto tileserver as your background imagery, and reference a separate map like LINZ's Ant50 series. |
Merging in new features destroys the OSM object's history | Fixed in iD#8708 |
Working with hex colours in our custom iD presets is confusing | Fixed in iD#8782 |
Duplicate hydrographic data due to overlapping charts | We only consider data from the most detailed chart available for that area. Features that cross multiple charts will be flagged and manually merged in RapiD. |
Hydrographic data crosses the antimeridian | We download the OSM planet extract in two chunks: west and east of the antimeridian. And we split all datasets into east/west of the antimeridian. |
Some lines and areas cross the antimeridian | For lines, we will use type=multilinestring. For areas, we will use a type=multipolygon with closure_segment=yes on the virtual boundaries |
A small number of hydrographic features reference the legend of the nautical chart. These legends are not available from LDS. | We will still import these features, with the tag description=see XXXXXX.txt. If these descriptions are made available, we can easily add them to the features. |
The seamark tagging schema is very complicated for mappers | We have created our own iD presets and rendering styles for the most common seamark tags |
Some obvious data is missing (e.g. fairways, ski access lanes, coast guard stations, surf-life-saving bases, patrolled beaches) | This data is managed by the local harbourmaster, and isn't included on nautical charts. We will create iD presets to make it easy to map these features. |
LINZ's Topo50 data generally does not associate topographic features with names. | Names are downloaded as a separate layer from the NZGB dataset. This means there will be two layers in the tool (e.g. 'Peaks' and 'Named Peaks') |
type=multilinestring is not a first-class data type in OSM and is not supported by any known software. | No solution, there is no other way to represent a discontiguous linear feature. |
MultiPoints (site relations) are not a first-class data type in OSM and aren't supported by the planet-extraction software we use. | No current solution, these features are skipped by the conflation tool (E.g. Redwood Station Redwood Station) |
Hydrographic data is some areas is completely missing | This issue is unresolved and still under investigation |