User:B1tw153/Missing Alaskan Peaks Import Plan
Missing Alaskan Peaks is an import of records in the Summit class of the GNIS dataset which is a database covering natural and other features in the United States). The import is currently (as of (2023-12-12)) at the planning stage.
Goals
The goal of this import is to add the more than 1700 GNIS records of Summits in Alaska that are currently not present in OSM.
Schedule
There is no urgency to this import. It will proceed only if and when there is community approval for the import.
Import Data
Background
Provide links to your sources.
Data source site: https://www.usgs.gov/us-board-on-geographic-names/download-gnis-data
Data license: Public Domain (as described in the data source site)
Type of license (if applicable): The SPDX License List does not include a generic identifier for US Government works in the public domain.
Link to permission (if required): N/A
OSM attribution (if required): Not required. However, all entities will include a gnis:feature_id=* tag to identify that the data was sourced from GNIS.
ODbL Compliance verified: yes
OSM Data Files
The original source data file is the Domestic Names text file from GNIS released on 2023-12-01. This file contains names for natural and other features extracted from the Geographic Names Information System (GNIS), the US Federal Government's repository of official geographic names.
I have extracted the subset of records with "Summit" in the "feature_class" column and "Alaska" in the "state_name" column in the original source data file into an Alaska_Summit_20231201.txt file.
Import Type
This is a one-time import. The data will be imported into OSM in a single changeset using JOSM.
Data Preparation
Data Reduction & Simplification
The source data will be run through an automated conflation process to remove records that do not match an existing feature in OSM. This automated conflation process has been already been used for several other tasks including the manual addition of Summit records from GNIS to the remaining 49 States and US Territories.
Tagging Plans
The source GNIS records are all in the Summit Feature Class. This Feature Class in GNIS includes the summits of mountains and hills, and a small number of mountain ranges and groups of hills.
By default, all nodes to be imported will be tagged with natural=peak unless one the name of the feature suggests otherwise. If the name of the feature ends in "Hill", the node will be tagged with natural=hill. If the name of the feature ends in "Mountains" the node will be tagged with natural=mountain_range. If the name of the feature ends in "Hills", the node will be tagged with natural=hills.
All nodes will be tagged with name=* with the value of the "feature_name" field supplied by GNIS. GNIS is the authoritative source for names for these features in the US.
All nodes will be tagged with gnis:feature_id=* with the value of the "feature_id" field supplied by GNIS. This will allow us to refer back to the source GNIS record to verify or update information in OSM in the future.
All nodes will be tagged with gnis:reviewed=no to indicate that they were imported without a manual review. This tag may be removed from individual nodes by mappers in the future after they have reviewed them.
Changeset Tags
Fill in the values your changesets will use.
Key | Value |
---|---|
comment | GNIS Summit records imported from the 2023-12-01 GNIS data file, processed for automated conflation by gnis-matching-bot/0.1, and transformed from NAD83(1986) to WGS84. |
import | yes |
source | USGS GNIS Domestic Names |
source:url | https://www.usgs.gov/us-board-on-geographic-names/download-gnis-data |
source:date | 2023-12-01 |
import:page | link to this wiki page |
source:license | Public Domain |
Data Transformation
The Alaska_Summit_20231201.txt file is used as input to the gnis-matching-bot/0.1 with parameters to identify only non-matching records. The output of this process is an OsmChange XML file containing one node tagged with natural=peak, name=*, gnis:feature_id=*, and gnis:reviewed=no for each Summit record where the tool did not find a matching feature in OSM with a similar name, tags, or location.
The OsmChange XML file will be transformed from the NAD83(1986) coordinate system used by GNIS to the WGS84 coordinate system used by OSM using QGIS and the NADCON 5 projections.
Data Transformation Results
The Alaska_Summit.osc file as processed by the gnis-matching-bot/0.1.
The Alaska_Summit_WGS84.geojson file as transformed by QGIS.
Data Merge Workflow
Team Approach
I will be doing the preprocessing and import solo.
References
I have already used this same process including the automated conflation to manually add the missing Summit records to all of the other 49 States and US Territories, which involved reviewing each Summit node individually against aerial imagery, USGS Topo maps, and USGS 3DEP Hillshades. During that process, I have become familiar with the accuracy of the automated conflation and the limitations of the source data set.
Workflow
- The latest (2023-12-01) Domestic Names file will be downloaded from the USGS GNIS web site.
- An extract file containing only Summit records in Alaska will be produced.
- The extract file containing Alaskan Summits will be processed for automated conflation using the gnis-matching-bot/0.1.
- The output file from conflation will be transformed from the NAD83(1986) coordinate system to the WGS84 coordinate system using QGIS.
- The transformed file will be opened in JOSM to adjust the tags based on the names of the features.
- The transformed file will be manually reviewed to identify possible duplicate records from the GNIS source data.
- Twenty five geographically dispersed nodes will be selected at random and the import data will be compared to OSM data, aerial imagery, USGS Topo maps, and USGS 3DEP Hillshades to QA a sample of the data. The gnis:reviewed=no tags will be removed from the QA'd nodes.
- If the QA sample does not identify any substantial issues with the data, the data will be imported into OSM in a single changeset. This changeset will cover the entire state of Alaska and will add approximately 1700 nodes. The changeset will not modify any existing features in OSM.
Since the changes do not modify existing features, the changeset may be reverted in whole or in part if it is determined that the imported data is not satisfactory.
Conflation
This import will use an automated conflation process using the gnis-matching-bot/0.1 tool that I have developed with help from Matt Whilden (watmildon). The gnis-matching-bot uses the information in a GNIS record to query Overpass for OSM data in the area of the feature and with attributes that are similar to the feature. The tool then parses this OSM data in an attempt to find OSM features that match the GNIS Feature ID, name, tags common to the GNIS Feature Class, and geometry. The tool examines all data from one or more Overpass queries to find all potential matches for a GNIS record.
QA
Twenty five geographically dispersed nodes will be selected at random and the import data will be compared to OSM data, aerial imagery, USGS Topo maps, and USGS 3DEP Hillshades to QA a sample of the data.
See also
The post to the community forum was sent on 2023-12-13 and can be found here