Durham County North Carolina Address Import
Goals
To add every missing address in Durham County to OpenStreetMap without creating duplicates, and to merge the addresses with existing features.
Schedule
The data has been converted to OSM XML format and duplicates of existing addresses in the OSM database have been removed. The data can be found here. The import is currently under way, using the OSM US Tasking Manager. Please see the project for more information.
Import Data
Background
Data source site: Durham Open Data
Data license: Open Database License
Type of license: Open Database License (ODbL)
Link to permission: N/A
ODbL Compliance verified: Yes.
OSM Data Files
Import Type
A one time import that will be completed in many small uploads via the OSM US Tasking Manager.
Data Preparation
Data Reduction & Simplification
Tagging Plans
For the source tagging, "source:addr"="Durham Open Data" should be used on each address. Using "source:addr" rather than "source" has been decided because after merging the address with a building, it seems as if the building comes from Durham Open Data, which is not true. I will also use addr:housenumber, addr:street, addr:city, addr:state, and addr:postcode. addr:unit will be used on addresses that include a unit. addr:country will not be used. Some people argue that adding source tags to each address is unnecessary, but I believe that it helps new mappers know where the address came from when they are editing it, helping them make better decisions about merging, editing, or deleting the address.
Changeset Tags
"source"="Durham Open Data", "source:website"="https://opendurham.nc.gov/explore/dataset/addresses/export/", "source:date"="July, 2018", and adequate comments such as "Imported addresses in Durham County. (upload 3/15)" will be used on each of the 15 changsets.
Data Transformation
All needed data transformation has been completed using a combination of JOSM, the opendata plugin, and a custom made XML editing Java program.
- First, the data was downloaded in KML format from Durham Open Data.
- Second, the data was opened with JOSM using the opendata plugin.
- Then, the file was saved in OSM XML format without changing the tags.
- Every object (node, way, relation) with "addr:housenumber" and "addr:street" in Durham County was then downloaded using the Overpass API. They were saved as well.
- The address editing program created by Leif Rasmussen was run, plugging in the new data and existing data. The program corrected casing (CHAPEL HILL -> Chapel Hill), created addr:street ("streetDirection"="N", "streetName" = "COLUMBIA", & "streetType"="ST" -> "addr:street"="North Columbia Street"), and removed duplicates of existing addresses by comparing the dataset to the existing addresses in the OSM database.
- The file was then opened and the data was cleaned up.
- The file was finally split into 15 manageable chunks and saved on Google Drive.
Data Transformation Results
Data Merge Workflow
Team Approach
Some talk has been going on on the imports mailing list, and the opportunity for growing the local mapping community has been highlighted. The import is now using the tasking manager from OSM US (project 46). Anyone with enough experience is welcome to start mapping addresses!
Workflow
- Open a square from the OSMUS Tasking Manager.
- Download the addresses in OSM format with duplicates of existing addresses already removed.
- Merge layers .
- Remove data not in square.
- Merge addresses with amenities and buildings manually.
- Upload to OSM server.
Conflation
My data transformation Java program automatically removes duplicates of existing addresses from the dataset so that only missing addresses are added. It accounts for casing, abbreviations, and other issues with existing data to provide the most accurate duplicate removal possible. Conflation will not be a major problem, only in the cases where existing addresses have incorrect information.