LINZ/Address Import
- About
- This page documents the original LINZ Address Import from 2017. The more recent import is documented at Import/New Zealand Street Addresses (2021)
- Captured time
- 2017
Initial import complete
The import described below was completed in June 2018 with nearly two million addresses imported. (Note that "complete" doesn't mean that every address was imported, just the process ended)
The import is summarised in this spreadsheet
The source data contained some sets of addresses with many addresses at exactly the same coordinate e.g. apartments or flats. Coincident points mustn't be added to OSM so either the points were manually distributed in space, or they were deleted, perhaps replace with a single address e.g. "1/3 A Street" to "100/3 A Street" replaced with "100 A Street"
The source data was split using the name of city/suburb or hamlet. During import, it was found that there are some hamlets with identical names in different parts of the country: Aramoana, Awatuna, Belmont, Blue Mountains, Clifton, Dalefield, Karamu, Kinloch, Longbush, Matahiwi, Muriwai, Ngahape, Otara, Owhiro, Purangi, Tokanui, Waikawau, Woodside. Any future import/update should consider this.
Source data
Data sourced from LINZ simple street address layer NZ Street Address
Structure of data is documented in the Data dictionary. Details are shown in the #Notes section at the bottom of this page.
Conversion and Tagging
Subsets of addresses to be processed differently.
address_type: Water | Road (initially will ignore water)
town_city + suburb_locality: Suburb+City | Locality | Town
We won't use the _ascii fields.
Node keys
in generated osmchange file
LINZ | OSM | Comment |
---|---|---|
address_id | LINZ:address_id or ref:linz:address_id=* | Explicit connection to source data, to be used for maintenance |
full_road_name | addr:street=* | E.g. "Open Map Street" |
full_address_number | addr:housenumber=* | E.g. "2" or "3A" or "4/56B". This follows NZ addressing conventions for unit number prefix. |
<coordinates> | <location> | Datum converted NZ to OSM |
suburb_locality | addr:suburb=* or addr:hamlet=* | suburb when town_city also present, otherwise hamlet. |
town_city | addr:city=* | When present in source, suburb_locality is always present |
NOTE 1: the inclusion of addr:city, addr:hamlet, addr:suburb keys is mainly for the mapper to verify against the underlying map.
When this information is already present in e.g. place=* on an area (very likely in urban areas), the redundant information will be removed by the mapper doing the upload. The mapper may also choose to create or adjust a suburb or place boundary or POI manually.
NOTE 2: About 5k addresses are "ranged". I.e. they have an "address_number_high" in the source database. A few also have unit numbers. And the full address in the source data looks like "4B/22-26 High Street". This import proposal uses only "address_number" and ignores the "address_number_high" as redundant.
Changeset keys
Keys on each changeset changeset:
- source=* https://data.linz.govt.nz/layer/3353-nz-street-address
- source:revision=* The linz dataset revision number, e.g. "43"
- attribution=* https://wiki.openstreetmap.org/wiki/Contributors#LINZ
- comment=* e.g. "LINZ addresses for <suburb> <city>"
Filtering
Identifying Duplicates
- Obtain all OSM items that contain addr:housenumber
- Find centroid of ways (buildings) to use as position for proximity testing.
- Generate table of node/way id, obj type, position, addr:street, addr:housenumber
- Convert LINZ positions to WGS84 SRID=4326 for comparison with OSM positions
- Match with LINZ data on proximity + housenumber (+ street if present).
Other odd stuff, involving relations, interpolations etc. Initially, relations can be ignored. Members that are points or polys with addr:housenumber can be identified as duplicates by number and proximity to LINZ address with same number.
What to do with the duplicates?
Eventually, all items that are real duplicates would have LINZ id attributes added.
Nodes with only addr:housenumber that aren't part of a relation - add addr:street
Addresses that seem to be duplicates, but are far away from the LINZ point location would be reviewed by a person. E.g. sometimes houses get tagged with the correct number, but the wrong street name.
Polygons (i.e buildings), need to discuss. EliotB opinion is to add the address as a node which applies to location independent of what is built on it. The building can be demolished, but the address remains valid.
An idea of how many duplicates there might be:
# Get NZ items that contain addr:housenumber > wget -O nz_addr.osm "http://www.overpass-api.de/api/xapi_meta?*[addr:housenumber=*][bbox=157.5,-59.0,179.9,-25.5]" > spatialite_osm_raw -d nz_address-osm_raw.spatialite -o nz_addr.osm
Analysis: nodes 12727, ways 33009, relations 22
Non Duplicates
Approximately 3% of NZ addresses are already in OSM in some form (40K/2M). The remaining 97% will be new.
In the first pass, potential duplicates will be identified, and saved as a separate dataset for later processing. The remaining non-duplicates will be uploaded in batches.
Batch membership will be determined by town_city & suburb_locality in common.
A quick delve into the data gives
- Water addresses 160, Road 1.9 million
- Localities (town_city = NULL) 1930 distinct. 1 to 2800 items per locality. 25 localities with > 1000 addresses
- town + suburb 1182 distinct, 258 of which have town=suburb. 645 have <1K addresses, 907 < 2K, 6 > 10K
Using the above distinction would give about 3100 separate batches.
An osmchange file will be generated for each place. After review, each will be uploaded as a separate changeset.
Using JOSM to import a dataset
- Ensure that your import-specific OSM user ID is active. (Edit/Preferences..)
- Load the generated changeset (File/Open...) e.g. The_Place.osm
- Download the existing OSM map data (File/Download data...) Select 'Download as new layer'. *Don't use (File/Download in current view)*, because it downloads into the same layer as the local data. Or download into existing OSM map layer. You may also want to enable an imagery layer, e.g. LINZ NZ Aerial Imagery
- Check that the roads implied by the address data are present.
- For small villages or localities, consider adding (for instance) place=hamlet at the centre.
- (removed this step which was deletion of addr:suburb where suburb boundary exists)
- Check for any conflict e.g. Run the conflation plugin with OSM data as Reference, new layer as Subject.
- Get the contents of The_Place.changeset_tags into the clipboard.
- Upload data (File/Upload Data, make sure the changeset layer is selected first). Go to [Tags of New Changeset], Paste the changeset tags: Click the button with 3 plus signs.
Common problems/solutions
- Sometimes the LINZ data has multiple addresses at exactly the same location. This will result in a warning when you try to upload. To solve this, move one of the points a short distance away, then select all the points and use (Tools/Distribute Nodes) to spread them evenly along a line OR delete all but one of the points before upload.
- There are a few place names where there are two localities with the same name. This can cause confusion... Aramoana, Awatunua, Blue Mountains, Clifton, Dalefield, Karamu, Kinloch, Longbush, Muriwai, Ngahape, Otara, Owhiro, Purangi, Tokanui, Waikawau, Woodside
Maintenance
Record the version of the LINZ database used to generate each import.
Periodically retrive a new version of the LINZ dataset, and obtain list of additions, deletions, changes w.r.t. previous set. The list would be checked manually against OSM (potentially a bot could do the checking, can investigate after the manual process has been trialled)
How to find new things entered on OSM side? Addressed items lacking ref:linz:address_id=* would be candidates.
Software Tools
Related Discussions
- Thread on imports list and maybe others.
- Thread in NZ OpenGIS group and maybe others.
- Real time chat
Notes
- See Addresses#Denmark and Addresses#Norway for examples of other countries with full scale address 'import'.
Details of source database
Name | Data Type | Length | Precision | Scale | Example | Description |
---|---|---|---|---|---|---|
address_id | integer | 32 | 0 | 505588 | AIMS unique identifier for an address. | |
change_id | integer | 32 | 0 | 1304726 | AIMS unique identifier for the address version. | |
address_type | varchar | 20 | Road | The type of address. Includes: Road and Water. | ||
unit_value | varchar | 70 | Alpha numeric value for a unit | |||
address_number | integer | 32 | 0 | 1 | Address number | |
address_number_suffix | varchar | A | Alpha numeric characters that may follow the address number. | |||
address_number_high | integer | 32 | 0 | High address number of a ranged address. | ||
water_route_name | varchar | 100 | Name of the beach the water address relates to. Currently this contains the
captured segment of coastline. This will be blank for ROAD addresses. | |||
water_name | varchar | 100 | Water body the address relates to. This will be blank for ROAD addresses. | |||
suburb_locality | varchar | 80 | Dannemora | Suburb/Locality from the NZ Localities (NZ Fire Service owned dataset). | ||
town_city | varchar | 80 | Auckland | Town/City from the NZ Localities (NZ Fire Service owned dataset). | ||
full_address_number | varchar | 100 | 1A | All number components concatenated for an address. | ||
full_road_name | varchar | 250 | Joe Bloggs Road | All road name components concatenated for an address. This has been derived
from the ‘Landonline: Roads’ Data and will move to using the new ‘Roads’ data tables when they become available’. | ||
full_address | varchar | 400 | 1A Joe Bloggs Road, Dannemora, Auckland | All address components concatenated for an address. | ||
road_section_id | integer | 32 | 0 | 199943 | Landonline Road Centreline ID (RCL_ID). | |
gd2000_xcoord | numeric | 12 | 8 | 174.9255518167 | NZGD2000 X-coordinate for the address in metres. | |
gd2000_ycoord | numeric | 12 | 8 | -36.9246773 | NZGD2000 Y-coordinate for the address in metres. | |
shape | geometry | <geometry> | Spatial geometry for the point in long/lat GD2000 ESPG 4167. | |||
ascii variants | not going to use |