Import/Catalogue/Milan addresses import
About
This page is about importing addresses in OSM planet file from the data provided by the Municipality of Milan (Italy).
The Municipality of Milan released their complete address data for State of the Map 2018. More info at Arriva State of the Map a Milano, il Comune rilascia oltre 60.000 numeri civici come open data (in Italian).
The import has been discussed in this Italian OSM mailing list thread. This wiki page is the result of consensus there.
Import Data
Background
Address format
House numbering follows the European scheme. An address is determined by its streetname and housenumber. Housenumber is also unique per street.
Housenumbers can include:
- subordinates, noted with suffix letters (e.g. in "7a", subordinate "a" ); subordinates usually arise when a new house is built between existing houses with subsequent housenumbers
- extensions, noted with a slash "/" followed by an integer; most cases occur when a single entrance is shared by different buildings.
Legal
Data source site: https://dati.comune.milano.it/dataset/ds634-numeri-civici-coordinate
Data license: https://geoportale.comune.milano.it/sit/toponomastica/
Type of license: CC-BY-2.5-IT
Waiver: https://geoportale.comune.milano.it/sit/toponomastica/
Addendum to CC BY 2.5 IT Licence with respect to following datasets: “Numeri Civici”, “Toponimi (Viario)”, “Centroidi toponimi”.
In case of reuse of “Numeri Civici”, “Toponimi (Viario)”, “Centroidi toponimi” datasets, the attribution by OpenStreetMap and its users through http://wiki.openstreetmap.org/wiki/Contributors is sufficient to provide attribution to Comune di Milano (City of Milano) in a manner that is “reasonable to the medium or means” in accordance with Section 4(b) of the CC BY 2.5 IT license.
In case of reuse of “Numeri Civici”, “Toponimi (Viario)”, “Centroidi toponimi” datasets, OpenStreetMap’s method of providing references to the original dataset and original license terms through http://wiki.openstreetmap.org/wiki/Contributors satisfies the requirements of Section 4(b) of the CC BY 2.5 IT license. OpenStreetMap users satisfy the requirements of Section 4(b) of the CC BY 2.5 IT license by referencing http://wiki.openstreetmap.org/wiki/Contributors in accordance with OpenStreetMap’s attribution requirements. Comune di Milano (City of Milano) waives any limitation in Section 4(a) of the CC BY 2.5 IT license on OpenStreetMap and its users using effective technological measures on OpenStreetMap data with the understanding that the Open Database License OdBL 1.0 requires open access or parallel distribution of OpenStreetMap data. In every case, this waiver has no impact on Comune di Milano (City of Milano)’s right or ability to distribute or license the above-mentioned datasets on any terms it wishes.
ODbL Compliance verified: yes
Attribution in the Contributors page is fine for data owner as stated above. It will be enough to add the following statement in the Contributors page: “Contains data provided by Comune di Milano released under CC-BY-2.5 IT license.”
Source data
Dataset identification string is "ds634" and has been downloaded from Numeri civici con coordinate geografiche page, Milano municipality website.
Import Type
The dataset will be cleaned and OSM-formatted by Openrefine; then it will be conflated with OSM conflator and published in a shared audit maps prior to upload.
Data Preparation
Operations applied to original dataset are listed in this operations file. Due to dataset large size (60k nodes), import shall be split on "MUNICIPIO" dataset field, which matches OSM admin_level=10 boundaries.
Tagging
The CSV file consists of a collection of punctual elements, one for each housenumber.
The following fields will be evaluated:
- NUMERO: housenumber (e.g. 11 in 11A)
- LETTERA: subordinate (e.g. A in 11A)
- BARRA: subordinate for numbers (e.g. 1 in 11/1)
- BARRA2: subordinate for numbers (e.g. 1 in 11/1)
- NUMEROCOMPLETO: complete housenumber assembled with previous fields
- STATOCIVICO: for pruning rows (2=present; 4=only in the database; 99=suppressed)
- DATA_SOPPRESSIONE: for pruning rows (address suppression date)
- LONG_WGS84
- LAT_WGS84
- CAP postal code addr:postcode
- MUNICIPIO used for splitting import and, optionally, for setting addr:district tag
- IDMASTER: official housenumber id, used for conflation and optionally for OSM loc_ref tag
Housenumber
addr:housenumber has been built lowercasing NUMEROCOMPLETO field.
Sample: "94n01", "93p01", "93p02", "90/10", "88p01", "94", "73a", "90/15".
Changeset Tags
Changeset will be tagged with:
- source=Comune di Milano
- source:license=CC-BY-2.5
- type=import
- url=https://wiki.openstreetmap.org/w/index.php?title=Import/Catalogue/Milan_addresses_import
Thus people will know the data has been imported following the guidelines and they will find this page for details.
Data Transformation
After the data preparation process, the following workflow has been performed on a subset (MUNICIPIO=5):
- dataset pruned records have been converted in a json file;
- Json file has been processed thru OSM conflator, using this profile;
- Preview conflated data has been uploaded in an audit map for shared review.
Data Transformation Results
After completion of the audit process, the OSM XML upload candidate file will be available here TODO
Data Merge Workflow
Non-node objects
Address data in Italy must be placed exclusively on nodes because the housenumber identifies the external access that leads from the street to the housing units (houses, stores, offices, etc). Please read https://wiki.openstreetmap.org/wiki/IT:Addresses#Regole_specifiche_per_l.27Italia (in Italian) for more details. At present date, query result for housenumbers applied to polygons or multipolygons count 1134 matches. Distance from dataset nodes and polygon centroids can often be more than conflation 10 meters usable radius, causing several cases (tagged with fixme "suppressed or wrong position: please check") that will need post-import QA inspection.
Conflation
Conflation is performed by OSM Conflator. Objects tagged ad natural=tree and denotation=natural_monument will be extracted from OSM in a bounding box defined by source dataset. Conflator output shall generate a public audit map for visual review.
OSM objects to be conflated
The following query gathers OSM objects for "Municipio 5" Milan district:
[out:xml][timeout:25]; area[name="Municipio 5"]["old_name"="Zona 5"]["admin_level"=10]->.searchArea; ( nwr["addr:housenumber"](area.searchArea); ); out meta qt center;
At present (March 2020) there are about 24k addresses already present in OpenStreetMap. In Municipio 5 subset, addresses are about 1k and exported data from query above (export.osm) will be piped to conflator.
Addresses and tags already present are merged by conflator using authoritative addr:housenumber and addr:street. Existing OSM unmatched addresses will be kept in order not to remove other useful tags (amenities, shops, etc).
Matching addrs
Any matching between input dataset and OSM element within a range (defined in profile.py) shall be considered and a proposal for change will be displayed in an audit map as a blue pin.
New addrs
Any input dataset address which has not OSM matches around the above range, will generate a proposal for a new OSM address and will be displayed in an audit map as a green pin.
Not in dataset
Existing OSM elements which don't have an input dataset match will generate a proposal for a fixme tag; text shall be 'this addr is missing from source dataset: please check'. They will be displayed in an audit map as a blue pin.
Conflator output example
pi@raspberrypi:~/OSM conflate -i municipio5.json --osm export.osm -v -c previewM5.json profile.py
08:37:53 Found 421 duplicates in the dataset
08:37:53 Read 4876 items from the dataset
08:37:53 Downloaded 1085 objects from OSM
08:38:13 Matched 790 points
08:38:13 Removed 401 unmatched duplicates
08:38:13 Adding 3685 unmatched dataset points
08:38:14 Deleted 0 and retagged 295 unmatched objects from OSM
pi@raspberrypi:~/OSM
Conflator re-run
Once audit is completed, online data is downloaded from conflator project page (example) and reprocessed.
pi@raspberrypi:~/OSM conflate -i municipio5.json -a audit_MI-M5.json -o M5.osm profile.py
[some echoes...]
pi@raspberrypi:~/OSM
Candidates
Municipio | Audit published | Post audit conflator run | File |
---|---|---|---|
9 | 2020-06-01 | 2021-05-18 | M9.osm |
Team Approach
This import is managed and supervised by:
- Cascafico (import account: attilaimport)
During the upload process, the subset import will be evaluated; possibly the batching criteria will be municipal district (Municipio, in Italian).
Reverting
In case of import anomalies, changeset(s) will be reverted using OSM reverter scripts or, if possible, the JOSM Reverter Plugin.
Post-import QA
Street names
After the import, addr:street names could be slightly different than current street names.
These differences should be caught using OSM Inspector (map already centered on Milan).
Unmarked streets
The result can be used to locate areas where streets are missing.
Missing roads will be created in JOSM using PCN 2012 areal images.
Unnamed streets
The result can be used to derive street names for unnamed streets when all the nodes along the street have the same addr:street value.
Missing road names will be identified using the OpenStreetMap NoName Map Overlay:tms:http://tile3.poole.ch/noname/{zoom}/{x}/{y}.png
OSM Inspector can also be used to find these streets.
Non-node objects
Since several polygon and multipolygon OSM address objects will be tagged as in wrong place, manual adaptation or deletion has to be performed.
See also
The email to the Imports mailing list was sent on 2020-04-04 and can be found in the imports mailing list archives.