Import CDAU
CDAU Import describes a proposal being developed to import the CDAU, a addresses dataset for Andalusia provided by the Institute of Statistics and Cartography of Andalusia (Instituto de Estadística y Cartografía de Andalucía)
Goals
The goal is to merge addresses from CDAU into the Spanish Cadastre/Buildings Import.
Schedule
- Preparation of the import proposal in the wiki in march, 2018.
- Submission to talk-es.
- Submission to import list.
- Attribution.
- April 2018. Start of the first project in Málaga.
Import Data
CDAU is the set of geographic data of roads and portals of Andalusia, with topological structure, which allows to place in the territory any geographical object (and its associated variables) that has a postal address, with an approximation at the portal level.
The basic entities that are maintained and updated with CDAU are the roads, sections of roads and portals in which the population resides (housing) or in which an activity is carried out (establishments or premises), including all population centers and the scattered ones.
Background
Data source site: http://www.callejerodeandalucia.es/portal/web/cdau/inf_alfa
Online sources: http://www.callejerodeandalucia.es/
Data license: CC BY 4.0
Link to permission: Explicit permission .
Attribution in OSM: In the Contributors page and in the changesets.
ODbL Compliance verified: Yes, with explicit permission.
OSM Data Files
The source files in CSV format can be opened directly in JOSM with the OpenData plugin and using the ETRS89 UTM 30 reference system (EPSG: 25830). No files have been prepared since the data are incorporated into the buildings within the workflow of the Spanish Cadastre/Buildings Import.
Import Type
Community import with manual review.
Data Preparation
Data Reduction & Simplification
Only addresses with the types "PORTAL" (house number) and "ACCESORIO" (accessory) are imported. Excluded are "DISEMINADOS" (diseminated) and "PUNTO KILOMÉTRICO" (kilometric point).
Tagging Plans
These are the fields for the addresses source file with a node for each addresses.
Sorce field | Description | OSM tag | Notes |
---|---|---|---|
id_vial | Road identifier | N/A | |
ine_via | Road identifier in INE (National Institute for Statistics) | N/A | |
dgc_via | Road identifier in Cadastre | N/A | Used to link with Cadastre addresses |
tvian | Type of road (short) | N/A | Five digits abbreviations. |
nom_tip_via | Type of road (long) | addr:street | addr:street = nom_tip_via + ' ' + nom_via. See #Places tagging |
nom_via | Road name | addr:street | addr:street = nom_tip_via + ' ' + nom_via |
sobrenombre | Nickname | N/A | |
id_por_pk | House number code | N/A | |
tipo_portal_pk | Type of house number | N/A | Only addresses with the types "PORTAL" (house number) and "ACCESORIO" (accessory) are imported. Excluded are "DISEMINADOS" (diseminated) and "PUNTO KILOMÉTRICO" (kilometric point). |
num_por_desde | Start house number | addr:housenumber | |
ext_desde | Start house letter | addr:housenumber | If present, addr:housenumber = num_por_desde + ext_desde, e.g. 15A |
num_por_hasta | End house number | addr:housenumber | If present, addr:housenumber = num_por_desde + '-' + num_por_hasta |
ext_hasta | End house letter | addr:housenumber | If present, addr:housenumber = num_por_desde + ext_desde + '-' + num_por_hasta + ext_hasta. e.g.: 15A-15B |
bloque | Block | N/A | |
portal | Entrance | N/A | |
escalera | Stairs | N/A | |
refcatparc | Cadastral Parcel Reference | N/A | Used to link with Cadastre addresses |
txt_app | Additional location data | N/A | |
nom_tipo_agrupación | Type of grouping | N/A | |
nom_agrup | Grouping name | N/A | |
ine_nucleo | Settlement code | N/A | |
nom_nucleo | Settlement / disseminated name | N/A | |
ine_mun | Code of the municipality | N/A | Used for query purposses. Maches with codes in Cadastre according to this tables [1] or [2]. |
nom_municipio | Name of the municipality | N/A | |
cod_postal | Postal code | addr:postcode | |
x | X coordinate | <node lat=* lon=*> | Transformed from EPSG:25830 to EPSG:4326 |
y | Y coordinate | <node lat=* lon=*> | Transformed from EPSG:25830 to EPSG:4326 |
This document adn glossary have been used for reference.
Places tagging
The addresses with certain values in the 'nom_tip_via' field are assigned to the addr:place=* tag instead of addr:street=*. The list of values is configured in the variable 'place_types_es' within the file setup.py.
Changeset Tags
Key | Value |
---|---|
comment | #CDAU_Import |
source | Instituto de Estadística y Cartografía de Andalucía |
source:date | 2018-02-01 |
type | import |
url | https://wiki.openstreetmap.org/wiki/Import CDAU |
Data Transformation
It will be used the CatAtom2Osm tool. The modificatio to download, read and merge the CDAU data was developed here.
Conflation with Cadastre
The two data sets have different number of addresses and may the house number for an address could differ. Each address in Cadastre is uniquely identified by the field 'localId', a string with this format PP.MM.VVV.N.CCCCC, like in 29.900.845.5.3350109UF7635S, where each part have this meaning:
Part | Meaning | Example | CDAU field* |
---|---|---|---|
PP | Two digits code for province | 29 | Fixed for each municipality |
MMM | Thre digits code for municipality | 900 | Fixed for each municipality |
VVV | Road identifier | 845 | dgc_via |
N | House number | 5 | Values could differ |
CCCCC | Cadastral parcel reference | 3350109UF7635S | refcatparc |
* The values to link both sets are not present in all the data.
The CDAU addresses are considered more updated and prevail over those of Cadastre. For each municipality, they will be combined in the following way:
- For each CDAU address we take the group of Cadastre addresses with matching values for 'dgc_via' and 'refcatparc'.
- If this group is empty and the are no nearby Cadastre addresses, we take the CDAU address.
- If the group has exactly an element, this is replaced by the CDAU address.
- If the group has several elements, the nearest Cadastre address is replaced by the CDAU.
Conflation with OSM
The names of the streets are combined with those existing in OSM in a two-phase process through the software and manual review. In a first phase, for each set of addresses with the same road name, the program locates the street in OSM with the closest name in the vicinity. The software generates a conversion table between the source names and their match in OSM that must be reviewed manually. In this phase:
- We detect the streets that have no name in OSM, those that have a name that needs to be corrected and those with doubts for the name. They will be checked with on the ground survey or searching street names plaques in the Cadastral [[ES:Fuentes de datos potenciales de España#Fotos de fachada|facade photos]. Corrections and new names to OSM are manually edited.
- The incorrect pairings made by the program are detected and corrected.
- The conversion of those streets whose addresses do not want to be imported is left blank.
In a second phase, the software incorporates the corrected names to the addresses and merges them with the buildings to be reviewed and imported. For the buildings that do not have an address in the CDAU, the Cadastre address is used.
Data Transformation Results
You can review some samples of the results for the city of Malaga in this repository. The 'address.osm' file contains the results for the entire city. This data is not imported, it is used only as a reference and to access the front photos through the link contained in the image tag (the Tag2Link JOSM plugin is required). The addresses are combined with the buildings by the CatAtom2Osm program, which generates files by blocks. The files 'u????.osm.gz' are an example of some of them.
Data Merge Workflow
We will follow the workflow described in the Spanish Cadastre/Buildings Import.
Team Approach
A manager by area will be responsible for the transformation, preliminary review of data and unification of street names.
Workflow
The manager will publish projects in the Task Manager open to participation.
Conflation
The program excludes those addresses that are already present in OSM. The addresses collected in the field have priority over the data to be imported.
QA
During the manual incorporation of the data into OSM, participants correct collisions of the two data sets and review each portal number with the facade photos.
Updates
When new data is published, the differences will be filtered and incorporated manually.
References
- Thread about the license in the Spanish OpenStreetMap community.