Madrid Drinking Water Import
Goals
The goal is to import and update drinking fountains from the Madrid City Council database, see: ES:Import_Ayuntamiento_Madrid (in spanish)
In Madrid there were 562 drinking fountains already in OSM. As the nodes to import have no useful information other than the coordinates, it made no sense to merge them. So we did not import those.
Schedule
- July 4 2016: We started the preparation and the discussion. See the discussion in the talk-es list (in spanish)
- July 29: Sent the proposal to the imports list.
- August 1: Import started
- September 25: QA finished
- September 28: Report sent to talk-es and to Madrid City Council open data site.
- November 28: The City Council published the first update
- January 16 2017: Created a proposal for the first update
Import Data
Data description
The original dataset is csv format. It includes a total of 1,553 spots.
We'll use the csv file for processing. You can download the data here.
We verified the data quality using our city knowledge, Mapilliary and orthophotos (PNOA and Bing).
Background
ODbL Compliance verified: YES
The license can be consulted here (in Spanish). It requires attribution of the source, so all nodes and the changeset will include source=Ayuntamiento de Madrid. The Madrid City Council (Ayuntamiento de Madrid) is already on the contributors list.
After sending the report with the problems and errors found during the drinking water import, including a section about the license, got a response from the "Subdirección General de Transparencia del Ayuntamiento de Madrid" (the data owner), and they confirmed me that we comply with their attribution clause using their data in OSM if we clearly specify the source and the source:date, the way it was done on the Madrid_Drinking_Water_Import. And they do like the work we are doing :) Kresp0 (talk) 13:27, 16 October 2016 (UTC)
Compatibility with the ODbL was already discussed in the imports and talk-es mailing lists.
Import Type
The import will be done manually, using the Tasks Manager, so it can be assigned to different volunteers.
First import data preparation
Data Reduction & Simplification
As mentioned before, the data file we have chosen is in csv format. Some corrections are part of the data conversion script. Some of the fields aren't relevant and will be ignored.
Tagging plans
Here are the original fields, their meaning and how they will be converted to the resulting OSM file:
Original Fields | Meaning | OSM tag | Comments |
---|---|---|---|
DISTRITO | City district | ||
CODIGO | Internal reference number | ||
ZONA VERDE/VIA PUBLICA | If the fountain is on the street or a park. | We need to know if it is "PARQUE HISTORICO SINGULAR O FORESTAL" because then a different coordinate system is used. | |
DIRECCION | Street, number. | Sometimes contains only the street. Would be addr:street=* and addr:housenumber=* or add:full=* | |
COMPLEMENTO DE DIRECCIÓN | Street, number. | Sometimes contains only the street. Would be addr:street=* and addr:housenumber=*. or add:full=* | |
COORD. X | X coordinates (EPSG:25830 and EPSG:23030) | X | those with the tag "PARQUE HISTORICO SINGULAR O FORESTAL" uses EPSG:25830 while the rest uses EPSG:23030. conversion to EPSG:4326 needed |
COORD. Y | Y coordinates (EPSG:25830 and EPSG:23030) | Y | those with the tag "PARQUE HISTORICO SINGULAR O FORESTAL" uses EPSG:25830 while the rest uses EPSG:23030. conversion to EPSG:4326 needed |
To all the nodes, we will add the following tags:
- amenity=drinking_water
- source:date=2016-06-29
- source=Ayuntamiento de Madrid
- operator=Ayuntamiento de Madrid
Changeset tags
We will use the following changeset tags:
- comment=#Madrid_drinking_water_import and any notes that the mapper may add.
- source=Ayuntamiento de Madrid
- source:date=2016-06-29
- import=yes
- url=https://wiki.openstreetmap.org/wiki/Madrid_Drinking_Water_Import (this page)
- created_by=JOSM/version (This tag is created automatically by JOSM)
Data Transformation
Data is in csv format. We already:
- Downloaded the data
- Splitted the original csv file into 2, each with its own coordinate system, using grep:
grep "PARQUE HISTORICO" fuentes.csv > fuentes-25830.csv ; grep -v "PARQUE HISTORICO" fuentes.csv > fuentes-23030.csv
- Reprojected from the 2 different coordinate systems (EPSG:25830 and EPSG:23030) to EPSG:4326 using ogr2ogr:
ogr2ogr -s_srs "+init=epsg:23030 +nadgrids=./peninsula.gsb +wktext" -t_srs EPSG:4326 destino.shp origen.shp
- Converted to OSM format with the opendata plugin in JOSM
- Removed all the original tags, added the OSM tags (see Madrid_Drinking_Water_Import#Tagging Plans) using JOSM
- Generated a polygon from the nodes using the convex hull tool in QGIS and used it to generate the working area on the TM
- Filtered the nodes to import that are already near a mapped node in OSM using the osmsplitdup script
- Manually confirmed that all nodes marked as duplicated nodes are indeed duplicated using JOSM with the ToDo plugin.
- Splitted the file with the possibly non-duplicated nodes to have one file for each task on the TM using the osmboxes script
Here is the code used to download, split and reproject the data:
# Descarga el archivo csv con los datos de las fuentes de agua potable
wget http://datos.madrid.es/egob/catalogo/300051-0-fuentes.csv -O "Inventario Fuentes.csv"
# Separa las fuentes según el sistema de coordenadas usado:
echo "DISTRITO;CODIGO;ZONA VERDE/VIA PUBLICA;name;COMPLEMENTO DE DIRECCION;x;y" > fuentes-25830.csv
cp fuentes-25830.csv fuentes-23030.csv
grep "PARQUE HISTORICO" "Inventario Fuentes.csv" >> fuentes-25830.csv
grep -v "PARQUE HISTORICO" "Inventario Fuentes.csv" | grep -v "DIRECCI" >> fuentes-23030.csv
# Convierte la codificación de caracteres a UTF-8:
iconv -f ISO-8859-15 -t UTF-8 fuentes-25830.csv > f ; mv f fuentes-25830.csv
iconv -f ISO-8859-15 -t UTF-8 fuentes-23030.csv > f ; mv f fuentes-23030.csv
# Descarga y descomprime la rejilla para cambio de Datum entre ED50 y ETRS89 para la península del Instituto Geográfico Nacional:
wget https://www.ign.es/ign/resources/herramientas/PENR2009.zip
unzip PENR2009.zip
# Prepara la reproyección de EPSG:25830 a EPSG:4326:
echo '<OGRVRTDataSource>
<OGRVRTLayer name="fuentes-25830">
<SrcDataSource>fuentes-25830.csv</SrcDataSource>
<GeometryType>wkbPoint</GeometryType>
<LayerSRS>+init=epsg:25830 +wktext</LayerSRS>
<GeometryField encoding="PointFromColumns" x="x" y="y"/>
<Field name="name" src="name" />
</OGRVRTLayer>
</OGRVRTDataSource>' > fuentes-25830.vrt
# Prepara la reproyección de EPSG:23030 a EPSG:4326 con la rejilla:
echo '<OGRVRTDataSource>
<OGRVRTLayer name="fuentes-23030">
<SrcDataSource>fuentes-23030.csv</SrcDataSource>
<GeometryType>wkbPoint</GeometryType>
<LayerSRS>+init=epsg:23030 +nadgrids=./PENR2009.gsb +wktext</LayerSRS>
<GeometryField encoding="PointFromColumns" x="x" y="y"/>
<Field name="name" src="name" />
</OGRVRTLayer>
</OGRVRTDataSource>' > fuentes-23030.vrt
# Reproyecta usando ogr2ogr
ogr2ogr -lco GEOMETRY=AS_XY -overwrite -f CSV -t_srs EPSG:4326 fuentes-23030-reproyectado.csv fuentes-23030.vrt
ogr2ogr -lco GEOMETRY=AS_XY -overwrite -f CSV -t_srs EPSG:4326 fuentes-25830-reproyectado.csv fuentes-25830.vrt
cp fuentes-23030-reproyectado.csv fuentes.csv
grep -v 'X,Y,name,' fuentes-25830-reproyectado.csv >> fuentes.csv
echo '<OGRVRTDataSource>
<OGRVRTLayer name="Fuentes de Madrid">
<SrcDataSource relativeToVRT="1">.</SrcDataSource>
<SrcLayer>fuentes</SrcLayer>
<GeometryType>wkbPoint</GeometryType>
<LayerSRS>WGS84</LayerSRS>
<GeometryField encoding="PointFromColumns" x="x" y="y"/>
</OGRVRTLayer>
</OGRVRTDataSource>' > fuentes.vrt
ogr2ogr -f "ESRI Shapefile" . fuentes.csv && ogr2ogr -f "ESRI Shapefile" . fuentes.vrt
Data Import Workflow
Team Approach
Import will be undertaken by experienced OSM volunteers, following a strict workflow.
References
The import will be discussed in the Talk-Es list and in the Imports list.
Workflow
As most (if not all) of the volunteers will be Spanish speaking, they can follow the detailed workflow in Spanish language on the TM.
The workflow will be as follows:
- 1. Install the TODO list JOSM plugin if the volunteer don't have it.
- 2. Create, if we don't have one yet, an import specific user account, like username_madrid_drinking_water_import (you will need a different email account from the account you used for your regular OSM account). Change the OSM username to that specific account in the JOSM preferences.
- 3. Configure JOSM to allow remote access.
- 4. Login into the Tasks Manager.
- 5 Select one square to work on.
- 6 Open the OSM data and the drinking_water data by clicking on the JOSM button and the link on the extra instructions.
- 7 Select all nodes on the drinking_water layer and add them to the ToDo list.For each node we first check its correctness, correcting any errors or typos we may still encounter. If the node is clearly wrong or suspicious of being wrong, it won't be imported in the first place, and it will be added to the comments, so it can be checked afterwards by other mappers to take a decision about it. The mapper will delete that node to be sure it won't be imported.
- 6. If the fountain is already on OSM, we'll not import it. We already manually reviewed and filtered 467 duplicates.
- 7. Once we have finished with all the nodes, we proceed to upload the new data to OSM with our specific import OSM account, using the Madrid_Drinking_Water_Import#Changeset_Tags already explained.
Reverse plan
In case of any trouble, JOSM reverter will be used.
After the import
Updates
Each 3 months I'll download the dataset, do a diff and update the data to OSM manually. Until I get bored. Kresp0 (talk) 07:53, 24 July 2016 (UTC)
Report with problems and errors
I made a report with all the problems and errors detected and sent to the Madrid city council Open Data administrators. Kresp0 (talk) 09:20, 28 September 2016 (UTC)
Read the full report in spanish: Problemas y errores en el conjunto de datos "Fuentes de agua potable" del Ayto. de Madrid
It includes:
- Introduction: explaining the import to OSM
- Coordinate systems: 2 of them mixed with no documentation
- Incorrect coordinates
- Drinking water tagged as non potable in OSM (drinking_water=no) and ornamental only (amenity=fountain)
- Nonexistent fountain
- Drinking water points mapped on OSM but missing on the city council dataset
- License: would be nice to have a signed letter with explicit permission
- Conclusion: thanks to the Transparency Project (city council) and to OSM volunteers. Some ideas to use OSM in the city council.