OpenData Puglia Import
OpenData Puglia Import is an import of different CSV datasets produced by Apulia region (regione Puglia) in Italy covering places of cultural and turistic interest. The import is currently (Feb 12, 2018) just planned and not executed yet.
The import task has been discussed in talk-it@openstreetmap.org list (nov 2017 - dec 2017).
Goals
The goal of this project is to import interesting data in OSM. The data only relate to Apulia region in Italy. Datasets contain data from two important website: Apulia Digital Libray (DL) and viaggiareinpuglia.it (VIP) and they mainly represent churches, manor farm, tourist attractions, paintings, etc.
Schedule
Jan 2018 - Ongoing
The import will be performed by a dedicate account User: InnoPuglia_Import
Import Data
Background
Data source site: Dataset
Data license: CC0 v1.0
Type of license (if applicable): CC0 v1.0.
ODbL Compliance verified: -
OSM Data Files
When conversion for a dataset is done, the list will be updated.
Dataset 8: masserie_clean_csv2osm.osm
Import Type
Bulk import of missing data on OSM. Data redundancy is performed with JOSM.
Data Preparation
Data Reduction & Simplification
After a manual dataset clean up from incomplete and wrong data, the result will be processed by a python script.
This Script will extract all data and process the dataset row by row .
Not all collumns will be extracted but only relevant ones containing data, such as name, description, category, Lat, Lon and website (representing a reference to the original DL or VIP Web portal).
Tagging Plans
This section locatea subset of data for each dataset that are processed for osm import.
Dataset 1: Luoghi di interesse turistico, culturale, naturalistico
Common Tags are:
Headers in dataset -----> OSM matching key(s)
nomeAttrattore --> name
risorsaTerritoriale --> *
latitudine --> latitude
longitudine -->longitude
sitoWeb --> website
Dataset 2: Uffici Informazione e Accoglienza Turistica
Common Tags are:
Headers in dataset -----> OSM matching key(s)
nome --> name
latitudine --> latitude
longitudine -->longitude
sitoWeb --> website
email --> email
Additional tags: information = office, tourism = information
Dataset 3: Strutture Ricettive 2016
Common Tags are:
Headers in dataset -----> OSM matching key(s)
denominazione --> name
tipologia --> *
latitudine --> latitude
longitudine -->longitude
sitoWeb --> website
email --> email
Dataset 4 :Digital Library - Collezione "Opuscoli Biblioteca comunale Barletta"
Common Tags are:
Headers in dataset -----> OSM matching key(s)
Titolo del bene rappresentato --> name
Categoria --> *
Latitudine --> latitude
Longitudine -->longitude
Scheda Puglia Digital Library --> website
Dataset 5:Collezione "Cinema 150 anni"
Common Tags are:
Headers in dataset -----> OSM matching key(s)
Titolo del bene rappresentato --> name
Categoria_1--> *
Categoria_2--> *
Latitudine --> latitude
Longitudine -->longitude
Scheda Puglia Digital Library --> website
Dataset 6:Digital Library - Collezione "Fondo manoscritti biblioteca comunale Barletta"
Common Tags are:
Headers in dataset -----> OSM matching key(s)
Titolo del bene rappresentato --> name
Categoria_1--> *
Categoria_2--> *
Latitudine --> latitude
Longitudine -->longitude
Scheda Puglia Digital Library --> website
Dataset 7: Digital Library - Collezione "Habitus percorsi tra costume e architettura"
Common Tags are:
Headers in dataset -----> OSM matching key(s)
Titolo del bene rappresentato --> name
Categoria_1--> *
Categoria_2--> *
Latitudine --> latitude
Longitudine -->longitude
Scheda Puglia Digital Library --> website
Dataset 8:Digital Library - Collezione "Masserie di Puglia"
Common Tags are:
Headers in dataset -----> OSM matching key(s)
Titolo del bene rappresentato --> name
Categoria_1--> *
Categoria_2--> *
Latitudine --> latitude
Longitudine -->longitude
Scheda Puglia Digital Library --> website
. . .
More dataset coming soon (when avaible)
(*) This is a column which cannot be uniquely assigned to a key in OSM. For every value an appropriate tag is found by using a dictionary (I'll explain better in Data Trasformation section)
Changeset Tags
source = Apulia Open Data
source:website = http://www.dataset.puglia.it/
source:date = *
comment = Semi-automatic import of different points of interest** related to Apulia region, Italy
website: https://wiki.openstreetmap.org/wiki/OpenData_Puglia_Import
(*)Dataset data is considered + last update
(**) this is replaced with main content about the dataset.
Data Transformation
A python script will proccess the dataset ( a csv refined using open refine) and export it in several output, such as OSM XML or a CSV well formatted for csv2osm script.
Some example of data trasformation on risorsaTerritoriale\Categoria*\tipologia column
Masserie ---> place = hamlet
Torri ----> man_made = tower
Chiese e cattedrali ----> building = church, building = cathedral
musei ---> tourism=museum
Basiliche e santuari ---> amenity=place_of_worship
Bed & breakfast ---> guest_house=bed_and_breakfast
Affittacamere ---> guest_house=bed_and_breakfast
Case e appartamenti vacanza ---> tourism=apartment
Alloggi agrituristici ---> guest_house=agritourism
Alberghi ---> tourism = hotel
Campeggi ---> tourism = camp site
Case per ferie ---> tourism=apartment
Residenze tur. alberghiere ---> tourism = hotel
Villaggi turistici -- > place = village, tourism=*
We refer to declared dataset 8 - Digital Library - Collezione "Masserie di Puglia"
This dataset contain a collection of pictures related to different physical point/attraction.
After a (Open) refine we get a "clean" dataset as this one: Dataset_masserie_clean.csv
We can note that the refined dataset contain just the physical point "masseria - name of the attraction" but at same time keep the references to all the "wiki-sheets" related to it ( html links separated by -|-)
At this point we need to transform some data in the dataset in order that these info match osm tag.
For this purpose a custom script is used: jcsv2osm*
The output produced are: a csv ready to be processed with csv2osm or direcly an osm file.
(*)About the use of the script we invite you to read the readme.md in github repo.
Data Transformation Results
csv ready to be processed with csv2osm: Dataset-masserie_csv2osm.csv
that converted with csv2osm produce this osm file: masserie_clean_csv2osm.osm
or get directly an osm file: masserie_clean.csv.osm
Note, csv2osm in this case have been customized: csv2osm_custom
Data Merge Workflow
Team Approach
It is done solo
Workflow
* Clean and convert datasets in osm file as described before.
* Use Josm for merge POI (conflation)
* Update the wikipage with new dataset and planning tags and about the progress of old uploads.
* Inform the comunitythat the upload is done.
Conflation
This step is done with JOSM conflation plugin. If the data already exist, only information about website are considered.