GNS Import for Yemen and Others
This wiki page explaind importing 'GEOnet Names Server (GNS)' data by user حبيشان in many countries including: Yemen, Saudi Arabia, Chad, Djibouti,Sudan,Mauritania.
Goals
The main Goals of importing 'GEOnet Names Server (GNS)' is:
- Adding new names.
- Correcting previous importing (in case of Yemen, Saudi Arabia).
Import Data
Background
Data source site: https://geonames.nga.mil/gns/html/
Data license: Public
as in https://geonames.nga.mil/gns/html/: There are no licensing requirements or restrictions in place for the use of the GNS data.
ODbL Compliance verified: yes
OSM Data Files
The data files of imported countries are located in https://geonames.nga.mil/gns/html/namefiles.html compressed country file contains some csv data files. The files structure explained in https://geonames.nga.mil/gns/html/gis_countryfiles.html.
Import Type
The Data imported by auto modifying an JOSM file and then open it in JOSM then uploading it via JOSM.
Data Preparation
Data Reduction & Simplification
- First I prepared as local PostgreSQL server with PostGIS, fuzzystrmatch and hstore extensions.
- I downloaded OSM country data from http://download.geofabrik.de.
- I downloaded GNS country data from https://geonames.nga.mil/gns/html/namefiles.html.
- I imported OSM data to local postgreSQL database in a separate table.
- I imported GNS data to separate table in the local database.
- Because GNS data make a separate record for every name. And many features have more than one name so I splitted GNS table to tow tables one for the features (I called 'gns_f') and another for the names (I called 'gns_n'). the feature in GNS data identified by 'UFI' column and the name identified by 'UNI' column.
- I created another table for selected names (I called names) for the feature, in this table I used the OSM name tags ('name','name:ar','name:en','alt_name'...) to identify names and prepare to import it to OSM.
- All names in GNS data are imported to 'names' table I put the most ranked name (by column 'NAME_RANK' in GNS data) in 'name' tag and the others in 'alt_name' tag separated by semicolon.
- If the name is Arabic and written in Latin script I try with simple function to rewrite it with Arabic script if the function returned a full Arabic name I put in main 'name' tag in the Arabic countries.
- After that I try to Link present OSM data with GNS Data to avoid duplication of features.
- In first stage I try to link by location and name. The linked features does not go to the next stage.
- In next stages I try to link location +-(0.0001) degree in latitude and longitude and the name. and so on to +-(0.0004)...
- I use fuzzystrmatch for matching names in some cases of linking.
- After that I create a modify JOSM xml file for the present OSM data with linking with GNS id and adding new alt_names and I uploaded the modified elements.
- After I created a new JSOM xml file with the new elements are not present in current OSM Data and I uploaded it.
Tagging Plans
Feature Tags
These are main Feature Designation Codes 'dsg' and Equivalent tags I used in importing:
GNS DSG code | GNS DSG name | OSM tags | Notes |
---|---|---|---|
ADM1 | first-order administrative division | place=state
or other debends of the wiki page of the county |
most of this feature are present in OSM data before importing |
ADM2 | second-order administrative division | place=district in Yemen | not imported in other countries |
AMD3 | third-order administrative division | place=locality in Yemen | Not imported in other countries |
ADMF | administrative facility | office=government
if English name contains "ministry" then government=ministry |
Only imported in Yemen |
AIRF, AIRB | airfield, airbase | aeroway=airfield
military=airfield |
|
AIRP | airport | aeroway = aerodrome | |
AIRQ | abandoned airfield | aeroway = airfield
military = airfield disused = yes |
|
AREA | area | place=locality | |
BAY | bay | natural = bay | |
BCH | beach | natural = beach | |
BUTE | butte(s) | natural = peak | |
CAPE | cape | natural = cape | |
CAVE | cave(s) | natural=cave_entrance | |
CLF | cliff(s) | natural=cliff | |
CMP | camp(s) | place=camp | |
CMTY | cemetery | landuse = cemetery | |
CTHSE | courthouse | amenity=courthouse | |
DAM | Dam | waterway = dam | |
DPR | depression(s) | place = locality | |
DUNE | dune(s) | natural = dune | |
FRM | farm | place = farm | |
GAP | gap | natural = saddle | |
GATE | gate | historic = city_gate | |
GRGE | gorge(s) | natural = gorge | |
HDLD | headland | natural = peninsula | |
HLL | hill | natural = peak | |
HLLS | hills | natural = peak | |
HSE | house(s) | building = house | |
HSP | hospital | amenity= hospital
healthcare= hospital |
|
HSPC | clinic | amenity = clinic
healthcare = clinic |
|
HSTS | historical site | If name contains "hisn"
historic = castle Else historic = archaeological_site |
|
HTL | hotel | tourism = hotel | |
INLT | inlet | natural = bay | |
ISL | island | place=island | |
ISLS | islands | place = archipelago | |
LCTY | locality | place = locality | |
LK | lake |
natural=water + water=lake
|
|
LKI | intermittent lake(s) |
natural=water + water=lake + intermittent=yes
|
|
LTHSE | lighthouse | man_made = lighthouse | |
MKT | market | amenity = marketplace | |
MRSH | marsh(es) | natural=wetland + wetland=marsh
|
|
MSQE | mosque | amenity = place_of_worship
religion = muslim |
|
MT | mountain | natural = peak | |
MTS | mountains | natural = mountain_range | |
MUS | museum | tourism = museum | |
OAS | oasis(-es) | place = hamlet | |
PASS | pass | mountain_pass = yes
|
|
PEN | peninsula | natural = peninsula | |
PK | peak | natural = peak | |
PLAT,PLN | plateau,plain(s) | place = locality | |
PND | pool(s) | natural=water + water=pond
|
|
PNDI | pool(s) | natural=water + water=pond + intermittent=yes
|
|
POOL | pool(s) | natural=water + water=pond
|
|
PP | police post | amenity = police | |
PPL | populated place | place = hamlet | In Yemen some places I have the population of it so it classified as in Yemen Wiki page
In other countries I do not have population data as I suppose that the cites and towns and villages are present before in OSM so I tag it as hamlets |
PPLF | farm village | place = hamlet | |
PPLA | captial of a first-order administrative division | place = town | I do not tag it as city because I suppose that all cities is present before in OSM |
PPLA2 | captial of a first-order administrative division | place = town | |
PPLL | populated locality | place = hamlet | |
PPLQ | abandoned populated place | place = locality | |
PPLX | section of populated place | place = suburb | |
PRK | park | leisure = park | |
PROM | promontory(-ies) | natural = peak | |
PRN | prison | amenity = prison | |
PT | point | natural = cape | |
RESF | forest reserve | leisure=nature_reserve
|
|
RF | reef(s) | natural = reef | |
RGN | region | place = region | |
RK,RKS | rock,rocks | natural = rock | |
RSVT | water tank | man_made= storage_tank
content= water |
|
RUIN | ruin(s) | historic = ruins | |
RVN | ravine(s) | natural = gorge | |
SALT | salt area | landuse = salt_pond | |
SAND | sand area | natural = sand | |
SBKH | sabkha(s) | landuse = salt_pond | |
SCH | school | amenity = school | |
SCHC | college | amenity = college | |
SCRP | escarpment | natural = cliff | |
SHOL | shoal(s) | natural = shoal | |
SPNG | spring(s) | natural=spring | |
SPNT | hot spring(s) | natural=hot_spring | |
SQR | square | place=square | |
STDM | stadium | leisure=stadium | |
STM, STMX | stream | waterway=stream
|
|
STMI | intermittent stream | waterway=stream + intermittent=yes
|
|
STRT | strait | natural=strait | |
TMB | tomb(s) | historic=tomb | |
TRB | tribal area | place=locality | |
TRGD | interdune trough(s) | place=locality | |
UPLD | upland | place=locality | |
VAL | valley | natural=valley | |
VLC | volcano | natural=volcano | |
WAD | wadi | natural=valley
fixme="It is better to draw the Wadi as way (Note that the node is in mouth of wadi), copy tags from node, and tag it as waterway=river and intermittent=yes clean natural=valley tag" |
|
WADX | section of wadi | natural=valley
fixme="It is better to draw the Wadi as way (Note that the node is in mouth of wadi), copy tags from node, and tag it as waterway=river and intermittent=yes clean natural=valley tag" |
|
WLL | well | man_made=water_well | |
WLLQ | abandoned well | man_made=water_well
disused=yes |
|
WTLD | wetland | natural=wetland
|
|
WTRH | waterhole(s) | natural=water + water=pond
|
Some tags updated after discuss in imports mailing list.
Referencing Tags
And I added these tags to all imported and linked nodes:
GNS:id
This key is set to 'UFI' column in GNS table. It is important to include this tag for possible future revising and to avoid reimporting the same feature to OSM.
GNS:dsg_code
This key is set to 'DSG' column in GNS csv table. It is important to include this tag for possible future revising. And if there any mistakes in tagging can be corrected from other users easily. 'DSG' column in GNS table contains 2-5 characters abbreviated the feature name so I also include:
GNS:dsg_name
This key is set to the name of the feature. The name is clearer to understand than the 'DSG' code. I got it from https://geonames.nga.mil/gns/html/rest/lookuptables.html#Designation%20Codes
Changeset Tags
Changesets as source is GNS.
Data Transformation
Data Transformation Results
Working with GNS Caveats
What I did with GNS Caveats listed GEOnet_Names_Server:
Once a name is put into the database, it is never removed unless it is an obvious duplicate. This means that there are many, many names that have no modern significance.
GNS data has the columns help to select the best name from names. 'NAME_RANK' and 'NM_MODIFY_DATE' so I select the most ranked and most recent names to 'name' key. all others wree put in 'alt_name'. In case of Yemen I also used 2004 census data published by 'Central Statistical Organisation' cso-yemen.com data to pick the most used names. In Case of Saudi Arabia I used list of towns form 2010 census published in stats.gov.sa to pick the correct names.
The geographical resolution is often very coarse, from experience features can often be two to three kilometres from their actual locations.
This in some little old data. I put it as it for future fixing by another users.
Places where people live are generally just classifies as "PPL", Populated Place. This can be anything from a city to what is now just a farm house.
See "PPL" in tagging table
There is a small but significant amount of entries that are inaccurate or plain wrong:
mistranscriptions, places that are listed as in one country but actually somewhere else completely.
There are tow reasons of this:
- In case of naturals (seas, montains, deserts ...) GNS puts the location in the center of feature. This feature will be listed if all countries that contains a part of it for example "Rub` al Khali Desert" https://www.openstreetmap.org/relation/12037787 is loacted in Saudi Arabia, Yemen, Oman and UAE but the center is in Saudi Arabia.
- In case bounrdies are not accurate.
* If you use not only the approved data, but also spelling variants and so on, you have often more than one node with the same coordinates, containing each of them another spelling variant. You might then want to download a perl file (and edit it to your needs) which the Russian community used to merge the gns info into one node.
How wrote this caveat does not understand the GNS data structure see https://geonames.nga.mil/gns/html/gis_countryfiles.html there are a Identifier for the feature "UFI" and Identifier for the name "UNI" all columns are related to the feature exept (UNI, NT, LC, SHORT_FORM,GENERIC, SORT_NAME_RO, FULL_NAME_RO, FULL_NAME_ND_RO, SORT_NAME_RG, FULL_NAME_RG, FULL_NAME_ND_RG, NAME_RANK, NAME_LINK, TRANSL_CD, NM_MODIFY_DATE) which realted to the name.