Import/Catalogue/Australian Bureau Of Meteorology/Import
In case upon reading this you have a feeling you have seen it all somewhere else, the bulk of this page was taken from correspondence on the talk-au mailing list.
Background
Back on 21st May, 2010 user John Smith (a.k.a. User:Delta foxtrot2) imported a raft of weather monitoring stations with details derived from the NOAA in changeset 4762601.
Subsequently several mappers noticed the location coordinates of many sites described by that data appear to have been randomly offset from their true location. There appears to be little in common with these offsets other than the fairly frequent occurrence of the true latitude or longitude being apparently truncated to two decimal places.
Proposal
Upon consultation with John, between us we located apparently far more reliable positional data for a near-superset of the sites described sourced by the Australian Bureau of Meteorology. At his prompting I contacted BoM personnel regarding copyright issues, and subsequently received this response from one Ian Muirhead:
> > Thanks for the email asking permission to generate maps for public > > access. It is acceptable to display this information on a map provided > > that you abide by the conditions specified on the Bureau's copyright page: > > http://www.bom.gov.au/other/copyright.shtml > > > > There are a few caveats associated with these station locations > > which you should be aware of before using them. For example, station moves > > over history are not displayed, there is no listing of the reference > > datum associated with geographic location, and the 4 decimal point > > precision does not necessary imply an equivalent certainty in the > > metadata. Following your request, we are going to include this > > information in the data file, but I will email you a copy once it has > > been produced (hopefully within the next couple of weeks).
With some clarification and refinement from the talk-au list (see below), I take this to mean the Bureau grants permission to use data not directly related to their core affairs (i.e. numerical and interpretive meteorological reporting) such as ftp://ftp.bom.gov.au/anon2/home/ncc/metadata/sitelists/stations.zip, provided that:
- Appropriate attribution is given the Australian Bureau of Meteorology,
- Stating that permission has been granted to use said data, and
- A reference is included to the Bureau's copyright page: http://www.bom.gov.au/other/copyright.shtml.
Possible Automation
Whilst awaiting the Bureaus response to my request for permission to use their data, I created an early form of the script reproduced below, with a view to providing some degree of automation. The key opportunity for initially merging nodes already entered into OpenStreetMap revolved around the existence within many of the records of the "wmo:id" tag. Of course there was a catch: elsewhere in the Bureau website lie buried several warnings similar to:
Some stations also possess a World Meteorological Organization (WMO) station number. The WMO number is different to the Bureau of Meteorology number. It also uniquely specifies a station at any given time but can be reassigned to another station if the new station takes priority in the global reporting network. Only selected stations will have a WMO number. Significant stations may maintain their WMO number for many decades.
- This example taken from "Notes on these metadata - Station Number", bottom of fourth page.
This means that matching records based on WMO station identifier alone may not lead to expected results.
Testing The Water
Upon request from the talk-au list I prepared a sample bag of changes, including at least one example from each category I intended to handle.
Feedback and Recommendations
Having a few other eyes view the samples described below certainly helped:
- Change/improve attribution style.
- The former "source:note" attribution is now performed by a pair of tags; a descriptive "attribution" tag (varies slightly according to merge case), and a fixed "attribution:url" tag, which always directs to the Bureau of Meteorology copyright web page.
- Make use of bounding boxes.
- The XAPI extract of potentially matching pre-existing OSM monitoring_station nodes is now performed within a bounding box determined dynamically bases upon the geographical spread of the stations listed by the Bureau. This mainly entailed rearranging the former internal order of operations within the script and adding a scan/sort/extract extrema on each of the latitude and longitude columns of the Bureau's station list (which covers both continental Australia, Antarctica, and neighbouring island chains as far north as e.g.the Marshall Islands.)
- Exclude obsolete stations.
- One of the sample sites I chose for demonstration turned out to have closed in 1973. The Bureau data showed this, but the original script had omitted to check for this. It does now.
- Add tagging to each station to:
- tag instruments present at each site
- Having already investigated this I consider non-copyright-invasive methods of performing this update automatically not reliable, with the sole exception of whether a given site possesses a barometer (Script modified to insert a "weather:barometer=yes" tag whenever the Bureau station list includes a "Barometer Height" entry, and there is no former "weather:barometer" tag present.)
- warn mappers they need to add details of site instrumentation.
- This suggestion rejected on basis OpenStreetMap culture encourages mappers to update details based upon observation in any case. Why preach to the converted?
- tag instruments present at each site
- Concern that "<site-name>" and "<site-name> AWS" entries may be duplicates.
- Short answer is some are definitely not, and some may in fact be so. In view of the "fixme=not_reviewed" regimen established in the original NOAA load, I feel the inclusion worthwhile, even if subsequent observations result in one entry being deleted again later.
The Helper Script
This script was used to assist in the classifying and merging of the Bureau station list with data already present in OpenStreetMap. In so far as possible, it produces output acceptable for import into JOSM (which is used as the actual conduit for importation of the results). It also attempts a cruder pre-formatting of records considered "too unusual" to be automatically handled, with a view to assisting manual importation at a later date.
The script was written with this requirement foremost: it does not perform any update automatically whatsoever; but tries to indicate degrees of safety to do so with various categories of import.
It is structured to return several XML-format (well, good enough for JOSM to accept anyway!) output files, each containing categories of merged or formatted data of varying degrees of confidence in the results of the particular merge method attempted. Any given BoM station may appear in one place after splitting; but there is a catch-all (case 6, below) which may associate multiple OSM nodes with a potential matching BoM station.
There are five cases fully handled, and of course the sixth alluded to above which requires human inspection and assembly if possible. These are:
- Safest, most reliable. BoM station carries WMO id and name both matching OSM node, which has been untouched since the original NOAA bulk load. Example: Ballina Airport AWS.
- Less reliable. Similar to case 1, but names do not match. Example: Evans Head Bombing Range.
- Fair reliability. Lack of matching WMO id in OSM suggests completely new station, sources solely from BoM data. Example: Beaudesert Drumley Street.
- Care required. Similar to case 1, however last update to node is from a different changeset to the NOAA load. Example: Hillston Airport.
- Weak. Similar to case 3, but missing WMO information denies ability to claim this entry is uniquely new. Example: Alstonville Post Office.
- All bets are off. All stations remaining after the other cases have been eliminated. However, fairly simple manual inspection reveals a valid-looking example: Norah Head Lighthouse.
To anticipate complaint, Yes, I could have written the script in Perl or something else. But I didn't. It works. It is not wrong, it is just differently right. I could have done it in TCL or CLIST just to be a complete pain, so be nice.
Progress
Planned Implementation Date
In the absence of further objection I intended to proceed with this scheme on Monday, 21st June, 2010.
Last-Minute Glitches
As you may deduce the import did not go quite as smoothly as I might have hoped! Nothing really more awkward than:
- JOSM learning curve - multiple new node insertions have to be performed in "upload each object individually" mode.
- The BoM uses the quaint colonial throwback convention of allocating locations East of longitude 180°, but West of the International Date Line, a longitude value greater than 180 (e.g. Papeete: S17.5333° E210.4000°). XAPI import responds to such insertion attempts with an enigmatic "The node is outside this world" error.
- I had to be lucky enough to have somebody correct a station name between the time of OSM extraction and return of updated data.
Log of Changeset(s)
Case 1 Changeset 5029673 Updated 582 nodes. Case 2 Changeset 5029686 Updated 185 nodes. Case 3 Changeset 5029763 Inserted 31/107 nodes. Changeset 5029888 Inserted 8/107 nodes. Changeset 5029953 Inserted 18/107 nodes. Changeset 5030066 Inserted 1/107 nodes. Changeset 5030094 Inserted 49/107 nodes. Case 4 Changeset 5030124 Updated 42 nodes. Case 5 Changeset 5030372 Inserted 7,160 nodes. Case 6 Changeset 5032184 Inserted 1 node, and updated 13 nodes.