Proposal:Multivalued Keys
Multivalued Keys | |
---|---|
Proposal status: | Draft (under way) |
Proposed by: | csmale |
Draft started: | 2016-01-27 |
Proposal
There appears to be support for defining a standardised way of supporting multi-valued keys in OSM. There are frequent discussions about the so-called semicolon syntax, and possible alternatives, which always seem to get stuck in the mud.
Please keep this debate objective. It's about improving the whole of OSM, not just fixing your personal short-term challenge. This page is not about whether or not we need multi-value keys, it is about how to reflect real-world constructions in OSM. Anyone who wants to dispute whether it is worth having this discussion is kindly and respectfully requested to take that discussion to a different forum - not on this page please!
Ordering
In this proposal, I would suggest we assume that the lists are unordered. The syntax for representing a list of values confers no special status on the members of the list depending on their position. Of course in the real world, one may suggest that one of the multiple values is "leading" or "primary", but that will vary according to the particular attribute under discussion.
Missing Values
Is there a requirement for the list-of-values to accommodate missing/empty values? Is it conceptually possible to have a situation where value A and value C exists, and we need to indicate that value B "goes here" but doesn't exist right now although it might exist in the future?
If the list is intrinsically unordered, we can reorder the elements to put the "missing" elements at the end of the list anyway. So my current position is that it is not necessary to accommodate missing values explicitly.
Necessity of multivalued keys and tradeoffs
In many cases selecting single value is impossible/extremely hard for machine and mapper is supposed to select one value - for example for http://www.openstreetmap.org/way/297445670#map=18/49.41306/21.98244 surface=compacted is far better than multivalue surface=asphalt;earth;sand;gravel;grass - that would about as hard to process as descriptive surface=very_old_asphalt_mostly_completely_crushed_with_grass_and_earth_peaking_through_with_some_potholes_filled_with_earth.
In some cases like ref=* values are explicitly multivalued and there is no reasonable method to avoid multivalued keys.
There are also more complicated cases of combined amenities - for example shop selling both bicycles and flowers. It may be represented either as two nodes (one with shop=bicycle, second with shop=florist) or as one with multivalued key (like shop=florist;bicycle). Both approaches have serious negative effects and both have their own supporters.
Rationale
OSM uses attributes as key-value pairs, and that is unlikely to change. Broadly speaking it is possible to address this issue in two domains: the Value domain and the Key domain. In this diagram I have attempted to illustrate the approaches that are currently in use and/or under discussion.
Subscript Syntax
This type of syntax will be instantly recognisable to many people. The instance of a key is differentiated from another instance on the same object by the "subscript", which indicates its position within the list of values. In order to be able to locate the subscript within the key, it is written between pairs of characters, usually some kind of bracket like [] or ().
Less likely to be clear for people unfamiliar with programming.
E.g. shop[1]=supermarket, shop[2]=bakery
Suffix Syntax
Subject of some recent discussion, this model appends a number to the end of the key to indicate the specific instance.
e.g. shop_1=supermarket, shop_2=bakery
Hierarchy Syntax
This builds on the current usage of the colon (":") as a delimiter between parts of a key. Sometimes this is called "namespace" usage, whereby a:name refers to a fundamentally different object to b:name, although they are both "names".
E.g. shop:1=supermarket, shop:2=bakery
The most comprehensive example of the use of the colon as a hierarchical delimiter can be found in the seamark tagging.
Note that this is not the same as the "suffix syntax" only with a different character. The delimiter here (currently a colon, but it could be a different character) separates parts of the key (and the list index may not be at the end of the key!), whereas in the "suffix syntax" a more simplistic approach is taken by simply extracting the numeric characters at the end of the key name.
Delimiter Syntax
The only current possibility in the Value domain. Currently this style is in use with a semicolon as a delimiter. Often the subject of heated debate, there are a number of limitations with this approach.
- It is not apparent from the key that there are multiple values
- The entire value string is limited to 255 characters, which may prove restrictive
- Special treatment is required when the delimiter character is contained within the data
E.g. ref=A4;7
A variation is used in the lanes tag family, where a second delimiter ("|") is used in conjunction with the semicolon to provide a sort of two-dimensional array, i.e. a list of lists. The semantics are a bit more complex than that, as the concepts of "missing values" and "ordering" are significant in the context of lane information.
Considerations
Human comprehensibility
It should be clear to a human that there is a list of values, and what the individual values are.
Machine parseability
It must be possible for a machine (i.e. a computer program) to parse the syntax unambiguously
Extensibility to handle data structures (grouped attributes)
In the future it might be nice to allow data structures to be defined, in which related attributes are grouped together in a form suitable for reuse in multiple contexts. An example is an address, which can be defined as being composed of various fields (house number, street etc). An object may have multiple addresses - postal address, visiting address, head office address etc. The syntax for multiple values should allow for extension in this direction, so that it is syntactically possible to have a list of addresses, for example, or for a data structure to contain a multi-valued item.
Ease of implementation in data producer programs (e.g. editors)
Ease of interoperability between single and multiple values
It should be easy for data consumers to accommodate both single and multiple values for a given attribute. Where a key has a single value in 99% of the cases it may be thought unreasonable to impose the overhead of the multiple-value syntax on every occurrence.
Examples
Please add some real-world examples of things that would benefit from multi-valued keys
To start the ball rolling, I would like to suggest the shop key. The boundaries between the different types of shop are not a clear as they once were, and shops frequently sell goods from multiple categories. It is currently rather subjective, or at least open to debate, whether "shop X" should be tagged as shop=A or shop=B because they are both at the same time. We should embrace this by providing a framework for this hypothetical shop to peacefully exist in both categories at once.
The following tags have been nominated:
- ref on roads (; as delimiter is widely used and supported by at least some data consumers like OSM Carto and MapQuest Open)
- shop
- cuisine
- destination
- amenity
- constituent parts of a long text (e.g. an inscription on a monument)
Tagging
Applies to
Rendering
Features/Pages affected
Comments
Please comment on the discussion page.