Talk:Taginfo/Parsing the Wiki
Please don't use this page to report bugs or other issues with the taginfo software, they will just get lost. Please report them on the bug tracker at https://github.com/taginfo/taginfo/issues .
Wrong analysis
The following are generally NOT errors and you should not instruct users to alter the content of the wiki, when these are just limitations of the TagInfo website, not even justified technically! — Verdy_p (talk) 09:06, 26 November 2016 (UTC)
description parameter should only contain plain text
"The description parameter containing the short description of this key, tag, or relation type should only contain plain text, not wiki syntax. This is important so that taginfo, but also other software outside the wiki, can use this text properly."
- This analysis is COMPLETELY wrong.
- A description DOES need to contain basic markup for various languages, and semantic markup such as "code", "br", "sup", "sub", or sometimes even small images/icons/diagrams.
- Drop this. Taginfo should not have any problem with this markup as the description is really intended to be displayed in HTML (including ion the Wiki pane of Taginfo).
- If you need plain text in some summary table showing only one line, use HTML code filtering (but be aware that this will break descriptions or even some languages: not all text can be encoded in HTML only as plain text.
- Nobody wants to drop this basic markup, except the TagList site itself (even if really does not need this "requirement" for its "wiki" information pane) !!! At least you should allow inline markup (including coloring, bold, italic, sub, sup, external links and wikilinks, line breaks, and some description also need numbered lists and bulleted lists, symbols not encoded in Unicode such as small road signs).
- I've seen people dropping markup on the wiki and then creating meaningless descriptions. — Verdy_p (talk) 04:54, 26 November 2016 (UTC)
- One of the longstanding problems with OSM is that there is no "one" description of tags that everybody can use in their software. Most software dealing with OSM tags has their own description for each tag (and needs also translations for that into every language). This has often been seen as a problem and many people have asked for a single source they can use. The only source for this that makes sense to me is the wiki. But using the descriptions from the wiki is very difficult if we don't constrain the format a bit. First, it is difficult to get this description out, but even if we could, all the markup, links to images, etc. will not work in every context. So it totally makes sense to restrict the description (and we are only talking about the one-sentence description in the infobox) to plain text. I can see no reason why this description should have markup and I see many benefits as described. Taginfo is only the "intermediate" goal here. But if taginfo can parse more of these descriptions, more programs can use them easily through the taginfo API. Again, this concerns only the one-line description in the infobox. For everything more, we have to link to the full text in the wiki anyway. Joto (talk) 10:19, 27 November 2016 (UTC)
- I disagree, basic inline markup is also useful in single line description (and frequently needed for some languages).
- If you just want to format datatables with only plain text (which may become non meaningful as this is destructive), it is very trivial for you to parse inline HTML or wiki markup, not a lot of them are permitted on the wiki (br, b, i, em, var, sub, sup, code, tt, span, all supported on all websites and wikis, and only three Mediawiki markups for italics, bold, and links).
- Notably the italics and code/tt are frequently needed for critical semantic and linguistic distinctions (they are essential in description lines where they should not be deleted), as well as interwal wikilinks or external links with URLs.
- Note also that some languages will need the use of some HTML character entities (such as nbsp, or for facilitating the input or edit) and some <!--comments-->. Here also this is basic HTML markup that no website whould have problem to parse correctly. Only data forms may seem "polluted" if these are not parsed but rendered as is. These markups are safe (no security problem), except possibly external links (you may want to check the URLs an restrict them, or place a warning alert box before going to random external sites, but this wiki has a policy on the usable URLs to avoid spammers that would post polluting links going to rogue sites). — Verdy_p (talk) 16:06, 27 November 2016 (UTC)
has positional parameter
"In general, wiki templates can have positional parameters and named parameters. The description templates only use named parameters. When you see this error, it usually means that the taginfo parser got confused. Try to clean up the template parameters."
- Here also the analysis is almost always broken: you only detect pipe characters within wikilinks present in descriptions or braces used when calling a formatting template (e.g. links to wikipedia or wiktionnary).
- For description fields, keep a large freedom of markup, it has never been meant to be only one-line plaintext, even if it is intended to be a short summary.
- In other words: fix your wiki code parser, don't convince random users to change the wiki and break many contents. — Verdy_p (talk) 05:31, 26 November 2016 (UTC)
invalid lang parameter
"The lang parameter should have the format xx (for example de for the German language) or xx_XX (for example pt_BR for Brazilian Portuguese)."
- Wrong. the format should use hyphens (the OSM and BCP47 standard). BCP47 accepts underscores, but only because of legacy Java locale codes. So "zh-Hans" is the correct and standard form, just like "fr-CA" ! You don't need to force the old broken Java locale codes (still used in its old ResourceLoader) for everyone: even Java now supports the BCP47 standard! Note that on the OSM wiki, all locales codes are using BCP 47 conforming codes (with only "DE,FR,ES,IT,JA,NL,RU" locale codes using uppercase letters, for legacy reasons in these 7 wiki namespaces, and other codes starting by a single uppercase letter, all other letters being lowercase only, including in "De-ch" which is not a wiki namespace even if it is still German) — Verdy_p (talk) 05:02, 26 November 2016 (UTC)
wrong lang format
"The language in the wiki page name should be of the format xx (for instance de for the German language), or xx_XX (for instance pt_BR for Brazilian Portuguese). Capitalization doesn't matter."
- Wrong! the language codes can already have 6 forms on the wiki: ll (e.g. "FR" or "Ca"), or lll (e.g. "Vec"), or ll-cc (e.g. "De-ch" or "Ro-md" or "Pt-br", not recommended and in fact deprecated), or lll-cc (e.g. "Tzm-ma", not recommanded too), or ll-ssss (e.g. "Zh-hans"), or lll-ssss (e.g. "Shi-latn").
- More details in Template:Langcode that parses language codes currently admitted in wiki page names.
- BCP47 (and OSM data as well) allows for longer codes, but they are still not used for naming translated wiki pages; however all valid BCP47 standard codes are accepted in various language parameter values to be used in pages that are partially multilingual in their listed examples or citations).
- However legacy non-standard language codes still used by Wikipedia are not accepted in OSM data and the wiki (such as "roa-rup"; or "nrm" which is completely wrong and conflicting in Wikipedia and Wikidata where it should be "nrf"; or "en-simple", or "de-formal" which are also invalid; as well "sr-ec" and "sr-el" still used in Wikipedia are conforming syntaxically, but completely wrong semantically as they should be "sr-cyrl" and "sr-latn"). Note that "zh-yue" is both conforming and valid in BCP47, but deprecated and should be replaced by the preferred value "yue"; and "zh-classical" is both invalid and non-conforming and MUST be replaced by "lzh".
- Wikipedia also supports a single "zh" language code for naming its wiki (merging "zh-hans" and "zh-hant" into a single Wikipedia edition), but only because it locally supports an automatic Hans/Hant transliterator, rarely supported elsewhere in applications and not supported on the OSM wiki; so "zh-hans" and "zh-hant" are distinguished on the OSM wiki and in OSM data. — Verdy_p (talk) 05:19, 26 November 2016 (UTC)
Tracking the count of errors
Over the past few days, I've tried to address a bunch of these, and I've got it down just over 500 as of today. I thought it'd be interesting to keep a running total here. So I'll do so, below. JesseFW (talk) 17:08, 14 May 2023 (UTC)
2023-05-14
137 wrong lang format 128 has positional parameter 67 slash in key 65 parsing failed 36 slash in value 32 description parameter should only contain plain text 31 invalid lang parameter 14 value in key page 9 no value for tag page 6 invalid osmcarto-rendering parameter 1 lang is en ---- 112 en 526 total
2023-06-15
Same.
2023-05-16
137 wrong lang format 128 has positional parameter 67 slash in key 59 parsing failed 36 slash in value 34 description parameter should only contain plain text 31 invalid lang parameter 14 value in key page 9 no value for tag page 6 invalid osmcarto-rendering parameter 1 invalid image parameter 1 lang is en ---- 115 en 523 total
2023-05-17
137 wrong lang format 124 has positional parameter 67 slash in key 42 parsing failed 37 slash in value 34 description parameter should only contain plain text 29 invalid lang parameter 14 value in key page 9 no value for tag page 6 invalid osmcarto-rendering parameter 1 invalid image parameter 1 lang is en ---- 113 en 501 total
2023-05-18
137 wrong lang format 124 has positional parameter 67 slash in key 39 parsing failed 37 slash in value 34 description parameter should only contain plain text 29 invalid lang parameter 14 value in key page 9 no value for tag page 6 invalid osmcarto-rendering parameter 1 lang is en ---- 112 en 497 total
2023-05-19
137 wrong lang format 124 has positional parameter 67 slash in key 40 parsing failed 37 slash in value 34 description parameter should only contain plain text 29 invalid lang parameter 14 value in key page 9 no value for tag page 6 invalid osmcarto-rendering parameter 1 lang is en ---- 112 en 498 total
2023-05-20
137 wrong lang format 124 has positional parameter 67 slash in key 37 slash in value 36 parsing failed 34 description parameter should only contain plain text 29 invalid lang parameter 14 value in key page 9 no value for tag page 6 invalid osmcarto-rendering parameter 1 lang is en ---- 110 en 494 total
2023-05-21
137 wrong lang format 122 has positional parameter 67 slash in key 37 slash in value 34 description parameter should only contain plain text 32 parsing failed 29 invalid lang parameter 14 value in key page 9 no value for tag page 3 invalid osmcarto-rendering parameter 1 lang is en ---- 103 en 485 total
2023-05-22
137 wrong lang format 122 has positional parameter 67 slash in key 37 slash in value 34 description parameter should only contain plain text 33 parsing failed 29 invalid lang parameter 14 value in key page 9 no value for tag page 1 lang is en ---- 105 en 483 total
2023-05-23
137 wrong lang format 106 has positional parameter 67 slash in key 37 slash in value 34 description parameter should only contain plain text 33 parsing failed 29 invalid lang parameter 14 value in key page 9 invalid osmcarto-rendering parameter 9 no value for tag page 1 lang is en ---- 91 en 476 total
Missed a couple of days...
2023-05-26
137 wrong lang format 96 has positional parameter 67 slash in key 37 slash in value 33 parsing failed 31 description parameter should only contain plain text 29 invalid lang parameter 14 value in key page 9 invalid osmcarto-rendering parameter 9 no value for tag page 1 lang is en ---- 91 en 463 total
Excluding the unsolvable ones, that just leaves the following:
96 has positional parameter 33 parsing failed 31 description parameter should only contain plain text 29 invalid lang parameter ---- 189 total
2023-05-27
137 wrong lang format 96 has positional parameter 67 slash in key 37 slash in value 33 parsing failed 31 description parameter should only contain plain text 15 invalid lang parameter 14 value in key page 9 invalid osmcarto-rendering parameter 9 no value for tag page 1 lang is en ---- 90 en 449 total
Excluding the unsolvable ones, that just leaves the following:
96 has positional parameter 33 parsing failed 31 description parameter should only contain plain text 15 invalid lang parameter ---- 175 total
2023-05-28
137 wrong lang format 88 has positional parameter 67 slash in key 37 slash in value 32 parsing failed 31 description parameter should only contain plain text 15 invalid lang parameter 14 value in key page 9 invalid osmcarto-rendering parameter 9 no value for tag page 1 lang is en ---- 89 en 440 total
Excluding the unsolvable ones, that just leaves the following:
88 has positional parameter 32 parsing failed 31 description parameter should only contain plain text 15 invalid lang parameter ---- 166 total
2023-05-29
137 wrong lang format 88 has positional parameter 67 slash in key 37 slash in value 33 parsing failed 31 description parameter should only contain plain text 14 value in key page 9 invalid osmcarto-rendering parameter 9 no value for tag page 5 invalid lang parameter 1 lang is en ---- 89 en 431 total
Excluding the unsolvable ones, that just leaves the following:
88 has positional parameter 33 parsing failed 31 description parameter should only contain plain text 5 invalid lang parameter ---- 157 total
2023-06-02
137 wrong lang format 67 slash in key 64 has positional parameter 37 slash in value 33 parsing failed 31 description parameter should only contain plain text 14 value in key page 9 invalid osmcarto-rendering parameter 9 no value for tag page 4 invalid lang parameter 1 lang is en ---- 89 en 406 total
Excluding the unsolvable ones, that just leaves the following:
64 has positional parameter 33 parsing failed 31 description parameter should only contain plain text 4 invalid lang parameter ---- 132 total
2023-06-30
Haven't updated these in a while, but made a LOT of progress, including taginfo fixes and corrections on the wiki.
65 slash in key 37 slash in value 20 has positional parameter 12 description parameter should only contain plain text 10 wrong lang format 9 non-file osmcarto-rendering parameter 6 parsing failed 3 invalid lang parameter 1 lang is en ---- 74 en 163 total
Excluding the unsolvable ones, that just leaves the following:
20 has positional parameter 6 parsing failed 12 description parameter should only contain plain text 3 invalid lang parameter ---- 41 total
2023-07-27
This is pretty much at the minimum (until/unless we get the slash ones addressed). If you notice any higher than this, they can probably be easily fixed.
65 slash in key 38 slash in value 14 has positional parameter 12 description parameter should only contain plain text 10 wrong lang format 9 non-file osmcarto-rendering parameter 6 parsing failed 2 invalid lang parameter 1 lang is en ---- 77 en 157 total
2023-08-04
65 slash in key 38 slash in value 12 description parameter should only contain plain text 10 has positional parameter 10 wrong lang format 9 non-file osmcarto-rendering parameter 6 parsing failed 1 invalid value for onRelation parameter 1 lang is en ---- 74 en 152 total
2023-08-09
65 slash in key 38 slash in value 11 description parameter should only contain plain text 10 wrong lang format 9 non-file osmcarto-rendering parameter 7 has positional parameter 6 parsing failed 1 lang is en ---- 74 en 147 total
2023-09-04
65 slash in key 39 slash in value 11 description parameter should only contain plain text 10 wrong lang format 9 non-file osmcarto-rendering parameter 7 has positional parameter 6 parsing failed 1 lang is en ---- 74 en 148 total
Sadly, we've got one more (unfixable, because it doesn't distinguish between redirects and not) "slash in value" case.
Solving slash in key / slash in value issues?
How should "slash in key" and "slash in value" issues be addressed? If it ignored redirects, that would be a way to fix it, but it doesn't -- which just leaves deleting the pages, which seems like excessive, and unhelpful. @Joto:, any thoughts? JesseFW (talk) 20:53, 14 May 2023 (UTC)
The value in key page and no value for tag page problems are similar to this, in that it would be useful if this only counted non-redirects, and until it does, these are pretty much unsolvable by non-admins. JesseFW (talk) 20:00, 26 May 2023 (UTC)
Added to the issue tracker, here: https://github.com/taginfo/taginfo/issues/416 -- JesseFW (talk) 22:38, 17 June 2023 (UTC)
"wrong lang format" -- bug in taginfo
Most of the "wrong lang format" issues are due to a bug in taginfo. It assumes all language codes can only be two letters, while the actual standard allows for some 3 letter codes, too. (Described here among other places.) This is the source code line that needs to be updated. I can make a PR if desired. JesseFW (talk) 02:16, 15 May 2023 (UTC)
Added to issue tracker: https://github.com/taginfo/taginfo/issues/417 -- JesseFW (talk) 22:40, 17 June 2023 (UTC)
invalid image/osmcarto-rendering parameter -- needs to support non-images
As has been documented on the wiki template page since 2016, the osmcarto-rendering and osmcarto-rendering-size parameters (and the type-specific versions) can include non-image links to more detailed descriptions. taginfo should be modified to accept this, and not warn about it. @Joto: I'll see about making a PR if desired. JesseFW (talk) 23:39, 22 May 2023 (UTC)
Added to issue tracker: https://github.com/taginfo/taginfo/issues/418 -- JesseFW (talk) 22:41, 17 June 2023 (UTC)