Talk:California/Using CPAD data
Area sizes from shapefiles
user:Tinder says in a recent edit: "Generally area sizes from shapefiles should not be incorporated into OSM." #1, I do not know if Tinder is simply asserting this on his own, or if he has some authority or consensus from OSM to posit this here, so if the latter, I ask him for a reference to that source (wiki, a Talk page, perhaps consensus reached at a national conference I might not have not heard about, etc.). #2, I have offered very good reasons (not simply my opinion, others in OSM have agreed with me here), that including a datum like this in curated data such as these are perfectly reasonable to include in OSM. To wit, they allow comparison of differing versions of the data over time using a robust and sensible algorithm, now documented here. I'm not sure if Tinder's reasoning for this has to do with "data purity" in OSM, or if he doesn't like imported data (CPAD is not being systematically imported, more like "individually curated, one polygon at a time") or some other reason, but I do ask him to explain the particulars of why he made that statement here. And, what is meant by "generally"? Is that saying that "most of the time, but not in the case of CPAD data..." (to which I might agree), or "here, in the case of CPAD data, we should not be doing this"? Please offer your reasoning(s) in this newly-created Talk page and whether you do or do not believe there is justification for including this datum from CPAD in OSM — that point in particular remains unclear to me. Thank you in advance for good discussion. Stevea (talk) 23:38, 9 January 2020 (UTC)
"Foreign" tags
Keys with ALL CAPITAL LETTERS do find their way into OSM data from CPAD (and other public sources, like SCCGIS). Some decry these as "foreign keys" which have no right or reason to exist in OSM. I disagree, here is why: these tags fall into two small smart classes. One is objectid and its ilk, the other is Shape* and ACRES keys. OSM has plastic tagging and using these two tag flavors allows an efficient method to update versions of data like these. The Shape* and ACRES (very exact size of polygon) data act as a checksum: if existing and new data differ between versions, take a closer, visual look. There are thousands and thousands of polygons undergoing almost constant review (updates happen about as quickly as humans and smarter processing pipelines that look at an efficient chain of activities sensibly and efficiently update in real time). This process is largely being updated in wiki in roughly real time, too, not lagging by much, anyway. Summation: the "cost" of storing two sensible and documented key:value pairs allows an efficient method of processing data which "frequently" update. (They about match now). The data and process have largely stabilized, a modest amount of effort to complete the updates is established. I welcome discussion. Stevea (talk) 23:17, 26 April 2020 (UTC)