ProjectIDs on individual records, rather than a dataset as a whole · Issue #836 · gbif/pipelines (original) (raw)

Idea/wish captured from feedback of the regional support contractors (BID) to GBIFS:

"It is being defined with SiB Colombia how to identify in each record of a dataset its link with the BID project, within the framework of the publication of data from partner organizations/collections in the Colombian BID-CA2020 projects. The use of DwC fields such as datasetID or datasetName has been proposed by the Regional Support, but in some cases that could create conflict when the field was filled with previous data. GBIF is encouraged in building its new data model to look for a more effective mechanism to accomplish this and clarify it for the BID projects (and project partners)."

There are two main reasons for this request:

Unfortunately, this is not easy – individual records would have to carry the project ID right from the point they are captured at record level – our transfer schema does not really allow for that. We are presently getting around the delivery-reporting requirement, e.g. in cases where records are published through eBird or iNaturalist, by requesting an explicit report on the data published in the project context. This is only for internal evaluation though. The second part is not easily possible, since there is no “project ID” field at record level.

Open question: do the benefits outweigh the added requirements, including internal data management and UI needs for surfacing this information?