Data publishing
Most institutions publishing data to GBIF need to convert their data into a format suitable for GBIF to process, typically Darwin Core Archive.
Tools including the GBIF IPT and BioCASe can convert data stored in spreadsheets and databases to the appropriate formats. The IPT is the most common way to publish data to GBIF.
Some institutional collections management systems, such as Symbiota or EarthCape, can export all or part of their data to GBIF.
Users or institutional systems (custom software) which can generate Darwin Core Archives and make them available on a webserver have two options:
-
For occasional datasets (one or two per year) contact the GBIF helpdesk, who will register the dataset on your behalf.
-
If new datasets will be registered more frequently, you may register the datasets directly using the API.
Further discussion of the options can be found in this blogpost.
Dataset classes
Datasets can be published in four different formats:
-
Metadata-only
Generally, the data quality increases from metadata-only to sampling event datasets.
Data quality recommendations
You can familiarize yourself with the requirements for the various types of datasets here.
Tools to quality check your publication
Dataset validator
The dataset validator can be used to validate zipped Darwin Core Archive datasets.
Species matching
The species matching tool can be used to normalize species names from a CSV file against the GBIF backbone.
Species API (link to API topic)
Name usage, search and parsing can be carried out with the species API.
Flags and issues
When records are published to GBIF, they may receive various data quality flags and issues. The meaning and how to deal with the different issues are documented for occurrence and checklist datasets.