Metadata recommendations and requirements

🚧 This page is not published on techdocs.gbif.org. It is only visible on the test and development sites (this is techdocs.gbif-uat.org). 🚧

Background to the GBIF Metadata Profile

The GBIF EML Profile was developed to standardize how resources get described at the dataset level in GBIF.org. This profile can be transformed to other common metadata formats.

In the GBIF EML Profile, there is a minimum set of mandatory elements required for identification, but it is recommended that as many elements be used as possible to ensure the metadata are as descriptive and complete as possible.

Things to consider

Metadata describes and explains the context of a dataset, and can vary in completeness. In general, metadata should allow a prospective end user of data to:

  1. Identify/discover its existence,

  2. Learn how to access or acquire the data,

  3. Understand its fitness-for-use, and

  4. Learn how to transfer (obtain a copy of) the data.

Metadata Elements

The GBIF EML Profile is primarily based on the Ecological Metadata Language (EML). The GBIF profile utilizes a subset of EML and extends it to include additional requirements that are not accommodated in the EML specification. The following tables provide short descriptions of the profile elements, and where relevant, links to more complete EML descriptions. The elements are categorized as follows:

  • Dataset (Resource)

  • Project

  • People and Organizations

  • Keyword Set (General Keywords)

  • Coverage

    • Taxonomic Coverage

    • Geographic Coverage

    • Temporal Coverage

  • Methods

  • Intellectual Property Rights

  • Additional Metadata + NCD (Natural Collections Descriptions Data) Related

Dataset (Resource)

The dataset field has elements relating to a single dataset (resource).

Term name Description Example

alternateIdentifier

It is a Universally Unique Identifier (UUID) for the EML document and not for the dataset. This term is optional. A list of different identifiers can be supplied.

619a4b95-1a82-4006-be6a-7dbe3c9b33c5

title

A description of the resource that is being documented that is long enough to differentiate it from other similar resources. Multiple titles may be provided, particularly when trying to express the title in more than one language (use the "xml:lang" attribute to indicate the language if not English/en).

Vernal pool amphibian density data, Isla Vista, 1990-1996

creator

The resource creator is the person or organization responsible for creating the resource itself. See section “People and Organizations” for more details.

metadataProvider

The metadataProvider is the person or organization responsible for providing documentation for the resource. See section “People and Organizations” for more details.

associatedParty

An associatedParty is another person or organization that is associated with the resource. These parties might play various roles in the creation or maintenance of the resource, and these roles should be indicated in the "role" element. See section “People and Organizations” for more details.

contact

The contact field contains contact information for this dataset. This is the person or institution to contact with questions about the use, interpretation of a data set. See section “People and Organizations” for more details.

pubDate

The date that the resource was published. The format should be represented as: CCYY, which represents a 4 digit year, or as CCYY-MM-DD, which denotes the full year, month, and day. Note that month and day are optional components. Formats must conform to ISO 8601. E.g. 2010-09-20.

language

The language in which the resource (not the metadata document) is written. This can be a well-known language name, or one of the ISO language codes to be more precise. GBIF recommendation is to use the ISO language code (https://api.gbif.org/v1/enumeration/language).

English

additionalInfo

Information regarding omissions, instructions or other annotations that resource managers may wish to include with a dataset. Basically, any information that is not characterized by the other resource metadata fields.

url

The URL of the resource that is available online.

abstract

A brief overview of the resource that is being documented.

Project

The project field contains information on the project in which this dataset was collected. It includes information such as project personnel, funding, study area, project design and related projects.

Term Definition Example

title

A descriptive title for the research project.

Species diversity in Tennessee riparian habitats

personnel

The personnel field is used to document people involved in a research project by providing contact information and their role in the project.

funding

The funding field is used to provide information about funding sources for the project such as: grant and contract numbers; names and addresses of funding sources.

studyAreaDescription

The studyAreaDescription field documents the physical area associated with the research project. It can include descriptions of the geographic, temporal, and taxonomic coverage of the research location and descriptions of domains (themes) of interest such as climate, geology, soils or disturbances.

designDescription

The field designDescription contains general textual descriptions of research design. It can include detailed accounts of goals, motivations, theory, hypotheses, strategy, statistical design, and actual work. Literature citations may also be used to describe the research design.

projectTitle

The title of the funded project as listed in the contract document, but not containing the projectID and other administrative information.

Examples

projectID

A unique identifier for the project from which a dataset is derived The record type is a GUID or other identifier that is near globally unique. This field is REQUIRED for a dataset that is funded through programmes operated by GBIF. In this case, the projectID is the ID of the funded project as listed in the contract document.

BID-AF2016-0001-REG

People and Organizations

Several fields could represent either a person or an organization. Below is a list of the various fields used to describe a person or organization.

Term Definition Example

givenName

Subfield of individualName field. The given name field can be used for the first name of the individual associated with the resource, or for any other names that are not intended to be alphabetized (as appropriate).

Jonny

surName

Subfield of individualName field. The surname field is used for the last name of the individual associated with the resource. This is typically the family name of an individual, for example, the name by which they are referred to in citations.

Carson

organizationName

The full name of the organization that is associated with the resource. This field is intended to describe which institution or overall organization is associated with the resource being described.

National Center for Ecological Analysis and Synthesis

positionName

This field is intended to be used instead of a particular person or full organization name. If the associated person that holds the role changes frequently, then Position Name would be used for consistency. Note that this field, used in conjunction with 'organizationName' and 'individualName' make up a single logical originator. Because of this, an originator with only the individualName of 'Joe Smith' is NOT the same as an originator with the name of 'Joe Smith' and the organizationName of 'NSF'. Also, the positionName should not be used in conjunction with individualName unless only that individual at that position would be considered an originator for the data package. If a positionName is used in conjunction with an organizationName, then that implies that any person who currently occupies said positionName at organizationName is the originator of the data package.

HAST herbarium data manager

electronicMailAddress

The electronic mail address is the email address for the party. It is intended to be an Internet SMTP email address, which should consist of a username followed by the @ symbol, followed by the email server domain name address.

jcuadra@gbif.org

deliveryPoint

Subfield of the address field that describes the physical or electronic address of the responsible party for a resource. The delivery point field is used for the physical address for postal communication.

GBIF Secretariat, Universitetsparken 15

RoleType

Use this field to describe the role the party played with respect to the resource. E.g. technician, reviewer, principal investigator, etc. NB: should this instead be role? Neither have a matching description to the original entry

phone

The phone field describes information about the responsible party’s telephone, be it a voice phone, fax.

+4530102040

postalCode

Subfield of the address field that describes the physical or electronic address of the responsible party for a resource. The postal code is equivalent to a U.S. zip code, or the number used for routing to an international address.

52000

city

Subfield of the address field that describes the physical or electronic address of the responsible party for a resource. The city field is used for the city name of the contact associated with a particular resource.

San Diego

administrativeArea

Subfield of the address field that describes the physical or electronic address of the responsible party for a resource. The administrative area field is the equivalent of a 'state' in the U.S., or Province in Canada. This field is intended to accommodate the many types of international administrative areas.

Colorado

country

Subfield of the address field that describes the physical or electronic address of the responsible party for a resource. The country field is used for the name of the contact’s country. The country name is most often derived from the ISO 3166 country code list.

Japan

onlineUrl

A link to associated online information, usually a web site. When the party represents an organization, this is the URL to a website or other online information about the organization. If the party is an individual, it might be their personal web site or other related online information about the party.

https://www.example.edu/botany.

KeywordSet (General Keywords)

The keywordSet field is a wrapper for the keyword and keywordThesaurus elements, both of which are required together.

Term Definition Example

keyword

A keyword or key phrase that concisely describes the resource or is related to the resource. Each keyword field should contain one and only one keyword (i.e., keywords should not be separated by commas or other delimiters).

biodiversity

keywordThesaurus

The name of the official keyword thesaurus from which keyword was derived. If an official thesaurus name does not exist, please keep a placeholder value such as “N/A” instead of removing this element as it is required together with the keyword element to constitute a keywordSet.

IRIS keyword thesaurus

Coverage

Describes the extent of the coverage of the resource in terms of its spatial extent, temporal extent, and taxonomic extent.

Taxonomic Coverage

A container for taxonomic information about a resource. It includes a list of species names (or higher level ranks) from one or more classification systems. Please note the taxonomic classifications should not be nested, just listed one after the other.

Term Definition Example

generalTaxonomicCoverage

Taxonomic Coverage is a container for taxonomic information about a resource. It includes a list of species names (or higher level ranks) from one or more classification systems. A description of the range of taxa addressed in the data set or collection. Use a simple comma separated list of taxa.

"All vascular plants were identified to family or species, mosses and lichens were identified as moss or lichen."

taxonomicClassification

Information about the range of taxa addressed in the dataset or collection.

taxonRankName

The name of the taxonomic rank for which the Taxon rank value is provided.

phylum, class, genus, species

taxonRankValue

The name representing the taxonomic rank of the taxon being described. It is recommended to start with Kingdom and include ranks down to the most detailed level possible.

Acer would be an example of a genus rank value, and rubrum would be an example of a species rank value, together indicating the common name of red maple

commonName

Applicable common names; these common names may be general descriptions of a group of organisms if appropriate.

invertebrates, waterfowl

Geographic Coverage

A container for spatial information about a resource; allows a bounding box for the overall coverage (in lat long), and also allows description of arbitrary polygons with exclusions.

Term Definition Example

geographicDescription

A short text description of a dataset’s geographic areal domain. A text description is especially important to provide a geographic setting when the extent of the dataset cannot be well described by the "boundingCoordinates".

"Manistee River watershed", "extent of 7 1/2 minute quads containing any property belonging to Yellowstone National Park"

westBoundingCoordinate

Subfield of boundingCoordinates field covering the W margin of a bounding box. The longitude in decimal degrees of the western-most point of the bounding box that is being described.

-18.25, +25, 45.24755

eastBoundingCoordinate

Subfield of boundingCoordinates field covering the E margin of a bounding box. The longitude in decimal degrees of the eastern-most point of the bounding box that is being described.

-18.25, +25, 45.24755

northBoundingCoordinate

Subfield of boundingCoordinates field covering the N margin of a bounding box. The longitude in decimal degrees of the northern-most point of the bounding box that is being described.

-18.25, +25, 65.24755.

southBoundingCoordinate

Subfield of boundingCoordinates field covering the S margin of a bounding box. The longitude in decimal degrees of the southern-most point of the bounding box that is being described.

-118.25, +25, 84.24755

Temporal Coverage

This container allows coverage to be a single point in time, multiple points in time, or a range of dates.

Term Definition Example

beginDate

Subfield of rangeOfDates field: It may be used multiple times with a endDate field to document multiple date ranges. A single time stamp signifying the beginning of some time period. The calendar date field is used to express a date, giving the year, month, and day. The format should be one that complies with the International Standards Organization’s standard 8601. The recommended format for EML is YYYY-MM-DD, where Y is the four digit year, M is the two digit month code (01 - 12, where January = 01), and D is the two digit day of the month (01 - 31). This field can also be used to enter just the year portion of a date.

2010-09-20

endDate

Subfield of rangeOfDates field: It may be used multiple times with a beginDate field to document multiple date ranges. A single time stamp signifying the end of some time period. The calendar date field is used to express a date, giving the year, month, and day. The format should be one that complies with the International Standards Organization’s standard 8601. The recommended format for EML is YYYY-MM-DD, where Y is the four digit year, M is the two digit month code (01 - 12, where January = 01), and D is the two digit day of the month (01 - 31). This field can also be used to enter just the year portion of a date.

2010-09-20

singleDateTime

The SingleDateTime field is intended to describe a single date and time for an event.

Methods

This field documents scientific methods used in the collection of the resource. It includes information on items such as tools, instrument calibration and software.

Term Definition Example

methodStep

The methodStep field allows for repeated sets of elements that document a series of procedures followed to produce a data object. These include text descriptions of the procedures, relevant literature, software, instrumentation, source data and any quality control measures taken.

qualityControl

The qualityControl field provides a location for the description of actions taken to either control or assess the quality of data resulting from the associated method step.

sampling

Description of sampling procedures including the geographic, temporal and taxonomic coverage of the study.

studyExtent

Subfield of the sampling field. The coverage field allows for a textual description of the specific sampling area, the sampling frequency (temporal boundaries, frequency of occurrence), and groups of living organisms sampled (taxonomic coverage). The field studyExtent represents both a specific sampling area and the sampling frequency (temporal boundaries, frequency of occurrence). The geographic studyExtent is usually a surrogate (representative area of) for the larger area documented in the "studyAreaDescription".

samplingDescription

Subfield of the sampling field. The samplingDescription field allows for a text-based/human readable description of the sampling procedures used in the research project. The content of this element would be similar to a description of sampling procedures found in the methods section of a journal article.

Intellectual Property Rights

Contain a rights management statement for the resource, or a reference to a service providing such information.

Term Definition

purpose

A description of the purpose of this dataset.

intellectualRights

A rights management statement for the resource, or reference a service providing such information. Rights information encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. In the case of a data set, rights might include requirements for use, requirements for attribution, or other requirements the owner would like to impose. E.g., © 2001 Regents of the University of California Santa Barbara. Free for use by all individuals provided that the owners are acknowledged in any use or publication.

The additionalMetadata field is a container for any other relevant metadata that pertains to the resource being described. This field allows EML to be extensible in that any XML-based metadata can be included in this element. The elements provided here in the GMP include those required for conformance with ISO 19139 and a subset of NCD (Natural Collections Descriptions) elements.

Term Definition Example

dateStamp

The dateTime the metadata document was created or modified.

2002-10-23T18:13:51.235+01:00

metadataLanguage

The language in which the metadata document (as opposed to the resource being described by the metadata) is written. Composed of an ISO639-2/T three-letter language code and an ISO3166-1 three-letter country code.

en_GB

hierarchyLevel

Dataset level to which the metadata applies; default value is “dataset”

dataset

citation

The citation for the work itself.

bibliography

A list of citations (see below) that form a bibliography on literature related / used in the dataset

physical

A container element for all of the elements that let you describe the internal/external characteristics and distribution of a data object (e.g., dataObject, dataFormat, distribution). Can repeat.

resourceLogoUrl

URL of the logo associated with a resource.

http://www.gbif.org/logo.jpg

parentCollectionIdentifier

Subfield of collection field. Is an optional field. Identifier for the parent collection for this sub-collection. Enables a hierarchy of collections and sub-collections to be built.

collectionName

Subfield of collection field. Is an optional field. Official name of the Collection in the local language.

collectionIdentifier

Subfield of collection field. Is an optional field. The URI (LSID or URL) of the collection. In RDF, used as URI of the collection resource.

formationPeriod

Text description of the time period during which the collection was assembled.

"Victorian", or "1922 - 1932", or "c. 1750".

livingTimePeriod

Time period during which biological material was alive (for palaeontological collections).

specimenPreservationMethod

Picklist keyword indicating the process or technique used to prevent physical deterioration of non-living collections. Expected to contain an instance from the Specimen Preservation Method Type Term vocabulary.

formaldehyde.

jgtiCuratorialUnit

A quantitative descriptor (number of specimens, samples or batches). The actual quantification could be covered by

  1. an exact number of “JGI-units” in the collection plus a measure of uncertainty (± x);

  2. a range of numbers (x to x), with the lower value representing an exact number, when the higher value is omitted.

The discussion concluded that the quantification should encompass all specimens, not only those that have not yet been digitized. This is to avoid having to update the numbers too often. The number of non-public data (not digitized or not accessible) can be calculated from the GBIF numbers as opposed to the JGTI-data.

Required metadata