API SQL Downloads

This is an experimental feature, and the implementation may change throughout 2025.

The experimental Occurrence SQL Download API allows users to query GBIF occurrences using SQL, structured query language, a programming language commonly used to access database systems. In contrast to the Predicate Download API, the SQL API allows selection of the columns of interest and generation of summary views of GBIF data. Functions to generate species occurrence cubes are available as part of the Biodiversity Building Blocks for Policy project.

Occurrence SQL downloads are created asynchronously — the user requests a download and, once the download is prepared, is sent an email with a link to the resulting file.

Users must have registered on GBIF.org to create an SQL download request, and must authenticate using their username (not email) and password.

Using this API requires some knowledge of SQL. Many general tutorials and guides to writing SQL queries are online, see for example the section on aggregating and grouping data within Data management with SQL for Ecologists by Data Carpentry.

The GBIF website allows users to create species occurrence cubes by using the "Cube" download option. Users with knowledge of SQL can customize the query before submitting it.

Requesting an SQL occurrence download

A download request includes your username, email address, the download format, and the filter for the required data in the form of an SQL statement.

A basic example using the curl command to make the request:

Put this in a file called (for example) query.json:

{
  "sendNotification": true,
  "notificationAddresses": [
    "userEmail@example.org" (1)
  ],
  "format": "SQL_TSV_ZIP", (2)
  "sql": "SELECT datasetKey, countryCode, COUNT(*) FROM occurrence WHERE continent = 'EUROPE' GROUP BY datasetKey, countryCode" (3)
}

1	Replace with your email address, or remove the `notificationAddresses` and `sendNotification` values
2	`format` must be `SQL_TSV_ZIP`. More formats will be added later.
3	This query will count occurrences from Europe per dataset and country.

Then use this curl command:

curl --include --user YOUR_GBIF_USERNAME:YOUR_PASSWORD --header "Content-Type: application/json" --data @query.json https://api.gbif.org/v1/occurrence/download/request

The first line of the output shows the result of the download request:

If the query is accepted the first line of output should be either HTTP/1.1 201 Created or HTTP/2 201, and a download key will be returned on the last line.
If the SQL is not valid, the first line of the output will be HTTP/… 400 …. There may be useful information on the error at the end of the output.
HTTP/… 401 … means your username or password is incorrect
HTTP/… 403 … means your username does not have permission to use this prototype feature — contact helpdesk@gbif.org.

SQL queries can also be submitted through the GBIF website, avoiding the use of curl and allowing the website to handle authentication. Start from this SQL download query, which can be modified as you wish. Running the query produced this download in February 2024.

Querying the download key from an accepted request shows the download information, including the download link and DOI once the download is ready. Run this repeatedly, until you see SUCCEEDED — you will need to use your download key rather than the example:

curl -Ss https://api.gbif.org/v1/occurrence/download/0000379-240229165702484

Luego puedes descargar el archivo resultante:

curl --location --remote-name https://api.gbif.org/v1/occurrence/download/request/0000379-240229165702484.zip

And see the result:

datasetkey	countrycode	COUNT(*)
005eb8d8-ed94-41be-89cf-e3115a9058e4	AD	1
009a76f6-0960-4a56-a116-63991e6bb037	GR	3
0214a6a7-898f-4ee8-b888-0be60ecde81f	PT	84
03f2256a-e548-43d7-a731-253302f4aa34	LV	27
040c5662-da76-4782-a48e-cdea1892d14c	IS	1306
06aeea22-c116-4df2-93af-2d8321482801	FR	24
07fd0d79-4883-435f-bba1-58fef110cd13	PL	357

datasetkey

countrycode

COUNT(*)

005eb8d8-ed94-41be-89cf-e3115a9058e4

009a76f6-0960-4a56-a116-63991e6bb037

0214a6a7-898f-4ee8-b888-0be60ecde81f

03f2256a-e548-43d7-a731-253302f4aa34

040c5662-da76-4782-a48e-cdea1892d14c

1306

06aeea22-c116-4df2-93af-2d8321482801

07fd0d79-4883-435f-bba1-58fef110cd13

357

Supported SQL

Only SELECT queries are supported, and only queries against a single table named occurrence. JOIN queries and sub-queries are not allowed. Selecting * is also not allowed, specify the columns you need.

GROUP BY queries are supported, as are basic SQL window functions (OVER and PARTITION BY). The group and partition filters (HAVING and QUALIFY) are not yet supported.

Most common SQL operators and functions are supported, such as AND, OR, NOT, IS NULL, RAND(), ROUND(…), LOWER(…), etc. Note BETWEEN is not supported, use >, >=, < and ⇐ instead.

Aggregate functions are also supported, for example COUNT(…), MIN(…), MAX(…), AVERAGE(…).

These custom filtering functions are available:

And these functions for handling data types:

As well as custom gridding functions:

See the separate page on the available SQL functions.

Available columns

The occurrence table contains the interpreted data, the verbatim (as provided to GBIF) data, and some calculated columns useful for searching. The full list of available columns is show below, and is also available through this API call — the name field is the column name. Note the v_ prefix for the verbatim data columns.

Full list of available SQL column names (click to expand)

The definitions marked with are from the Darwin Core standard.

The definitions marked with are from GBIF, and may reflect the result of interpretation and data quality procedures applied by GBIF, or they may not be part of Darwin Core.

Unresolved include directive in modules/ROOT/pages/api-sql-downloads.adoc - include::partial$download-sql-terms-table.adoc[]

Quoting column names

The SQL parser is strict, and column names that are also SQL keywords must be quoted. This affects the columns "year", "month", "day", "order", "group", "language", "references" and "member". They must be quoted even if qualified with the table name as in occurrence."year" or "occurrence"."year".

Query validation

You can validate your queries using the validation API call.

curl --include --header "Content-Type: application/json" --data @query.json https://api.gbif.org/v1/occurrence/download/request/validate

Your query will be returned if it is valid, and an error message with an explanation will be returned if it is invalid.

Troubleshooting

Many GBIF APIs are tolerant of small differences in how they are queried — you can make an occurrence download query using the search parameter taxonKey, taxon_key, TAXONKEY and so on, and using the value 1, '1', "1", 1, " 01" etc.

The SQL API is less forgiving, so some common parts of queries must be specified exactly as documented.

Within the SQL query itself:

If column names are quoted they must use double quotes and lowercase: "year", "basisofrecord". It’s only necessary to quote column names that are also SQL functions: "year", "month", "day", "order", "group", "language", "references" and "member".
When column and function names aren’t quoted, the letter case doesn’t matter: basisOfRecord, basisofrecord, gbif_eeargcode(…), GBIF_EEARGCode(…).
String values use single quotes, like basisOfRecord = 'HUMAN_OBSERVATION'.
SQL comments (lines beginning -- ) cannot be included in queries.
Columns with numeric types must be filtered using numbers, not quoted string values — use taxonkey IN (1234, 5687) or "year" >= 2020.
Array values (including vocabulary lineages) should be converted to delimited strings, for example CONCAT_WS(' | ', recordedByID) and CONCAT_WS(' | ', occurrence.lifestage.lineage).

When converting the SQL query to JSON for submission to the API:

JSON uses double quotes for attributes and values, so when the SQL is written values containing double quotes must have these escaped using \"
Newlines, if used, should be replaced with either spaces ` ` or \n

{
  "sendNotification": true,
  "notificationAddresses": [
    "userEmail@example.org"
  ],
  "format": "SQL_TSV_ZIP",
  "sql": "SELECT DISTINCT \"year\" \n FROM occurrence \n WHERE \"year\" > 2000";
}

It can be convenient to work with neatly formatted SQL files, and convert them to JSON format automatically:

# Using jq
grep -v -- '--' neat-query.sql | jq -R -s .

# Using Python
grep -v -- '--' neat-query.sql | python3 -c 'import sys, json; print(json.dumps(sys.stdin.read()))'

Further examples

For a complete example of generating a data cube see the Species occurrence cubes page.

Known issues

Known faults with the system are tracked on GitHub.