Species Occurrence Cube SQL Functions

These custom functions have been developed by the B-Cubed project for producing species occurrence cubes.

Grid functions

These functions are usually used for generating species occurrence cubes. They calculate a grid cell for a point.

EEA Reference Grid cell code

Suggested function declaration:

CREATE OR REPLACE TEMPORARY FUNCTION eeaCellCode AS 'org.gbif.occurrence.hive.udf.EeaCellCodeUDF';

Function definition:

STRING eeaCellCode(INTEGER gridSize,
                   DOUBLE latitude,
                   DOUBLE longitude,
                   DOUBLE coordinateUncertaintyInMeters)

Parameters:

gridSize

The grid size in metres of the EEA Reference Grid. Must be 25, 100, 250, 1000, 10000 or 100000.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000))

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

Return value:

The code for the cell of the EEA reference grid in which the randomized point falls.

Example arguments and returned values:

Result

gridSize

latitude

longitude

coordinateUncertaintyInMeters

100kmE43N32

100 000

52.0

10.0

1000.0

10kmE432N321

10 000

52.0

10.0

1000.0

1kmE4321N3210

1 000

52.0

10.0

1000.0

250mE432100N321000

250

52.0

10.0

1000.0

100mE43210N32100

100

52.0

10.0

1000.0

25mE4321000N3210000

25

52.0

10.0

1000.0

Extended Quarter-Degree Grid cell code

Suggested function declaration:

CREATE OR REPLACE TEMPORARY FUNCTION eqdgcCode AS 'org.gbif.occurrence.hive.udf.ExtendedQuarterDegreeGridCellCodeUDF';

Function definition:

STRING eqdgcCode(INTEGER level,
                 DOUBLE latitude,
                 DOUBLE longitude,
                 DOUBLE coordinateUncertaintyInMeters)

Parameters:

level

The level of the grid; the number of additional divisions applied to a one-degree cell.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000))

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

A level of 0 will give the 1° cell, e.g. S01E010. For quarter-degrees, use level 2, e.g. S01E010AD.

Return value:

The code for the cell of the Extended Quarter-Degree Grid in which the randomized point falls.

Example arguments and returned values:

Result

level

latitude

longitude

coordinateUncertaintyInMeters

E010N52

0

52.3

10.3

1000.0

E010N52C

1

52.3

10.3

1000.0

E010N52CB

2

52.3

10.3

1000.0

E010N52CBC

3

52.3

10.3

1000.0

E010N52CBCC

4

52.3

10.3

1000.0

E010N52CBCCB

5

52.3

10.3

1000.0

E010N52CBCCBB

6

52.3

10.3

1000.0

See also: Wikipedia: QDGC.

ISEA3H Grid cell code

This function implements the Inverse Snyder Equal-Area Projection (ISEA) Aperture 3 Hexagonal (3H) Discrete Global Grid System (DGGS), ISEA3H. Grid cell codes/identifiers are as specified in A novel identifier scheme for the ISEA Aperture 3 Hexagon Discrete Global Grid System.

Suggested function declaration:

CREATE OR REPLACE TEMPORARY FUNCTION isea3hCode AS 'org.gbif.occurrence.hive.udf.Isea3hCellCodeUDF';

Function definition:

STRING isea3hCode(INTEGER resolution,
                  DOUBLE latitude,
                  DOUBLE longitude,
                  DOUBLE coordinateUncertaintyInMeters)

Parameters:

resolution

The resolution of the grid; the number of division steps applied to the initial dodecahedron. Valid values are 1 to 22.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000))

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

See ISEA3H details for the number and size of cells at each resolution.

Return value:

The code for the cell of the ISEA3H grid cell in which the randomized point falls.

Example arguments and returned values:

Result

resolution

latitude

longitude

coordinateUncertaintyInMeters

-358282526011250000

3

52.3

10.3

1000.0

652180731009071912

6

52.3

10.3

1000.0

952458899010519815

9

52.3

10.3

1000.0

References:

See also:

  • geogrid, the Java library used by GBIF to calculate the grid.

  • dggridR, an R package to calculate the grid (though not using the same identifiers).

Military Grid Reference System cell code

Suggested function declaration:

CREATE OR REPLACE TEMPORARY FUNCTION mgrsCode AS 'org.gbif.occurrence.hive.udf.MilitaryGridReferenceSystemCellCodeUDF';

Function definition:

STRING mgrsCode(INTEGER gridSize,
                DOUBLE latitude,
                DOUBLE longitude,
                DOUBLE coordinateUncertaintyInMeters)

Parameters:

gridSize

The grid size in metres. Must be 1, 10, 100, 1000, 10000, 100000 or 0.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000))

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

A gridSize of 0 will give the Grid Zone Junction (GZJ) only, e.g. 32. Other values increase the accuracy of the grid, e.g. 100 (metres) 32UNC686615.

Return value:

The code for the cell of the Military Grid Reference System in which the randomized point falls.

Example arguments and returned values:

Result

gridSize

latitude

longitude

coordinateUncertaintyInMeters

32U

0

52.0

10.0

1000.0

32UNC

100 000

52.0

10.0

1000.0

32UNC66

10 000

52.0

10.0

1000.0

32UNC6861

1 000

52.0

10.0

1000.0

32UNC686615

100

52.0

10.0

1000.0

32UNC68646151

10

52.0

10.0

1000.0

32UNC6864961510

1

52.0

10.0

1000.0

Temporal Uncertainty

Suggested function declaration:

CREATE OR REPLACE TEMPORARY FUNCTION temporalUncertainty AS 'org.gbif.occurrence.hive.udf.TemporalUncertaintyUDF';

Function definition:

STRING temporalUncertainty(String dateInterval)

Parameters:

dateInterval

A string containing an ISO-8106 date or date interval.

Return value:

This function returns the uncertainty in seconds of the date.

Example arguments and returned value

dateInterval

Result

Remarks

2021-03-21T15:01:32.456Z

1

Milliseconds are rounded to seconds.

2021-03-21T15:01:32Z

1

2021-03-21T15:01Z

60

2021-03-21T15Z

60×60

2021-03-21

60×60×24

2021-03-01

60×60×24

2021-01-01

60×60×24

2021-03

60×60×24×31

2021

60×60×24×365

2021-03-21/2021-03-23

60×60×24×3

Software repository

Source code for the functions is available on GitHub.

Compiled code is available from GBIF’s repository; choose the latest version and download cube-functions-<VERSION>.jar.