This repository contains a Dockerfile which generates TDB2 datasets for Fuseki.
It can:
- Create TDB2 datasets
- Create a spatial index for the dataset.
- Create a text index for the dataset.
Create a tdb2 dataset in the current directory from the RDF files in ./data.
docker run \
-v "./data:/rdf" \
-v "$(pwd):/fuseki/databases" \
--rm \
ghcr.io/kurrawong/tdb2-generation:latestNote
To persist the generated dataset files, you need to mount a volume to the location
where the dataset will be created.
Typically, this is the location of the tdb2 dataset as specified in the mounted
assembler description (/config.ttl).
If no assembler description is given then the dataset will be created at
/fuseki/databases/ds
This can be overriden with the $DATASET Environment Variable.
See the Environment Variables section below for more information.
The loading process can be configured by passing environment variables to the container. See the table below for all available options.
The text and spatial index creation are opt-in and will not be generated by default.
To create a tdb dataset with a text and spatial index:
docker run \
-e "SPATIAL=true" \
-e "TEXT=true" \
-v "./data:/rdf" \
-v "$(pwd):/fuseki/databases" \
-v "./config.ttl:/config.ttl" \
--rm \
ghcr.io/kurrawong/tdb2-generation:latest| Variable | Purpose | Default | Usage Example |
|---|---|---|---|
JENA_VERSION |
Which version of jena/fuseki to use for building the database. | 6.0.0 options: [ 6.0.0, ... ] | JENA_VERSION=6.0.0 |
SPATIAL |
If set, do spatial indexing | unset (false) | SPATIAL=true |
TEXT |
If set, do text indexing. Requires an assembler description mounted at /config.ttl |
unset (false) | TEXT=true |
THREADS |
Sets the number of threads to use for processing (only applies to tdb2.xloader) |
Number of available processors minus 1 | THREADS=4 |
USE_XLOADER |
If set, use tdb2.xloader instead of tdb2.tdbloader. See tdb.xloader |
unset (false) | USE_XLOADER=true |
TDB2_MODE |
Specifies the loader mode for tdb2.tdbloader. See tdbloader options |
phased if not set |
TDB2_MODE=sequential |
DATASET |
Specifies the path where the tdb dataset should be created. | If no assembler description is mounted at /config.ttl it will defualt to /fuseki/databases/ds. Else it is derived from the tdb2:location "..." ; statement in /config.ttl. |
DATASET=/fuseki/databases/ds |
SKIP_VALIDATION |
If set skip the validation check. By default, riot will be used to check for invalid RDF files, and not process them. | unset (false) | SKIP_VALIDATION=true |
SKIP_LOAD |
If set skip the tdb2 generation. Allows indexing an already built dataset or applying validation only. | unset (false) | SKIP_LOAD=true |
GRAPH |
Optional named graph for triples (only used for tdb2.tdbloader, not tdb2.xloader) | unset | GRAPH=https://graphs/example |
JVM_ARGS |
General Java args | unset | JVM_ARGS=-Xmx4G |
SPATIAL_INDEX_FILE |
location for spatial index file | object of the geosparql:spatialIndexFile predicate on the declared geosparql:Dataset in /config.ttl or if not given here "$DATASET/spatial.index" | SPATIAL_INDEX_FILE=/fuseki/databases/ds/spatial.index |
SRS_URI |
URI of srs to use for the spatial index | object of the geosparql:srsUri predicate on the declared geosparql:Dataset in /config.ttl or if not given, unset | SRS_URI=http://www.opengis.net/def/crs/OGC/1.3/CRS84 |
STATS |
Generate dataset statistics using tdb2.tdbstats | unset (false) | STATS=true |
COMPRESS |
Use gzip to compress the generated dataset / index files. | unset (false) | COMPRESS=true |
Warning
Setting variables to false is not supported, instead they should be left unset.
i.e. USE_XLOADER=false will be treated as enabling the USE_XLOADER flag. In other
words, USE_XLOADER=false == USE_XLOADER=true
To build the image locally
docker build . -t tdb2-generation:devSome test scenarios are pre-written in the compose file and can be executed easily with task. eg.
task tests:basic