Usage

Overview

This repository contains a Dockerfile which generates TDB2 datasets for Fuseki.

It can:

Create TDB2 datasets
Create a spatial index for the dataset.
Create a text index for the dataset.

Usage

Create a tdb2 dataset in the current directory from the RDF files in ./data.

docker run \
  -v "./data:/rdf" \
  -v "$(pwd):/fuseki/databases" \
  --rm \
  ghcr.io/kurrawong/tdb2-generation:latest

Note

To persist the generated dataset files, you need to mount a volume to the location
where the dataset will be created.

Typically, this is the location of the tdb2 dataset as specified in the mounted
assembler description (/config.ttl).

If no assembler description is given then the dataset will be created at
/fuseki/databases/ds

This can be overriden with the $DATASET Environment Variable.
See the Environment Variables section below for more information.

The loading process can be configured by passing environment variables to the container. See the table below for all available options.

The text and spatial index creation are opt-in and will not be generated by default.

To create a tdb dataset with a text and spatial index:

docker run \
  -e "SPATIAL=true" \
  -e "TEXT=true" \
  -v "./data:/rdf" \
  -v "$(pwd):/fuseki/databases" \
  -v "./config.ttl:/config.ttl" \
  --rm \
  ghcr.io/kurrawong/tdb2-generation:latest

Environment Variables

Variable	Purpose	Default	Usage Example
`JENA_VERSION`	Which version of jena/fuseki to use for building the database.	6.0.0 options: [ 6.0.0, ... ]	`JENA_VERSION=6.0.0`
`SPATIAL`	If set, do spatial indexing	unset (false)	`SPATIAL=true`
`TEXT`	If set, do text indexing. Requires an assembler description mounted at `/config.ttl`	unset (false)	`TEXT=true`
`THREADS`	Sets the number of threads to use for processing (only applies to tdb2.xloader)	Number of available processors minus 1	`THREADS=4`
`USE_XLOADER`	If set, use tdb2.xloader instead of tdb2.tdbloader. See tdb.xloader	unset (false)	`USE_XLOADER=true`
`TDB2_MODE`	Specifies the loader mode for tdb2.tdbloader. See tdbloader options	`phased` if not set	`TDB2_MODE=sequential`
`DATASET`	Specifies the path where the tdb dataset should be created.	If no assembler description is mounted at /config.ttl it will defualt to `/fuseki/databases/ds`. Else it is derived from the `tdb2:location "..." ;` statement in /config.ttl.	`DATASET=/fuseki/databases/ds`
`SKIP_VALIDATION`	If set skip the validation check. By default, riot will be used to check for invalid RDF files, and not process them.	unset (false)	`SKIP_VALIDATION=true`
`SKIP_LOAD`	If set skip the tdb2 generation. Allows indexing an already built dataset or applying validation only.	unset (false)	`SKIP_LOAD=true`
`GRAPH`	Optional named graph for triples (only used for tdb2.tdbloader, not tdb2.xloader)	unset	`GRAPH=https://graphs/example`
`JVM_ARGS`	General Java args	unset	`JVM_ARGS=-Xmx4G`
`SPATIAL_INDEX_FILE`	location for spatial index file	object of the geosparql:spatialIndexFile predicate on the declared geosparql:Dataset in /config.ttl or if not given here "$DATASET/spatial.index"	`SPATIAL_INDEX_FILE=/fuseki/databases/ds/spatial.index`
`SRS_URI`	URI of srs to use for the spatial index	object of the geosparql:srsUri predicate on the declared geosparql:Dataset in /config.ttl or if not given, unset	`SRS_URI=http://www.opengis.net/def/crs/OGC/1.3/CRS84`
`STATS`	Generate dataset statistics using tdb2.tdbstats	unset (false)	`STATS=true`
`COMPRESS`	Use gzip to compress the generated dataset / index files.	unset (false)	`COMPRESS=true`

Warning

Setting variables to false is not supported, instead they should be left unset. i.e. USE_XLOADER=false will be treated as enabling the USE_XLOADER flag. In other words, USE_XLOADER=false == USE_XLOADER=true

Development

To build the image locally

docker build . -t tdb2-generation:dev

Some test scenarios are pre-written in the compose file and can be executed easily with task. eg.

task tests:basic

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
docs		docs
queries		queries
tests		tests
.gitignore		.gitignore
.releaserc.json		.releaserc.json
Dockerfile		Dockerfile
README.md		README.md
Taskfile.yaml		Taskfile.yaml
compose.yaml		compose.yaml
entrypoint.sh		entrypoint.sh
jena_download.sh		jena_download.sh
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Usage

Environment Variables

Development

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Usage

Environment Variables

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages