Sefaria GPU Server

A Flask-based server that serves ML models for Named Entity Recognition (NER) and reference part extraction for Sefaria clients. Supports both Hebrew and English text, with GPU acceleration via CUDA.

Models

The server loads and serves four models, configured via MODEL_PATHS:

Type	Language	Architecture	Purpose
`named_entity`	`he`	`huggingface`	Detects citations/sources in Hebrew text
`named_entity`	`en`	`spacy`	Detects named entities in English text
`ref_part`	`he`	`spacy`	Breaks Hebrew citations into constituent parts
`ref_part`	`en`	`spacy`	Breaks English citations into constituent parts

API Endpoints

POST /recognize-entities — single text

{"text": "...", "lang": "en"}

POST /bulk-recognize-entities — batch processing

{"texts": ["...", "..."], "lang": "he"}

Add ?with_span_text=1 to include the original span text in entity results.

Setup

Requirements

cd app
pip install -r requirements.txt

For GPU support (CUDA 12.x), install additional packages:

pip install cupy-cuda12x "spacy[cuda122]~=3.7.0" gunicorn

Local Configuration

The server selects its config file via the APP_CONFIG environment variable. For local development, create app/local_config.py (git-ignored) with a MODEL_PATHS list pointing to your local model directories:

MODEL_PATHS = [
    {
        'arch': 'huggingface',  # 'huggingface' or 'spacy'
        'lang': 'he',           # 'he' or 'en'
        'path': '/path/to/he_ner_model',
        'type': 'named_entity'  # 'named_entity' or 'ref_part'
    },
    {
        'arch': 'spacy',
        'lang': 'en',
        'path': '/path/to/en_ner_model',
        'type': 'named_entity'
    },
    {
        'arch': 'spacy',
        'lang': 'he',
        'path': '/path/to/subref_he',
        'type': 'ref_part'
    },
    {
        'arch': 'spacy',
        'lang': 'en',
        'path': '/path/to/subref_en',
        'type': 'ref_part'
    },
]

Each entry in MODEL_PATHS requires:

arch: Model architecture — 'huggingface' or 'spacy'
lang: Language code — 'he' (Hebrew) or 'en' (English)
path: Absolute path to the model directory, or a gs:// GCS URI (downloaded automatically at startup)
type: Task type — 'named_entity' or 'ref_part'

Running the Server

Local development

cd app
APP_CONFIG=local_config.py python app.py

The Flask dev server starts on http://localhost:5000.

Docker

cd app
docker build -t sefaria-gpu-server .
docker run -p 8000:8000 sefaria-gpu-server

The container runs Gunicorn on port 8000 with a single worker (required for shared GPU memory).

Kubernetes / Helm

The chart/ directory contains a Helm chart. Environment-specific HelmRelease manifests live in deploy/dev/ and deploy/prod/. Model paths and resource limits are configured in chart/values.yaml.

In production, MODEL_PATHS entries use gs:// URIs; the server downloads and extracts each model from GCS at startup using workload identity for authentication.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
app		app
chart		chart
deploy		deploy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
simple_test.py		simple_test.py
test_config.py		test_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sefaria GPU Server

Models

API Endpoints

Setup

Requirements

Local Configuration

Running the Server

Local development

Docker

Kubernetes / Helm

About

Uh oh!

Releases 47

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sefaria GPU Server

Models

API Endpoints

Setup

Requirements

Local Configuration

Running the Server

Local development

Docker

Kubernetes / Helm

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 47

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages