Summarizer

This is the link preview API used by ChatMUD. Given a URL, returns structured metadata including title, description, images, author, and optionally an AI-generated summary.

What it does

Extracts Open Graph, JSON-LD, and standard meta tags from web pages
Handles YouTube and Spotify URLs via their oEmbed APIs
Renders JavaScript-heavy sites (Twitter, TikTok, Instagram, Reddit, etc.) with a headless browser
Detects paywalls and attempts bypass or falls back to Wayback Machine archives
Identifies non-HTML content (PDFs, images, documents) and returns file metadata
Generates short summaries using an OpenAI-compatible API
Caches results in Redis

Quick start

cp .env.example .env
# Edit .env to set REDIS_PASSWORD
docker compose up -d

The API runs on port 8005.

Endpoints

`GET /health`

Returns service status. Checks Redis connectivity.

{"status": "healthy", "message": "All services operational"}

`GET /preview`

Generates a link preview.

Query parameters:

url (required) — The URL to preview. Max 2048 characters.
force_refresh (optional) — Bypass cache and fetch fresh data. Default: false
summarizer (optional) — Include an AI-generated summary. Default: false

Example:

GET /preview?url=https://example.com/article&summarizer=true

Response:

{
  "status": "success",
  "url": "https://example.com/article",
  "title": "Article Title",
  "description": "The article description from meta tags.",
  "image": "https://example.com/image.jpg",
  "favicon": "https://example.com/favicon.ico",
  "author": "Jane Smith",
  "keywords": ["news", "tech"],
  "language": "en",
  "metadata": {
    "opengraph": {"og:type": "article", "og:title": "Article Title"},
    "json_ld": {"@type": "Article", "headline": "Article Title"},
    "oembed": null,
    "spotify": null
  },
  "summary": "Two to three sentence summary of the content."
}

The summary field only appears when summarizer=true.

Configuration

Set these in .env:

Variable	Default	Description
`REDIS_HOST`	`localhost`	Redis hostname
`REDIS_PORT`	`6379`	Redis port
`REDIS_DB`	`0`	Redis database number
`REDIS_PASSWORD`	—	Redis password
`CACHE_TTL`	`300`	Cache lifetime in seconds
`RATE_LIMIT_PER_MINUTE`	`20`	Requests per minute per IP
`OPENAI_API_KEY`	—	API key for summarization
`OPENAI_BASE_URL`	—	OpenAI-compatible endpoint
`GPT_MODEL`	—	Model name for summaries
`DEFAULT_TIMEOUT`	`30`	Request timeout in seconds

Configure OPENAI_API_KEY, OPENAI_BASE_URL, and GPT_MODEL to enable the summarization feature. Any OpenAI-compatible API will work.

URL handling

Standard websites — Fetches HTML via HTTP client, extracts metadata with BeautifulSoup and extruct.

JavaScript-required sites — Vimeo, TikTok, Twitter/X, Instagram, Facebook, LinkedIn, Reddit, Medium, and Substack get rendered with Playwright's Chromium browser.

YouTube — Calls the YouTube oEmbed API directly. Falls back to browser rendering if that fails.

Spotify — Calls the Spotify oEmbed API for tracks, albums, playlists, artists, shows, and episodes. Extracts additional metadata (artists, duration, release date) from the embed HTML.

Non-HTML content — PDFs, images, and other files return the filename, MIME type, file size, and last-modified date.

Paywalled content — Detects common paywall patterns, attempts to remove overlay elements, and falls back to Wayback Machine if content remains inaccessible.

Rate limiting

Requests are limited per IP address. Cached responses cost 0.2 against the limit; fresh fetches cost 1.0. When the limit is exceeded, the API returns:

{
  "status": "failure",
  "message": "Rate limit exceeded. Please try again later.",
  "url": "https://example.com"
}

Security

Only http:// and https:// schemes allowed
Localhost and private IP ranges blocked
Security headers set on all responses (HSTS, CSP, X-Frame-Options, etc.)

Running without Docker

Requires Python 3.11+ and a running Redis instance.

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Install browser
uv run playwright install chromium

# Run the server
uv run uvicorn api.main:app --host 0.0.0.0 --port 8005

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
api		api
extractors		extractors
services		services
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
compose.yaml		compose.yaml
config.py		config.py
models.py		models.py
pyproject.toml		pyproject.toml
run.py		run.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarizer

What it does

Quick start

Endpoints

`GET /health`

`GET /preview`

Configuration

URL handling

Rate limiting

Security

Running without Docker

License

About

Uh oh!

Releases

Packages

Languages

chatmud/summarizer

Folders and files

Latest commit

History

Repository files navigation

Summarizer

What it does

Quick start

Endpoints

GET /health

GET /preview

Configuration

URL handling

Rate limiting

Security

Running without Docker

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`GET /health`

`GET /preview`

Packages