This is the link preview API used by ChatMUD. Given a URL, returns structured metadata including title, description, images, author, and optionally an AI-generated summary.
- Extracts Open Graph, JSON-LD, and standard meta tags from web pages
- Handles YouTube and Spotify URLs via their oEmbed APIs
- Renders JavaScript-heavy sites (Twitter, TikTok, Instagram, Reddit, etc.) with a headless browser
- Detects paywalls and attempts bypass or falls back to Wayback Machine archives
- Identifies non-HTML content (PDFs, images, documents) and returns file metadata
- Generates short summaries using an OpenAI-compatible API
- Caches results in Redis
cp .env.example .env
# Edit .env to set REDIS_PASSWORD
docker compose up -dThe API runs on port 8005.
Returns service status. Checks Redis connectivity.
{"status": "healthy", "message": "All services operational"}Generates a link preview.
Query parameters:
url(required) — The URL to preview. Max 2048 characters.force_refresh(optional) — Bypass cache and fetch fresh data. Default:falsesummarizer(optional) — Include an AI-generated summary. Default:false
Example:
GET /preview?url=https://example.com/article&summarizer=true
Response:
{
"status": "success",
"url": "https://example.com/article",
"title": "Article Title",
"description": "The article description from meta tags.",
"image": "https://example.com/image.jpg",
"favicon": "https://example.com/favicon.ico",
"author": "Jane Smith",
"keywords": ["news", "tech"],
"language": "en",
"metadata": {
"opengraph": {"og:type": "article", "og:title": "Article Title"},
"json_ld": {"@type": "Article", "headline": "Article Title"},
"oembed": null,
"spotify": null
},
"summary": "Two to three sentence summary of the content."
}The summary field only appears when summarizer=true.
Set these in .env:
| Variable | Default | Description |
|---|---|---|
REDIS_HOST |
localhost |
Redis hostname |
REDIS_PORT |
6379 |
Redis port |
REDIS_DB |
0 |
Redis database number |
REDIS_PASSWORD |
— | Redis password |
CACHE_TTL |
300 |
Cache lifetime in seconds |
RATE_LIMIT_PER_MINUTE |
20 |
Requests per minute per IP |
OPENAI_API_KEY |
— | API key for summarization |
OPENAI_BASE_URL |
— | OpenAI-compatible endpoint |
GPT_MODEL |
— | Model name for summaries |
DEFAULT_TIMEOUT |
30 |
Request timeout in seconds |
Configure OPENAI_API_KEY, OPENAI_BASE_URL, and GPT_MODEL to enable the summarization feature. Any OpenAI-compatible API will work.
Standard websites — Fetches HTML via HTTP client, extracts metadata with BeautifulSoup and extruct.
JavaScript-required sites — Vimeo, TikTok, Twitter/X, Instagram, Facebook, LinkedIn, Reddit, Medium, and Substack get rendered with Playwright's Chromium browser.
YouTube — Calls the YouTube oEmbed API directly. Falls back to browser rendering if that fails.
Spotify — Calls the Spotify oEmbed API for tracks, albums, playlists, artists, shows, and episodes. Extracts additional metadata (artists, duration, release date) from the embed HTML.
Non-HTML content — PDFs, images, and other files return the filename, MIME type, file size, and last-modified date.
Paywalled content — Detects common paywall patterns, attempts to remove overlay elements, and falls back to Wayback Machine if content remains inaccessible.
Requests are limited per IP address. Cached responses cost 0.2 against the limit; fresh fetches cost 1.0. When the limit is exceeded, the API returns:
{
"status": "failure",
"message": "Rate limit exceeded. Please try again later.",
"url": "https://example.com"
}- Only
http://andhttps://schemes allowed - Localhost and private IP ranges blocked
- Security headers set on all responses (HSTS, CSP, X-Frame-Options, etc.)
Requires Python 3.11+ and a running Redis instance.
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync
# Install browser
uv run playwright install chromium
# Run the server
uv run uvicorn api.main:app --host 0.0.0.0 --port 8005MIT