A comprehensive Goodreads data scraper and analytics dashboard that automatically tracks reading activity, provides detailed statistics, and visualizes reading habits over time.
Katalog consists of two main components:
- Python scraper - A robust web scraper that extracts Goodreads data (books, shelves, feed activity, reading challenges) and stores it in Supabase
- Next.js dashboard - A clean web dashboard that visualizes reading statistics and social feed activity
The scraper runs on a schedule (via cron job) to keep data fresh, while the dashboard provides insights into my reading patterns.
- Python 3.9+
- Node.js 18+
- Docker (optional, for containerized dev/deployment)
- Supabase account (for data storage)
- Goodreads account with valid session cookie
Create a .env file in the root directory:
# Goodreads Authentication
GOODREADS_COOKIE=your_cookie_string_here
GOODREADS_USER_ID=your_user_id_here
# Database
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your_supabase_anon_key
# Environment
ENVIRONMENT=dev # or 'production'
# Monitoring (if you'd like that in production)
SENTRY_DSN=https://xxxxx@sentry.io/yyyy- Log into Goodreads in your browser
- Open Developer Tools (F12)
- Go to the Network tab
- Refresh the page
- Make any request to
goodreads.com - Copy the entire
Cookieheader value from the Request Headers
Create a .env file in the client/ directory:
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_keyThe scraper expects the following Supabase tables: books, feed, metadata, reading_challenge.
-
Clone the repository
git clone <repository-url> cd katalog
-
Set up Python environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate chmod +x scripts/install.sh ./scripts/install.sh
-
Configure environment variables
cp .example.env .env # Edit .env with credentials -
Set up the client
cd client npm install cp .example.env .env # Edit .env with credentials
- Build with Docker
docker build -t katalog .
Local execution
source venv/bin/activate
python src/index.pyDocker execution
docker run --rm --env-file .env -v "$(pwd)/output":/app/output katalog
###
# Alternatively, you could just run ./scripts/build.sh
###cd client
npm run devVisit http://localhost:3000 to view the dashboard.
- Scheduled Trigger: Cron job triggers the scraper
- Session Verification: Validates Goodreads cookie is still valid
- Data Extraction:
- Feed activity scraped via Playwright (JavaScript-rendered content)
- Books data scraped via requests library (static HTML)
- Reading challenge fetched from Goodreads API
- Data Validation: All data validated against Pydantic schemas
- Health Checks: Ensures data was actually scraped (not empty due to selector changes)
- Database Sync:
- Books: Upserted (updates existing, inserts new)
- Feed: Only new items inserted (based on high-water mark)
- Challenge: Upserted with latest progress
- Metadata Update: Updates
last_refreshedandnext_scrapetimestamps - Dashboard: Reads from Supabase during build time and renders visualizations
- Logs written to
kata.login the project root - Console output also displayed
- Debug level verbosity
- Logs sent to stderr (captured by Render in my case)
- Sentry integration for error tracking
- Info level verbosity
The scraper includes robust error handling:
- Session Validation: Fails fast if Goodreads cookie is invalid
- Health Checks: Exits with error if no books are scraped (indicates selector breakage)
- Partial Failures: Continues scraping even if individual items fail validation
- Retry Logic: Uses exponential backoff for network requests
- Graceful Degradation: Empty feed is logged as warning (not fatal error)
I chose to put everything on Render.
- Create new cron job and select Docker as the source
- Set environment variables as required
- Schedule the job to run every 3 days
- Create a new static site and choose
clientas the base folder - For the build command, put
npm ci && npm run build - The output directory should be
dist
Note
This scraper is built for my personal use only. Please get in touch if you feel this can be useful for you as well.
