A fast CLI + Python tool to scrape Goodreads book details — title, author, rating, ratings/reviews counts, description, ISBN, page count, format, cover image — to CSV or JSON. No API key, no JS rendering.
- Clean book metadata — title, author(s), average rating, ratings count, reviews count, description, ISBN, number of pages, format, publisher, language, and cover image URL
- Two structured sources, no browser — reads Goodreads' own schema.org JSON-LD and Next.js
__NEXT_DATA__blobs that ship in the page HTML, so there's no Selenium/Playwright and no fragile CSS selectors - Takes an id or a URL — pass
3735293orhttps://www.goodreads.com/book/show/3735293-clean-code - Multiple books at once — list several ids/URLs on one command
- CSV / JSON / JSONL output — ready for Excel, pandas, or a database
- Python API — use it as a library
- Minimal dependencies — just
requests+beautifulsoup4
pip install goodreads-scraperRequires Python 3.10+.
# Scrape one book by id (prints JSON)
goodreads 3735293
# By full URL, to CSV
goodreads "https://www.goodreads.com/book/show/3735293-clean-code" -f csv -o clean-code.csv
# Several books at once, to JSONL
goodreads 3735293 5907 11870085 -f jsonl -o books.jsonlExample JSON record (goodreads 3735293):
{
"title": "Clean Code: A Handbook of Agile Software Craftsmanship",
"author": "Robert C. Martin",
"authors": ["Robert C. Martin"],
"rating": 4.19,
"ratings_count": 28173,
"reviews_count": 1502,
"description": "Even good software developers leave a trail of...",
"isbn": "9780132350884",
"num_pages": 464,
"format": "Paperback",
"publisher": "Prentice Hall",
"language": "English",
"cover_url": "https://i.gr-assets.com/images/S/.../3735293.jpg",
"book_id": "3735293",
"url": "https://www.goodreads.com/book/show/3735293"
}goodreads [OPTIONS] BOOK [BOOK ...]
| Argument / Flag | Default | Description |
|---|---|---|
BOOK |
— | One or more Goodreads book ids (3735293) or book-page URLs |
--format, -f |
json |
Output format: csv, json, jsonl |
--output, -o FILE |
stdout | Write to file |
--count |
off | Print only how many books were scraped |
A book that fails to scrape logs an error to stderr and is skipped; the rest still come through.
from goodreads_scraper import scrape, parse_book_id, parse_html
# Scrape by id or URL
book = scrape(3735293)
print(book["title"], book["rating"], book["ratings_count"])
book = scrape("https://www.goodreads.com/book/show/3735293-clean-code")
# Resolve an id from a URL without hitting the network
parse_book_id("https://www.goodreads.com/book/show/3735293-clean-code") # -> "3735293"
# Parse a page you already have in hand (no network)
record = parse_html(open("book.html").read())Goodreads renders each book detail page server-side and embeds two machine-readable data sources directly in the HTML — neither needs JavaScript:
- JSON-LD (
<script type="application/ld+json">, a schema.orgBook) — the most stable source for title, author, average rating, and ratings/reviews counts. Used first. __NEXT_DATA__(<script id="__NEXT_DATA__">, a Next.js/Apollo cache) — fills in the description (HTML stripped), ISBN-13, page count, format, publisher, and cover image.
The scraper fetches https://www.goodreads.com/book/show/{book_id} with a full
Chrome request fingerprint, merges both sources, and returns clean fields.
This is an honest, lightweight tool. Read this before relying on it:
- Detail pages only. It scrapes a book page when you already know its id or URL. It does not search or discover books — Goodreads' search and listing endpoints sit behind AWS WAF and reject scripted requests, so you need to get book ids from elsewhere (a browser, your existing library export, the Goodreads URL bar, etc.).
- No official API. Goodreads shut down its public API in late 2020, so this reads the public page HTML instead.
- Page-structure dependent. If Goodreads changes its JSON-LD or
__NEXT_DATA__layout, parsing may need an update. Some older/sparse book pages omit fields (e.g. publisher or cover); those come back asnull. - Be polite. Throttle your requests; don't hammer the site.
💡 Don't want to write code or hunt for book IDs? Thunderbit is an AI web scraper Chrome extension that scrapes Goodreads (and any site) in 2 clicks, no code.
git clone https://github.com/thunderbit-operations/goodreads-scraper.git
cd goodreads-scraper
pip install -e ".[dev]"
pytestTests run fully offline against a saved book-page fixture.
- redfin-scraper — Scrape Redfin real-estate listings
- ebay-scraper — Scrape eBay product listings
- sitejabber-scraper — Scrape Sitejabber business reviews
- craigslist-scraper — Scrape Craigslist listings
Scrape responsibly and at a polite rate. Only collect publicly available data, and review Goodreads' Terms of Service and your local regulations before use.
MIT — Built by Thunderbit, AI-powered web scraper & data extraction tools.