Goodreads Scraper

A fast CLI + Python tool to scrape Goodreads book details — title, author, rating, ratings/reviews counts, description, ISBN, page count, format, cover image — to CSV or JSON. No API key, no JS rendering.

Features

Clean book metadata — title, author(s), average rating, ratings count, reviews count, description, ISBN, number of pages, format, publisher, language, and cover image URL
Two structured sources, no browser — reads Goodreads' own schema.org JSON-LD and Next.js __NEXT_DATA__ blobs that ship in the page HTML, so there's no Selenium/Playwright and no fragile CSS selectors
Takes an id or a URL — pass 3735293 or https://www.goodreads.com/book/show/3735293-clean-code
Multiple books at once — list several ids/URLs on one command
CSV / JSON / JSONL output — ready for Excel, pandas, or a database
Python API — use it as a library
Minimal dependencies — just requests + beautifulsoup4

Installation

pip install goodreads-scraper

Requires Python 3.10+.

Quick Start

# Scrape one book by id (prints JSON)
goodreads 3735293

# By full URL, to CSV
goodreads "https://www.goodreads.com/book/show/3735293-clean-code" -f csv -o clean-code.csv

# Several books at once, to JSONL
goodreads 3735293 5907 11870085 -f jsonl -o books.jsonl

Example JSON record (goodreads 3735293):

{
  "title": "Clean Code: A Handbook of Agile Software Craftsmanship",
  "author": "Robert C. Martin",
  "authors": ["Robert C. Martin"],
  "rating": 4.19,
  "ratings_count": 28173,
  "reviews_count": 1502,
  "description": "Even good software developers leave a trail of...",
  "isbn": "9780132350884",
  "num_pages": 464,
  "format": "Paperback",
  "publisher": "Prentice Hall",
  "language": "English",
  "cover_url": "https://i.gr-assets.com/images/S/.../3735293.jpg",
  "book_id": "3735293",
  "url": "https://www.goodreads.com/book/show/3735293"
}

CLI Reference

goodreads [OPTIONS] BOOK [BOOK ...]

Argument / Flag	Default	Description
`BOOK`	—	One or more Goodreads book ids (`3735293`) or book-page URLs
`--format, -f`	`json`	Output format: `csv`, `json`, `jsonl`
`--output, -o FILE`	stdout	Write to file
`--count`	off	Print only how many books were scraped

A book that fails to scrape logs an error to stderr and is skipped; the rest still come through.

Python API

from goodreads_scraper import scrape, parse_book_id, parse_html

# Scrape by id or URL
book = scrape(3735293)
print(book["title"], book["rating"], book["ratings_count"])

book = scrape("https://www.goodreads.com/book/show/3735293-clean-code")

# Resolve an id from a URL without hitting the network
parse_book_id("https://www.goodreads.com/book/show/3735293-clean-code")  # -> "3735293"

# Parse a page you already have in hand (no network)
record = parse_html(open("book.html").read())

How it works

Goodreads renders each book detail page server-side and embeds two machine-readable data sources directly in the HTML — neither needs JavaScript:

JSON-LD (<script type="application/ld+json">, a schema.org Book) — the most stable source for title, author, average rating, and ratings/reviews counts. Used first.
__NEXT_DATA__ (<script id="__NEXT_DATA__">, a Next.js/Apollo cache) — fills in the description (HTML stripped), ISBN-13, page count, format, publisher, and cover image.

The scraper fetches https://www.goodreads.com/book/show/{book_id} with a full Chrome request fingerprint, merges both sources, and returns clean fields.

Limitations

This is an honest, lightweight tool. Read this before relying on it:

Detail pages only. It scrapes a book page when you already know its id or URL. It does not search or discover books — Goodreads' search and listing endpoints sit behind AWS WAF and reject scripted requests, so you need to get book ids from elsewhere (a browser, your existing library export, the Goodreads URL bar, etc.).
No official API. Goodreads shut down its public API in late 2020, so this reads the public page HTML instead.
Page-structure dependent. If Goodreads changes its JSON-LD or __NEXT_DATA__ layout, parsing may need an update. Some older/sparse book pages omit fields (e.g. publisher or cover); those come back as null.
Be polite. Throttle your requests; don't hammer the site.

💡 Don't want to write code or hunt for book IDs? Thunderbit is an AI web scraper Chrome extension that scrapes Goodreads (and any site) in 2 clicks, no code.

Development

git clone https://github.com/thunderbit-operations/goodreads-scraper.git
cd goodreads-scraper
pip install -e ".[dev]"
pytest

Tests run fully offline against a saved book-page fixture.

Related tools

redfin-scraper — Scrape Redfin real-estate listings
ebay-scraper — Scrape eBay product listings
sitejabber-scraper — Scrape Sitejabber business reviews
craigslist-scraper — Scrape Craigslist listings

Legal

Scrape responsibly and at a polite rate. Only collect publicly available data, and review Goodreads' Terms of Service and your local regulations before use.

License

MIT — Built by Thunderbit, AI-powered web scraper & data extraction tools.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
goodreads_scraper		goodreads_scraper
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goodreads Scraper

Features

Installation

Quick Start

CLI Reference

Python API

How it works

Limitations

Development

Related tools

Legal

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Goodreads Scraper

Features

Installation

Quick Start

CLI Reference

Python API

How it works

Limitations

Development

Related tools

Legal

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages