Web Data Collection Tools

Open-source CLI tools for collecting data from web pages — by Thunderbit

Tools

Tool	Description	CLI Command
email-harvest	Extract email addresses from text, files, and web pages	`emx`
phone-harvest	Extract and identify phone numbers from text, files, and web pages	`phonex`
image-harvest	Discover and batch download images from web pages	`imgx`
product-harvest	Extract structured product data using Schema.org	`productx`

Common Features

Zero config — works out of the box, no API keys needed
CLI-first — pipe-friendly commands with Python API support
Recursive crawling — follow links within a domain up to N levels deep
Multiple output formats — plain text, CSV, JSON, JSONL
Polite crawling — rate limiting and robots.txt compliance built-in
Proxy support — route requests through HTTP/HTTPS proxies
Minimal dependencies — Python 3.8+, mostly just requests + beautifulsoup4

Quick Start

# Install any tool
pip install email-harvest
pip install phone-harvest
pip install image-harvest
pip install product-harvest

# Extract emails from a web page
emx https://example.com/contact

# Extract phone numbers
phonex https://example.com/contact

# Download all images
imgx https://example.com

# Extract product data
productx https://example.com/product

License

MIT

Built by Thunderbit — AI-powered web scraper and data extraction tools.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
email-harvest		email-harvest
image-harvest		image-harvest
phone-harvest		phone-harvest
product-harvest		product-harvest
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Data Collection Tools

Tools

Common Features

Quick Start

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Data Collection Tools

Tools

Common Features

Quick Start

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages