GitHub - piotrekbednus/compliants_wiki_WebExtractor

WebExtractor

WebExtractor is a lightweight Python tool for extracting email addresses from websites. It works well for simple lead collection and business contact discovery when you already have a list of company pages to check.

Features

Extracts email addresses from a target page
Saves the extracted information for further analysis.
Clean and organized output
Works on Linux, Termux, and macOS
Simple CLI interface
Lightweight and fast

Compatibility

Linux (Debian, RHEL, Arch, etc.)
Termux (Android)
macOS

The tool automatically detects the environment and installs itself accordingly.

Disclaimer

This tool is intended for educational and ethical OSINT purposes only. Use it only on websites you own or have explicit permission to analyze. The developer is not responsible for any misuse of this tool.

Installation

Step 1: Clone the Repository

git clone https://github.com/s-r-e-e-r-a-j/WebExtractor.git

step2: Navigate to the WebExtractor directory

cd WebExtractor

Step 3: Install Dependencies

pip3 install -r requirements.txt

Note for Kali, Parrot, Ubuntu 23.04+ users:

If you see an error like:

error: externally-managed-environment

then use:

pip3 install -r requirements.txt --break-system-packages

Step 4: Run Installer (Linux or Termux)

python3 install.py

Then type y for install

Step 5: Run the Tool

webextractor

macOS Quick Start

On macOS, the simplest way is to run the script directly inside a virtual environment:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 webextractor.py

You do not need to use install.py on macOS.

Usage

Just run the tool:

webextractor

Provide a valid URL when prompted.
It will display the extracted email addresses.
Optionally save the extracted data to a folder.

Batch Mode and CSV Output

You can also run the scraper on a list of URLs and save everything to one CSV file.

Create a text file with one URL per line, for example urls.txt:

https://example.com
https://openai.com

Then run:

python3 webextractor.py --urls-file urls.txt --output-csv results.csv

Examples:

python3 webextractor.py --url https://example.com --output-csv result.csv
python3 webextractor.py --urls-file urls.txt --output-csv emails.csv

You can also use an input CSV file with company and url columns, for example input.csv:

company,url
Acme Vets,https://example.com
North Clinic,https://example.org

Then run:

python3 webextractor.py --input-csv input.csv --output-csv outreach_emails.csv --workers 2

The output CSV will contain:

company,url,status,email_1,email_2,email_3,error

Uninstallation

Run the install.py script

python3 install.py

Then type n for uninstall

License

This project is licensed under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
all_brands.csv		all_brands.csv
casinos.csv		casinos.csv
complaints_api.py		complaints_api.py
exhibitors_miami.csv		exhibitors_miami.csv
install.py		install.py
requirements.txt		requirements.txt
webextractor.py		webextractor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebExtractor

Features

Compatibility

Disclaimer

Installation

macOS Quick Start

Usage

Batch Mode and CSV Output

Uninstallation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WebExtractor

Features

Compatibility

Disclaimer

Installation

macOS Quick Start

Usage

Batch Mode and CSV Output

Uninstallation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages