Skip to content

piotrekbednus/compliants_wiki_WebExtractor

Repository files navigation

WebExtractor

WebExtractor is a lightweight Python tool for extracting email addresses from websites. It works well for simple lead collection and business contact discovery when you already have a list of company pages to check.

Features

  • Extracts email addresses from a target page
  • Saves the extracted information for further analysis.
  • Clean and organized output
  • Works on Linux, Termux, and macOS
  • Simple CLI interface
  • Lightweight and fast

Compatibility

  • Linux (Debian, RHEL, Arch, etc.)
  • Termux (Android)
  • macOS

The tool automatically detects the environment and installs itself accordingly.

Disclaimer

This tool is intended for educational and ethical OSINT purposes only. Use it only on websites you own or have explicit permission to analyze. The developer is not responsible for any misuse of this tool.

Installation

Step 1: Clone the Repository

git clone https://github.com/s-r-e-e-r-a-j/WebExtractor.git

step2: Navigate to the WebExtractor directory

cd WebExtractor

Step 3: Install Dependencies

pip3 install -r requirements.txt

Note for Kali, Parrot, Ubuntu 23.04+ users:

If you see an error like:

error: externally-managed-environment

then use:

pip3 install -r requirements.txt --break-system-packages

Step 4: Run Installer (Linux or Termux)

python3 install.py

Then type y for install

Step 5: Run the Tool

webextractor

macOS Quick Start

On macOS, the simplest way is to run the script directly inside a virtual environment:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 webextractor.py

You do not need to use install.py on macOS.

Usage

Just run the tool:

webextractor
  1. Provide a valid URL when prompted.

  2. It will display the extracted email addresses.

  3. Optionally save the extracted data to a folder.

Batch Mode and CSV Output

You can also run the scraper on a list of URLs and save everything to one CSV file.

Create a text file with one URL per line, for example urls.txt:

https://example.com
https://openai.com

Then run:

python3 webextractor.py --urls-file urls.txt --output-csv results.csv

Examples:

python3 webextractor.py --url https://example.com --output-csv result.csv
python3 webextractor.py --urls-file urls.txt --output-csv emails.csv

You can also use an input CSV file with company and url columns, for example input.csv:

company,url
Acme Vets,https://example.com
North Clinic,https://example.org

Then run:

python3 webextractor.py --input-csv input.csv --output-csv outreach_emails.csv --workers 2

The output CSV will contain:

company,url,status,email_1,email_2,email_3,error

Uninstallation

Run the install.py script

python3 install.py

Then type n for uninstall

License

This project is licensed under the MIT License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages