Web Scraping Utility

A modern JavaScript utility for fetching, parsing, and exporting data from web pages — built for the khushalkks/web_scraping assignment.

Features

Fetch HTML using the native fetch API (or node-fetch for older Node versions)
Parse DOM with cheerio (jQuery-like selectors) or vanilla DOM methods
Export scraped data to CSV and JSON formats
Configurable URL list and CSS selectors via a simple config.js file
Rate limiting and robust error handling to avoid bans
Modular design — easy to extend with new parsers or output formats

Prerequisites

Node.js v14 or higher
npm v6.14 or higher
Internet access for fetching target pages

Installation

git clone https://github.com/khushalkks/web_scraping.git
cd web_scraping
npm install

Configuration

Edit config.js to define target pages and the CSS selectors for the data you need:

module.exports = {
  targets: [
    {
      url: "https://example.com/articles",
      selectors: {
        title: "h1.article-title",
        author: ".author-name",
        date: ".publish-date",
      },
    },
    // Add more targets as needed
  ],
  outputDir: "output",
  rateLimitMs: 2000, // Delay between requests (ms)
};

Usage

npm run scrape

Results are written to the output/ directory as results.json and results.csv.

Testing

A minimal test suite is provided using Jest. Network requests are mocked — no external calls are made during tests.

npm test

Contributing

Contributions are welcome! To get started:

Fork the repository
Create a feature branch: git checkout -b feature/your-feature
Commit your changes with clear messages
Open a Pull Request describing your changes
Ensure all tests pass and linting is clean: npm run lint

License

This project is licensed under the MIT License — see the LICENSE file for details.

Contact

Author: Khushal K.
GitHub: @khushalkks
Email: khushal@example.com

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
APPROACH.md		APPROACH.md
README.md		README.md
companies.csv		companies.csv
requirements.txt		requirements.txt
scrape_ambitionbox.py		scrape_ambitionbox.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Utility

Table of Contents

Features

Prerequisites

Installation

Configuration

Usage

Testing

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Utility

Table of Contents

Features

Prerequisites

Installation

Configuration

Usage

Testing

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages