Skip to content

khushalkks/web_scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping Utility

License: MIT Node.js npm

A modern JavaScript utility for fetching, parsing, and exporting data from web pages — built for the khushalkks/web_scraping assignment.


Table of Contents


Features

  • Fetch HTML using the native fetch API (or node-fetch for older Node versions)
  • Parse DOM with cheerio (jQuery-like selectors) or vanilla DOM methods
  • Export scraped data to CSV and JSON formats
  • Configurable URL list and CSS selectors via a simple config.js file
  • Rate limiting and robust error handling to avoid bans
  • Modular design — easy to extend with new parsers or output formats

Prerequisites

  • Node.js v14 or higher
  • npm v6.14 or higher
  • Internet access for fetching target pages

Installation

git clone https://github.com/khushalkks/web_scraping.git
cd web_scraping
npm install

Configuration

Edit config.js to define target pages and the CSS selectors for the data you need:

module.exports = {
  targets: [
    {
      url: "https://example.com/articles",
      selectors: {
        title: "h1.article-title",
        author: ".author-name",
        date: ".publish-date",
      },
    },
    // Add more targets as needed
  ],
  outputDir: "output",
  rateLimitMs: 2000, // Delay between requests (ms)
};

Usage

npm run scrape

Results are written to the output/ directory as results.json and results.csv.


Testing

A minimal test suite is provided using Jest. Network requests are mocked — no external calls are made during tests.

npm test

Contributing

Contributions are welcome! To get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/your-feature
  3. Commit your changes with clear messages
  4. Open a Pull Request describing your changes
  5. Ensure all tests pass and linting is clean: npm run lint

License

This project is licensed under the MIT License — see the LICENSE file for details.


Contact

Author: Khushal K.
GitHub: @khushalkks
Email: khushal@example.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages