A modern JavaScript utility for fetching, parsing, and exporting data from web pages — built for the khushalkks/web_scraping assignment.
- Fetch HTML using the native
fetchAPI (ornode-fetchfor older Node versions) - Parse DOM with
cheerio(jQuery-like selectors) or vanilla DOM methods - Export scraped data to CSV and JSON formats
- Configurable URL list and CSS selectors via a simple
config.jsfile - Rate limiting and robust error handling to avoid bans
- Modular design — easy to extend with new parsers or output formats
git clone https://github.com/khushalkks/web_scraping.git
cd web_scraping
npm installEdit config.js to define target pages and the CSS selectors for the data you need:
module.exports = {
targets: [
{
url: "https://example.com/articles",
selectors: {
title: "h1.article-title",
author: ".author-name",
date: ".publish-date",
},
},
// Add more targets as needed
],
outputDir: "output",
rateLimitMs: 2000, // Delay between requests (ms)
};npm run scrapeResults are written to the output/ directory as results.json and results.csv.
A minimal test suite is provided using Jest. Network requests are mocked — no external calls are made during tests.
npm testContributions are welcome! To get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit your changes with clear messages
- Open a Pull Request describing your changes
- Ensure all tests pass and linting is clean:
npm run lint
This project is licensed under the MIT License — see the LICENSE file for details.
Author: Khushal K.
GitHub: @khushalkks
Email: khushal@example.com