Skip to content

OjasPunje/ExxonWebsiteScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExxonMobil Mozambique Newsroom Scraper

This scraper pulls article data from:

https://corporate.exxonmobil.com/locations/mozambique/mozambique-newsroom

It is built for the date range:

  • Start: 2017-03-01
  • End: 2026-12-31

What it saves

  • output/exxon_mozambique_news_2017_2026.json
  • output/exxon_mozambique_news_2017_2026.csv
  • output/exxon_mozambique_keyword_hits.json
  • output/exxon_mozambique_keyword_paragraph_hits.json
  • output/exxon_mozambique_keyword_paragraph_hits.csv

Each record includes:

  • article title
  • article URL
  • published date
  • article type
  • read time
  • location tag
  • summary bullets
  • matched keywords
  • keyword hit count
  • keyword snippets
  • paragraph-level keyword hits with article link
  • extracted body text
  • full raw page text in the JSON output

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
playwright install chromium

Run

python scrape_exxon_mozambique.py

To override the default scan terms:

python scrape_exxon_mozambique.py --keywords conflict "force majeure" crisis

Notes

  • The newsroom uses a Load More interface, so the script uses Playwright instead of plain requests for URL discovery.
  • Article extraction is heuristic-based. If ExxonMobil changes the HTML structure, selectors may need a small update.
  • The script filters by article publish date after fetching each page.
  • A separate output/exxon_mozambique_keyword_hits.json file is written with only the articles that matched your scan terms.
  • Paragraph-level matches are also exported so you can review the exact paragraph containing each keyword alongside the article URL.

About

Code to Scrape Exxons Mozambique News Website for keywords

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages