ExxonMobil Mozambique Newsroom Scraper

This scraper pulls article data from:

https://corporate.exxonmobil.com/locations/mozambique/mozambique-newsroom

It is built for the date range:

Start: 2017-03-01
End: 2026-12-31

What it saves

output/exxon_mozambique_news_2017_2026.json
output/exxon_mozambique_news_2017_2026.csv
output/exxon_mozambique_keyword_hits.json
output/exxon_mozambique_keyword_paragraph_hits.json
output/exxon_mozambique_keyword_paragraph_hits.csv

Each record includes:

article title
article URL
published date
article type
read time
location tag
summary bullets
matched keywords
keyword hit count
keyword snippets
paragraph-level keyword hits with article link
extracted body text
full raw page text in the JSON output

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
playwright install chromium

Run

python scrape_exxon_mozambique.py

To override the default scan terms:

python scrape_exxon_mozambique.py --keywords conflict "force majeure" crisis

Notes

The newsroom uses a Load More interface, so the script uses Playwright instead of plain requests for URL discovery.
Article extraction is heuristic-based. If ExxonMobil changes the HTML structure, selectors may need a small update.
The script filters by article publish date after fetching each page.
A separate output/exxon_mozambique_keyword_hits.json file is written with only the articles that matched your scan terms.
Paragraph-level matches are also exported so you can review the exact paragraph containing each keyword alongside the article URL.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
output		output
.codex		.codex
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrape_exxon_mozambique.py		scrape_exxon_mozambique.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExxonMobil Mozambique Newsroom Scraper

What it saves

Setup

Run

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ExxonMobil Mozambique Newsroom Scraper

What it saves

Setup

Run

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages