A simple GitHub Actions RSS/HTML feed aggregator.
It runs on a schedule, collects new items, writes them to aggregated_feed.xml, tracks already used links in processed_links.txt, and publishes the result with GitHub Pages.
hitem
Use this when you want to turn one or more RSS/HTML sources into a single feed that can be used by:
- RSS readers
- Microsoft Teams RSS connectors
- Power Automate flows
- Feedly or similar feed tools
- Any tool that can poll an RSS/XML feed
Recommended: Use
rss_aggregator.pyas the default option.
html_aggregator.pyis a fallback parser for sources that do not provide a usable RSS feed.
The generated feed URL will look like this:
https://<username>.github.io/<repo-name>/aggregated_feed.xml
Example:
https://hitem.github.io/rss-aggregator/aggregated_feed.xml
| File | Purpose |
|---|---|
rss_aggregator.py |
Aggregates RSS feeds |
html_aggregator.py |
Aggregates HTML sources |
aggregated_feed.xml |
The generated RSS/XML feed |
processed_links.txt |
Tracks links that have already been processed |
.github/workflows/rss_aggregator.yml |
Runs the aggregator and deploys GitHub Pages |
requirements.txt |
Python dependencies |
Create a new public GitHub repository, for example:
rss-aggregator
Upload the project files to the repository.
Go to:
Repository -> Settings -> Pages
Set:
Build and deployment -> Source -> GitHub Actions
This lets the workflow deploy the generated feed to GitHub Pages.
Your Pages site will be:
https://<username>.github.io/<repo-name>/
Your feed URL will be:
https://<username>.github.io/<repo-name>/aggregated_feed.xml
Open either rss_aggregator.py or html_aggregator.py.
Find the link field:
etree.SubElement(channel, "link").text = "https://<username>.github.io/<repo name>/aggregated_feed.xml"Replace it with your real feed URL.
Example:
etree.SubElement(channel, "link").text = "https://hitem.github.io/rss-aggregator/aggregated_feed.xml"Open:
.github/workflows/rss_aggregator.yml
For RSS feeds, use:
- name: Run RSS aggregator script
run: python rss_aggregator.pyFor HTML sources, use:
- name: Run HTML aggregator script
run: python html_aggregator.pyOpen rss_aggregator.py or html_aggregator.py.
Use this when your external tool reads the whole feed each time.
append_mode = FalseGood for tools where the ingestion is triggered elsewhere and reads the full aggregated_feed.xml.
Use this when your external tool checks for newly added RSS items.
append_mode = True
max_age_days = 365Good for Feedly, Teams, Power Automate, or other tools that look for new RSS entries.
max_age_days controls how long items stay in aggregated_feed.xml.
Go to:
Repository -> Actions -> RSS & HTML Aggregator -> Run workflow
After the run completes, open:
https://<username>.github.io/<repo-name>/aggregated_feed.xml
You should see the generated XML feed.
Use this URL:
https://<username>.github.io/<repo-name>/aggregated_feed.xml
The workflow needs permission to commit updated files and deploy GitHub Pages.
The workflow should include:
permissions:
contents: write
pages: write
id-token: writeIf commits or deployment fail, check:
Repository -> Settings -> Actions -> General -> Workflow permissions
Make sure GitHub Actions has write access.
More info:
https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository#setting-the-permissions-of-the-github_token-for-your-repository
Use as few permissions as possible for your project.
Timing has three parts:
| Setting | Where | Purpose |
|---|---|---|
time_threshold |
rss_aggregator.py or html_aggregator.py |
How far back the script looks for items |
| Cron schedule | .github/workflows/rss_aggregator.yml |
How often GitHub Actions runs |
| Ingestion frequency | Your RSS reader / Teams / Power Automate flow | How often the external tool checks the feed |
In the Python script:
time_threshold = datetime.datetime.utcnow() - datetime.timedelta(hours=3)In the workflow:
schedule:
- cron: "37 * * * *"Recommended external ingestion frequency: (if append_mode = false)
1 hour
The collection window should overlap the workflow and ingestion interval so you do not miss items.
Example:
time_threshold: 60 days
Cron job interval: 30 days
Ingestion frequency: 30 days
Note: with append_mode = False, the first run can collect a large window of items.
With append_mode = True, the feed only appends items found inside the configured script window, so it will not dump a large historical backlog on the first run.
If you fork or watch this repository and run the workflow often, GitHub notifications can become noisy.
For your own sanity, adjust your notification settings:
If you run the workflow every hour with append_mode = false, it can get very chatty.

