An automated daily content curation system that aggregates articles from Hacker News, Reddit, and RSS feeds, uses deterministic topic scoring to select the best picks across six interest categories, and serves them through a personal web UI with feedback-driven learning.
Zero external AI dependencies — runs entirely locally in ~1 second per curation pass.
- Fetch — Pulls articles from Hacker News, configured subreddits, and RSS feeds in parallel
- Deduplicate — Filters out URLs seen in the past 30 days and scores articles by topic relevance
- Curate — Assigns each article to its best-matching category via weighted keyword scoring, then ranks candidates by topic score + source quality + recency + engagement
- Learn — Aggregates feedback from all historical remote reports, extracts keywords from voted articles, nudges
feedback-weights.jsonby ±0.1 per keyword. Weights decay gradually so old signals don't dominate forever. - Publish — Renders a Markdown report, uploads it to the hosted web UI via SFTP
Runs automatically at 3 AM daily via macOS launchd.
- AI & LLMs
- Software Development
- Geopolitics & World News
- Robotics, Electronics & 3D Printing
- Science & Technology
- Wildcard (one article outside all categories — either unexpected or high-quality but uncategorised)
Each article is rated on a per-category basis:
| Signal | How it works |
|---|---|
| Keyword matches | Multi-word keywords use substring match (bonus +4); single-word tokens: title match = +3, snippet match = +1 |
| Feedback weights | Your past 👍/👎 votes add ±0.1 per keyword per vote, clamped to [-1.0, 1.0] |
| Source quality | BBC, NYT, Reuters +0.5; Nature +0.4; Hacker News +0.2. Configurable in curator.ts. |
| Recency | Today's articles get +0.3, yesterday +0.1, older articles +0.0 |
| Engagement | HN upvotes divided by 500 (capped at +1.0); Reddit score similarly scaled |
Final score = (topic_score × 2) + source_bonus + recency + engagement
The top 5 articles per category by total score are selected. The wildcard is the highest-scoring article that wasn't assigned to any category.
| Layer | Technology |
|---|---|
| Runtime | Bun |
| Language | TypeScript (strict) |
| Curation | Deterministic scoring (no AI/L LM) |
| Feed parsing | fast-xml-parser |
| Remote sync | ssh2-sftp-client (IONOS SFTP) |
| Web UI | PHP + Vanilla JS (hosted on IONOS) |
| Auth | Google OAuth 2.0 (only allows access from configured email address) |
| Scheduler | macOS launchd |
- Bun runtime (for TypeScript execution)
- IONOS SFTP account (for hosting and report sync)
- Google OAuth credentials (for web UI authentication — only allows access from configured email address)
- macOS (for launchd automation)
No AI model or external API dependency required.
├── src/
│ ├── index.ts # Pipeline orchestrator
│ ├── curator.ts # Deterministic curation: scoring, ranking, wildcard selection
│ ├── sftp.ts # SFTP client: download reports/blacklist, upload today, fetch historical reports
│ ├── feedback.ts # Parse feedback markers from all reports (not just yesterday), update weights
│ ├── prefilter.ts # Topic scoring, deduplication, best-topic assignment
│ ├── report.ts # Markdown report renderer
│ ├── seen.ts # 30-day rolling URL deduplication store
│ ├── deploy.ts # One-time deployment of web UI to IONOS
│ ├── server.ts # Local dev server (port 3001)
│ ├── log.ts # Colourised logging helper
│ └── fetchers/
│ ├── hackernews.ts # HN top stories API
│ ├── reddit.ts # Reddit hot posts (public JSON)
│ └── rss.ts # Generic RSS/Atom parser
│
├── config/
│ ├── interests.yaml # Topics, keywords, subreddits, RSS feeds
│ ├── feedback-weights.json # Learned keyword weights (auto-updated)
│ └── seen-urls.json # 30-day dedup store (auto-updated)
│
├── public/
│ ├── index.php # Report viewer SPA
│ ├── api.php # REST API (reports, voting, settings, deletion)
│ ├── auth.php # Google OAuth handler
│ └── .htaccess # IONOS routing config
│
├── scripts/
│ ├── install-launchd.sh # Install macOS launchd 3 AM job + persistent wake schedule
│ ├── run.sh # Wrapper: loads .env, sets PATH, runs pipeline
│ └── monitor.sh # Colourised live log viewer
│
├── reports/ # Generated Markdown reports (YYYY-MM-DD.md)
└── logs/ # Execution logs (dailyreport.log, dailyreport.err)
curl -fsSL https://bun.sh/install | bashbun install# IONOS SFTP
FTP_HOST=your-sftp-host
FTP_USER=your-username
FTP_PASS=your-password
FTP_REMOTE_REPORTS_DIR=/path/to/remote/reports
TARGET_DIR=/path/to/remote/web-root
# Google OAuth (for web UI login)
GOOGLE_CLIENT_ID=your-client-id
GOOGLE_CLIENT_SECRET=your-client-secret
ALLOWED_EMAIL=your@email.com
REDIRECT_URI=https://your-site.com/projects/dailyreport/auth.phpEdit config/interests.yaml to set your topics, keywords, subreddits, and RSS feeds.
bun run deployUploads index.php, api.php, auth.php, .htaccess, and a generated config.php (with OAuth secrets) to IONOS.
bun run generateFetches articles, curates deterministically using weighted scoring, writes reports/YYYY-MM-DD.md, and uploads it to IONOS.
bun run dry-runFetches and pre-filters articles without running curation or uploading. Useful for testing source config.
bun run serveStarts a dev server at http://localhost:3001 with the report viewer (no OAuth required).
bash scripts/install-launchd.shInstalls a launchd job that runs the pipeline at 3:00 AM every day, and sets a persistent daily wake at 2:55 AM via pmset repeat so the machine is awake in time. The wake schedule survives reboots and macOS updates.
Note:
install-launchd.shrequires sudo to set the wake schedule. If it can't prompt for a password (e.g. first run), set it manually:sudo pmset repeat wake MTWRFSU 02:55:00
View logs (colourised):
bash scripts/monitor.shView full log history:
bash scripts/monitor.sh --allRaw logs:
tail -f logs/dailyreport.logUninstall:
launchctl unload ~/Library/LaunchAgents/com.dailyreport.generate.plist
rm ~/Library/LaunchAgents/com.dailyreport.generate.plistVote on articles in the web UI with 👍 / 👎. The next morning, the pipeline:
- Downloads yesterday's report (which contains your votes as
+1/-1markers at end of article lines) - Downloads all older reports from the server for additional signal
- Extracts keywords from all voted titles (only words longer than 3 characters)
- Nudges
config/feedback-weights.jsonby ±0.1 per keyword, clamped to[-1.0, 1.0] - Cleans up temporary downloaded reports after aggregation
Over time, the report adapts to surface articles you actually want to read. The weight system is fully transparent — config/feedback-weights.json is plain readable JSON showing exactly what signals have been accumulated.
- Deterministic curation — Each article is scored against every topic using keyword matching, then assigned to its best-matching category. No LLM required.
- Multi-signal ranking — Final rank combines topic score (×2), source quality bonus, recency bonus, and engagement metrics.
- Historical feedback — Unlike systems that only use yesterday's report, this aggregates votes from all historical reports. The feedback system bootstraps immediately on first FTP sync with real voting data.
- 30-day deduplication — A rolling URL window (
seen-urls.json) prevents the same article appearing twice within a month. - SFTP sync model — The pipeline pulls yesterday's report plus all older reports to read votes, generates today's report, then pushes it. Reports older than 48 hours are downloaded into a temporary directory and cleaned up after feedback parsing.
- Keyword weights — Simpler and more inspectable than embedding-based preference learning; the full weight map is readable JSON with each signal traceable back to a specific voted article.
- Deterministic, lightweight — No LLM API calls needed; curation runs locally in under 1 second.