Skip to content

Conversation

@bartdegoede
Copy link
Owner

@bartdegoede bartdegoede commented Feb 9, 2026

The Wikimedia abstract XML dumps have been discontinued, so switch the data source to the wikimedia/wikipedia dataset on Hugging Face. This replaces lxml XML parsing and requests HTTP downloads with a single load_dataset() call to HF 🤗 that handles downloading and caching.

Also modernize the project tooling:

  • Replace requirements.txt with pyproject.toml and uv
  • Add ruff for linting (fixes if/elif bug in index.py)
  • Add pytest with unit tests for the core search logic
  • Add GitHub Actions CI (lint + test across Python 3.10-3.13)

Closes and fixes #10, #11 and #13

The Wikimedia abstract XML dumps have been discontinued, so switch the
data source to the wikimedia/wikipedia dataset on Hugging Face. This
replaces lxml XML parsing and requests HTTP downloads with a single
load_dataset() call that handles downloading and caching.

Also modernize the project tooling:
- Replace requirements.txt with pyproject.toml and uv
- Add ruff for linting (fixes if/elif bug in index.py)
- Add pytest with 17 unit tests for the core search logic
- Add GitHub Actions CI (lint + test across Python 3.10-3.13)
@bartdegoede bartdegoede merged commit aec9a25 into master Feb 9, 2026
5 checks passed
@bartdegoede bartdegoede deleted the modernize branch February 9, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Help

1 participant