Skip to content

fix: Add PyInstaller hook to bundle newspaper resource files#719

Draft
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-newspaper4k-pyinstaller-issue
Draft

fix: Add PyInstaller hook to bundle newspaper resource files#719
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-newspaper4k-pyinstaller-issue

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 24, 2026

When frozen with PyInstaller, the newspaper/resources/ directory (stopwords, user-agents, source lists) is not automatically included, causing a FileNotFoundError at runtime when StopWords tries to open e.g. stopwords-en.txt.

Proposed Changes:

  • newspaper/_pyinstaller/__init__.py — Exposes get_hook_dirs() so PyInstaller discovers the hook automatically via the pyinstaller40 entry point
  • newspaper/_pyinstaller/hook-newspaper.py — Calls collect_data_files("newspaper") to bundle all files under newspaper/resources/ into the frozen app
  • pyproject.toml — Registers the entry point:
    [project.entry-points."pyinstaller40"]
    hook-dirs = "newspaper._pyinstaller:get_hook_dirs"
    This is the standard mechanism (supported since PyInstaller 4.0); no user-side .spec changes required.

How did you test it?

Added tests/unit/test_pyinstaller_hook.py verifying:

  • get_hook_dirs() returns a list with one entry
  • The returned directory exists on disk
  • hook-newspaper.py is present in that directory

Notes for the reviewer

The hook uses collect_data_files("newspaper") which recursively collects all non-.py files from the installed package — this covers both resources/text/stopwords-*.txt and resources/misc/*.txt. No changes to the runtime resource-loading logic (settings.py / text.py) are needed; PyInstaller places bundled data files such that Path(__file__).resolve().parent still resolves correctly inside the frozen binary.

Checklist

  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: /usr/bin/python python -m pytest tests/unit/ -v -x -q (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>newspaper4k not working with PyInstaller</issue_title>
<issue_description>I have sample script:

from newspaper import Article

url = 'https://www.root.cz/zpravicky/freebsd-ma-plne-reprodukovatelne-buildy/'
article = Article(url)

article.download()
article.parse()

print(article.text)

The script works without any problems. When trying to package it with pyinstaller and running output bundled binary, I get this error:


    ^
  File "newspaper\article.py", line 485, in parse
  File "newspaper\extractors\content_extractor.py", line 167, in calculate_best_node
  File "newspaper\extractors\articlebody_extractor.py", line 41, in parse
  File "newspaper\text.py", line 122, in __init__
FileNotFoundError: Stopwords file for language en not found! Make sure that the language is supported (see `

newspaper.languages()`)

Do you as newspaper4k have any quick tips to deal with this error? Many thanks.</issue_description>

<agent_instructions>Check out why the error occurs</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

Copilot AI changed the title [WIP] Fix newspaper4k not working with PyInstaller fix: Add PyInstaller hook to bundle newspaper resource files Mar 24, 2026
Copilot AI requested a review from AndyTheFactory March 24, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

newspaper4k not working with PyInstaller

2 participants