Skip to content

docs: Add NLP documentation page for keywords and summary extraction#716

Draft
Copilot wants to merge 2 commits intomasterfrom
copilot/add-summary-and-keywords-docs
Draft

docs: Add NLP documentation page for keywords and summary extraction#716
Copilot wants to merge 2 commits intomasterfrom
copilot/add-summary-and-keywords-docs

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 13, 2026

Users had no documentation explaining how article.keywords and article.summary are populated — what algorithms drive them, what the scores mean, or how to tune the output.

Proposed Changes

  • New page docs/user_guide/nlp_keywords_summary.rst covering:
    • Keyword extraction: tokenisation → stopword removal → frequency counting → score formula (count × 1.5 / N + 1) → ranking; title keywords are merged and averaged
    • Extractive summarisation: NLTK Punkt sentence splitting → top-10 keyword extraction → four-feature weighted scoring (title similarity ×1.5, keyword frequency via SBS+DBS ×2.0, sentence length ×1.0, position ×1.0) → top-N sentences re-ordered by original position
    • SBS / DBS score formulas explained with LaTeX math blocks
    • Configuration via Config: max_keywords, max_summary_sent, max_summary
    • Language support note with link to the languages page
  • docs/index.rst: page added to toctree between examples and advanced

How did you test it?

Built the Sphinx HTML docs locally (make html) — build succeeds with no new warnings beyond the pre-existing :any: reference warnings present across the whole docs tree.

Notes for the reviewer

The scoring formula divides by 4 (number of features), matching the actual implementation in nlp.py line 104 — not by the sum of weights (5.5), which would be a true weighted average. The description is worded accordingly.

Checklist

  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • docs.python.org
    • Triggering command: /home/REDACTED/.local/bin/sphinx-build sphinx-build -M html . _build (dns block)
    • Triggering command: /home/REDACTED/.local/bin/sphinx-build sphinx-build -M html . _build t (dns block)
  • docs.readthedocs.io
    • Triggering command: /home/REDACTED/.local/bin/sphinx-build sphinx-build -M html . _build (dns block)
    • Triggering command: /home/REDACTED/.local/bin/sphinx-build sphinx-build -M html . _build t (dns block)
  • www.sphinx-doc.org
    • Triggering command: /home/REDACTED/.local/bin/sphinx-build sphinx-build -M html . _build (dns block)
    • Triggering command: /home/REDACTED/.local/bin/sphinx-build sphinx-build -M html . _build t (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>what are the mechnisms of "keywords" and "summary"? any documents about them?</issue_title>
<issue_description> Issue by myrainbowandsky
Wed Aug 12 12:07:12 2020
Originally opened as codelucas/newspaper#833


</issue_description>

<agent_instructions>Create a documentation subpage descibing how summary and keywords are extracted</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@AndyTheFactory **Comment by [mbahmani](https://github.com/mbahmani)** _Tue Apr 20 16:17:44 2021_

This is my question too. also, how we can evaluate the result for the summary and keywords?


💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

Co-authored-by: AndyTheFactory <863810+AndyTheFactory@users.noreply.github.com>
Copilot AI changed the title [WIP] Add documentation for summary and keywords extraction docs: Add NLP documentation page for keywords and summary extraction Mar 13, 2026
Copilot AI requested a review from AndyTheFactory March 13, 2026 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

what are the mechnisms of "keywords" and "summary"? any documents about them?

2 participants