Skip to content

Fix device state restoration in NamedEntityExtractor and add release …#11259

Open
ritikraj2425 wants to merge 5 commits intodeepset-ai:mainfrom
ritikraj2425:fix-ner-device-state
Open

Fix device state restoration in NamedEntityExtractor and add release …#11259
ritikraj2425 wants to merge 5 commits intodeepset-ai:mainfrom
ritikraj2425:fix-ner-device-state

Conversation

@ritikraj2425
Copy link
Copy Markdown
Contributor

Related Issues

Proposed Changes:

Previously, the NamedEntityExtractor (spaCy backend) would unconditionally call spacy.require_cpu() after execution. Since spaCy and Thinc use a global state for device configuration, this would override any pre-existing user configuration (e.g., if the user was using a specific GPU for other parts of their application).

This PR:

  • Captures the current Thinc Ops state at the start of the _select_device context manager.
  • Restores the original Ops state in the finally block instead of forcing a reset to CPU.
  • Removes the outdated TODO regarding device restoration.

How did you test it?

  • Regression Test: Added TestNamedEntityExtractorDeviceRestoration to test/components/extractors/test_named_entity_extractor.py. This test sets a custom attribute on the global Ops object and verifies that it is preserved after the component's internal device switching logic.
  • Unit Tests: Ran the existing NamedEntityExtractor test suite (11 tests passed).
  • Manual Verification: Verified that custom NumpyOps objects are not replaced by fresh instances after component execution.

Checklist

@ritikraj2425 ritikraj2425 requested a review from a team as a code owner May 5, 2026 10:10
@ritikraj2425 ritikraj2425 requested review from sjrl and removed request for a team May 5, 2026 10:10
@vercel
Copy link
Copy Markdown

vercel Bot commented May 5, 2026

@ritikraj2425 is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@sjrl sjrl self-assigned this May 8, 2026
@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented May 8, 2026

Hey @ritikraj2425 is this an issue you were running into with scripts you were running yourself? Also please look at the failing CI and fix the issues there.

@github-actions github-actions Bot added the type:documentation Improvements on the docs label May 8, 2026
@ritikraj2425
Copy link
Copy Markdown
Contributor Author

Hi @sjrl!
Yes, I was running some custom test scripts while exploring how Haystack components manage hardware resources. I noticed that NamedEntityExtractor calls a global spacy.require_cpu() in its finally block, which could unintentionally wipe out device settings for other components sharing the same GPU. I implemented the Thinc Ops state restoration to prevent this global side-effect!

I also pushed a fix for the CI. I had mistakenly placed my new device restoration test in the standard unit tests (where spaCy isn't installed in CI), causing an ImportError during collection. I have now moved it to the e2e pipeline tests where the spaCy backend is properly tested.

Let me know if there's anything else I should adjust!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] NamedEntityExtractor (spaCy) fails to restore device state after execution

2 participants