Skip to content

Fix Docker container crash: pin transformers dependency and replace deprecated encode_plus#174

Merged
ajamous merged 1 commit into
feature/opentextshield-platform-v2from
claude/fix-docker-dependencies-VlFFc
Apr 8, 2026
Merged

Fix Docker container crash: pin transformers dependency and replace deprecated encode_plus#174
ajamous merged 1 commit into
feature/opentextshield-platform-v2from
claude/fix-docker-dependencies-VlFFc

Conversation

@ajamous
Copy link
Copy Markdown
Collaborator

@ajamous ajamous commented Apr 8, 2026

The Docker builds using requirements-security.txt had transformers>=4.53.0 (unpinned
upper bound), which pulled a newer incompatible version where BertTokenizer.encode_plus
was removed. This caused all classification requests to fail with:
"BertTokenizer has no attribute encode_plus"

Changes:

  • Pin transformers==4.53.0 in requirements-security.txt (matches requirements.txt)
  • Add upper bounds to torch, huggingface-hub, safetensors, numpy, peft to prevent
    similar untested major version upgrades from breaking Docker builds
  • Replace all tokenizer.encode_plus() calls with tokenizer() across the codebase
    (the call method is the modern, forward-compatible API that accepts identical
    parameters)

Affected files: prediction_service.py, test_sms.py, stressTest_500.py,
stressTest_1000_mlx.py, train_ots.py, train_ots_improved.py, compare_models.py,
train_incremental.py, train_enhanced_multilingual.py

…eprecated encode_plus

The Docker builds using requirements-security.txt had transformers>=4.53.0 (unpinned
upper bound), which pulled a newer incompatible version where BertTokenizer.encode_plus
was removed. This caused all classification requests to fail with:
  "BertTokenizer has no attribute encode_plus"

Changes:
- Pin transformers==4.53.0 in requirements-security.txt (matches requirements.txt)
- Add upper bounds to torch, huggingface-hub, safetensors, numpy, peft to prevent
  similar untested major version upgrades from breaking Docker builds
- Replace all tokenizer.encode_plus() calls with tokenizer() across the codebase
  (the __call__ method is the modern, forward-compatible API that accepts identical
  parameters)

Affected files: prediction_service.py, test_sms.py, stressTest_500.py,
stressTest_1000_mlx.py, train_ots.py, train_ots_improved.py, compare_models.py,
train_incremental.py, train_enhanced_multilingual.py

https://claude.ai/code/session_018ERyTcyaXRpheHinYgiypJ
@ajamous ajamous merged commit e8c246c into feature/opentextshield-platform-v2 Apr 8, 2026
3 checks passed
@claude
Copy link
Copy Markdown

claude Bot commented Apr 8, 2026

PR Review: Fix Docker container crash — transformers dependency & encode_plus deprecation

Overall: Solid, well-scoped fix for a real production breakage. The root cause is correctly identified and addressed consistently across the codebase.


What's done well

  • Correct API migration: Replacing tokenizer.encode_plus() with tokenizer() (the __call__ method) is the right move. The __call__ interface accepts identical parameters and is the canonical, forward-compatible HuggingFace API. All 9 files updated consistently — confirmed stressTest_20k_mlx_api.py is HTTP-based and doesn't tokenize locally, so it's correctly excluded.
  • Consistent change: Every occurrence is updated with the same arguments, so there is no behavioural change — purely a call-site migration.
  • Upper bounds added: Proactively adding upper bounds to torch, huggingface-hub, safetensors, numpy, and peft is good practice to prevent the same class of breakage from happening again.

Minor concerns worth tracking

1. Hard-pinning transformers==4.53.0 in a security requirements file

transformers==4.53.0   # requirements-security.txt

This prevents receiving security patches in any future 4.53.x release. Since this file is explicitly for security CVE mitigations, a range like transformers>=4.53.0,<4.54 (or <5.0 for more flexibility) would allow patch-level security fixes while still blocking the breaking major/minor upgrade that caused this crash. Worth reconsidering before the next release cycle.

2. huggingface-hub<0.36 is very tight

huggingface-hub>=0.35.3,<0.36 locks to the 0.35.x series only. If a security patch lands in 0.36.x, you will need a manual update to pick it up. Consider <1.0 if the API surface you rely on is stable within the major version.

3. Root cause: requirements.txt vs requirements-security.txt drift

The underlying issue was that requirements.txt already pinned transformers==4.53.0 but requirements-security.txt had >=4.53.0. Consider adding a CI lint step (e.g. a simple script diffing the pinned versions between the two files) to catch this drift early and prevent recurrence.


Test coverage

Existing test files (test_sms.py, stressTest_500.py, stressTest_1000_mlx.py) are updated in sync, which is correct. No new tests are strictly needed here since this is an API compatibility migration with identical semantics — but a smoke test in CI that actually loads the tokenizer and calls it would catch this class of issue before it reaches Docker builds.


Summary

Area Assessment
Bug fix correctness Correct
API migration Correct and complete
Dependency pinning Good, minor refinements recommended
Test coverage Adequate for the change
Security posture Minor: hard pin may delay future security patches

Good fix overall. The three items above are suggestions for follow-up, not blockers.

Reviewed by Claude (claude-sonnet-4-6)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants