Fix Docker container crash: pin transformers dependency and replace deprecated encode_plus by ajamous · Pull Request #174 · TelecomsXChangeAPi/OpenTextShield

ajamous · 2026-04-08T18:16:38Z

The Docker builds using requirements-security.txt had transformers>=4.53.0 (unpinned
upper bound), which pulled a newer incompatible version where BertTokenizer.encode_plus
was removed. This caused all classification requests to fail with:
"BertTokenizer has no attribute encode_plus"

Changes:

Pin transformers==4.53.0 in requirements-security.txt (matches requirements.txt)
Add upper bounds to torch, huggingface-hub, safetensors, numpy, peft to prevent
similar untested major version upgrades from breaking Docker builds
Replace all tokenizer.encode_plus() calls with tokenizer() across the codebase
(the call method is the modern, forward-compatible API that accepts identical
parameters)

Affected files: prediction_service.py, test_sms.py, stressTest_500.py,
stressTest_1000_mlx.py, train_ots.py, train_ots_improved.py, compare_models.py,
train_incremental.py, train_enhanced_multilingual.py

…eprecated encode_plus The Docker builds using requirements-security.txt had transformers>=4.53.0 (unpinned upper bound), which pulled a newer incompatible version where BertTokenizer.encode_plus was removed. This caused all classification requests to fail with: "BertTokenizer has no attribute encode_plus" Changes: - Pin transformers==4.53.0 in requirements-security.txt (matches requirements.txt) - Add upper bounds to torch, huggingface-hub, safetensors, numpy, peft to prevent similar untested major version upgrades from breaking Docker builds - Replace all tokenizer.encode_plus() calls with tokenizer() across the codebase (the __call__ method is the modern, forward-compatible API that accepts identical parameters) Affected files: prediction_service.py, test_sms.py, stressTest_500.py, stressTest_1000_mlx.py, train_ots.py, train_ots_improved.py, compare_models.py, train_incremental.py, train_enhanced_multilingual.py https://claude.ai/code/session_018ERyTcyaXRpheHinYgiypJ

claude · 2026-04-08T18:18:35Z

PR Review: Fix Docker container crash — transformers dependency & encode_plus deprecation

Overall: Solid, well-scoped fix for a real production breakage. The root cause is correctly identified and addressed consistently across the codebase.

What's done well

Correct API migration: Replacing tokenizer.encode_plus() with tokenizer() (the __call__ method) is the right move. The __call__ interface accepts identical parameters and is the canonical, forward-compatible HuggingFace API. All 9 files updated consistently — confirmed stressTest_20k_mlx_api.py is HTTP-based and doesn't tokenize locally, so it's correctly excluded.
Consistent change: Every occurrence is updated with the same arguments, so there is no behavioural change — purely a call-site migration.
Upper bounds added: Proactively adding upper bounds to torch, huggingface-hub, safetensors, numpy, and peft is good practice to prevent the same class of breakage from happening again.

Minor concerns worth tracking

1. Hard-pinning transformers==4.53.0 in a security requirements file

transformers==4.53.0   # requirements-security.txt

This prevents receiving security patches in any future 4.53.x release. Since this file is explicitly for security CVE mitigations, a range like transformers>=4.53.0,<4.54 (or <5.0 for more flexibility) would allow patch-level security fixes while still blocking the breaking major/minor upgrade that caused this crash. Worth reconsidering before the next release cycle.

2. huggingface-hub<0.36 is very tight

huggingface-hub>=0.35.3,<0.36 locks to the 0.35.x series only. If a security patch lands in 0.36.x, you will need a manual update to pick it up. Consider <1.0 if the API surface you rely on is stable within the major version.

3. Root cause: requirements.txt vs requirements-security.txt drift

The underlying issue was that requirements.txt already pinned transformers==4.53.0 but requirements-security.txt had >=4.53.0. Consider adding a CI lint step (e.g. a simple script diffing the pinned versions between the two files) to catch this drift early and prevent recurrence.

Test coverage

Existing test files (test_sms.py, stressTest_500.py, stressTest_1000_mlx.py) are updated in sync, which is correct. No new tests are strictly needed here since this is an API compatibility migration with identical semantics — but a smoke test in CI that actually loads the tokenizer and calls it would catch this class of issue before it reaches Docker builds.

Summary

Area	Assessment
Bug fix correctness	Correct
API migration	Correct and complete
Dependency pinning	Good, minor refinements recommended
Test coverage	Adequate for the change
Security posture	Minor: hard pin may delay future security patches

Good fix overall. The three items above are suggestions for follow-up, not blockers.

Reviewed by Claude (claude-sonnet-4-6)

ajamous merged commit e8c246c into feature/opentextshield-platform-v2 Apr 8, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Docker container crash: pin transformers dependency and replace deprecated encode_plus#174

Fix Docker container crash: pin transformers dependency and replace deprecated encode_plus#174
ajamous merged 1 commit into
feature/opentextshield-platform-v2from
claude/fix-docker-dependencies-VlFFc

ajamous commented Apr 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

claude Bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ajamous commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Apr 8, 2026

PR Review: Fix Docker container crash — transformers dependency & encode_plus deprecation

What's done well

Minor concerns worth tracking

Test coverage

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ajamous commented Apr 8, 2026 •

edited

Loading