Skip to content

GLM-OCR#19

Open
AkshitMaheshwari wants to merge 2 commits into
mainfrom
akshit
Open

GLM-OCR#19
AkshitMaheshwari wants to merge 2 commits into
mainfrom
akshit

Conversation

@AkshitMaheshwari
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings April 3, 2026 18:42
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
tax-ai Ready Ready Preview, Comment Apr 3, 2026 7:19pm

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR switches the backend image OCR implementation from Tesseract/pytesseract to the HuggingFace zai-org/GLM-OCR model, and updates frontend environment configuration to point at the backend running on port 8000.

Changes:

  • Replace Tesseract-based OCR with GLM-OCR (Transformers) inference and add model-loading caching.
  • Update backend dependencies to include transformers, torch, and accelerate.
  • Adjust frontend .env.example / .env API URL values and add myenv/ to backend .gitignore.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
Frontend/.env.example Updates example API base URL to http://127.0.0.1:8000.
Frontend/.env Changes committed Vite API base URL to localhost.
Backend/services/image_ocr.py Replaces pytesseract OCR pipeline with GLM-OCR model inference.
Backend/requirements.txt Adds Transformers + Torch + Accelerate dependencies for GLM-OCR.
Backend/.gitignore Ignores myenv/ directory.
Comments suppressed due to low confidence (1)

Backend/services/image_ocr.py:10

  • ImageOps is imported but no longer used after switching away from the pre-processing path. Please remove the unused import to keep the module clean (and avoid lint/test failures if the repo enforces them).

from PIL import Image, ImageOps


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Backend/services/image_ocr.py Outdated
Comment on lines +3 to +5
import os
import tempfile
import uuid
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempfile is imported but never used. Either remove the import or use tempfile.NamedTemporaryFile/mkstemp for the temporary PNG (which would also avoid writing into the app’s working directory).

Copilot uses AI. Check for mistakes.
Comment on lines 10 to 13

from runtime import get_runtime_config
from services.document_ingestion import DocumentValidationError, parse_document
from services.groq_ai import extract_csv_from_ocr_text, groq_status
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_runtime_config is now unused in this module (the prior Tesseract config path was removed). Please drop the unused import to avoid confusion and potential linting failures.

Copilot uses AI. Check for mistakes.
Comment thread Backend/services/image_ocr.py Outdated
Comment on lines +21 to +27
_processor = None
_model = None

def _load_glm_ocr():
global _processor, _model
if _processor is None or _model is None:
from transformers import AutoProcessor, AutoModelForImageTextToText
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model initialization is guarded only by _processor is None or _model is None with module-level globals. Under concurrent requests, two threads can enter _load_glm_ocr() simultaneously and race while downloading/loading the model, causing excessive memory use or intermittent failures. Consider protecting initialization with a threading.Lock (or initializing once during app startup/lifespan).

Copilot uses AI. Check for mistakes.
Comment on lines 37 to 41
def ocr_status() -> dict[str, Any]:
config = get_runtime_config()
tesseract_cmd = str(getattr(config, "TESSERACT_CMD", "")).strip()
return {
"pytesseract_installed": pytesseract is not None,
"tesseract_cmd_configured": bool(tesseract_cmd),
"glm_ocr_configured": True,
"groq": groq_status(),
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ocr_status() reports "glm_ocr_configured": True unconditionally, even if transformers/torch aren’t installed or the model can’t be loaded/downloaded. This can mislead the /options API and the frontend. Consider making this flag reflect reality (e.g., attempt a lightweight import and/or _load_glm_ocr() in a try/except and return configured: False plus an error detail when it fails).

Copilot uses AI. Check for mistakes.
Comment thread Backend/services/image_ocr.py Outdated
Comment on lines +52 to +69
temp_filename = f"{uuid.uuid4().hex}.png"
try:
image = Image.open(BytesIO(image_bytes))
image.save(temp_filename, format="PNG")
except Exception as exc:
if os.path.exists(temp_filename):
os.remove(temp_filename)
raise OcrConversionError(f"Uploaded file is not a readable image: {exc}") from exc

# Grayscale + auto-contrast usually improves OCR quality on scans.
processed = ImageOps.autocontrast(ImageOps.grayscale(image))

try:
text = pytesseract.image_to_string(processed)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"url": temp_filename
},
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OCR path writes a UUID-named PNG into the current working directory and then passes a relative path (url: temp_filename) into the processor. In deployed environments the CWD may be read-only or different than expected, which can cause OCR to fail. Prefer creating the file in the OS temp directory (e.g., via tempfile.NamedTemporaryFile(dir=tempfile.gettempdir(), ...)) and pass an absolute path to the model/processor.

Copilot uses AI. Check for mistakes.
Comment thread Backend/services/image_ocr.py Outdated
Comment on lines +86 to +92
inputs.pop("token_type_ids", None)
generated_ids = _model.generate(**inputs, max_new_tokens=8192)
output_text = _processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)

normalized = "\n".join(line.rstrip() for line in output_text.splitlines())
if len(normalized.strip()) < 20:
raise OcrConversionError(
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_processor.decode(..., skip_special_tokens=False) will include the model’s special/control tokens in the OCR output. Since the downstream Groq extraction expects plain OCR text, these tokens can pollute the prompt and reduce extraction quality. Consider decoding with skip_special_tokens=True (or explicitly stripping known template tokens) before normalization.

Copilot uses AI. Check for mistakes.
Comment thread Frontend/.env
@@ -1,2 +1,2 @@
# Backend API URL
VITE_API_URL=https://taxai-77xc.onrender.com
VITE_API_URL=http://127.0.0.1:8000
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committing Frontend/.env with VITE_API_URL=http://127.0.0.1:8000 will bake a localhost API URL into any production build that uses repository .env defaults, breaking deployments (and overriding the fallback onrender URL in src/utils/api.js). Typically .env should be untracked and .env.example used for documentation; consider removing Frontend/.env from version control (and adding it to Frontend/.gitignore) or restoring the deployed API URL here.

Suggested change
VITE_API_URL=http://127.0.0.1:8000
# Leave VITE_API_URL unset here so the app can use its built-in fallback URL.
# For local development, define VITE_API_URL in an untracked local env file instead.

Copilot uses AI. Check for mistakes.
Comment thread Backend/requirements.txt Outdated
Comment on lines +12 to +14
transformers
torch
accelerate
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding transformers, torch, and accelerate significantly increases build size and may fail to install in some deployment targets without extra system dependencies / correct wheels. Consider documenting the required runtime (CPU vs GPU) and deployment implications, or making these optional extras so environments that don’t use OCR don’t have to install them.

Suggested change
transformers
torch
accelerate
# Optional OCR/ML dependencies:
# Install these only in deployments that need OCR features, as they
# significantly increase build size and may require CPU/GPU-specific wheels.
# Example:
# pip install transformers torch accelerate

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants