feat: support Granite-Docling model #2109

dhdaines · 2026-01-04T05:35:57Z

Applies on top of #2108 which has the necessary changes to MTMD.

This adds chat formats to support https://huggingface.co/ggml-org/granite-docling-258M-GGUF and its ancestor, https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF (and various other SmolVLM models)

In order to use Granite-Docling effectively for table structure, equation and layout extraction, it is necessary to enable special tokens in the chat completion output, so this adds a special flag to all of the chat completion functions which matches what --special does in llama-cli (this is enabled by default in llama-mtmd-cli)

dhdaines · 2026-01-04T18:07:21Z

Ready for review!

- Update vendor/llama.cpp submodule to commit be47fb92 (2026-01-01) - Bump version 0.3.16 -> 0.4.0 Critical fixes: - Remove phantom flash_attn field from llama_context_params (caused segfaults) - Add 3 missing params to llama_params_fit (margin, n_ctx_min, log_level) - Migrate flash_attn bool -> flash_attn_type enum (BREAKING CHANGE) - Add flash_attn_type to TYPE_CHECKING block - Fix test: use flash_attn_type instead of removed flash_attn field - FIX CRITICAL: kv_cache_seq_rm must preserve seq_id=-1 semantics (all sequences) * The wrapper was incorrectly converting -1 to 0, breaking context rewind * This caused 'discontinuity' errors on multi-turn conversations API changes: - flash_attn: bool field REMOVED from structs - flash_attn_type: int enum ADDED (AUTO=-1, DISABLED=0, ENABLED=1) - High-level API maintains backward compatibility via wrapper - Server default changed: flash_attn=False -> flash_attn=None (AUTO mode) New features: - 20+ new functions (memory API, state management, samplers, vocab queries) - 5 new enums (flash_attn_type, params_fit_status, model_meta_key, etc.) - 6 new struct fields across llama_model_params, llama_context_params, mtmd_context_params Deprecated removals: - 11 llama_kv_self_* functions (replaced by llama_memory_*) - llama_sampler_init_softmax - verbosity field from mtmd_context_params

After external code review (GPT-5.2), fixed 4 critical issues: 1. CRITICAL: Fixed tokens[:-1] bug in prefix matching - Was silently breaking prefix matching for ALL models - Caused false rewind detection and cache inefficiency - Impact: Transformers AND recurrent models 2. CRITICAL: Implement proper reset() for recurrent models - Now actually clears llama_memory backend state - Root cause fix for 'sequence positions not consecutive' crash - Without this, reset was a no-op for recurrent models 3. CRITICAL: Enforce strict append policy for recurrent models - Prevents KV cache rewinding that's impossible without state snapshots - Forces full reset on history edits instead of crashing 4. Performance: Cache _is_recurrent to avoid repeated FFI calls 5. Documentation: Simplified comments and updated docstring 6. Testing: All existing tests pass + Mistral-Small-3.2-24B validated Resolves multi-turn crashes for Nemotron-A3B, Mamba, RWKV, Jamba models. Reviewed-by: GPT-5.2 (OpenAI) Tested-by: pytest + Mistral-Small-3.2-24B Fixes: abetlen#2108 (recurrent model crashes) Compatible-with: abetlen#2109 (Granite-Docling/SmolVLM special tokens)

dhdaines force-pushed the granite-docling branch from cdcadc5 to 03cf904 Compare January 4, 2026 18:04

dhdaines marked this pull request as ready for review January 4, 2026 18:05

dhdaines force-pushed the granite-docling branch 3 times, most recently from 3a04e21 to 8790ce6 Compare January 6, 2026 19:55

Ralf Waldukat and others added 4 commits January 13, 2026 09:47

feat: support Granite-Docling model

ef832d6

feat: add special argument needed to make Granite-Docling useful

4190c19

feat: add special to all formatters/completers

9cd71ba

dhdaines force-pushed the granite-docling branch from 8790ce6 to 9cd71ba Compare January 13, 2026 12:57

feat: add support for SmolVLM

344913c

dhdaines mentioned this pull request Jan 13, 2026

Update to llama.cpp 2026-01-01 #2108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support Granite-Docling model #2109

feat: support Granite-Docling model #2109

Uh oh!

dhdaines commented Jan 4, 2026 •

edited

Loading

Uh oh!

dhdaines commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: support Granite-Docling model #2109

Are you sure you want to change the base?

feat: support Granite-Docling model #2109

Uh oh!

Conversation

dhdaines commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhdaines commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dhdaines commented Jan 4, 2026 •

edited

Loading