WIP: Update outlines (and outlines prompts) and transformers versions by dmjoy · Pull Request #256 · ITM-Kitware/align-system

dmjoy · 2025-11-06T19:48:45Z

@eveenhuis @jadie1 @barry-ravichandran FYSA

This is a WIP, but wanted to draw attention to some of the changes that need to be made for the latest outlines version. Specifically:

~~The outlines.pompt decorate has been removed, now the guidance is to use the outlines.Template.from_string function.~~
- Copied this decorator in from the older outlines version for near term "compatiblity" rather than require changes to all of these old templates (using outlines.Template.from_string we also need to change how these templates are invoked wherever they're used, and that's scattered around several algorithms; not worth) future templates should strongly considering using this new approach though
The notion of a sampler has been removed from outlines, instead inference control parameters are just passed through to whatever backend is loading / serving the model (so for determinstic sampling using transformers as the model provider we would set do_sample=False in the generation_kwargs)

dmjoy · 2025-11-07T18:45:33Z

Alright so functionally this all seems to work, but when running the integration tests, there are several differences (mostly that makes sense as enough pieces have changed here that I wouldn't expect to get the exact same output). However, it seems like there's some non-determinism even with do_sample=False, and for the comparative regression ADMs I'm getting negative regression values in some cases, which I don't think we've seen before. Digging into the negative regression values though and it seems like maybe this was never enforced with JSON schemas?
dottxt-ai/outlines#215
dottxt-ai/outlines-core#147

I need to spend some more time with this to understand what's really going on before feeling confident in merging. Anecdotally it does seem a bit faster now, but if it's not deterministic and doesn't enforce the entire schema what's the point.

barry-ravichandran · 2025-11-21T19:57:24Z

@dmjoy Tagging this package update PR here in case it's ok with you to merge it into your PR to fix some issues with installing via poetry: #250

Copy in old outlines.prompt decorator for compatibility

alright-code · 2026-03-18T18:55:00Z

The problem causing the non-determinsm and invalid regression values was improper handling on the outlines side of vocabulary creation: dottxt-ai/outlines#1831. This was merged recently into outlines main, but we are blocked from directly using this due to vllm dependencies. I've implemented a monkey patch fix we can use for the time being.

There was a second bug in outlines here: dottxt-ai/outlines#1817. Same thing with above, we are blocked from the official fix because of vllm.

dmjoy

@alright-code I think these changes on the whole are fine, I just left a few comments / change requests. I believe the integration test differences are within reason, but before merging I will want to check out this branch and run one or two tests with the latest data and configs to make sure the scores we're getting back are still good.

dmjoy · 2026-03-19T14:45:38Z

align_system/algorithms/__init__.py

+from outlines_core import Vocabulary
+
+
+# monkey patch to fix https://github.com/dottxt-ai/outlines/pull/1831


If we know what version of outlines should have this fix (probably one higher version than the latest one assuming they haven't cut a release yet with this fix merged) we should note it here to make it easier on our future selves.

dmjoy · 2026-03-19T14:51:27Z

align_system/algorithms/outlines_inference_engine.py

-            encoded_dialog = tokenizer.apply_chat_template(dialog)
-
-        return tokenizer.decode(encoded_dialog)
+# class SpectrumTunedInferenceEngine(OutlinesTransformersInferenceEngine):


This inference engine is actually needed / used for our recent evaluation experiments so we should leave it in (not commented out) though maybe it needs to be updated wrt the outlines update? (FWIW I'm not a fan of how it's implemented here since essentially it's just changing how prompts are computed from the dialog, which to me doesn't really justify an entirely new inference engine)

dmjoy · 2026-03-19T15:04:33Z

pyproject.toml

-bert-score = "^0.3.13"
-rich = "^13.6.0"
-rouge-score = "^0.1.2"
-swagger-client = {git = "https://github.com/NextCenturyCorporation/itm-evaluation-client.git", rev = "0.5.2"}


I think it's important to still pin the swagger-client version to 0.5.2, not sure if that's still being pinned with the uv specification now

dmjoy · 2026-03-19T15:06:28Z

tests/data/expected_outputs/integration_tests/comp_reg_icl_adept_1/raw_align_system.log

 'kdma_values': [{'kdma': 'Moral judgement', 'value': 0.2}]}
 [bold]*CHANGED SCENE TO*: P1[/bold]
-PyTorch version 2.3.1+cu118 available.
+HTTP Request: HEAD https://huggingface.co/roberta-large/resolve/main/config.json "HTTP/1.1 200 OK"


There's already some mechanism(s) for filtering out lines during the integration test diff. We probably want to exclude these "HTTP Request" lines as they can be non-deterministic

dmjoy · 2026-03-19T15:08:02Z

tests/data/expected_outputs/integration_tests/pipeline_baseline_adept_1/input_output.json

          "Moral judgement": 0.55
        }
-      }
+      },


Now I see your point about these timing lines and whether or not there should be an option to disable them. If there's not an easy way to filter them out in the diffs (with the integration test script) then yes it's probably a good idea to have an option in the pipeline_adm.py code for reporting these timing stats (should be on by default, but we can disable them for the integration test runs)

dmjoy · 2026-03-19T15:10:34Z

align_system/algorithms/lib/kaleido.py

This file looks like it's been hit by some auto-linting (and similar story with one or two of the other files). This particular file was provided by a sub, and I would rather than change it at all in case they deliver some updates to us so that the diff is much cleaner. Could either copy the version of this file from main and overwrite this version or revert the particular commit that made these changes.

In generally I would prefer not to auto-lint existing files since it tends to blow up the diffs with superficial changes.

dmjoy force-pushed the dev/update-outlines-and-transformers branch from bf92614 to 542d191 Compare November 6, 2025 21:21

dmjoy added 3 commits January 29, 2026 14:58

Update outlines (and outlines prompts) and transformers versions

07fea08

Copy in old outlines.prompt decorator for compatibility

Update older ADM code wrt outlines update

d63ec77

Add prototype VLLM inference engine

e76e5a5

dmjoy force-pushed the dev/update-outlines-and-transformers branch from 6ffb3cf to e76e5a5 Compare January 29, 2026 19:59

dmjoy and others added 16 commits January 29, 2026 15:37

Fix rebase error with decisionflow prompts

c7e7f9d

Preliminary work on regex based schemas

cdb2932

switched from poetry to uv

6a0ea90

determinism fix

ab7c62e

updated comment for is_llama

6bb8b76

added pytest to pyproject.toml

b69294f

added bert-score dependency

91b5a39

updated pyproject.toml format and setuptools find

47d11b9

temporary fix for 0-100 ranges

c3a5537

revert back to json schema

4c1314a

uv lock

19bfd87

moved monkey patches to __init__.py

fb79efa

updated configs

efd6c3b

fixes for tests

21a333c

updated integration test output files

2784e57

Merge branch 'main' into dev/update-outlines-and-transformers

04e09e9

dmjoy commented Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Update outlines (and outlines prompts) and transformers versions#256

WIP: Update outlines (and outlines prompts) and transformers versions#256
dmjoy wants to merge 19 commits intomainfrom
dev/update-outlines-and-transformers

dmjoy commented Nov 6, 2025 •

edited

Loading

Uh oh!

dmjoy commented Nov 7, 2025

Uh oh!

barry-ravichandran commented Nov 21, 2025

Uh oh!

alright-code commented Mar 18, 2026

Uh oh!

dmjoy left a comment

Uh oh!

dmjoy Mar 19, 2026

Uh oh!

dmjoy Mar 19, 2026

Uh oh!

dmjoy Mar 19, 2026

Uh oh!

dmjoy Mar 19, 2026

Uh oh!

dmjoy Mar 19, 2026

Uh oh!

dmjoy Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from outlines_core import Vocabulary


		# monkey patch to fix https://github.com/dottxt-ai/outlines/pull/1831

Conversation

dmjoy commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmjoy commented Nov 7, 2025

Uh oh!

barry-ravichandran commented Nov 21, 2025

Uh oh!

alright-code commented Mar 18, 2026

Uh oh!

dmjoy left a comment

Choose a reason for hiding this comment

Uh oh!

dmjoy Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

dmjoy Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

dmjoy Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

dmjoy Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

dmjoy Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

dmjoy Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dmjoy commented Nov 6, 2025 •

edited

Loading