-
Notifications
You must be signed in to change notification settings - Fork 5
WIP: Update outlines (and outlines prompts) and transformers versions #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
07fea08
d63ec77
e76e5a5
c7e7f9d
cdb2932
6a0ea90
ab7c62e
6bb8b76
b69294f
91b5a39
47d11b9
c3a5537
4c1314a
19bfd87
fb79efa
efd6c3b
21a333c
2784e57
04e09e9
ae95e39
a8deeb6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| from outlines.backends.outlines_core import OutlinesCoreBackend | ||
| from outlines.models.transformers import TransformerTokenizer | ||
| from outlines_core import Vocabulary | ||
|
|
||
|
|
||
| # monkey patch to fix https://github.com/dottxt-ai/outlines/pull/1831 | ||
| # fix was applied to outlines main, but we will probably be blocked from updating due to vllm dependency | ||
| # assuming this will be in official release >1.1.12 | ||
| @staticmethod | ||
| def deterministic_create_vocab(vocab, eos_token_id, eos_token, token_to_str): | ||
| formatted_vocab = {} | ||
| for token, token_id in vocab.items(): | ||
| token_as_str = token_to_str(token) | ||
| formatted_vocab.setdefault(token_as_str, []).append(token_id) | ||
| formatted_vocab.pop(eos_token) | ||
| return Vocabulary(eos_token_id, formatted_vocab) | ||
|
|
||
|
|
||
| OutlinesCoreBackend.create_outlines_core_vocabulary = deterministic_create_vocab | ||
|
|
||
|
|
||
| # monkey patch to fix https://github.com/dottxt-ai/outlines/pull/1817 | ||
| # newer verion of outlines fixes this issue (1.2.10), but we are blocked with the vllm dependency | ||
| def convert_token_to_string(self, token: str) -> str: | ||
| from transformers.file_utils import SPIECE_UNDERLINE | ||
|
|
||
| string = self.tokenizer.convert_tokens_to_string([token]) | ||
|
|
||
| if token.startswith(SPIECE_UNDERLINE) or token == "<0x20>": | ||
| return " " + string | ||
|
|
||
| return string | ||
|
|
||
|
|
||
| TransformerTokenizer.convert_token_to_string = convert_token_to_string | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file looks like it's been hit by some auto-linting (and similar story with one or two of the other files). This particular file was provided by a sub, and I would rather than change it at all in case they deliver some updates to us so that the diff is much cleaner. Could either copy the version of this file from main and overwrite this version or revert the particular commit that made these changes. In generally I would prefer not to auto-lint existing files since it tends to blow up the diffs with superficial changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we know what version of outlines should have this fix (probably one higher version than the latest one assuming they haven't cut a release yet with this fix merged) we should note it here to make it easier on our future selves.