From 0664dfbcca9d06ec0e2d0aab25fd145106c9edc3 Mon Sep 17 00:00:00 2001 From: albertodiazdurana <52709586+albertodiazdurana@users.noreply.github.com> Date: Wed, 6 May 2026 18:15:34 +0200 Subject: [PATCH 1/3] docs(ollama): add streaming-with-tools example to OllamaChatGenerator reference Closes deepset-ai/haystack-core-integrations#3263 (follow-up). The component reference page already covers Tool Support and Streaming in separate sections, but no example shows them combined. Adds a Streaming with Tools section between the two, with an executable example verified empirically against llama3.1:8b on Ollama. Notable behavior captured in the doc: when the model invokes a tool, streamed chunks carry tool_calls deltas and chunk.content is empty; the final ChatMessage has text=None and tool_calls populated. --- .../generators/ollamachatgenerator.mdx | 48 +++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx b/docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx index 7f6b3fad5f..2084037c0d 100644 --- a/docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx +++ b/docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx @@ -88,6 +88,54 @@ See our [Streaming Support](guides-to-generators/choosing-the-right-generator.md Give preference to `print_streaming_chunk` by default. Write a custom callback only if you need a specific transport (for example, SSE/WebSocket) or custom UI formatting. +### Streaming with Tools + +You can combine streaming with tool calling. Pass both `tools` and `streaming_callback`; when the model decides to invoke a tool, the streamed chunks carry tool-call deltas instead of text tokens, and the final reconstructed `ChatMessage` exposes the resolved `tool_calls` list on `replies[0]`. + +```python +from haystack.dataclasses import ChatMessage +from haystack.dataclasses.streaming_chunk import StreamingChunk +from haystack.tools import create_tool_from_function +from haystack_integrations.components.generators.ollama import OllamaChatGenerator + + +def get_weather(city: str) -> str: + """Get current weather for a city.""" + return f"Sunny, 22°C in {city}" + + +def callback(chunk: StreamingChunk) -> None: + if chunk.tool_calls: + print(f"[tool delta] {chunk.tool_calls}") + elif chunk.content: + print(chunk.content, end="", flush=True) + + +generator = OllamaChatGenerator( + model="llama3.1:8b", + generation_kwargs={"temperature": 0.0}, + tools=[create_tool_from_function(get_weather)], + streaming_callback=callback, +) + +response = generator.run( + messages=[ChatMessage.from_user( + "What's the weather in Berlin? Use the get_weather tool." + )] +) + +# Final reconstructed message: tool_calls populated, text is None +assistant_message = response["replies"][0] +print(assistant_message.tool_calls) +# -> [ToolCall(tool_name='get_weather', arguments={'city': 'Berlin'}, ...)] +``` + +:::tip[What to expect when tools fire] +When the model emits a tool call rather than free-form text, streamed chunks carry `tool_calls` deltas and `chunk.content` is empty. The final `replies[0].text` will be `None`, and `replies[0].tool_calls` holds the reconstructed call list. Plain text streaming and tool calling are mutually exclusive within a single generation step. +::: + +You can use the built-in `print_streaming_chunk` callback (which handles both text tokens and tool events) instead of writing your own. + ## Usage 1. You need a running instance of Ollama. The installation instructions are [in the Ollama GitHub repository](https://github.com/jmorganca/ollama). From e22be72b4a027e451cbd5aaa5bc20b0b088bc1c3 Mon Sep 17 00:00:00 2001 From: albertodiazdurana <52709586+albertodiazdurana@users.noreply.github.com> Date: Wed, 6 May 2026 23:46:29 +0200 Subject: [PATCH 2/3] docs(ollama): add release note for streaming-with-tools doc Per CONTRIBUTING.md, every PR requires a release note under releasenotes/notes/. Categorized as `enhancements` to match the shape of prior docs-only release notes (e.g., docs-cleaner-markdown-ocr-examples-...yaml). --- ...-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml diff --git a/releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml b/releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml new file mode 100644 index 0000000000..cf1599ceef --- /dev/null +++ b/releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml @@ -0,0 +1,4 @@ +--- +enhancements: + - | + Document the combined use of streaming and tool calling in ``OllamaChatGenerator``: pass both ``streaming_callback`` and ``tools`` to receive ``tool_calls`` deltas through the streaming callback, and read the reconstructed call list from ``replies[0].tool_calls`` once the run returns. The component reference page now includes an executable example and notes that, within a single generation step, streamed text tokens and tool-call deltas are mutually exclusive (when the model invokes a tool, ``chunk.content`` is empty and ``replies[0].text`` is ``None``). From de747a4bc5b865b64d1d42b28c0673b1728e9ef4 Mon Sep 17 00:00:00 2001 From: albertodiazdurana <52709586+albertodiazdurana@users.noreply.github.com> Date: Thu, 7 May 2026 14:50:57 +0200 Subject: [PATCH 3/3] docs(ollama): address review feedback on PR #11268 - Remove releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml: docs-only change does not need a release note. - Remove the :::tip[What to expect when tools fire] admonition from docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx: the inline comments in the streaming-with-tools example already convey the same information. - Add the Streaming with Tools section to docs-website/versioned_docs/version-2.28/pipeline-components/generators/ollamachatgenerator.mdx (latest stable), byte-identical to the v3 docs section. --- .../generators/ollamachatgenerator.mdx | 4 -- .../generators/ollamachatgenerator.mdx | 44 +++++++++++++++++++ ...machatgenerator-docs-8e339d62f38ebd06.yaml | 4 -- 3 files changed, 44 insertions(+), 8 deletions(-) delete mode 100644 releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml diff --git a/docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx b/docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx index 2084037c0d..b141d5889e 100644 --- a/docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx +++ b/docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx @@ -130,10 +130,6 @@ print(assistant_message.tool_calls) # -> [ToolCall(tool_name='get_weather', arguments={'city': 'Berlin'}, ...)] ``` -:::tip[What to expect when tools fire] -When the model emits a tool call rather than free-form text, streamed chunks carry `tool_calls` deltas and `chunk.content` is empty. The final `replies[0].text` will be `None`, and `replies[0].tool_calls` holds the reconstructed call list. Plain text streaming and tool calling are mutually exclusive within a single generation step. -::: - You can use the built-in `print_streaming_chunk` callback (which handles both text tokens and tool events) instead of writing your own. ## Usage diff --git a/docs-website/versioned_docs/version-2.28/pipeline-components/generators/ollamachatgenerator.mdx b/docs-website/versioned_docs/version-2.28/pipeline-components/generators/ollamachatgenerator.mdx index 816ed66ad1..d16e1964cf 100644 --- a/docs-website/versioned_docs/version-2.28/pipeline-components/generators/ollamachatgenerator.mdx +++ b/docs-website/versioned_docs/version-2.28/pipeline-components/generators/ollamachatgenerator.mdx @@ -87,6 +87,50 @@ See our [Streaming Support](guides-to-generators/choosing-the-right-generator.md Give preference to `print_streaming_chunk` by default. Write a custom callback only if you need a specific transport (for example, SSE/WebSocket) or custom UI formatting. +### Streaming with Tools + +You can combine streaming with tool calling. Pass both `tools` and `streaming_callback`; when the model decides to invoke a tool, the streamed chunks carry tool-call deltas instead of text tokens, and the final reconstructed `ChatMessage` exposes the resolved `tool_calls` list on `replies[0]`. + +```python +from haystack.dataclasses import ChatMessage +from haystack.dataclasses.streaming_chunk import StreamingChunk +from haystack.tools import create_tool_from_function +from haystack_integrations.components.generators.ollama import OllamaChatGenerator + + +def get_weather(city: str) -> str: + """Get current weather for a city.""" + return f"Sunny, 22°C in {city}" + + +def callback(chunk: StreamingChunk) -> None: + if chunk.tool_calls: + print(f"[tool delta] {chunk.tool_calls}") + elif chunk.content: + print(chunk.content, end="", flush=True) + + +generator = OllamaChatGenerator( + model="llama3.1:8b", + generation_kwargs={"temperature": 0.0}, + tools=[create_tool_from_function(get_weather)], + streaming_callback=callback, +) + +response = generator.run( + messages=[ChatMessage.from_user( + "What's the weather in Berlin? Use the get_weather tool." + )] +) + +# Final reconstructed message: tool_calls populated, text is None +assistant_message = response["replies"][0] +print(assistant_message.tool_calls) +# -> [ToolCall(tool_name='get_weather', arguments={'city': 'Berlin'}, ...)] +``` + +You can use the built-in `print_streaming_chunk` callback (which handles both text tokens and tool events) instead of writing your own. + ## Usage 1. You need a running instance of Ollama. The installation instructions are [in the Ollama GitHub repository](https://github.com/jmorganca/ollama). diff --git a/releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml b/releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml deleted file mode 100644 index cf1599ceef..0000000000 --- a/releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml +++ /dev/null @@ -1,4 +0,0 @@ ---- -enhancements: - - | - Document the combined use of streaming and tool calling in ``OllamaChatGenerator``: pass both ``streaming_callback`` and ``tools`` to receive ``tool_calls`` deltas through the streaming callback, and read the reconstructed call list from ``replies[0].tool_calls`` once the run returns. The component reference page now includes an executable example and notes that, within a single generation step, streamed text tokens and tool-call deltas are mutually exclusive (when the model invokes a tool, ``chunk.content`` is empty and ``replies[0].text`` is ``None``).