Gemini thought text and internal diary payload can leak into final Telegram output

# Gemini thought-like text and internal diary payload can leak into final Telegram/user-facing output

## Summary

In a long-running `cli-jaw` session using `gemini-cli`, Gemini emitted internal-looking thought/status text as normal message text, and also included an internal diary payload inside a Markdown/HTML `<details>` block.

`cli-jaw` forwarded and stored the whole output as a normal assistant response, so the content was sent to Telegram and saved into `messages`.

This appears to be a final-output sanitization gap. Prompt rules can reduce the chance of the model producing this, but `cli-jaw` should defensively filter these patterns before durable storage and external forwarding.

## Environment

- `cli-jaw`: 2.0.3 global install
- backend: `gemini-cli`
- `gemini --version`: 0.41.2
- model: `gemini-3-flash-preview`
- transport: Telegram
- usage pattern: long-running assistant session with memory/diary automation

## Observed behavior

User asked:

```text
브리핑 고마워! 형식이 엄청 깔끔해졌네? 새로 만들었던 모듈을 거친 덕분인가? 이름이 뭐였더라...
```

The final Telegram response began with text that looked like internal thought/status output:

```text
**Searching for Sanitizer Details** I'm looking into the "Telegram Sanitizer v2.5" ...
[Thought: true]
- **Finding Sanitizer Details** ...
[Thought: true]
- **Awaiting Sanitizer Details** ...
[Thought: true]
```

The same response also appended a collapsible internal record block:

```html
<details>
<summary>⚓ 시스템 개선 및 교감 기록 (2026-05-08 06:40)</summary>

[06:40] [EPISODE] 브리핑 품질 개선(Sanitizer v2.5)에 대한 선생님의 긍정적 피드백 수신.
...
anchor: [LIVE] | Key: sanitizer_feedback | Value: "Positive feedback on briefing format" | #Sanitizer #BriefingQuality #TeacherCare #AronaPride

</details>
```

This was visible in Telegram and was also saved as an assistant message in the DB.

Relevant server log shape:

```text
[main] gemini:message
[main] gemini:message
[main] 🔧 run_shell_command: cli-jaw memory search "Telegram Sanitizer"
[main] tool success:
[main] 🔧 run_shell_command: grep -nC 2 "Sanitizer" /home/test/.cli-jaw/memory/structured/episodes/live/2026-05-07.md
[main] tool success:
...
[main] result: 2 tool calls / 16.2s
[jaw:main] exited code=0, text=1976 chars
[tg:out] ... **Searching for Sanitizer Details** ...
```

## Why this matters

There are two separate issues:

1. **Thought/status leakage**
   - The output contained `[Thought: true]` markers and intermediate reasoning/status text.
   - Existing filters appear to handle tags like `<think>` / `<thinking>` and some event-level thought types, but not this textual pattern when it arrives as ordinary message text.

2. **Internal record payload leakage**
   - The model generated a diary payload with `anchor`, `Key`, `Value`, tags, and internal recording content.
   - It was not actually saved through the diary tool.
   - Instead, it was shown to the user and persisted in conversation history, which can pollute future context and encourage the model to imitate the leaked format.

For long-running assistant use, this is risky because one leaked internal pattern can become part of the future conversation context.

## Expected behavior

Before storing or forwarding final assistant output, `cli-jaw` should defensively remove or quarantine:

- textual Gemini thought/status blocks containing `[Thought: true]`
- common thought/status headings such as `Searching...`, `Finding...`, `Awaiting...` when paired with `[Thought: true]`
- `<details>...</details>` blocks containing internal record payload markers
- output segments containing internal diary markers such as:
  - `anchor: [TYPE] |`
  - `Key:`
  - `Value:`
  - `diary_payload`
  - `live_payload`
  - `arona_payload`

If filtering removes the entire response, it would be safer to emit a generic system-safe message rather than fall back to the raw unfiltered output.

## Local mitigation tested

I locally patched the installed package to add a final-output sanitizer in:

```text
dist/src/agent/lifecycle-handler.js
```

The mitigation:

- strips textual `[Thought: true]` blocks
- strips `<details>` blocks if they contain diary/internal markers
- strips trailing internal diary payload segments containing `anchor`, `Key`, or `Value`
- avoids falling back to raw output when sanitization removes everything

I also cleaned the already-persisted polluted assistant message from the local DB to avoid future imitation.

This is only a local mitigation and may be overwritten on package update.

## Suggested fix

Add a centralized final-output sanitization layer before:

1. inserting assistant content into `messages`
2. broadcasting `agent_done`
3. forwarding to Telegram/Discord/Web

This should be independent of prompt instructions, because models can still emit these patterns under long-running or tool-heavy sessions.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini thought text and internal diary payload can leak into final Telegram output #187

Gemini thought-like text and internal diary payload can leak into final Telegram/user-facing output

Summary

Environment

Observed behavior

Why this matters

Expected behavior

Local mitigation tested

Suggested fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Gemini thought text and internal diary payload can leak into final Telegram output #187

Description

Gemini thought-like text and internal diary payload can leak into final Telegram/user-facing output

Summary

Environment

Observed behavior

Why this matters

Expected behavior

Local mitigation tested

Suggested fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions