Stream chat responses from Gemini to the browser with SSE

The chat UI currently waits for the full Gemini response before displaying anything, which for a multi-paragraph answer is 3–8 seconds of spinner. Every production chat app streams. Streaming with SSE is far cheaper than migrating to WebSockets.

**Current state:**
`/chat` blocks until Gemini returns, then returns the full answer.

**Proposed implementation:**
1. Switch Gemini call to `streamGenerateContent` endpoint.
2. Flask route returns `Response(generator, mimetype='text/event-stream')` emitting `data: {token}\n\n` chunks.
3. Frontend uses `EventSource` (or `fetch` + `ReadableStream`, because `EventSource` is GET-only and the query is in the POST body — easier to POST and parse chunks manually).
4. Source-attribution footer is sent as a final SSE event after the tokens.

**Files likely affected:**
- `app.py` (chat route)
- `backend/rag_utils.py` (`LLMIntegration.generate_answer` becomes a generator)
- `frontend/src/components/ChatInterface.js`
- `frontend/src/utils/api.js`

**Acceptance criteria:**
- First token visible in the browser within ~500 ms of sending the question.
- Sources and `chunks_found` still render correctly after the stream completes.
- Aborting the request (user navigates away) closes the upstream Gemini connection.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream chat responses from Gemini to the browser with SSE #10

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Stream chat responses from Gemini to the browser with SSE #10

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions