Skip to content

Stream chat responses from Gemini to the browser with SSE #10

@devangb3

Description

@devangb3

The chat UI currently waits for the full Gemini response before displaying anything, which for a multi-paragraph answer is 3–8 seconds of spinner. Every production chat app streams. Streaming with SSE is far cheaper than migrating to WebSockets.

Current state:
/chat blocks until Gemini returns, then returns the full answer.

Proposed implementation:

  1. Switch Gemini call to streamGenerateContent endpoint.
  2. Flask route returns Response(generator, mimetype='text/event-stream') emitting data: {token}\n\n chunks.
  3. Frontend uses EventSource (or fetch + ReadableStream, because EventSource is GET-only and the query is in the POST body — easier to POST and parse chunks manually).
  4. Source-attribution footer is sent as a final SSE event after the tokens.

Files likely affected:

  • app.py (chat route)
  • backend/rag_utils.py (LLMIntegration.generate_answer becomes a generator)
  • frontend/src/components/ChatInterface.js
  • frontend/src/utils/api.js

Acceptance criteria:

  • First token visible in the browser within ~500 ms of sending the question.
  • Sources and chunks_found still render correctly after the stream completes.
  • Aborting the request (user navigates away) closes the upstream Gemini connection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions