Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,8 @@ Binary audio messages contain timestamps in the server's time domain indicating
- When a client cannot maintain sync (e.g., buffer underrun), it should send `state: 'error'` via [`client/state`](#client--server-clientstate), mute its audio output, and continue buffering until it can resume synchronized playback, at which point it should send `state: 'synchronized'`
- The server is unaware of individual client synchronization accuracy - it simply broadcasts timestamped audio
- The server sends audio to late-joining clients with future timestamps only, allowing them to buffer and start playback in sync with existing clients
- After sending [`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear) messages, servers should schedule the first audio timestamp far enough in the future so clients can receive and queue initial chunks without missing playback start (see [`required_lead_time_ms`](#client--server-clientstate-player-object))
- For live streams, servers may need to delay playback to build and maintain players' [`min_buffer_ms`](#client--server-clientstate-player-object) targets
- Audio chunks may arrive with timestamps in the past due to network delays or buffering; clients should drop these late chunks to maintain sync
- Clients subtract their [`static_delay_ms`](#client--server-clientstate-player-object) from server timestamps before scheduling playback
- Servers factor in each client's `static_delay_ms` when calculating how far ahead to send audio, keeping effective buffer headroom constant
Expand Down Expand Up @@ -455,6 +457,19 @@ The `player@v1_support` object in [`client/hello`](#client--server-clienthello)

**Note:** Servers must support all audio codecs: 'opus', 'flac', and 'pcm'.

**Note:** [`required_lead_time_ms`](#client--server-clientstate-player-object) and [`min_buffer_ms`](#client--server-clientstate-player-object) are reported via [`client/state`](#client--server-clientstate-player-object). Players should report the lowest values that reliably prevent buffer underruns and start-of-stream truncation under expected conditions, to ensure the lowest possible latency for real-time applications. Both should factor in expected network delay/jitter (small on LAN/Wi-Fi, larger for remote or high-latency clients). Do not include `static_delay_ms` in these values; the server applies `static_delay_ms` separately when calculating send-ahead.

**Server behavior:**
- For startup/restart timing, compute per-player send-ahead using `required_lead_time_ms + static_delay_ms`.
- For grouped startup/restart, use a common send-ahead of `max(required_lead_time_ms + static_delay_ms)` across grouped players.
- For ongoing playback timing, compute per-player send-ahead using `min_buffer_ms + static_delay_ms`.
- For live streams or other real-time content with grouped playback, use a common ongoing send-ahead of `max(min_buffer_ms + static_delay_ms)` across grouped players. Recompute when players join, leave, or update their timing parameters.
- When the max `min_buffer_ms` decreases mid-stream (player leaves group, or updates timing), the server may keep the current send-ahead unchanged or reduce it toward the new max. The choice depends on the implementation and the priorities of the server.
- Especially for live streams, servers must keep each player's ongoing buffer duration at or above its `min_buffer_ms`, capped by the maximum buffer size advertised in `buffer_capacity`. If `min_buffer_ms` worth of audio exceeds `buffer_capacity`, `buffer_capacity` takes precedence; players must size `buffer_capacity` to fit their own `min_buffer_ms`.
- For buffered streams, prefer filling each player's queue near `buffer_capacity` to maximize stability.
- `buffer_capacity` is a hard per-player byte limit; servers should not send data that would cause a player's queued compressed audio to exceed this limit.
- Servers may rate-limit, debounce, or coalesce a player's timing updates to prevent disruption from frequent or small changes.

Comment thread
kahrendt marked this conversation as resolved.
**PCM Encoding Convention:** For the `pcm` codec, samples are encoded as little-endian signed integers (two's complement). 24-bit samples are packed as 3 bytes per sample.

### Client → Server: `client/state` player object
Expand All @@ -469,10 +484,14 @@ State updates must be sent whenever any state changes, including when the volume
- `volume?`: integer - range 0-100, must be included if 'volume' is in `supported_commands` from [`player@v1_support`](#client--server-clienthello-playerv1-support-object)
- `muted?`: boolean - mute state, must be included if 'mute' is in `supported_commands` from [`player@v1_support`](#client--server-clienthello-playerv1-support-object)
- `static_delay_ms`: integer - static delay in milliseconds (0-5000), always required for players
- `required_lead_time_ms`: integer - minimum startup lead time in milliseconds (e.g., codec init, decode warmup, audio backend buffering, DAC latency), always required for players. Measured from server transmit time of the start/restart trigger ([`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear)) to the timestamp of the first subsequent audio chunk.
- `min_buffer_ms`: integer - requested minimum ongoing buffer duration in milliseconds during playback (primarily for live streams), used to absorb network jitter and ongoing decode/playback timing variance. Always required for players.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a way we can give guidance on how to calculate this (required_lead_time_ms gives very specific instructions). This is a harder one to do reliably I guess, but if we could describe a way to approach this then that would be good.

I believe we had previously discussed something like looking at the distribution of time message latencies and taking like the 95th percentile or something along those lines. I think we'll have to do some experiments to really see what is actually usable in practice.

- `supported_commands?`: string[] - subset of: 'set_static_delay'

**Static delay:** The default is 0, meaning audio exits the device's audio port at the timestamp. `static_delay_ms` compensates for additional delay beyond the port (external speakers, amplifiers). Negative values are not supported and should never be required for any compliant implementation. Clients must persist `static_delay_ms` locally across reboots and server reconnections. Clients may update `static_delay_ms` and `supported_commands` when audio output changes (e.g., external speaker connected), persisting separate delays per output.

**Timing parameters:** Clients may update `required_lead_time_ms` and `min_buffer_ms` at any time (e.g., after empirically measuring lead time post-warmup, or on link-type change). Servers must factor in updated values for subsequent playback timing. Clients should debounce updates locally, reporting changes only after a shift in conditions appears sustained, not on transient fluctuations.

### Client → Server: `stream/request-format` player object

The `player` object in [`stream/request-format`](#client--server-streamrequest-format) has this structure:
Expand Down