From aec96397e77100d4dc2136ab56ad97fbf5cd313a Mon Sep 17 00:00:00 2001 From: Maxim Raznatovski Date: Thu, 26 Feb 2026 11:07:26 +0100 Subject: [PATCH 1/4] docs(spec): clarify player timing and buffer parameters --- README.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/README.md b/README.md index 4e2b1ca..d1aa837 100644 --- a/README.md +++ b/README.md @@ -176,6 +176,8 @@ Binary audio messages contain timestamps in the server's time domain indicating - When a client cannot maintain sync (e.g., buffer underrun), it should send `state: 'error'` via [`client/state`](#client--server-clientstate), mute its audio output, and continue buffering until it can resume synchronized playback, at which point it should send `state: 'synchronized'` - The server is unaware of individual client synchronization accuracy - it simply broadcasts timestamped audio - The server sends audio to late-joining clients with future timestamps only, allowing them to buffer and start playback in sync with existing clients +- After sending [`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear) messages, servers should schedule the first audio timestamp far enough in the future so clients can receive and queue initial chunks without missing playback start (see [`required_lead_time_ms`](#client--server-clienthello-playerv1-support-object)) +- For live streams, servers may need to delay playback to build and maintain players' [`min_buffer_ms`](#client--server-clienthello-playerv1-support-object) targets - Audio chunks may arrive with timestamps in the past due to network delays or buffering; clients should drop these late chunks to maintain sync - Clients subtract their [`static_delay_ms`](#client--server-clientstate-player-object) from server timestamps before scheduling playback - Servers factor in each client's `static_delay_ms` when calculating how far ahead to send audio, keeping effective buffer headroom constant @@ -451,10 +453,23 @@ The `player@v1_support` object in [`client/hello`](#client--server-clienthello) - `sample_rate`: integer - sample rate in Hz (e.g., 44100) - `bit_depth`: integer - bit depth for this format (e.g., 16, 24) - `buffer_capacity`: integer - max size in bytes of compressed audio messages in the buffer that are yet to be played + - `required_lead_time_ms?`: integer - minimum startup lead time in milliseconds (e.g., codec init, decode warmup, audio backend buffering, DAC latency). Measure this from server transmit time of the start/restart trigger message ([`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear)) to the timestamp of the first subsequent audio chunk. + - `min_buffer_ms?`: integer - requested minimum ongoing buffer duration in milliseconds during playback (primarily for live streams), used to absorb network jitter and continuous-playback pipeline delays. - `supported_commands`: string[] - subset of: 'volume', 'mute' **Note:** Servers must support all audio codecs: 'opus', 'flac', and 'pcm'. +**Note:** `required_lead_time_ms` and `min_buffer_ms` should factor in expected network delay/jitter. On LAN/Wi-Fi they can be small; for remote or high-latency clients they should be increased. Do not include `static_delay_ms` in these values; the server applies `static_delay_ms` separately when calculating send-ahead. + +**Server behavior:** +- For startup/restart timing, compute per-player send-ahead using `required_lead_time_ms + static_delay_ms`. +- For grouped startup/restart, use a common send-ahead of `max(required_lead_time_ms + static_delay_ms)` across grouped players, plus optional `sync_guard_ms`. +- For ongoing playback timing, compute per-player send-ahead using `min_buffer_ms + static_delay_ms`. +- For live streams or other real-time content with grouped playback, use a common ongoing send-ahead of `max(min_buffer_ms + static_delay_ms)` across grouped players, plus optional `sync_guard_ms`. +- For live streams, keep each player's minimum buffer duration at or above `min_buffer_ms` when possible, capped by the maximum buffer size advertised in `buffer_capacity`. +- For buffered streams, prefer filling each player's queue near `buffer_capacity` to maximize stability. +- `buffer_capacity` is a hard per-player byte limit; servers should not send data that would cause a player's queued compressed audio to exceed this limit. + **PCM Encoding Convention:** For the `pcm` codec, samples are encoded as little-endian signed integers (two's complement). 24-bit samples are packed as 3 bytes per sample. ### Client → Server: `client/state` player object From 4bf39cfb3abc5c675e9f29e4df7fbfaa7d4315bd Mon Sep 17 00:00:00 2001 From: Maxim Raznatovski Date: Fri, 27 Feb 2026 08:34:44 +0100 Subject: [PATCH 2/4] Make `required_lead_time_ms` and `min_buffer_ms` manditory --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index d1aa837..95a053d 100644 --- a/README.md +++ b/README.md @@ -453,8 +453,8 @@ The `player@v1_support` object in [`client/hello`](#client--server-clienthello) - `sample_rate`: integer - sample rate in Hz (e.g., 44100) - `bit_depth`: integer - bit depth for this format (e.g., 16, 24) - `buffer_capacity`: integer - max size in bytes of compressed audio messages in the buffer that are yet to be played - - `required_lead_time_ms?`: integer - minimum startup lead time in milliseconds (e.g., codec init, decode warmup, audio backend buffering, DAC latency). Measure this from server transmit time of the start/restart trigger message ([`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear)) to the timestamp of the first subsequent audio chunk. - - `min_buffer_ms?`: integer - requested minimum ongoing buffer duration in milliseconds during playback (primarily for live streams), used to absorb network jitter and continuous-playback pipeline delays. + - `required_lead_time_ms`: integer - minimum startup lead time in milliseconds (e.g., codec init, decode warmup, audio backend buffering, DAC latency). Measure this from server transmit time of the start/restart trigger message ([`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear)) to the timestamp of the first subsequent audio chunk. + - `min_buffer_ms`: integer - requested minimum ongoing buffer duration in milliseconds during playback (primarily for live streams), used to absorb network jitter and continuous-playback pipeline delays. - `supported_commands`: string[] - subset of: 'volume', 'mute' **Note:** Servers must support all audio codecs: 'opus', 'flac', and 'pcm'. From f5d7e32280a079b73960c0f055f02017b29fff0c Mon Sep 17 00:00:00 2001 From: Maxim Raznatovski Date: Mon, 16 Mar 2026 17:02:30 +0100 Subject: [PATCH 3/4] docs: strengthen live stream `min_buffer_ms` requirement --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 95a053d..30851f9 100644 --- a/README.md +++ b/README.md @@ -465,8 +465,8 @@ The `player@v1_support` object in [`client/hello`](#client--server-clienthello) - For startup/restart timing, compute per-player send-ahead using `required_lead_time_ms + static_delay_ms`. - For grouped startup/restart, use a common send-ahead of `max(required_lead_time_ms + static_delay_ms)` across grouped players, plus optional `sync_guard_ms`. - For ongoing playback timing, compute per-player send-ahead using `min_buffer_ms + static_delay_ms`. -- For live streams or other real-time content with grouped playback, use a common ongoing send-ahead of `max(min_buffer_ms + static_delay_ms)` across grouped players, plus optional `sync_guard_ms`. -- For live streams, keep each player's minimum buffer duration at or above `min_buffer_ms` when possible, capped by the maximum buffer size advertised in `buffer_capacity`. +- For live streams or other real-time content with grouped playback, use a common ongoing send-ahead of `max(min_buffer_ms + static_delay_ms)` across grouped players, plus optional `sync_guard_ms`. Recompute when players join, leave, or update their timing parameters. +- Especially for live streams, servers must keep each player's ongoing buffer duration at or above its `min_buffer_ms`, capped by the maximum buffer size advertised in `buffer_capacity`. - For buffered streams, prefer filling each player's queue near `buffer_capacity` to maximize stability. - `buffer_capacity` is a hard per-player byte limit; servers should not send data that would cause a player's queued compressed audio to exceed this limit. From 4802a2a1266ba4023c8cf655ccba530ed937e810 Mon Sep 17 00:00:00 2001 From: Maxim Raznatovski Date: Fri, 8 May 2026 10:50:06 +0200 Subject: [PATCH 4/4] docs: move timing capabilities to `client/state` and tighten wording --- README.md | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 30851f9..cd4ffdf 100644 --- a/README.md +++ b/README.md @@ -176,8 +176,8 @@ Binary audio messages contain timestamps in the server's time domain indicating - When a client cannot maintain sync (e.g., buffer underrun), it should send `state: 'error'` via [`client/state`](#client--server-clientstate), mute its audio output, and continue buffering until it can resume synchronized playback, at which point it should send `state: 'synchronized'` - The server is unaware of individual client synchronization accuracy - it simply broadcasts timestamped audio - The server sends audio to late-joining clients with future timestamps only, allowing them to buffer and start playback in sync with existing clients -- After sending [`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear) messages, servers should schedule the first audio timestamp far enough in the future so clients can receive and queue initial chunks without missing playback start (see [`required_lead_time_ms`](#client--server-clienthello-playerv1-support-object)) -- For live streams, servers may need to delay playback to build and maintain players' [`min_buffer_ms`](#client--server-clienthello-playerv1-support-object) targets +- After sending [`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear) messages, servers should schedule the first audio timestamp far enough in the future so clients can receive and queue initial chunks without missing playback start (see [`required_lead_time_ms`](#client--server-clientstate-player-object)) +- For live streams, servers may need to delay playback to build and maintain players' [`min_buffer_ms`](#client--server-clientstate-player-object) targets - Audio chunks may arrive with timestamps in the past due to network delays or buffering; clients should drop these late chunks to maintain sync - Clients subtract their [`static_delay_ms`](#client--server-clientstate-player-object) from server timestamps before scheduling playback - Servers factor in each client's `static_delay_ms` when calculating how far ahead to send audio, keeping effective buffer headroom constant @@ -453,22 +453,22 @@ The `player@v1_support` object in [`client/hello`](#client--server-clienthello) - `sample_rate`: integer - sample rate in Hz (e.g., 44100) - `bit_depth`: integer - bit depth for this format (e.g., 16, 24) - `buffer_capacity`: integer - max size in bytes of compressed audio messages in the buffer that are yet to be played - - `required_lead_time_ms`: integer - minimum startup lead time in milliseconds (e.g., codec init, decode warmup, audio backend buffering, DAC latency). Measure this from server transmit time of the start/restart trigger message ([`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear)) to the timestamp of the first subsequent audio chunk. - - `min_buffer_ms`: integer - requested minimum ongoing buffer duration in milliseconds during playback (primarily for live streams), used to absorb network jitter and continuous-playback pipeline delays. - `supported_commands`: string[] - subset of: 'volume', 'mute' **Note:** Servers must support all audio codecs: 'opus', 'flac', and 'pcm'. -**Note:** `required_lead_time_ms` and `min_buffer_ms` should factor in expected network delay/jitter. On LAN/Wi-Fi they can be small; for remote or high-latency clients they should be increased. Do not include `static_delay_ms` in these values; the server applies `static_delay_ms` separately when calculating send-ahead. +**Note:** [`required_lead_time_ms`](#client--server-clientstate-player-object) and [`min_buffer_ms`](#client--server-clientstate-player-object) are reported via [`client/state`](#client--server-clientstate-player-object). Players should report the lowest values that reliably prevent buffer underruns and start-of-stream truncation under expected conditions, to ensure the lowest possible latency for real-time applications. Both should factor in expected network delay/jitter (small on LAN/Wi-Fi, larger for remote or high-latency clients). Do not include `static_delay_ms` in these values; the server applies `static_delay_ms` separately when calculating send-ahead. **Server behavior:** - For startup/restart timing, compute per-player send-ahead using `required_lead_time_ms + static_delay_ms`. -- For grouped startup/restart, use a common send-ahead of `max(required_lead_time_ms + static_delay_ms)` across grouped players, plus optional `sync_guard_ms`. +- For grouped startup/restart, use a common send-ahead of `max(required_lead_time_ms + static_delay_ms)` across grouped players. - For ongoing playback timing, compute per-player send-ahead using `min_buffer_ms + static_delay_ms`. -- For live streams or other real-time content with grouped playback, use a common ongoing send-ahead of `max(min_buffer_ms + static_delay_ms)` across grouped players, plus optional `sync_guard_ms`. Recompute when players join, leave, or update their timing parameters. -- Especially for live streams, servers must keep each player's ongoing buffer duration at or above its `min_buffer_ms`, capped by the maximum buffer size advertised in `buffer_capacity`. +- For live streams or other real-time content with grouped playback, use a common ongoing send-ahead of `max(min_buffer_ms + static_delay_ms)` across grouped players. Recompute when players join, leave, or update their timing parameters. +- When the max `min_buffer_ms` decreases mid-stream (player leaves group, or updates timing), the server may keep the current send-ahead unchanged or reduce it toward the new max. The choice depends on the implementation and the priorities of the server. +- Especially for live streams, servers must keep each player's ongoing buffer duration at or above its `min_buffer_ms`, capped by the maximum buffer size advertised in `buffer_capacity`. If `min_buffer_ms` worth of audio exceeds `buffer_capacity`, `buffer_capacity` takes precedence; players must size `buffer_capacity` to fit their own `min_buffer_ms`. - For buffered streams, prefer filling each player's queue near `buffer_capacity` to maximize stability. - `buffer_capacity` is a hard per-player byte limit; servers should not send data that would cause a player's queued compressed audio to exceed this limit. +- Servers may rate-limit, debounce, or coalesce a player's timing updates to prevent disruption from frequent or small changes. **PCM Encoding Convention:** For the `pcm` codec, samples are encoded as little-endian signed integers (two's complement). 24-bit samples are packed as 3 bytes per sample. @@ -484,10 +484,14 @@ State updates must be sent whenever any state changes, including when the volume - `volume?`: integer - range 0-100, must be included if 'volume' is in `supported_commands` from [`player@v1_support`](#client--server-clienthello-playerv1-support-object) - `muted?`: boolean - mute state, must be included if 'mute' is in `supported_commands` from [`player@v1_support`](#client--server-clienthello-playerv1-support-object) - `static_delay_ms`: integer - static delay in milliseconds (0-5000), always required for players + - `required_lead_time_ms`: integer - minimum startup lead time in milliseconds (e.g., codec init, decode warmup, audio backend buffering, DAC latency), always required for players. Measured from server transmit time of the start/restart trigger ([`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear)) to the timestamp of the first subsequent audio chunk. + - `min_buffer_ms`: integer - requested minimum ongoing buffer duration in milliseconds during playback (primarily for live streams), used to absorb network jitter and ongoing decode/playback timing variance. Always required for players. - `supported_commands?`: string[] - subset of: 'set_static_delay' **Static delay:** The default is 0, meaning audio exits the device's audio port at the timestamp. `static_delay_ms` compensates for additional delay beyond the port (external speakers, amplifiers). Negative values are not supported and should never be required for any compliant implementation. Clients must persist `static_delay_ms` locally across reboots and server reconnections. Clients may update `static_delay_ms` and `supported_commands` when audio output changes (e.g., external speaker connected), persisting separate delays per output. +**Timing parameters:** Clients may update `required_lead_time_ms` and `min_buffer_ms` at any time (e.g., after empirically measuring lead time post-warmup, or on link-type change). Servers must factor in updated values for subsequent playback timing. Clients should debounce updates locally, reporting changes only after a shift in conditions appears sustained, not on transient fluctuations. + ### Client → Server: `stream/request-format` player object The `player` object in [`stream/request-format`](#client--server-streamrequest-format) has this structure: