Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 60 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@ Sendspin is a multi-room music experience protocol. The goal of the protocol is
## Definitions

- **Sendspin Server** - orchestrates all devices, generates audio streams, manages players and clients, provides metadata
- **Sendspin Client** - a client that can play audio, visualize audio, display metadata, display colors, or provide music controls. Has different possible roles (player, metadata, controller, artwork, visualizer, color). Every client has a unique identifier
- **Sendspin Client** - a client that can play audio, visualize audio, display metadata, display colors, or provide music controls. Has different possible roles (player, metadata, controller, artwork, lyrics, visualizer, color). Every client has a unique identifier
- **Player** - receives audio and plays it in sync. Has its own volume and mute state and preferred format settings
- **Controller** - controls the Sendspin group this client is part of
- **Metadata** - displays text metadata (title, artist, album, etc.)
- **Artwork** - displays artwork images. Has preferred format for images
- **Lyrics** - displays track lyrics or karaoke subtitles. Has preferred format for lyrics data
- **Visualizer** - visualizes music. Has preferred format for audio features
- **Color** - receives colors derived from the current audio
- **Sendspin Group** - a group of clients. Each client belongs to exactly one group, and every group has at least one client. Every group has a unique identifier. Each group has the following states: list of member clients, volume, mute, and playback state
Expand All @@ -21,7 +22,7 @@ Sendspin is a multi-room music experience protocol. The goal of the protocol is

Roles define what capabilities and responsibilities a client has. All roles use explicit versioning with the `@` character: `<role>@<version>` (e.g., `player@v1`, `controller@v1`).

This specification defines the following roles: [`player`](#player-messages), [`controller`](#controller-messages), [`metadata`](#metadata-messages), [`artwork`](#artwork-messages), [`visualizer`](#visualizer-messages), [`color`](#color-messages). All servers must implement all versions of these roles described in this specification.
This specification defines the following roles: [`player`](#player-messages), [`controller`](#controller-messages), [`metadata`](#metadata-messages), [`artwork`](#artwork-messages), [`lyrics`](#lyrics-messages), [`visualizer`](#visualizer-messages), [`color`](#color-messages). All servers must implement all versions of these roles described in this specification.

All role names and versions not starting with `_` are reserved for future revisions of this specification.

Expand Down Expand Up @@ -147,7 +148,7 @@ Binary message IDs typically use **bits 7-2** for role type and **bits 1-0** for
- `000000xx` (0-3): Reserved for future use
- `000001xx` (4-7): Player role
- `000010xx` (8-11): Artwork role
- `000011xx` (12-15): Reserved for a future role
- `000011xx` (12-15): Lyrics role
- `00010xxx` (16-23): Visualizer role
- Roles 6-47 (IDs 24-191): Reserved for future roles
- Roles 48-63 (IDs 192-255): Available for use by [application-specific roles](#application-specific-roles)
Expand Down Expand Up @@ -217,6 +218,9 @@ sequenceDiagram
alt Artwork role
Server->>Client: binary Types 8-11 (artwork channels 0-3)
end
alt Lyrics role
Server->>Client: binary Type 12 (lyrics data)
end
alt Visualizer role
Server->>Client: binary Type 16 (visualization data)
end
Expand Down Expand Up @@ -275,10 +279,12 @@ Players that can output audio should have the role `player`.
- `controller@v1` - controls the current Sendspin group
- `metadata@v1` - displays text metadata describing the currently playing audio
- `artwork@v1` - displays artwork images
- `lyrics@v1` - displays track lyrics or karaoke subtitles
- `visualizer@v1` - visualizes audio
- `color@v1` - receives colors derived from the current audio
- `player@v1_support?`: object - only if `player@v1` is listed ([see player@v1 support object details](#client--server-clienthello-playerv1-support-object))
- `artwork@v1_support?`: object - only if `artwork@v1` is listed ([see artwork@v1 support object details](#client--server-clienthello-artworkv1-support-object))
- `lyrics@v1_support?`: object - only if `lyrics@v1` is listed ([see lyrics@v1 support object details](#client--server-clienthello-lyricsv1-support-object))
- `visualizer@v1_support?`: object - only if `visualizer@v1` is listed ([see visualizer@v1 support object details](#client--server-clienthello-visualizerv1-support-object))

**Note:** Each role version may have its own support object (e.g., `player@v1_support`, `player@v2_support`). Application-specific roles or role versions follow the same pattern (e.g., `_myapp_display@v1_support`, `player@_experimental_support`).
Expand Down Expand Up @@ -381,6 +387,7 @@ Starts a stream for one or more roles. If sent for a role that already has an ac

- `player?`: object - only sent to clients with the `player` role ([see player object details](#server--client-streamstart-player-object))
- `artwork?`: object - only sent to clients with the `artwork` role ([see artwork object details](#server--client-streamstart-artwork-object))
- `lyrics?`: object - only sent to clients with the `lyrics` role ([see lyrics object details](#server--client-streamstart-lyrics-object))
- `visualizer?`: object - only sent to clients with the `visualizer` role ([see visualizer object details](#server--client-streamstart-visualizer-object))

[Application-specific roles](#application-specific-roles) may also include objects in this message (keys starting with `_`).
Expand All @@ -399,6 +406,7 @@ Request different stream format (upgrade or downgrade). Available for clients wi

- `player?`: object - only for clients with the `player` role ([see player object details](#client--server-streamrequest-format-player-object))
- `artwork?`: object - only for clients with the `artwork` role ([see artwork object details](#client--server-streamrequest-format-artwork-object))
- `lyrics?`: object - only for clients with the `lyrics` role ([see lyrics object details](#client--server-streamrequest-format-lyrics-object))

[Application-specific roles](#application-specific-roles) may also include objects in this message (keys starting with `_`).

Expand All @@ -410,7 +418,7 @@ Response: [`stream/start`](#server--client-streamstart) for the requested role(s

Ends the stream for one or more roles. When received, clients should stop output and clear buffers for the specified roles.

- `roles?`: string[] - roles to end streams for ('player', 'artwork', 'visualizer'). If omitted, ends all active streams
- `roles?`: string[] - roles to end streams for ('player', 'artwork', 'lyrics', 'visualizer'). If omitted, ends all active streams

[Application-specific roles](#application-specific-roles) may also be included in this array (names starting with `_`).

Expand Down Expand Up @@ -703,6 +711,54 @@ The timestamp indicates when this artwork should be displayed. Clients must tran

**Clearing artwork:** To clear the currently displayed artwork on a specific channel, the server sends an empty binary message (only the message type byte and timestamp, with no image data) for that channel.

## Lyrics messages
This section describes messages specific to clients with the `lyrics` role, which handle display of track lyrics and karaoke subtitles. Lyrics clients receive lyrics data in their preferred format delivered as binary messages.

**Supported formats:**
- `lrc` - LRC text format with embedded timestamps (UTF-8 encoded)
- `cdg` - CD+G karaoke format (binary)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lrc files are pretty ubiquitous, but I haven't heard of CD+G before.
Why should we support CD+G? I mean if it's really niche we can't depend on Sendspin implementations supporting that.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the data to back this up unfortunately, but karaoke subtitles are not niche and are well supported by both hardware and software.

Copy link
Copy Markdown

@Hedda Hedda Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lrc files are pretty ubiquitous, but I haven't heard of CD+G before.
Why should we support CD+G? I mean if it's really niche we can't depend on Sendspin implementations supporting that.

@maximmaxim345 CDG+G is not niche at all, it is an official standard for Audio-CD with Graphics (as Audio Compact Discs with digital images) that all professional Karaoke machines uses (such as used in Karaoke bars and home use), so almost all songs are available in that format.The ”G” stands for Graphics and as I understand the format is it just images for each character and as such does not need a font engine to display wierd characters for all languages. Anyway, more common nowadays are MP3+G which is just when an the redbook Audio-CD part been converted to MPG and the Graphics broken out to separate files (.cdg).

image

For player code reference I suggest check out the Kodi (formerly XBMC) codebase which supports CD+G and MP3+G

Quote from that Wikipedia article:

Along with dedicated karaoke machines, other consumer devices that play CD+G format CDs include the NEC TurboGrafx-CD (a CD-ROM peripheral for the TurboGrafx-16) and Turbo Duo, as well as the Japan-only successor the PC-FX, the Philips CD-i, the Sega CD, Sega Saturn,[5] the JVC X'Eye, the 3DO Interactive Multiplayer, the Amiga CD32 and Commodore CDTV, and the Atari Jaguar CD (an attachment for the Atari Jaguar). Some CD-ROM drives can also read this data. Pioneer's LaserActive player can also play CD+G discs, as long as either the PAC-S1/S-10 or PAC-N1/N10 game modules are installed.

Since 2003, some standalone DVD players have supported the CD+G format. Regular audio CD players will output only the audio tracks as if it was a normal music CD, unless otherwise designed to read the extra data (lyrics and images).[6]

CD+G karaoke albums are still made today by several UK and US manufacturers including Sunfly, Zoom Entertainments, SBI Karaoke and Vocal Star. Although the popularity of CD sales are dwindling the format is still widely used as MP3+G downloads.


**No transport timestamps:** Unlike audio or artwork, lyrics binary messages do not carry a server clock timestamp. Timing information is embedded within the lyrics data itself (e.g., `[mm:ss.xx]` tags in LRC, frame timing in CDG). The server sends lyrics at the start of each track; clients begin processing as soon as they receive them.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the lyrics be in sync then if it's not aligned to any timestamp?
I mean:

  • What if you pause/stop playback in the mean time, the lyrics should also stop in that case
  • Depending on the networks latency, the position in the lyrics can be different every time.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec instructs clients how to calculate track progress - can't we use that progress value to show the right lyrics? This is what I'm currently doing in my client (which is downloading lyrics independently) and it's working well.


### Client → Server: `client/hello` lyrics@v1 support object

The `lyrics@v1_support` object in [`client/hello`](#client--server-clienthello) has this structure:

- `lyrics@v1_support`: object
- `supported_formats`: string[] - list of supported lyrics formats in priority order (first is preferred), e.g. `['lrc', 'cdg']`
- `max_size_bytes`: integer - maximum size in bytes of lyrics data the client can handle per binary message. If the lyrics for the current track exceed this limit, the server must send [`stream/end`](#server--client-streamend) for the lyrics role instead of the binary data.

**Note:** Servers must support all lyrics formats: 'lrc' and 'cdg'.

### Client → Server: `stream/request-format` lyrics object

The `lyrics` object in [`stream/request-format`](#client--server-streamrequest-format) has this structure:

Request the server to change the lyrics format.

After receiving this message, the server responds with [`stream/start`](#server--client-streamstart) for the lyrics role with the new format, followed by an immediate lyrics update through a binary message.

- `lyrics`: object
- `format`: 'lrc' | 'cdg' - requested lyrics format

### Server → Client: `stream/start` lyrics object

The `lyrics` object in [`stream/start`](#server--client-streamstart) has this structure:

- `lyrics`: object
- `format`: 'lrc' | 'cdg' - format of the lyrics data

### Server → Client: Lyrics (Binary)

Binary messages should be rejected if there is no active stream.

- Byte 0: message type `12` (uint8)
- Rest of bytes: lyrics data (UTF-8 encoded text for `lrc`; binary data for `cdg`)
Comment thread
OnFreund marked this conversation as resolved.

The server sends lyrics data at the start of each track and resends after a track change. Clients begin processing the lyrics data immediately upon receipt.

When no lyrics are available for the current track (or lyrics exceed the client's `max_size_bytes`), the server sends [`stream/end`](#server--client-streamend) for the lyrics role. When lyrics become available again (e.g., on the next track), the server sends a new [`stream/start`](#server--client-streamstart) followed by the binary lyrics data.

## Visualizer messages
This section describes messages specific to clients with the `visualizer` role, which create visual representations of the audio being played. Visualizer clients receive audio analysis data like FFT information that corresponds to the current audio timeline.

Expand Down