Skip to content

WIP: Add encryption and authentication support#77

Draft
maximmaxim345 wants to merge 6 commits into
mainfrom
encryption
Draft

WIP: Add encryption and authentication support#77
maximmaxim345 wants to merge 6 commits into
mainfrom
encryption

Conversation

@maximmaxim345
Copy link
Copy Markdown
Member

Rough draft of how encryption support could look like.
Related: Sendspin/backlog#32
Still very WIP with a couple limitations and potential security vulnerabilities.

Comment thread README.md
Comment thread README.md
PSK = HKDF-SHA256(dh_shared_secret, salt="sendspin-pairing", info="sendspin-psk")
```

This PSK bootstraps a Noise NNpsk0 handshake. Inside the resulting encrypted session, both sides exchange long-term static public keys for future reconnections. See [encryption messages](#encryption-messages) for the full message sequence.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version of the spec does not secure against MITM attacks.
We need to support mandatory PIN confirmation or similar, at least for commercial devices. Probably still optional for DIY clients.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PIN is difficult though, only very few speakers have screens.

We could also just accept the risk since the MITM attack needs to happen during setup. After that the connection is protected.
And for devices that implement pairing_method: 'button' the LRU eviction issue can only happen with physical access.

Comment thread README.md Outdated
Comment thread README.md

The receiver decrypts and checks the first byte: `0x00` means strip it and parse the rest as JSON; any other value is handled as a binary message using the existing [binary message ID structure](#binary-message-id-structure).

Nonce management is handled automatically by the Noise transport state.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noise has a 65535-byte transport message limit. Large artwork images may exceed this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserve a bit of the role byte. a 0 (existintg behavior) says its a finished message. A 1 says its unfinished and will continue with follow up messages until one is marked finished.

Comment thread README.md
Comment thread README.md
Mitigates MITM and rogue server attacks. Physical button press gates
pairing and LRU eviction. Unpaired servers cannot displace active
connections or trigger pairing from discovery. Identity changes between
sessions surface a distinct
warning before re-pairing is attempted.
Comment thread README.md

Clients advertise their pairing capability via the `pairing_method` field in [`client/hello`](#client--server-clienthello):

- `button` — physical pairing button on the device. Pairing requires both server UI approval and a button press. **Recommended.**
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is that button? play/pause? Long pressing play/pause? Dedicated sync/paring button?

Not clear.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that not all speakers have a buttons though, (so would be nice if could force pairing without a button).

I guess an alternative option to having a physical pairing button could be to do what many Zigbee devices manufactures do when the device does not have any buttons and instead have the user power ON and OFF the device a certain amout of times within a set time slot? The downside to that is that it is not a user-friendly experience and pairing of some Zigbee devices that uses that method can sometimes require multiple or repeated attempts before pairing works, (that is also why many manufacturers usually recommend to always factory-reset of such Zigbee devices before pairing it).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is a tricky problem to solve though.

Power cycling wouldn't really work here since you need to keep the WebSocket connection during this "accept pairing" action. (and I hate switching the power on and off every time I need to pair a lightbulb haha)

As an alternative there is still the none paring option. But here its a compromise between security and convenience. For example, anybody with access to the same network can blast music with full volume.

Another idea I'm thinking about is to have another password pairing option. Here any server that wants to pair to the server needs to know either a factory provided password (thats maybe printed on a sticker).
Or a device without a button or provided password can be "upgraded" to this more secure form.

I think most users wouldn't set a password, but with the option you could setup any Sendspin client (even without a button) to be secure enough to be used on a public wifi (for example in a Cafe or something similar). At least when just looking at the protocol itself as a attack surface.

I'm pretty sure this password approach is used by at least a couple AirPlay speakers.

Comment thread README.md
- `encryption?`: object - omitted if client does not support [encryption](#encryption)
- `suite`: string - preferred cipher suite: `'25519_ChaChaPoly_SHA256'` or `'25519_AESGCM_SHA256'`
- `pairing_method`: string - `'button'` or `'none'`. See [pairing methods](#pairing-methods)
- `server_static_key?`: string - Base64 encoded static public key of this server from a previous pairing. Enables [reconnection](#pairing-and-reconnection) via Noise KK handshake
Copy link
Copy Markdown
Member Author

@maximmaxim345 maximmaxim345 Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client doesn't know yet what server is connecting. How can then the client send the correct static key?

But thinking about it, we don't even need this field. The server and client can just try to build a connection, if it fails we treat it invalid.
And if the client suddenly acts as if it isn't paired, the server should also show a warning.

Comment thread README.md
Comment on lines +121 to +122
- `button` — physical pairing button on the device. Pairing requires both server UI approval and a button press. **Recommended.**
- `none` — no physical pairing mechanism. Pairing requires only server UI approval. **Discouraged** — vulnerable to rogue server pairing with no physical presence check.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative idea to always require the button press:
Clients allow all servers to connect, but some roles require a more secure paring process.

In that case playing to a speaker is always as frictionless as possible (and still protects the server from zero-click attacks).
But more sensitive roles are still secured behind additional steps (for example the future source role).

For example, a client implementing the player and source role will only be able to use the player role until the user does the full pairing process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants