Skip to content

Conversation

@scgbckbone
Copy link

Copy link
Owner

@BenWestgate BenWestgate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concept ACK

I agree we should do something to prevent this unexpected behavior but it's due to unintuitive mathematical properties of finite fields.

The reason is GF(32) interpolation does not preserve padding, it operates on full 5-bit values. So you storing bytes and throwing away the padding means you don't have enough information to construct the same share you extracted bytes from.

You MUST pass padding to reconstruct a derived string (one produced by interpolation).

I don't know how to enforce that at the library level, any ideas? Especially when we both agree a default is also a nice feature, but it foot guns here.

assert d.s == "ms13k00ldp4v5nw8lph96x47mjxzgwjexe44p32swkq99e0w"

# now round-trip d share ('d' is derived via interpolation, NOT via 'from_seed')
dd = Codex32String.from_seed(d.data, "ms13k00ld")
Copy link
Owner

@BenWestgate BenWestgate Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do this. You can only .from_seed without passing pad_val for the k initial strings, derived strings MUST be passed padding to round-trip..

You needed to be able to do this:

dd = Codex32String.from_seed(d.data, "ms13k00ld", d.pad_val)

This version's Codex32String lacks a pad_val property, I'm working on an update which does.

No matter what padding style we use, since it's less than a full 5-bit value, so not in field GF(32), it will not interpolate into derived shares and maintain any linear relationship that allows round-tripping from bytes, GF(256), to GF(32) interpolated strings without passing the padding.

The only string you should care about data of after construction is "s" so the fact other share index values can return data is more of a curiosity and maybe .data should Raise InvalidShareIndex or return None if share_idx != "s" to this misuse.

What is your exact use case where you really need to store ALL the shares as bytes and recover back to codex32?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm able to do this which fixes this test case:
dd = Codex32String.from_seed(d.data, "ms13k00ld", pad_val=1)

but I have no idea how did I get to the pad_val=1 besides grinding it against the string which I already know (which won't be the case in real life)

I don't know how to enforce that at the library level, any ideas?

not really... besides grinding correct pad_val right after construction of derived share via round-trips (very meh)

What is your exact use case where you really need to store ALL the shares as bytes and recover back to codex32?

So my general idea is that I can use individual shares as normal secrets, load them on HWW, sign with them, etc. For instance user uses one HWW device to do the shamir split, while having N devices ready to export generated/derived shares as QR codes for instance. Load these derived shares on devices and geo-distribute the devices. These then serve as decoy, fully functional signers. When S secret is needed user just collect K devices & does some QR scanning to recover the S on empty HWW.

For this I thought I can use this from_seed/to_seed round-trips. Secure element storage is limited so for me byte encoding is more desired instead of u5.

But now, it seems this was never intended purpose of the non-secret shares, which seems more as just recovery tools, aka data with one and only one purpose - to recover share S (which is kind of pity tbh). Am I reading this correctly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that if round-trips with derived shares can be achieved somehow, even if passing padding is necessary, it should be desired.

Copy link
Owner

@BenWestgate BenWestgate Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I have no idea how did I get to the pad_val=1 besides grinding it against the string which I already know (which won't be the case in real life)

You had to grind it because you discarded the pad_val. You might recover a different last data character if you don't know the last character without padding. interpolate_at operates on 5-bit values not bytes.

any ideas?

not really... besides grinding correct pad_val ... (very meh)

It may be possible to do it if you give up being able construct "non-encoded" shares from bytes data and instead accept construction of a Codex32ShareSet object with a from_bytes (or from_seeds) factory. And then use an interpolate_at(share_idx) method of that share set object.

What is your exact use case...?

generated/derived shares as QR codes for instance.

Make sure to skim this compact CodexQR discussion before speccing a QR design, it's the analog of compact SeedQR. I found a fun way to fit 128-bit codex32 share data into 21x21 QR codes by dropping some of the identifier.

Whatever solution we find for Codex32ShareSet.from_bytes(header, dict) would be very helpful there, as well as here.

These then serve as decoy, fully functional signers.

This seems useful!

For this I thought I can use this from_seed/to_seed round-trips.

You may be able to round trip the share set from_seeds/to_seeds or .data of individual shares but we need to define the correct Codex32ShareSet from_seeds class method to make this possible.

The source of truth in a Codex32ShareSet should be the common header and the byte payloads of "s", "a", "c" for k = 3 or maybe "a", "c", "d". CRC padding, which does not interpolate, is slightly more useful on a share you can actually find and verify it on, than trying to interpolate to an unknown share to check if it validates.

Secure element storage is limited so for me byte encoding is more desired instead of u5.

A 21x21 QR has only 137.2 bits if using base45 alphanumeric encoding, 138.2 bits if also using kanji, bytes and numeric modes. So it'd be excellent for us to define a compact encoding of share data. The bare minimum needed to always recover the correct secret and with what's left: prevent user errors.

But now, it seems this was never intended purpose of the non-secret shares, which seems more as just recovery tools, aka data with one and only one purpose - to recover share S (which is kind of pity tbh). Am I reading this correctly?

Yes, this is not their intended purpose but they do contain randomness and I think your idea is a cool and efficient use of that otherwise wasted random data needed for SSS so worth pursuing IF it can be done securely (not revealing any more info about "s" than, at most, its padding bits with k-1 shares.)

I also think that if round-trips with derived shares can be achieved somehow, even if passing padding is necessary, it should be desired.

I agree. The solution to recover seeds from bytes alone is non-trivial but it should exist, lets find it. You'll find this bytes vs 130-bits question tripped up Andrew in the QR discussion, it's always surprising how padding behaves as the finite field changes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scgbckbone do you still want me to come up with a way to recover the same seed from any share's (not just initial strings) bytes without passing padding? I think it's technically possible but I haven't thought about how to do it yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, 1. sounds better

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

Working for your vectors at least.

This approach takes the share_idx_list and share_bytes_list, header, then for every possible combination of pad values, interpolates the initial k shares, checks they round trip, returns the target string at "s" if all initial shares round trip.

def recover_with_bytes(hrp, header_str, share_idx_list, share_bytes_list, target="s"):
    """Recover codex32 secret from share bytes assuming default padding."""
    pad_len = (5 - (len(share_bytes_list[0]) * 8) % 5) % 5
    default_padded_shares = IDX_ORDER[1 : 1 + len(share_idx_list)]
    pad_candidates = product(range(1 << pad_len), repeat=len(share_idx_list))
    for pad_vals in pad_candidates:
        given_shares = []
        for idx, data, pad_val in zip(share_idx_list, share_bytes_list, pad_vals):
            share_str = encode(hrp, header_str + idx, data, pad_val)
            given_shares.append(Codex32String(share_str))
        for share_idx in default_padded_shares:
            initial_share = Codex32String.interpolate_at(given_shares, share_idx)
            round_tripped_initial_share = Codex32String.from_seed(
                initial_share.data,
                hrp + "1" + header_str + share_idx,
            )
            if str(round_tripped_initial_share) != str(initial_share):
                break
        return Codex32String.interpolate_at(given_shares, target)

I haven't proven yet this always finds exactly one solution but my intuition says it will especially when the default padding bits checksum the missing last character data bits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try it, but missing definitions (like encode in above snippet). Do you have some implementation, even if just POC, you can post ?

Copy link
Owner

@BenWestgate BenWestgate Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use Codex32String.from_seed() in place of encode just pass header_str instead of hrp and header separately.

IDX_ORDER='sacdefghjklmnpqrstuvwxyz023456789'

The last tip is you must generate the 'a', 'c', ... 'd' etc shares with default padding as shown in my screenshot because it recovers by solving what padding values on its given share data recovers initial shares with correct padding.

IDX_ORDER = 'sacdefghjklmnpqrstuvwxyz023456789'

def recover_with_bytes(hrp, header_str, share_idx_list, share_bytes_list, target="s"):
    """Recover codex32 secret from share bytes assuming default padding."""
    pad_len = (5 - (len(share_bytes_list[0]) * 8) % 5) % 5
    default_padded_shares = IDX_ORDER[1 : 1 + len(share_idx_list)]
    pad_candidates = product(range(1 << pad_len), repeat=len(share_idx_list))
    for pad_vals in pad_candidates:
        given_shares = []
        for idx, data, pad_val in zip(share_idx_list, share_bytes_list, pad_vals):
            given_shares.append(Codex32String.from_seed(data, hrp + '1' + header_str + idx, pad_val)
        for share_idx in default_padded_shares:
            initial_share = Codex32String.interpolate_at(given_shares, share_idx)
            round_tripped_initial_share = Codex32String.from_seed(
                initial_share.data,
                hrp + "1" + header_str + share_idx,
            )
            if str(round_tripped_initial_share) != str(initial_share):
                break
        return Codex32String.interpolate_at(given_shares, target)

Try and see if it always recovers the correct seed from shares interpolated from initial shares that used the default padding. I think it should.

That it recovered the right seed for your vector is a bad omen because the d share was interpolated and per bip85 and bip93 "for a fresh secret" d should be an initial share (with default padding) for threshold 3, but by chance it passed the share d padding check and recovered the correct secret.

Copy link
Author

@scgbckbone scgbckbone Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what I'm missing, I can see that in your screenshot you're using d that is not round-tripped. I use from_seed that uses your CRC checksum as padding.

I updated the vector:
dd4767a

still does not recover correct secret s

@BenWestgate BenWestgate added bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers question Further information is requested labels Dec 8, 2025

def test_round_trip_recovery():
# secret share from seed
s = Codex32String.from_seed(bytes.fromhex("68f14219957131d21b615271058437e8"), "ms13k00ls")
Copy link
Owner

@BenWestgate BenWestgate Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recover_from_bytes expects the first k non-s shares to use the default padding. So if you encode the secret rather than recover it with interpolation we'd need to tweak the default_padded_shares to include 's'. see the comment on that line.

However for bip85 we don't directly encode secrets unless threshold is "0".

def recover_with_bytes(hrp, header_str, share_idx_list, share_bytes_list, target="s"):
"""Recover codex32 secret from share bytes assuming default padding."""
pad_len = (5 - (len(share_bytes_list[0]) * 8) % 5) % 5
default_padded_shares = IDX_ORDER[1 : 1 + len(share_idx_list)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to:
default_padded_shares = IDX_ORDER[ : len(share_idx_list)]
and it should be able to solve the padding for your vector where 's' is encoded from bytes.

As a standard I think A, C, D should get default padding not S, A, C for threshold 3 share sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants