This repository was archived by the owner on Jan 8, 2026. It is now read-only.
hashmap: differentiate serialization of string and byte keys#192
Closed
hashmap: differentiate serialization of string and byte keys#192
Conversation
Member
Author
|
I also included original notes about |
vmx
approved these changes
Sep 11, 2019
Stebalien
pushed a commit
to Stebalien/specs
that referenced
this pull request
Sep 18, 2019
…proofs-to-datastructures add PoStProof description and note size of arrays
Contributor
|
@rvagg what’s the status of this one? |
Member
Author
|
we're going to drop this (for now at least) - the clear preference is for an ADL to behave just like its data model equivalent. There may be reason for an implementation to offer this, maybe as a side API, but it's probably not something we care about at the spec level where our data structures should be ADLs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an alternative to both #180 and #184; I'd like to retire those discussions.
The current state of keys in HashMap spec: the algorithm can accept both
stringandbytesas keys and they are hashed asbytesfor the purpose of indexing (this is implied but not explicitly stated by the current spec form) and for the purpose of serialisation into block form they are stored asbytesregardless of whether you providestringorbytes.The primary problem with this approach is that we lose the ability to differentiate when we deserialise. You require context to know whether they should be used as
bytesor converted back intostring. The algorithm has to be agnostic to this so it ends up getting pushed up the application stack. In naive usage, where you don't have much context, or haven't brought that context along for the ride (perhaps you're inspecting objects through the ipld explorer), you just get byte arrays, even if you were using them as strings. I believe it's fair to say that common usage of this data structure will be as it is in most programming languages: string keys. So being able to differentiate would be nice.The proposed solution here is to (1) explicitly allow both
stringandbytesin the spec, (2) define some basic rules for how these things should be consistently hashed, and (3) serialize them as their original form. So on the block, astringkey would be stored as astring. A byte array provided as a key would be stored asbytes.Minor complications exists if you use a HashMap with both
stringandbytekeys. I don't expect this will happen much in reality, particularly in the typed languages, you should have a consistent interface (especially if such interfaces are defined through schemas where you'd hopefully do something liketype MyMap { String : Foo } representation advanced HashMap- there's your context). Implementations have to do some awkward things like: sorting buckets of mixed types requires a bit of care, checking for the existence of a key also requires care because the same key could be provided asbytesorstringand the hash would be the same but you have to make sure that "does this already exist?" works properly. IMO these should be left to the implementation for now and they should also probably carry suggestions against mixed types, which I'm doing here: https://github.com/rvagg/iamap/pull/8/files#diff-04c6e90faac2675aa89e2176d2eec7d8R244