You’re still signing data structures the wrong way

How do you package data before feeding it into a cryptographic algorithm like sign, encrypt, MAC or hash? This question has persisted for decades without adequate resolution. There are at least two important problems to solve. First, the encoding must produce Canon’s Output, because systems like Bitcoin struggle when two different encodings decode the same in-memory data. But more importantly, the encoding system should pay attention to the important problem domain separation.

To understand the issue, let’s look at a simple example using a well-known IDL like Protobufs. Imagine a distributed system that has two types of messages (among others): TreeRootwhich contains the root of a transparency tree, and KeyRevokeThis indicates that the key is being revoked:

message TreeRoot {
  int64 timestamp = 1;
  bytes hash = 2;
}
message KeyRevoke {
  int64 timestamp = 1;
  publicKeyFingerprint hash = 2;
}

By a stroke of bad luck, these two data structures end up being lined up field-by-field, even though as far as the program and the programmer are concerned, they mean completely different things. If any node in this system signs TreeRoot and injects the signature into the network, an attacker can attempt to forge
KeyRevoke Messages that serialize byte-by-byte into a signed tree root-like message, then staple TreeRoot signature on KeyRevoke data structure. Now it looks like the signer has signed KeyRevoke When he never did it, he only signed one TreeRoot. A verifier can be fooled into “verifying” a statement that the signer never intended.

This is not an ideological attack. Bitcoin, Ethereum have a long historical record of success in terms of DEX, TLS, JWT, and AWS, among others.

And although our little example deals with signing, the same ideas apply to MAC’ing (via HMAC or SHA-3), hashing, or even encryption, since most encryption these days is certified. In general, cryptography must guarantee that the sender and receiver agree not only on the content of the payload, but also on the “type” of the data.

Systems that have taken on domain separation use ad-hoc techniques, such as hashing the local name of surrounding program methods in Solana, best practices in Ethereum, or “context strings” in TLS v1.3. Given the rich variety of potentially serious bugs here, a more systematic approach is needed. While building FOKS, we invented one.

Idea: Domain Separator in IDL

The main idea behind FOKS’s scheme for serializing cryptographic data (called snowpack) is to inject random, immutable domain separators directly into the IDL:

struct TreeRoot @0x92880d38b74de9fb {
   timestamp @0 : Uint;
   hash @1 : Blob;
}

A simple compiler transpiles the IDL into the target language. In the target language, a runtime library provides a method to sign such an object: it creates a concatenation of the domain separator (@0x92880d38b74de9fb) and serialization of the object, and then feeds the byte stream into the signing primitive. Similarly, verification of an object verifies the corresponding reconstructed combination against the supplied signature. Note that the domain separator does not appear in the final serialization (which would waste bytes), because both the signer and the receiver agree to this through a shared protocol specification. Encrypt, HMAC and hash work in the same way.

In Go (as well as TypeScript and other languages), the type system enforces safety guarantees. The compiler outputs a method:

func (t TreeRoot) GetUniqueTypeID() uint64 { return 0x92880d38b74de9fb }

and this Sign And Verify The methods look like this:

func Sign(key Key, obj VerifiableObjecter) ([]byte, error) 
func Verify(key Key, sig []byte, obj VerifiableObjecter) error

VerifiableObjecter There is an interface that requires GetUniqueTypeID() method, in addition to other methods such as EncodeToBytes.

These 64-bit domain separators are not necessary for all structures, and many do not need them. However, these untagged structures are not found GetUniqueTypeID() methods, and therefore cannot be included in Sign Or Verify Without type errors. The same applies for encryption, MAC’ing, prefixed hashing, etc.

As long as the random domain separators are unique (which they will be with high probability globally), there is no possibility for the signer and verifier to know what data type they are dealing with. As we discussed earlier, any replacement will fail verification. Developers must use simple tooling in either the IDE or CLI to generate these random domain separators and insert them into their protocol specifications.

The logic behind the random generation of domain separators reminds us to generate p(x) Randomness in Robin Fingerprinting. In the base case, if Bob sits down to write a new project today, and generates all the domain separators randomly with very good probability, he knows authenticator Your project will never verify signatures generated by another existing project. Random generation saves him the effort of thinking about wrong collisions. As an inductive step, imagine that Mallory creates a new project after Bob publishes his protocol specification. She may intentionally reuse her domain separators. If Bob gives his project access to his private keys, he can trick verifiers in his project into verifying the signatures he generates. We claim that there is nothing to be done here. Mallory’s attack against domain separators is possible in any system, and since his project is malicious, it was a mistake to trust his private key in the first place. On the other hand, if Mallory randomly generates domain separators, she and Bob have the same desirable guarantees as in the base case.

The second risk is that AI coding or autocompletion agents may copy-paste existing domain separators, or generate them sequentially. The Snowpack compiler and runtime ensure that all domain separators are unique within the same project, and otherwise error or panic (respectively).

However developers are free to change the name of the structure TreeRoot However, they must keep the domain separator constant over the lifetime of the protocol, regardless of whether they add or remove fields. Like protobufs and cap’n proto, the system supports deletion and addition of fields, as long as the state of the remaining fields (as given) @0 And @1 above) will never change, and will not as long as the retired field is reused.

Snowpack IDL: Domain Separation + Canonical Encoding + More!

The built-in domain separation in Snowpack is a new idea. But overall, it has proven to be a simple and effective forward- and backward-compatible system for both RPC and serialization of inputs to cryptographic functions. We emphasize that a single system should serve both purposes well. For example, protobufs do not make any guarantees regarding canonical encoding. JSON encodings, although often used in cryptographic settings, have the drawback that they lack binary buffers (as output by most cryptographic primitives!), and therefore invite confusion between strings and Base64-encoded binary data.

However, the Snowpack checks all the boxes for us. The simple idea is to encode structures of the form TreeRoot In the form of JSON-like positional arrays above:

[ 1234567890, \xdeadbeef ] 

@1 Encoders and decoders are instructed to look at the protocol specification above hash : Blob Field in the first position of the array. Omitted and retired fields are encoded nilS. If TreeRoot
The message was upgraded to something that looks like this:

struct TreeRoot @0x92880d38b74de9fb {
   hash @1 : Blob;
   timestampMsec @2 : Uint;
}

The intermediate encoding becomes:

[ nil, \xdeadbeef, 1234567890123 ] 

Older decoders can still decode the new encoding, but look for 0-values.
timestamp They were expecting. Newer decoders can decode older encodings, but look for 0-values. timestampMsec The area they were expecting. It is of course up to the application developer to decide whether these conditions will break the program and consequently whether this protocol development makes sense, but they can rest assured that decoding at the protocol level will not fail.

From this intermediate encoding, Snowpack arrives at a flat byte-stream via Msgpack encoding, but with significant limitations. First, all integer encodings should use the smallest possible size encoding. And second, dictionaries containing more than one key-value pair are never sent to the encoder, so we can bypass the whole thorny issue of canonical key ordering. As a result, we end up with the canonical encoding every time.

The overall flow is as follows:

Go Structures snowpack intermediate JSON-like objects self telling bytes self telling intermediate JSON-like objects snowpack Go Structures

Unlike external conversions, internal conversions (to and from bytes) are self-describing and do not require a Snowpack protocol definition to accomplish. This design choice enables forward compatibility: older decoders can decode future messages. It also allows convenient debugging and inspection of the byte stream.

We have seen how structures encode and decode. Furthermore, Snowpack provides enough complexity to cover every situation seen in FOKS. Other important features are: ListS, Options And variantS. The first two find direct expression as array-based encodings. Variants, or tagged unions, encode as single key-value-pair dictionaries, allowing existing Msgpack libraries to decode them with type safety.

Summary

The domain separation bug has repeatedly plagued real systems. Existing mitigations are ad-hoc: reference strings, method-name hashes, and hand-rolled prefixes that are easy to forget and hard to audit.

Snowpack takes a different approach: random, immutable 64-bit domain separators reside in the IDL itself, and the type system ensures that you can’t sign, encrypt, or MAC an object that lacks it. We believe this basic idea is bigger than any one system, and we would love to see other numbering schemes adopt it. In the meantime, get it at Snowpack, which is open-source on GitHub, currently targeting Go and TypeScript with more languages ​​to come.

credit

Thanks to Jack O’Connor for his feedback on a draft of this post and for creating related systems affecting snowpack.



<a href

Leave a Comment