Anatomy of a Sphinx Packet

How to make every message look identical and why that's the hardest problem in network privacy.

Part 3 of 6 by Xythum Labs • ~17,000 words • 45-minute read

Here's a fun party trick. Take a 32KB chunk of random-looking bytes and tell me whether it's a Uniswap swap, a loan repayment, a cover traffic dummy, or someone's grocery list. You can't. Neither can the mix node processing it. Neither can a nation-state adversary watching every link in the network.

That's the Sphinx packet format [1]. And it's criminally underappreciated I've met senior cryptographers who've never read the paper. It's a 2009 result by George Danezis and Ian Goldberg that solves a genuinely hard problem in 6 pages. Six. Satoshi's whitepaper had more than that, and Satoshi was solving an easier problem. (Fight me.)

In this post I'm going to walk through how Sphinx works not at the hand-wavy "it's like an onion" level, but at the level of actual cryptographic operations. We'll trace a real packet byte-by-byte through three mix nodes, watch the group element transform at each hop, see exactly how the routing information shifts without leaking position, and understand why a single flipped bit causes the entire packet to self-destruct at the next honest node.

If you understand Diffie-Hellman key exchange and symmetric encryption, you can follow everything. If you don't, you'll still get the intuition, and the ASCII diagrams should carry you through.

We'll also cover the practical engineering the implementation gotchas that NOX's Rust implementation taught us the hard way, the performance profile down to the microsecond, the tradeoffs we made and why, and the post-quantum future that will eventually replace all of this. Plus: proof-of-work for spam prevention, SURBs for anonymous replies, Reed-Solomon FEC for reliable delivery, and why NOX's packets are 16x larger than everyone else's.

Fair warning: this is long. Grab coffee. The details matter.

The Problem Sphinx Solves

Let me set the stage with a concrete DeFi example. Alice holds 50 ETH in a privacy pool. She wants to swap 10 ETH for USDC through a private order. If her swap intent travels through the mixnet in a way that allows any observer to correlate "packet entering the network from Alice's IP" with "packet exiting the network containing a swap order," the privacy pool is useless the observer knows Alice initiated the swap, and from there it's trivial to trace the UTXO chain.

This isn't hypothetical. Ethereum's public mempool gives block builders and searchers real-time visibility into pending transactions. MEV extraction generated over $600M in realized value between 2020-2023 [21]. Privacy pools like Tornado Cash showed that on-chain privacy without network-level privacy is insufficient researchers at Chainalysis and TRM Labs routinely de-anonymized Tornado Cash deposits by correlating timing, amounts, and network-level metadata.

So Alice needs a mix network. And the mix network needs a packet format that makes her traffic genuinely unlinkable. Say she wants to send a message through three mix nodes M1, M2, M3 to reach Bob (where "Bob" is the exit node that will execute her swap). The naive approach is onion encryption: encrypt the message for M3, then encrypt that for M2, then encrypt that for M1. Each node strips one layer and forwards the result.

This seems fine until you think about it for thirty seconds. Let me count the ways it fails.

Problem 1: The Packet Shrinks

M1 receives a packet of size S. After stripping one encryption layer, it forwards a packet of size S - overhead. M2 receives that, strips another layer, forwards something smaller. An observer who can see packets at different points in the network can match them by size.

"Ah, the 1,200-byte packet entering M1 became the 1,100-byte packet leaving M1 and entering M2." So much for unlinkability.

This isn't a theoretical concern. Murdoch and Danezis demonstrated packet-size correlation attacks against Tor in 2005 [11], achieving route fingerprinting in under five minutes. Tor mitigates this with fixed-size cells (514 bytes), but early onion routing designs without fixed sizes were completely broken by this trivial observation.

Problem 2: The Headers Leak Position

Each node needs routing information "forward this to the next node." If you naively include all routing info in a plaintext header, every node sees the full path. But even if you encrypt the headers separately, each hop's decryption changes the header differently. Node M1 sees header H1. After decryption, the header becomes H2 (shorter, or differently structured). An observer who sees H1 entering M1 and H2 leaving can correlate them by the structural transformation, even without reading the contents.

Problem 3: The Group Element Problem

Suppose you use a fresh Diffie-Hellman key exchange with each hop. Alice needs to include an ephemeral public key in the packet so each node can compute a shared secret. But if she puts the same public key g^x in every layer, any two colluding nodes can see they received a packet with the same ephemeral key trivially linking Alice's packet across hops.

The naive fix is to use a different ephemeral key for each hop. But that requires encoding N public keys in the header (one per hop), and an observer at hop i knows they're hop i because they received the i-th key. Position leaked.

Problem 4: Tagging Attacks

Here's a subtle one. An adversary at hop 1 flips a bit in the encrypted payload. If none of the subsequent nodes detect the modification, the adversary can watch for a corrupted packet emerging from the network and correlate it with the packet they tagged. "I flipped bit 47 of this packet entering M1, and a packet with a flipped bit 47 just came out of M3 those are the same message."

This is the tagging attack, first described by Syverson et al. [12], and it's devastating against any onion encryption scheme that doesn't authenticate each layer.

What Sphinx Does

Sphinx solves all four problems simultaneously:

Constant-size packets. Every Sphinx packet is exactly the same size at every hop. No information leaks through packet length.
Self-transforming headers. The routing information is XOR-encrypted with pseudorandom padding that fills the space vacated by the stripped routing command. Headers are always the same length and always look uniformly random.
Re-blinding. A single ephemeral key is transformed at each hop using a deterministic "blinding factor." Each node sees a different group element, and no two group elements are linkable without the node's private key.
Per-hop MACs. Each hop verifies an HMAC before processing. Any modification even a single bit flip causes the MAC to fail, and the packet is silently dropped. Tagged packets don't survive.

And the whole thing is provably secure under the Gap Diffie-Hellman assumption [2]. (More on why it's GDH and not DDH in a moment.)

The Sphinx Proof Gap

In December 2023, Scherer et al. [2] published a result that should have been a wake-up call for the entire mixnet community. They demonstrated that Sphinx's original security proof, which relies on the Decisional Diffie-Hellman (DDH) assumption, is insufficient. The actual security guarantee requires the stronger Gap Diffie-Hellman (GDH) assumption.

The difference matters. DDH says an adversary cannot distinguish (g^a, g^b, g^ab) from (g^a, g^b, g^c). GDH says DDH holds even when the adversary has access to a decisional oracle it can ask "is this tuple a valid DH triple?" and still cannot distinguish real from random elements in other contexts. Sphinx's blinding mechanism, where each hop transforms the group element via alpha_{i+1} = alpha_i ^ b_i, requires this stronger assumption because the adversary can test whether two observed group elements are related.

Think about it concretely. An adversary observing alpha_0 entering M1 and alpha_1 leaving M1 could try to check whether alpha_1 = alpha_0 ^ b_1 for some b_1 derivable from the observed traffic. Under DDH alone, the proof doesn't rule this out. Under GDH, the oracle doesn't help because the adversary can't derive the blinding factor without M1's private key.

What makes this finding significant is the blast radius. Every system that uses Sphinx Nym [3], Katzenpost [4], HOPR, the Lightning Network's BOLT-04 onion routing [5] was operating for years without the formal security guarantees its designers assumed. The proof wasn't wrong; it was incomplete. The cryptographic operations are sound. But the formal reduction from "breaking Sphinx" to "breaking a hard problem" had a gap that nobody caught in 14 years of peer review and production deployment.

Scherer et al. provided the corrected proof under GDH, and the practical impact is minimal GDH is widely believed to hold in the groups these systems use. But the episode is a powerful reminder that even well-cited security proofs need periodic re-examination. If your system's security argument rests on a single paper from 2009, it is worth checking whether the intervening years have revealed any cracks.

(I'll note that the 14-year gap is actually par for the course in cryptography. The OAEP proof had a gap for 7 years before Shoup found it in 2001 [13]. The Schnorr signature proof assumed the Random Oracle Model for 20 years before anyone formalized the Generic Group Model alternative. Proofs are hard. Checking proofs is harder. Checking proofs of protocols you're actually running in production is apparently hardest of all.)

The Packet Structure

A Sphinx packet has two parts: a header and a payload.

+------------------------ SPHINX PACKET (32,768 bytes) ------------------------+
|                                                                               |
|  +------------------- HEADER (472 bytes) --------------------+               |
|  |                                                           |               |
|  |  Group Element     32 bytes   (X25519 ephemeral pubkey)   |               |
|  |  Routing Info     400 bytes   (XOR-encrypted, per-hop)    |               |
|  |  MAC               32 bytes   (HMAC-SHA256)               |               |
|  |  PoW Nonce          8 bytes   (anti-spam)                 |               |
|  |                                                           |               |
|  +-----------------------------------------------------------+               |
|                                                                               |
|  +------------------- PAYLOAD (~32,296 bytes) ---------------+               |
|  |                                                           |               |
|  |  Encrypted message content                                |               |
|  |  (ChaCha20 per hop, constant size)                        |               |
|  |                                                           |               |
|  +-----------------------------------------------------------+               |
|                                                                               |
+-------------------------------------------------------------------------------+

Let me break down each field:

Group Element (32 bytes). This is an X25519 public key a point on Curve25519 encoded as 32 bytes. It changes at every hop via the blinding trick. This is the field that enables key agreement without revealing position.

Routing Info (400 bytes). This is the nested, encrypted routing information. It's sized for 3 hops in our stratified topology, with 128 bytes per hop: 1 byte for a hop type flag (forward or exit), 1 byte for address length, 32 bytes for the next-hop MAC, and up to ~90 bytes for the multiaddr of the next node. The remaining bytes in the 400-byte block are pseudorandom padding.

MAC (32 bytes). An HMAC-SHA256 tag computed over the routing info. Each hop verifies this before processing anything else. If verification fails, the packet is dropped. No exceptions. No error messages. Silent death.

PoW Nonce (8 bytes). A proof-of-work nonce for anti-spam. The sender brute-forces this so that SHA256(group_element || routing_info || mac || nonce) has a required number of leading zero bits. The 8-byte nonce gives a 2^64 search space, which is vastly more than needed for any practical difficulty level. Even at difficulty 40 (which would require ~10^12 hash operations on average), the nonce space is not exhausted. More on PoW in the dedicated section below.

Payload (~32,296 bytes). The actual message content, padded to a fixed size and layered with ChaCha20 symmetric encryption. Each hop decrypts one layer. At the exit node, the plaintext payload is recovered.

The exact payload size depends on the packet implementation. In our fixed-size SphinxPacket wrapper, the total packet is 32,768 bytes, with 1,024 bytes reserved for the header area (which includes the 472-byte Sphinx header plus room for future extensions), 12 bytes for a ChaCha20Poly1305 nonce, and 16 bytes for the Poly1305 authentication tag. The maximum payload before padding is 31,716 bytes comfortably large enough for any DeFi operation.

In our implementation, the total packet is 32,768 bytes (32KB). Why 32KB? That's a section unto itself see "The 32KB Decision" below.

Step 1: Route Selection

Alice first picks her route through the network. In a stratified topology, she selects one node from each layer uniformly at random:

Alice --> M1 (Layer 1) --> M2 (Layer 2) --> M3 (Layer 3 / Exit) --> Destination

With 10 nodes per layer, there are 1,000 equally probable paths. Alice looks up each node's public key from the topology registry. Now she has:

pk_1, pk_2, pk_3 the X25519 public keys of her three hops
The routing commands for each hop (basically: "forward to the next node" or "you're the exit, deliver this")

Why three hops? Danezis showed in the original Loopix paper [8] that three layers provide a good tradeoff between anonymity and latency in stratified topologies. Fewer layers reduce the anonymity set (with 2 layers and 10 nodes/layer, only 100 paths exist too few). More layers add latency without proportional privacy gain, because the mixing at each hop already provides strong unlinkability. The "sweet spot" of 3 layers with 10+ nodes per layer gives 1,000+ equally-probable paths, and at 1ms average per-hop Poisson delay, the 3-hop path adds only ~3ms of mixing delay.

The stratified topology also has a security property: to fully deanonymize a message, an adversary must compromise at least one node in every layer of the path. With 10 nodes per layer, the probability of a fully compromised path is (f/10)^3 where f is the number of compromised nodes per layer. If the adversary controls 2 nodes in each layer (20% corruption), the probability of a fully compromised path is (2/10)^3 = 0.008 less than 1%. Compare this to a free-route topology (like Tor's), where the adversary can position compromised nodes anywhere and potentially control both the entry and exit of a circuit.

The route selection must be uniformly random within each layer. If Alice always picks the same node in layer 1 (say, because it's geographically closest), an adversary who compromises that node can link all of Alice's traffic. The topology registry provides the list of nodes per layer, and Alice selects uniformly at random from each list.

This uniform selection is enforced at the client level the mix nodes themselves don't know (and can't verify) how the route was selected. A buggy or malicious client implementation that biases route selection weakens its own privacy without affecting other users. This is an important property: one user's poor route selection doesn't compromise the anonymity of other users in the network.

Step 2: Key Derivation (The Clever Part)

This is where Sphinx gets genuinely elegant. I've read a lot of cryptographic protocol designs, and the key derivation scheme in Sphinx is one of the prettiest things I've seen. It solves a seemingly impossible problem: how does Alice establish a shared secret with each of three nodes using only a single group element in the header?

The Setup

Alice generates a single ephemeral private key x a random Curve25519 scalar. She computes the corresponding public element:

alpha_0 = g * x        (Curve25519 base point multiplication)

This 32-byte alpha_0 is the group element that will travel with the packet. It's the only public-key material in the header.

First Hop: Standard ECDH

For the first hop, Alice does a normal X25519 Diffie-Hellman key exchange:

shared_secret_1 = pk_1 * x = g * (a1 * x)

where a1 is M1's private key. Simple. M1 will compute the same shared secret as alpha_0 * a1 = g * (x * a1). Diffie-Hellman commutativity gives both parties the same result.

Second Hop: The Blinding Trick

For the second hop, Alice can't just do pk_2 * x, because that would require sending the same alpha_0 to M2. If M2 sees the same group element that M1 saw, and M1 and M2 collude, they can trivially link the packets.

Instead, M1 re-blinds the group element before forwarding. After processing, M1 replaces alpha_0 with:

alpha_1 = alpha_0 * b_1

where b_1 is a blinding factor deterministically derived from shared_secret_1:

b_1 = SHA256("blind" || shared_secret_1) mod L

(where L is the order of the Curve25519 group, and mod L gives us a valid scalar). This is a one-way derivation knowing b_1 doesn't reveal shared_secret_1, and knowing alpha_0 and alpha_1 doesn't reveal b_1 without M1's private key.

Here's the key insight: Alice can predict alpha_1 because she knows shared_secret_1 (she computed it herself), so she knows b_1, so she can compute alpha_1 = alpha_0 * b_1. And she can use this to compute the shared secret M2 will derive:

shared_secret_2 = pk_2 * (x * b_1)

M2, upon receiving alpha_1, computes:

shared_secret_2 = alpha_1 * a2 = (alpha_0 * b_1) * a2 = g * (x * b_1 * a2)

Alice computes:

shared_secret_2 = pk_2 * (x * b_1) = (g * a2) * (x * b_1) = g * (a2 * x * b_1)

Same result. Beautiful.

Third Hop and Beyond

The pattern continues. For the third hop:

alpha_2 = alpha_1 * b_2
shared_secret_3 = pk_3 * (x * b_1 * b_2)

In general, for hop i:

alpha_{i} = alpha_{i-1} * b_i
shared_secret_{i+1} = pk_{i+1} * (x * b_1 * b_2 * ... * b_i)

Alice can derive all shared secrets at packet creation time, because she knows all the blinding factors (they're deterministic from the shared secrets, which she computes iteratively each one feeds into the next).

Why X25519?

We use X25519 (Curve25519 in Montgomery form) for the Diffie-Hellman operations. The choice is deliberate:

Constant-time implementation. X25519, as designed by Bernstein [14], is inherently constant-time. The Montgomery ladder for scalar multiplication takes the same number of steps regardless of the scalar value. This is critical in a mix node that processes packets from potentially adversarial senders any timing variation in the ECDH computation could leak information about the node's private key.

Speed. X25519 is one of the fastest elliptic curve operations available. On modern hardware, a single key exchange takes ~36 microseconds in our Rust implementation. For a mix node processing thousands of packets per second, this matters.

Small keys. 32 bytes for a public key means the group element fits cleanly in the Sphinx header. Compare this with post-quantum alternatives where public keys can be 800+ bytes (ML-KEM-768: 1,184 bytes) or even larger (ML-KEM-1024: 1,568 bytes). In a Sphinx header that needs to carry multiple group elements' worth of data across several hops, key size directly impacts header overhead.

Wide deployment. X25519 is used in TLS 1.3, Signal, WireGuard, SSH (since OpenSSH 6.5), and every major cryptographic library. The implementation is battle-tested across billions of connections. More importantly for our use case, the curve25519-dalek Rust library is one of the most audited and fuzzed elliptic curve implementations in the open-source ecosystem it's the same library used by the Solana blockchain for signature verification under load far exceeding anything a mixnet will encounter.

No cofactor pitfalls. X25519's design (Bernstein, 2006 [14]) explicitly handles small-subgroup attacks by clamping the scalar's lowest bits. The Montgomery ladder rejects points not on the curve. This means our Sphinx implementation doesn't need separate checks for invalid curve points X25519 handles it at the primitive level. One less thing to get wrong.

From Shared Secret to Symmetric Keys

From each shared secret, Alice derives four symmetric keys using tagged SHA-256 hashing:

rho  = SHA256("rho"  || shared_secret)    // routing info encryption key
mu   = SHA256("mu"   || shared_secret)    // HMAC key for integrity
pi   = SHA256("pi"   || shared_secret)    // payload encryption key
blind = SHA256("blind" || shared_secret)   // blinding factor (as Scalar)

Four calls to SHA-256, each with a unique domain separator prefix. This is essentially a poor person's KDF not quite HKDF [15], but the domain separation provides the key independence guarantee we need. Each key is used for exactly one purpose, and knowing one key doesn't reveal the others (assuming SHA-256 behaves as a random oracle, which is the standard assumption).

In our implementation (derive_keys() in nox-crypto/src/sphinx/mod.rs), the blinding factor gets an additional step: the 32-byte SHA-256 output is reduced modulo the Curve25519 group order via Scalar::from_bytes_mod_order(). This ensures the result is a valid scalar for the elliptic curve multiplication.

This is the mechanism that Scherer et al. [2] showed requires GDH rather than DDH. The blinding step alpha_{i+1} = alpha_i * b_i creates a relationship between consecutive group elements that a DDH adversary with oracle access could potentially detect. Under GDH, this relationship remains hidden.

A Worked Example: One Packet, Three Hops

Let me trace a concrete Sphinx packet through three hops. I'll use abbreviated hex values to keep things readable, but these are realistic representations of what the actual bytes look like.

Setup

Alice wants to send the message "SWAP 1.5 ETH → USDC" through three mix nodes.

Node keys (X25519):
  M1: sk_1 = 3a8f...c721,  pk_1 = 9b4e...d832
  M2: sk_2 = 7c2d...a156,  pk_2 = 2f71...e944
  M3: sk_3 = 1b5e...f387,  pk_3 = e8a3...1b76
 
Alice generates:
  ephemeral secret:  x     = 4d91...b2e3
  ephemeral public:  alpha_0 = g * x = 5c7a...f128

Alice Computes All Shared Secrets

Step 1: shared_secret_1 = pk_1 * x = 8e3c...a7f1
        b_1 = SHA256("blind" || 8e3c...a7f1) mod L = 2b5d...c043
        rho_1 = SHA256("rho" || 8e3c...a7f1) = a1f2...3e89
        mu_1  = SHA256("mu"  || 8e3c...a7f1) = 7c9a...d412
        pi_1  = SHA256("pi"  || 8e3c...a7f1) = b5e3...8a71
 
Step 2: alpha_1 = alpha_0 * b_1 = 3d9f...e256  (this is what M2 will see)
        shared_secret_2 = pk_2 * (x * b_1) = f471...2b83
        b_2 = SHA256("blind" || f471...2b83) mod L = 9c4a...e871
        rho_2 = SHA256("rho" || f471...2b83) = c3d7...1a45
        mu_2  = SHA256("mu"  || f471...2b83) = 5e82...f937
        pi_2  = SHA256("pi"  || f471...2b83) = 1a8c...4e92
 
Step 3: alpha_2 = alpha_1 * b_2 = a27b...1c94  (this is what M3 will see)
        shared_secret_3 = pk_3 * (x * b_1 * b_2) = 6da2...9e15
        rho_3 = SHA256("rho" || 6da2...9e15) = d9f1...7b23
        mu_3  = SHA256("mu"  || 6da2...9e15) = 3a6e...c548
        pi_3  = SHA256("pi"  || 6da2...9e15) = 8e4b...2d16

Notice: alpha_0, alpha_1, and alpha_2 look completely unrelated. 5c7a...f128, 3d9f...e256, a27b...1c94. Without the blinding factors (which require the shared secrets, which require the private keys), there is no way to determine that these three group elements are related.

Alice Builds the Routing Info (Backwards)

Alice constructs the 400-byte routing info section from the inside out last hop first.

Layer 3 (M3 = exit node):
  routing_cmd_3 = [0x01]                         // "You're the exit"
  plaintext_3 = routing_cmd_3 || zeros(399)       // Pad to 400 bytes
  routing_3 = Encrypt(rho_3, plaintext_3)         // ChaCha20 with key rho_3
 
Layer 2 (M2 = relay):
  routing_cmd_2 = [0x00]                          // "Forward"
            || [addr_len]                          // Length of M3's address
            || mac_3                               // The MAC M3 will verify
            || addr_3                              // M3's network address
  plaintext_2 = routing_cmd_2 || routing_3[0..272] // Prepend to inner layers
  routing_2 = Encrypt(rho_2, plaintext_2)          // ChaCha20 with key rho_2
 
Layer 1 (M1 = entry):
  routing_cmd_1 = [0x00]                          // "Forward"
            || [addr_len]                          // Length of M2's address
            || mac_2                               // The MAC M2 will verify
            || addr_2                              // M2's network address
  plaintext_1 = routing_cmd_1 || routing_2[0..272] // Prepend to inner layers
  routing_1 = Encrypt(rho_1, plaintext_1)          // ChaCha20 with key rho_1

Alice Builds the Payload

payload = "SWAP 1.5 ETH → USDC" || padding      // Pad to ~32,296 bytes
encrypted_payload = Encrypt(pi_3, Encrypt(pi_2, Encrypt(pi_1, payload)))

Three layers of ChaCha20, applied inside-out. First pi_1 (outermost), then pi_2, then pi_3 (innermost). Wait actually, Alice applies pi_3 first, then pi_2, then pi_1, because she's building the onion from inside out. When M1 decrypts with pi_1, it removes the outermost layer, revealing the pi_2 layer. M2 decrypts with pi_2, revealing the pi_3 layer. M3 decrypts with pi_3, revealing the plaintext.

The Assembled Packet

PACKET (32,768 bytes):
  [ alpha_0 (32 bytes) ][ routing_1 (400 bytes) ][ mac_1 (32 bytes) ][ pow_nonce (8 bytes) ][ encrypted_payload (~32,296 bytes) ]
    5c7a...f128            a8b2...encrypted...       9e1d...f743          0000...0042            c3f7...encrypted...

Hop 1: M1 Processes the Packet

M1 receives the 32KB packet.

1. Extract alpha_0 = 5c7a...f128
 
2. ECDH: shared_secret = alpha_0 * sk_1 = 8e3c...a7f1
   (Same value Alice computed!)
 
3. Derive keys:
   rho_1 = a1f2...3e89,  mu_1 = 7c9a...d412,  pi_1 = b5e3...8a71,  b_1 = 2b5d...c043
 
4. Verify MAC: HMAC-SHA256(mu_1, routing_info) == mac_1?
   ✓ YES → continue
 
5. Decrypt routing info with rho_1:
   First 128 bytes reveal: [0x00][len][mac_2][M2's address]
   → "Forward this to M2"
 
6. Shift routing info:
   Remove first 128 bytes, append 128 bytes of PRG padding
   (derived from rho_1 keystream, starting at offset 400)
 
7. Blind the group element:
   alpha_1 = alpha_0 * b_1 = 3d9f...e256
 
8. Decrypt one payload layer:
   ChaCha20_Decrypt(pi_1, encrypted_payload) → still encrypted (2 layers remain)
 
9. Assemble forwarded packet:
   [ alpha_1 ][ shifted_routing ][ mac_2 ][ 0 ][ partially_decrypted_payload ]

The packet leaving M1 is still 32,768 bytes. But every single byte is different. Different group element. Different routing info. Different MAC. Different payload ciphertext. Same size.

Hop 2: M2 Processes the Packet

1. Extract alpha_1 = 3d9f...e256  (completely different from alpha_0!)
 
2. ECDH: shared_secret = alpha_1 * sk_2 = f471...2b83
 
3. Derive keys, verify MAC ✓, decrypt routing → "Forward to M3"
 
4. Shift routing, blind: alpha_2 = alpha_1 * b_2 = a27b...1c94
 
5. Decrypt payload layer with pi_2
 
6. Forward to M3

Hop 3: M3 Processes the Packet (Exit)

1. Extract alpha_2 = a27b...1c94
 
2. ECDH: shared_secret = alpha_2 * sk_3 = 6da2...9e15
 
3. Derive keys, verify MAC ✓, decrypt routing → [0x01] = "You're the exit"
 
4. Decrypt final payload layer with pi_3:
   → "SWAP 1.5 ETH → USDC"
 
5. Execute the requested operation

What an Observer Sees

Let me make the unlinkability concrete with a side-by-side comparison of the three versions of this packet as seen by Alice, by M1→M2, and by M2→M3:

+---------------------------------------------------------------------+
|              The Same Packet at Three Points in the Network          |
+---------------------------------------------------------------------+
 
Alice → M1:
  Group Element:  5c7a 2e91 b3f4 d786 ... f128    (32 bytes)
  Routing Info:   a8b2 17c3 e94f 5d21 ... 6fa0    (400 bytes)
  MAC:            9e1d 4a73 b258 cf96 ... f743    (32 bytes)
  PoW Nonce:      0000 0000 0000 0042              (8 bytes)
  Payload:        c3f7 91a2 4b85 de63 ... 8d1e    (32,296 bytes)
 
M1 → M2:
  Group Element:  3d9f 8b14 a672 e5c3 ... e256    (32 bytes)  ← DIFFERENT
  Routing Info:   f1c4 92d8 7ea5 b310 ... 2cb7    (400 bytes) ← DIFFERENT
  MAC:            5a82 e6f1 c934 71d8 ... 4b29    (32 bytes)  ← DIFFERENT
  PoW Nonce:      0000 0000 0000 0000              (8 bytes)  ← DIFFERENT
  Payload:        87e2 3c5f a149 d0b6 ... 72a4    (32,296 bytes) ← DIFFERENT
 
M2 → M3:
  Group Element:  a27b 45d8 3c91 f6e0 ... 1c94    (32 bytes)  ← DIFFERENT
  Routing Info:   28e7 b5a3 d14c 8f69 ... e8d1    (400 bytes) ← DIFFERENT
  MAC:            c46f 1d98 a572 3be0 ... 917c    (32 bytes)  ← DIFFERENT
  PoW Nonce:      0000 0000 0000 0000              (8 bytes)
  Payload:        4b91 d7e2 8c35 fa60 ... 5de3    (32,296 bytes) ← DIFFERENT

Every single field is different at every hop. The total packet size is always 32,768 bytes. There is no structural fingerprint, no recurring pattern, no field that an adversary could use to correlate across hops.

Now imagine an adversary with a global view they can see every packet on every link simultaneously. They see 32KB chunks of random-looking data flowing between mix nodes. Without the private keys of the mix nodes, they cannot determine which outgoing packet corresponds to which incoming packet at any node. This is what "bitwise unlinkability" means, and it's the core guarantee Sphinx provides.

To put a number on it: with 10 nodes per layer and 1,000 packets in transit, the adversary's best strategy is random guessing. The probability of correctly linking a packet across all 3 hops is (1/1000) × (1/1000) = 10^-6. Compare this to naive onion routing without constant-size packets, where packet-size correlation gives near-certain linkability effectively P ≈ 1.

Step 3: Building the Routing Information (The Filler Trick)

The routing info section is 400 bytes specifically, 128 bytes × 3 hops + filler. Alice builds it backwards, starting from the last hop. We covered the high-level flow above, but the filler mechanism deserves its own explanation because it's one of the subtle parts of Sphinx that everyone gets wrong the first time.

Here's the problem. When M1 strips its 128-byte routing command and decrypts, the remaining routing info would be 128 bytes shorter. Alice pads it with 128 bytes of pseudorandom data (derived from the keystream of rho_1 at offsets 400–528). Now the routing info is back to 400 bytes. So far so good.

But wait when M2 decrypts its layer, those padding bytes from M1 get decrypted too. If they don't decrypt to the right thing, the packet is corrupted. Specifically, the padding from M1 needs to decrypt under M2's key to... more valid padding. And M2's padding needs to decrypt under M3's key correctly.

This is the "filler" computation. Alice pre-computes what the padding bytes will look like after each successive decryption, and she writes the correct values into the exit hop's routing layer so that everything lines up. The filler is computed iteratively:

filler = []
for i in range(num_hops - 1):
    keystream = ChaCha20_KeyStream(rho_i, length=ROUTING_INFO_SIZE + SHIFT_SIZE)
    # XOR existing filler with the keystream at the right offset
    for j in range(len(filler)):
        filler[j] ^= keystream[ROUTING_INFO_SIZE - len(filler) + j]
    # Extend filler with the new padding bytes
    filler.extend(keystream[ROUTING_INFO_SIZE : ROUTING_INFO_SIZE + SHIFT_SIZE])

The result is a filler blob that, when placed at the end of the exit hop's routing layer, will survive all the intermediate decryptions and produce valid padding bytes at each step.

This is not intuitive. I had to draw it out on paper three times before it clicked. The key insight is that each encryption/decryption operation is just XOR with a keystream (ChaCha20 is a stream cipher), so Alice can "pre-cancel" the XOR operations that will happen at each hop by building the filler in the same order that the hops will decrypt it.

If the filler is wrong by even one byte, the MAC at the next hop will fail and the packet will be dropped. This is by design it's part of the integrity chain that prevents any unauthorized modification from surviving.

Step 4: Building the Payload

The payload construction is comparatively simple. Alice writes her message, pads it to the fixed payload size using ISO/IEC 7816-4 padding (append 0x80, fill remaining bytes with 0x00), and then encrypts it in layers innermost hop first, outermost hop last.

plaintext = "SWAP 1.5 ETH → USDC"
padded = plaintext || 0x80 || 0x00...0x00    (total: ~32,296 bytes)
 
layer_3 = ChaCha20_Encrypt(pi_3, padded)
layer_2 = ChaCha20_Encrypt(pi_2, layer_3)
layer_1 = ChaCha20_Encrypt(pi_1, layer_2)

Each hop decrypts one layer. M1 applies ChaCha20_Decrypt(pi_1, ...) to get layer_2. M2 applies pi_2 to get layer_3. M3 applies pi_3 to get the padded plaintext.

In the original Sphinx spec, each layer uses a Strong Pseudo-Random Permutation (SPRP), like LIONESS [16]. An SPRP has an avalanche property: any modification to the ciphertext scrambles the entire plaintext unpredictably on decryption. This means if an adversary modifies even one bit of the encrypted payload, the decrypted result is garbage, and the authentication tag won't match. That's how Sphinx detects tagging attacks on the payload.

We use ChaCha20 instead of LIONESS. This is a deliberate tradeoff. ChaCha20 is significantly faster (it's a stream cipher), but it doesn't have the avalanche property. A bit flip in the ciphertext produces a predictable bit flip in the plaintext. This makes our payload layer vulnerable to tagging attacks. We know this.

This is the same tradeoff the Lightning Network made [5]. BOLT-04's modified Sphinx uses ChaCha20-Poly1305 for per-hop encryption instead of LIONESS. In their case, the rationale is identical: LIONESS operates on the entire payload as a single wide-block cipher, which means the entire payload must be in memory and processed as a unit. For our 32KB payloads, LIONESS would process all 32,768 bytes as a single wide-block permutation per hop that's 98KB of LIONESS computation per packet (3 hops), compared to 98KB of ChaCha20 stream cipher (orders of magnitude faster). When you're processing thousands of packets per second, this matters.

Our mitigation: the header integrity chain (per-hop HMACs) catches modifications to the routing information, and the mixing layer's Poisson delays prevent timing-based correlation. Payload tagging remains a known limitation documented in our threat model. (More on this in Part 6, where we talk honestly about what we haven't built yet.)

Step 5: Computing the MAC And Why HMACs Matter

The final piece is the MAC (Message Authentication Code). Each hop needs to verify that the packet hasn't been tampered with. Alice computes an HMAC-SHA256 for each hop, covering the routing information as it will appear at that hop.

mac_3 = HMAC-SHA256(mu_3, routing_info_as_seen_by_M3)
mac_2 = HMAC-SHA256(mu_2, routing_info_as_seen_by_M2)
mac_1 = HMAC-SHA256(mu_1, routing_info_as_seen_by_M1)

The MAC for the first hop goes into the packet header. The MACs for subsequent hops are embedded inside the encrypted routing commands each hop's routing command includes "forward to next node, and the MAC the next node should check is [mac_{i+1}]."

This chaining is critical. If an adversary modifies any part of the header between hops, the MAC verification will fail at the next honest node, and the packet gets dropped. No modified packets make it through the network they're detected and destroyed.

The Tagging Attack and Why HMACs Matter

Let me expand on why this is so important. A tagging attack works like this:

TAGGING ATTACK (without per-hop MACs):
 
1. Adversary controls (or observes) M1 and M3
2. Packet enters M1 with payload P
3. Adversary at M1 flips bit 42 of the payload
4. Modified packet travels through M2 (honest node, doesn't detect the flip)
5. Packet arrives at M3
6. Adversary at M3 checks: "is bit 42 of the decrypted payload flipped?"
7. If yes: this is the same packet I tagged at M1 → I've linked the two hops

The adversary doesn't need to break any encryption. They don't need to read the packet. They just need to mark it and recognize the mark later. With a stream cipher like ChaCha20, a bit flip in the ciphertext produces a predictable bit flip in the plaintext, so the tag survives all intermediate decryption steps.

Sphinx's per-hop MACs prevent this for the header:

WITH per-hop MACs:
 
1. Adversary at M1 flips bit 42 of the routing info
2. Packet arrives at M2
3. M2 computes HMAC-SHA256(mu_2, modified_routing_info)
4. MAC does NOT match the expected value embedded in M1's routing command
5. M2 drops the packet immediately
 
Result: the tagged packet never reaches M3. Attack fails.

Why HMAC-SHA256 specifically, and not just a plain hash? Two reasons:

Key-dependent. Each hop's MAC key mu_i is derived from the shared secret between Alice and that hop. An adversary who doesn't know the shared secret can't compute a valid MAC for the modified routing info. If we used a plain hash, the adversary could modify the data, recompute the hash, and replace it.
Resistance to length extension. HMAC's nested construction (H(K ⊕ opad || H(K ⊕ ipad || message))) prevents length-extension attacks that plague naive constructions like H(key || message). With SHA-256, a length-extension attack would let an adversary who sees H(key || message) compute H(key || message || attacker_suffix) without knowing the key. HMAC prevents this.

The MAC verification is constant-time in our implementation (using the subtle crate's ct_eq), preventing timing-based oracle attacks where an adversary could learn partial MAC values by measuring how long verification takes.

MAC Verification in code (nox-crypto/src/sphinx/mod.rs):
 
let calculated_mac = compute_mac(&mu, &self.routing_info);
if calculated_mac.ct_eq(&self.mac).unwrap_u8() == 0 {
    return Err(SphinxError::MacMismatch);
}

Note: the payload is NOT covered by the per-hop MAC. This is by design the payload is too large (32KB) for the MAC to cover efficiently at each hop, and the payload encryption uses ChaCha20 (a stream cipher without built-in integrity). This is the gap that allows payload tagging attacks. LIONESS would close this gap, at the cost of performance.

How a Mix Node Processes a Packet

Now let's look at the processing path in detail. A Sphinx packet arrives at M2. Here's exactly what happens, operation by operation:

 1. Extract the group element alpha_1 from the header (32 bytes)
 
 2. ECDH: shared_secret = sk_2 * alpha_1                       [35.8 µs]
    → X25519 scalar multiplication, constant-time
 
 3. Derive 4 keys from shared_secret via tagged SHA-256:        [0.49 µs]
    rho = SHA256("rho" || shared_secret)
    mu  = SHA256("mu"  || shared_secret)
    pi  = SHA256("pi"  || shared_secret)
    blind = SHA256("blind" || shared_secret) as Scalar
 
 4. Verify MAC: HMAC-SHA256(mu, routing_info) == header.mac?    [0.82 µs]
    → Constant-time comparison via subtle::ct_eq
    → If MISMATCH: DROP packet immediately. Done.
    → No error response. No logging of packet contents.
    → Silent, unconditional death.
 
 5. Decrypt routing info using ChaCha20(rho, nonce=0):          [0.59 µs]
    → Extends routing_info to 528 bytes (400 + 128)
    → Decrypts all 528 bytes with the rho keystream
    → First 128 bytes: this hop's routing command
    → Bytes 128-528: the shifted routing info for the next hop
 
 6. Parse routing command (first byte):
    → 0x00 = "Forward": extract next_mac, next_hop_address
    → 0x01 = "Exit": this is the destination, extract payload
 
 7. Construct new routing info (400 bytes):
    → Take bytes 128-528 from the decrypted extended buffer
    → This automatically shifts the inner layers left
      and fills the end with PRG-derived padding
 
 8. Blind the group element:                                    [22.9 µs]
    → alpha_2 = alpha_1 * blind
    → Curve25519 scalar multiplication, constant-time
    → Result: a group element that looks completely unrelated to alpha_1
 
 9. Decrypt one layer of payload: ChaCha20(pi, nonce=0)         [0.53 µs]
    → Stream cipher XOR over ~32,296 bytes of payload
    → Removes the outermost encryption layer
 
10. Assemble forwarded packet:
    → New header: [alpha_2][new_routing_info][next_mac][0]
    → Same-size payload with one fewer encryption layer
 
11. Replay detection: compute tag = Blake3(ephemeral_key || mac || nonce)
    → Check against Bloom filter of seen tags
    → If SEEN: DROP. Replay attack detected.
    → If new: insert tag into Bloom filter
 
12. Queue for mixing:
    → Draw random delay from Exponential(lambda) distribution
    → Insert into priority queue with timestamp = now + delay
    → Packet sits in memory until its time comes
 
13. When delay expires: forward to next_hop_address

After this, M2 has produced a packet that:

Has a different group element than the one it received
Has different routing info (one layer stripped, padding added)
Has a different MAC
Has a different payload ciphertext (one layer removed)
Is exactly the same size as the incoming packet

The timings above come from our Criterion microbenchmarks (more detail in the Performance section). The total: ~61.6 µs per hop in integration, ~30 µs for the pure crypto path. The bottleneck is the elliptic curve operations (ECDH + blinding), which together account for 95.4% of the processing time.

The Cover Traffic Protocol

Here's a question that trips up almost every engineer who encounters mixnets for the first time: if we can correlate packets by timing Alice sends a packet, and 3ms later a packet emerges from the exit node doesn't that break all our Sphinx guarantees?

Yes. Absolutely yes. Sphinx solves the packet content linkability problem. But without cover traffic, timing analysis trivially de-anonymizes everything.

The solution, formalized in the Loopix protocol [8], is to flood the network with fake packets cover traffic that are cryptographically indistinguishable from real packets. Loopix defines three types:

λ_P: Loop Traffic (Client Self-Loops)

Each client sends packets to itself through the full mixnet path. The packet goes Alice → M1 → M2 → M3 → Alice. From any external observer's perspective, Alice is sending traffic. They can't tell whether it's a real transaction or a self-loop.

Why it works: λ_P packets use the exact same Sphinx format same 32KB size, same header structure, same PoW nonce. A mix node processing a λ_P packet can't distinguish it from a real forward packet because the Sphinx processing is identical. The only entity that knows it's cover traffic is Alice (because the exit node delivers it back to her, and she recognizes the SURB).

Rate: Clients send λ_P at a constant Poisson rate regardless of whether they're doing real transactions. If Alice's real transaction rate is 1 per minute and her λ_P rate is 10 per minute, an observer sees 11 packets/minute at all times. When Alice sends a real transaction, she sends 10 λ_P packets and 1 real one. When she's idle, she sends 11 λ_P packets. The observer sees the same rate either way.

λ_L: Loop Traffic (Mix Node Self-Loops)

Mix nodes also send cover traffic to themselves through other nodes. This ensures that the link traffic volume between any two mix nodes stays approximately constant. Without λ_L, an adversary could observe that traffic between M1 and M2 increases when Alice sends a real packet, and decreases when she stops.

λ_D: Drop Traffic (Dummy Messages)

Mix nodes generate packets addressed to a special "drop address." These packets traverse the full mixnet path and are silently discarded at the exit. They exist to keep exit node traffic volume constant without them, an observer watching exit nodes could count the real messages by subtracting the loop traffic.

The Indistinguishability Requirement

The critical property is that all three types of cover traffic MUST be indistinguishable from real packets at the Sphinx level. This means:

Same 32KB packet size
Same header structure (group element, routing info, MAC, PoW nonce)
Same processing path through mix nodes
Same timing distribution (Poisson delays)
Random payload content (indistinguishable from encrypted real content)

If cover traffic were even slightly distinguishable say, if cover packets used a different padding scheme, or if the PoW nonce were always zero an adversary could filter out all cover traffic and trace real packets trivially. The Sphinx packet format's uniformity is what makes cover traffic possible.

This is why we invest so much effort in making every Sphinx packet bitwise identical in structure regardless of its purpose. A 32KB packet containing a $10M DeFi trade looks exactly like a 32KB packet containing random bytes destined for a drop address. The mix node can't tell. The ISP can't tell. The NSA can't tell. That's the point.

The Formal Model

Piotrowska et al. [8] formalized the cover traffic rates as Poisson processes. Each cover traffic type has its own rate parameter:

λ_P (client loops): Rate at which clients generate loop cover traffic. Higher λ_P means better resistance to traffic analysis, but more bandwidth cost for clients.
λ_L (mix loops): Rate at which mix nodes generate loop traffic. This must be high enough to maintain approximately constant traffic volume on inter-node links.
λ_D (drop traffic): Rate at which mix nodes generate drop traffic. This smooths the output rate of exit nodes.

The Loopix paper proves that under these Poisson cover traffic rates, the system provides (ε, δ)-sender/receiver unlinkability, where ε and δ are functions of the cover traffic rates relative to real traffic rates. Intuitively: if λ_P is 10x the real message rate, the adversary's advantage is roughly 1/10 they can narrow it down to "one of these 10 packets is real" but can't identify which one.

For DeFi, we tune these rates based on expected transaction volume. In a network processing 100 real transactions per minute, we might set λ_P to produce 1,000 cover packets per minute, giving a 10:1 ratio of cover to real traffic. The bandwidth cost (1,000 × 32KB/min = 32 MB/min per client) is non-trivial but acceptable for high-value DeFi transactions.

Cover Traffic and the "Anonymity Trilemma"

Das et al. [18] identified what they call the "anonymity trilemma": any anonymous communication system can achieve at most two of three properties simultaneously (1) strong anonymity, (2) low bandwidth overhead, and (3) low latency. You can have fast and private (but expensive in bandwidth), cheap and private (but slow), or fast and cheap (but not private).

Sphinx's fixed-size packets address the content linkability dimension. Cover traffic addresses the timing dimension. But cover traffic is expensive it's where the bandwidth trilemma bites. Our 32KB packet size makes the bite harder than smaller-packet systems, because each cover packet is 16x larger. This is one of the genuine costs of our 32KB decision, and it's a tradeoff we accept because the alternative (fragmenting DeFi payloads) is worse for privacy.

Sphinx Across the Field: How Other Systems Do It

Sphinx is the dominant packet format for mix networks, but it is not the only one. Four alternatives illuminate the design space and the tradeoffs Sphinx makes.

cMix: Eliminating Real-Time Public-Key Operations

Chaum et al. [6] took a radically different approach with cMix, the protocol underlying the xx Network. Where Sphinx performs an ECDH key exchange at every hop in real time, cMix splits processing into two phases: an offline precomputation phase where nodes collectively prepare cryptographic materials for a batch of messages, and an online phase where the actual message mixing uses only fast symmetric operations (modular multiplications).

The result is striking: cMix's online phase requires zero public-key operations per message, achieving 2-3 second end-to-end delivery in production with 350+ nodes. The cost is architectural rigidity cMix uses fixed cascades (every message passes through the same ordered set of nodes), not the flexible stratified topology that Sphinx-based systems use.

Fixed cascades simplify the mixing proof but reduce the anonymity set geometry. If a cascade is compromised (all nodes in the cascade collude), every message through that cascade is exposed. In a stratified topology with 10 nodes per layer, an adversary would need to compromise at least one node in every layer and even then, they only compromise routes that selected all compromised nodes, which is (1/10)^3 = 0.1% of paths.

The precomputation phase is the other cost. Nodes must perform the expensive public-key operations in advance, agree on the batch structure, and discard unused precomputed materials. This creates a synchronous batch processing model that's fundamentally different from Sphinx's asynchronous per-packet processing. For DeFi, where transactions are bursty and unpredictable, the asynchronous model is a better fit.

Mixminion Type III: Where SURBs Were Born

Mixminion [7], the Type III anonymous remailer, introduced Single Use Reply Blocks before Sphinx formalized them. Mixminion's SURB design is conceptually similar to Sphinx's but differs in two important ways.

First, Mixminion uses LIONESS [16], a wide-block cipher built from SHA-256 and AES, for payload encryption. LIONESS provides the SPRP (Strong Pseudo-Random Permutation) property that makes tagging attacks detectable any bit flip in the ciphertext propagates unpredictably across the entire block on decryption. This is the property we sacrifice by using ChaCha20.

Second, Mixminion's reply messages are distinguishable from forward messages at the protocol level (they traverse a "crossover point" where the packet structure changes), whereas Sphinx makes forward messages and SURB replies bitwise indistinguishable. This is a genuine security improvement in Sphinx: an adversary cannot determine whether a given packet is a forward message or a reply.

Mixminion also pioneered the idea of directory authorities for mix node discovery, key rotation, and revocation infrastructure that modern mixnets including Nym and NOX have adopted in various forms.

Lightning's BOLT-04: Sphinx for Payments

The Lightning Network's onion routing [5] is a modified Sphinx adapted for payment channel networks. The modifications are instructive.

Lightning replaces LIONESS with ChaCha20-Poly1305 (the same tradeoff we make, but with added AEAD authentication per hop). It adds per-hop payloads each node receives not just routing instructions but also payment-specific data (amount, CLTV expiry, payment hash). And it fixes the route length at 20 hops with 65-byte per-hop payloads, giving a 1,300-byte onion packet.

The 20-hop fixed length is interesting. Lightning doesn't use a stratified topology it routes through an arbitrary graph of payment channels. Fixing at 20 hops means that an adversary can't learn the path length from the packet size, but it also means most payments waste significant space (the median Lightning path is 2-3 hops, so 17-18 of the 20 slots are padding).

Lightning's Sphinx is arguably the most widely deployed variant, running across tens of thousands of nodes processing millions of payments. Its adoption validates the core Sphinx design while demonstrating that the format is adaptable you can swap ciphers, add per-hop data, and change packet geometry without breaking the fundamental security properties.

One important limitation of Lightning's Sphinx that doesn't apply to mixnet Sphinx: Lightning has no cover traffic. Every onion-routed payment is a real payment. This means that while the content of the payment is hidden from intermediate routing nodes, the traffic patterns are fully visible. A well-positioned routing node can learn significant information about payment flows. Several academic papers (including Kappos et al. 2021) have demonstrated timing-based deanonymization attacks on Lightning. For a mixnet, cover traffic closes this gap but Lightning chose to accept the timing vulnerability rather than add the bandwidth overhead of cover traffic, which is a reasonable tradeoff for a payment network that prioritizes low latency.

Katzenpost: The Rigorous Variant

Katzenpost [4] implements what is arguably the most standards-compliant Sphinx variant. Their specification document is exhaustive every byte offset, every key derivation step, every error handling case is documented. They support both NIKE (Non-Interactive Key Exchange, the classic Sphinx ECDH approach) and KEM (Key Encapsulation Mechanism, a newer approach that's post-quantum friendly) modes.

Their KEM variant is particularly interesting because it shows how Sphinx can be adapted for post-quantum cryptography. Instead of the blinding trick (which requires the algebraic structure of Diffie-Hellman groups), the KEM variant encapsulates a fresh symmetric key at each hop. This eliminates the blinding factor computation entirely but changes the header structure each hop's KEM ciphertext must be included in the routing info.

Katzenpost's NIKE mode benchmark gives us a useful comparison point: 144.1 µs per hop in Go, compared to our 30.17 µs in Rust. Their KEM mode runs at 55.7 µs, closer to ours. The performance difference between NIKE and KEM in Katzenpost (2.6x) shows how much of the cost is in the blinding computation versus the key exchange.

A Common Thread

Looking across all four alternatives, a pattern emerges. Every variant makes the same fundamental tradeoff in different ways:

System	Public-key ops	Header format	Tagging defense	Topology
Sphinx	Per-hop ECDH	Compact (DH)	SPRP (LIONESS)	Flexible
cMix	Precomputed	N/A (batch)	Re-encryption	Cascades
Mixminion	Per-hop ECDH	Compact (DH)	LIONESS (SPRP)	Free
BOLT-04	Per-hop ECDH	Compact (DH)	ChaCha20-Poly	Graph
NOX	Per-hop ECDH	Compact (DH)	ChaCha20 (no)	Stratified
Outfox	Per-hop KEM	Larger (KEM)	SPRP capable	Stratified

The "ideal" column would be: per-hop KEM (for post-quantum), compact headers, SPRP for tagging defense, and stratified topology. Outfox with LIONESS comes closest, but no deployed system achieves all four simultaneously at acceptable performance. Every real-world system compromises somewhere usually on tagging defense (replacing LIONESS with a faster stream cipher) or on header compactness (using larger KEM ciphertexts).

The Performance Profile

We benchmarked every operation in the Sphinx processing pipeline, and the results tell an interesting story about where the time actually goes.

Headline Number

At 30.17 microseconds per hop (Criterion median), our Sphinx unwrap is fast. But the headline number hides the real story. Here is the per-operation breakdown from our integration benchmark (1,449 hop samples across 500 packets on AMD Ryzen 7 9800X3D):

Per-Hop Operation Breakdown

+---------------------------------------------------------------------+
|                   Per-Hop Processing Breakdown                       |
|                  (integration benchmark, mean)                       |
+---------------------------------------------------------------------+
|                                                                     |
|  Operation         Mean       p50       p99      Share              |
|  ---------         ----       ---       ---      -----              |
|  ECDH             35.8 µs   28.8 µs   72.7 µs   58.2%             |
|  Key Blinding     22.9 µs   28.1 µs   64.8 µs   37.2%             |
|  MAC Verify        0.82 µs   0.77 µs   1.63 µs    1.3%            |
|  Routing Dec.      0.59 µs   0.53 µs   1.12 µs    1.0%            |
|  Body Decrypt      0.53 µs   0.38 µs   1.09 µs    0.9%            |
|  Key Derivation    0.49 µs   0.48 µs   1.04 µs    0.8%            |
|  ---------         ----       ---       ---      -----              |
|  TOTAL            61.6 µs   59.3 µs  146.8 µs   100%              |
|                                                                     |
+---------------------------------------------------------------------+

The 2x gap between Criterion (30 µs) and integration (62 µs) reflects real-world overhead: event bus dispatch, context switching, tracing instrumentation, and backpressure. The Criterion number isolates the Sphinx cryptographic processing; the integration number measures end-to-end per-hop cost in a running mixnet.

Where the Time Goes

The public-key operations ECDH and key blinding account for 95.4% of the per-hop processing time. The symmetric crypto (MAC verification, routing decryption, payload decryption, key derivation) collectively consume 4.6%.

This asymmetry has important consequences:

1. Payload size barely matters. Our 32KB packets are 16x larger than Katzenpost's 2KB, but we're still faster overall because the bottleneck is elliptic curve math, not symmetric decryption. Body decryption (the one operation that scales with payload size) accounts for 0.9% of per-hop time. Even if we doubled the payload to 64KB, the body decryption would go from 0.53 µs to roughly 1.06 µs a rounding error compared to the 58.7 µs of curve operations.

2. Tail latency comes from the OS, not crypto. The p99 for ECDH is 72.7 µs against a mean of 35.8 µs roughly 2x. The p99 for total Sphinx processing is 146.8 µs against a mean of 61.6 µs roughly 2.4x. These tails are consistent with OS scheduling jitter, cache evictions, and TLB misses, not cryptographic variance. The symmetric operations show similar tail ratios. On a dedicated mix node with CPU pinning and NUMA-aware allocation, these tails would shrink.

3. Parallelization helps a lot. Since each packet's ECDH is independent, we can process multiple packets concurrently. Our relay pipeline uses N worker tasks (configurable, defaults to CPU cores), each processing packets independently from the ingress queue. On a 16-core machine, that's 16 × 11,000 = ~176,000 hops/second theoretical throughput, or ~58,000 full 3-hop packets/second.

4. The blinding cost is nearly as much as the ECDH. Key blinding (22.9 µs) is 64% of the ECDH cost (35.8 µs). Both are Curve25519 scalar multiplications, but ECDH multiplies with a random point (the sender's ephemeral key) while blinding multiplies with the current group element. The cost difference comes from the fact that ECDH can use a precomputed basepoint table for the scalar multiplication when the point is the well-known basepoint (which it isn't here the ephemeral key is arbitrary), while blinding always operates on an arbitrary point. This means that moving to a KEM-based format like Outfox (which eliminates blinding entirely) would remove 37.2% of the per-hop cost even with the same curve the KEM approach isn't just about post-quantum security, it's also a performance win for classical crypto.

NOX vs. Competitors

How does this compare? The mixnet field has a transparency problem most projects publish little to no performance data [10]. Here is what is available:

+---------------------------------------------------------------------+
|            Sphinx Per-Hop Processing: NOX vs Field                   |
+---------------------------------------------------------------------+
|                                                                     |
|  System                            Per-Hop     Relative             |
|  ------                            -------     --------             |
|  NOX (Rust, X25519)                30.17 µs     1.0x                |
|  Outfox (Rust, X25519)              ~31 µs      1.0x (different     |
|                                                  format)            |
|  Katzenpost KEM (Go, X25519)        55.7 µs     1.8x slower        |
|  Katzenpost NIKE (Go, X25519)      144.1 µs     4.8x slower        |
|  Katzenpost Xwing (Go, PQ)         172.6 µs     5.7x slower        |
|  Loopix (Python, 2017)            1500   µs    49.7x slower        |
|  Nym (Rust, X25519)                  N/P        unpublished         |
|                                                                     |
+---------------------------------------------------------------------+
   Sources: NOX Criterion [10]; Outfox paper [9] estimates;
   Katzenpost Go nightly CI; Loopix USENIX Security 2017 [8];
   Nym: benchmarks exist in nymtech/sphinx repo but results
   are unpublished.

Three factors explain NOX's advantage over Katzenpost:

1. Language. Rust compiles to native code with zero-cost abstractions and no garbage collector. Go's GC introduces pause-time variance a GC pause during Sphinx processing shows up as a latency spike. For a time-sensitive privacy system where latency variance itself can leak information (an adversary might correlate GC pauses across nodes), this matters beyond raw performance. This accounts for roughly 1.5-2x of the gap.

2. NIKE vs KEM path. Katzenpost's NIKE (Non-Interactive Key Exchange) variant performs the full Sphinx blinding computation that their KEM variant avoids through encapsulation. NOX uses ECDH with blinding (closer to NIKE conceptually), but our Rust curve25519-dalek implementation is significantly faster than Go's native implementation. Compare our blinding at 22.9 µs to Katzenpost NIKE at 144.1 µs total their blinding alone likely exceeds our entire per-hop cost.

3. Curve implementation. NOX uses curve25519-dalek, which is one of the most optimized Curve25519 implementations in existence. It uses precomputed lookup tables for the basepoint multiplication, SIMD-accelerated field arithmetic (on platforms that support it via the simd backend), and careful constant-time implementation throughout. Go's standard library curve implementations, while correct, don't have the same level of micro-optimization.

At 30 µs per hop and 3 hops, a single-threaded node can theoretically process ~33,000 hops per second, or ~11,000 full 3-hop packets per second. In practice, with network I/O, mixing delays, and other overhead, we measured 369 packets per second in a multi-process 10-node simulation with zero loss. Real-world numbers will be different, but the crypto is not going to be the bottleneck network I/O and mixing delays dominate.

Why Nym's Numbers Are Missing

You'll notice Nym's Sphinx benchmarks are listed as "unpublished." This is not a criticism Nym has the most mature mixnet deployment in production, with 700+ mix nodes processing real traffic. But their benchmark infrastructure (in the nymtech/sphinx Rust crate) exists without published aggregate results.

This is a broader problem in the mixnet space. Academic papers report performance for prototype implementations (Loopix's 1,500 µs was a Python prototype from 2017), but production systems rarely publish updated benchmarks as their implementations mature. It makes cross-system comparison difficult and forces users to run their own benchmarks which is one reason we publish ours with full methodology, hardware specs, and commit hashes [10]. Reproducibility matters.

The Katzenpost team deserves credit for the most transparent approach: their Go nightly CI runs benchmarks on every commit, and the results are publicly visible. We aspire to the same level of transparency.

SURBs: Anonymous Replies

There's one more trick Sphinx supports that's critical for DeFi: Single Use Reply Blocks.

The Problem

Alice sends a message to Bob through the mixnet. Bob wants to reply. But Bob doesn't know Alice's address (that's the whole point of anonymity). How does Bob send a response?

In traditional networking, this is trivial every packet has a source address. In a mixnet, including a source address would defeat the purpose. Alice's whole goal is to hide who she is.

How SURBs Work

Alice pre-builds a SURB a "return envelope" and includes it in her outgoing message. The SURB contains:

A pre-constructed Sphinx header with a route back to Alice (encrypted routing info, group elements, MACs all pre-computed)
The first hop's address (so Bob knows where to inject the reply)
A symmetric payload key and IV for encrypting the reply payload
A unique 16-byte identifier for tracking which SURB this is

The construction process is almost identical to building a forward Sphinx packet. Alice picks a return path (e.g., M3 → M2 → M1 → Alice), generates ephemeral keys, computes shared secrets, builds the routing info backwards, computes MACs all the same steps. The difference is that she packages the resulting header as a SURB instead of sending it as a packet.

SURB Construction (Alice):
 
1. Pick return path: [M3, M2, M1] → Alice
2. Generate ephemeral x', compute alpha'_0 = g * x'
3. Compute shared_secret'_i for each hop (same blinding trick)
4. Build routing info backwards (same filler mechanism)
5. Compute per-hop MACs
6. Generate random payload_key and payload_iv
7. Save SurbRecovery{layer_keys: [pi_3, pi_2, pi_1], payload_key, payload_iv}
8. Package Surb{header, first_hop: M3.address, payload_key, payload_iv, id}

How the Reply Works

Bob (or the exit node acting as Bob's proxy) writes his reply, encrypts it with the SURB's payload key, attaches the pre-built header, and sends the resulting packet to the first hop specified in the SURB:

Bob's reply process:
 
1. Write reply: "TX confirmed, gas: 42,156"
2. Pad to body_size with ISO 7816-4 padding (0x80 || 0x00...)
3. Encrypt: ChaCha20(payload_key, payload_iv, padded_reply)
4. Assemble: SURB.header || encrypted_body = 32,768 byte SphinxPacket
5. Send to SURB.first_hop (M3's address)

The packet traverses the mixnet back to Alice, who decrypts it:

Alice receives the reply:
 
1. Extract body from the received packet
2. Peel onion layers: for each key in layer_keys:
     body = ChaCha20(key, nonce=0, body)
3. Decrypt payload: body = ChaCha20(payload_key, payload_iv, body)
4. Remove ISO 7816-4 padding: find 0x80 marker, truncate
5. Read: "TX confirmed, gas: 42,156"

The Indistinguishability Property

Here's what makes SURBs brilliant: a SURB reply is indistinguishable from a forward packet. Mix nodes can't tell whether they're processing a message going from Alice to Bob, or a reply going from Bob to Alice, or a cover traffic dummy going nowhere. They all look identical same 32KB size, same header structure, same processing operations.

This is the improvement over Mixminion [7], where forward and reply messages are structurally distinguishable because they traverse a "crossover point." In Sphinx, an adversary watching the network can't even determine the direction of a message. They can see traffic going between M1 and M2, but they can't tell if it's a forward message or a return-path reply.

Alice                                                        Bob
  |                                                            |
  |  Creates SURB (return route + keys)                        |
  |---> M1 ---> M2 ---> M3 --->  message + SURB ------------->|
  |                                                            |
  |                               Uses SURB to reply           |
  |<--- M1 <--- M2 <--- M3 <--- SURB reply (indistinguishable)|
  |                                                            |
  |  Decrypts with saved SURB layer_keys + payload_key         |

The Single-Use Property

Each SURB can only be used once. If Bob uses the same SURB twice, the second packet will have the same group element as the first, and any mix node in the return path will detect it as a replay (via the replay detection Bloom filter) and drop it.

This single-use property is enforced by the replay detection mechanism. When a mix node processes a Sphinx packet, it computes Blake3(ephemeral_key || mac || nonce) and checks this tag against a set of seen tags. If the tag has been seen before, the packet is dropped. Since a SURB always produces the same header (it's pre-computed), reusing it produces the same tags at each node.

The single-use constraint means Alice needs to pre-generate enough SURBs for all expected replies. For DeFi, this is usually straightforward one set of SURBs per outgoing transaction (with FEC, that's 6 SURBs per expected reply). But for protocols with unpredictable reply counts (like streaming data or multi-step DeFi operations), SURB management becomes more complex.

The SURB-ACK protocol solves this: each reply message includes fresh SURBs for future messages. Alice sends an initial message with 10 SURBs. Bob uses 6 SURBs for the first reply (Reed-Solomon FEC) and includes 10 fresh SURBs in the reply payload. Alice receives the reply, extracts the new SURBs, and now has 4 + 10 = 14 SURBs for future communication. The SURB supply is self-replenishing as long as messages are flowing.

The elegant part: because SURBs are included in the encrypted payload (not the Sphinx header), the mix nodes never know that a packet contains SURBs. They process it identically to any other packet. The SURB-ACK protocol is invisible at the network layer.

The SURB Reliability Problem

The catch: SURBs are fragile. The packet has to traverse the entire return path without any loss. If even one mix node drops the response packet (network congestion, node failure, adversarial behavior), Alice gets nothing.

We measured this. In a benchmark of 550 SURB round-trips (500 packets, 3 hops per leg, 1ms mixing delay, 5 nodes), we observed:

+---------------------------------------------------------------------+
|                   SURB Round-Trip Measurements                       |
|                  (550 attempts, 3-hop return)                        |
+---------------------------------------------------------------------+
|                                                                     |
|  Successful deliveries:    480 / 550                                |
|  Lost responses:            70 / 550                                |
|  Loss rate:               12.7%                                     |
|                                                                     |
|  Latency (successful):                                              |
|    Mean:    237.8 ms                                                |
|    Median:  228.2 ms                                                |
|    p95:     412.4 ms                                                |
|    p99:     469.8 ms                                                |
|    Max:     508.7 ms                                                |
|    Stddev:  102.0 ms                                                |
|                                                                     |
+---------------------------------------------------------------------+

A 12.7% loss rate. For email, maybe tolerable. For a DeFi transaction receipt where you need to know whether your swap executed or reverted, absolutely not.

The loss compounds with path length. With 3 hops on the return path and per-hop reliability p, the end-to-end delivery probability is p^3. If each hop drops 4.5% of packets, you get 0.955^3 = 0.872 close to our observed 87.3% delivery rate. Adding more hops for stronger anonymity would make it worse.

Our Fix: Reed-Solomon FEC

Our solution is Reed-Solomon Forward Error Correction. Instead of sending a single SURB reply, the exit node splits the response into k data fragments, generates n - k parity fragments (where n > k), and sends each fragment via a separate SURB. The client can reconstruct the original response from any k of the n fragments.

With parameters tuned for 10% expected packet loss:

k = 4 data fragments
n = 6 total fragments (4 data + 2 parity)
Client needs any 4 of 6 fragments to reconstruct

The probability that all 4+ fragments are lost is:

P(failure) = P(≥3 lost from 6) at 10% loss per fragment
           = Σ C(6,i) × 0.1^i × 0.9^(6-i) for i=3,4,5,6
           ≈ 0.012

That gives us 98.8% delivery at 10% loss, compared to 87.3% without FEC. The cost: each response requires 6 SURBs instead of 1, so Alice needs to pre-generate 6x more SURBs for each outgoing transaction. But SURBs are cheap to create (same cost as building a Sphinx header, ~30 µs), so this is an acceptable tradeoff.

The full FEC protocol is covered in Part 5. The relevant point for this post is that the FEC fragments are themselves Sphinx packets they go through the same processing pipeline, look identical to every other packet, and provide the same privacy guarantees.

Proof-of-Work: The Spam Gate

Mix networks have a spam problem. Unlike traditional networks where a server can rate-limit by IP address, a mixnet can't identify the sender of a packet that's the whole point. An adversary could flood the network with millions of garbage packets, consuming mix node resources and displacing legitimate traffic. This is a denial-of-service vector that's unique to anonymous networks.

Our solution is Hashcash-style Proof-of-Work (PoW) on every packet. The sender must find a nonce such that:

SHA256(ephemeral_key || routing_info || mac || nonce)
 
has at least D leading zero bits, where D is the network difficulty parameter.

This is the same fundamental approach as Bitcoin mining, but applied per-packet rather than per-block.

How It Works

The PoW is integrated directly into the Sphinx header. The 8-byte nonce field at the end of the header is the PoW solution. The sender builds the complete Sphinx packet (group element, routing info, MAC), then brute-forces nonce values until they find one that produces a hash with sufficient leading zeros.

PoW Solving:
 
loop {
    hash = SHA256(ephemeral_key || routing_info || mac || nonce)
    if leading_zeros(hash) >= difficulty:
        break
    nonce += 1
}

Verification is instant one SHA-256 hash and a leading-zeros check. Solving takes exponential work proportional to the difficulty. At difficulty D, the expected number of hash operations is 2^D.

+---------------------------------------------------------------------+
|                   PoW Difficulty vs. Solve Time                      |
|                  (SHA-256, ~5M hashes/sec per core)                  |
+---------------------------------------------------------------------+
|                                                                     |
|  Difficulty    Expected Hashes    Expected Time (1 core)            |
|  ----------    ---------------    ----------------------            |
|      0         0                  instant (no PoW)                  |
|      4         16                 ~3 µs                             |
|      8         256                ~51 µs                            |
|     12         4,096              ~819 µs                           |
|     16         65,536             ~13 ms                            |
|     20         1,048,576          ~210 ms                           |
|     24         16,777,216         ~3.4 sec                          |
|                                                                     |
+---------------------------------------------------------------------+

Algorithm Choice: SHA-256 vs Blake3

Our implementation supports two PoW algorithms via a trait-based abstraction:

SHA-256 (Sha256Pow): Battle-tested, ~5M hashes/sec per core. This is the default because it's the most widely analyzed hash function for PoW. Bitcoin's mining ecosystem has proven SHA-256's PoW properties extensively.
Blake3 (Blake3Pow): Modern, ~15M hashes/sec per core roughly 3x faster than SHA-256. Blake3's tree-hashing mode enables parallelism that SHA-256 lacks. We use Blake3 for non-PoW hashing throughout the system (replay tags, content addressing), and it's available as a fast PoW alternative for future use.

The SHA-256 PoW verification uses the same compute_hash() function that hashes the ephemeral key, routing info, MAC, and nonce the exact fields that are covered by the integrity guarantees. An adversary can't pre-compute PoW for a packet they plan to tamper with, because any modification to the header fields invalidates the PoW solution.

Difficulty Tuning

The difficulty parameter is network-wide and adjustable. The design tradeoffs:

Too low: Spam is cheap. An adversary with a GPU cluster can flood the network.
Too high: Legitimate clients (especially mobile wallets) can't create packets fast enough for interactive DeFi use.

Our current default is difficulty 0 (no PoW) during the testnet phase. For mainnet, we plan to start at difficulty 12-16, which requires ~1-13ms of work per packet noticeable but acceptable for DeFi transactions that will take 500+ ms to traverse the mixnet anyway. Dynamic difficulty adjustment based on network load is future work.

Safety Bounds

One non-obvious security measure: we cap the maximum difficulty at 64 bits. Without this cap, an adversary could set a difficulty of 256 (full SHA-256 output), which would require 2^256 hash operations on average effectively infinite. The cap at 64 bits (MAX_DIFFICULTY = 64) ensures that even the hardest PoW is solvable in finite (if unreasonably long) time. This was a real bug we caught during testing a malicious difficulty parameter could have caused the solver to loop forever.

We also support parallel solving via rayon for higher difficulties. The parallel solver distributes nonce ranges across all available CPU cores with work-stealing, finding solutions roughly num_cores times faster than the single-threaded solver.

Comparison with Hashcash

Our PoW is conceptually identical to Back's Hashcash [20], the anti-spam proof-of-work system originally designed for email (and later adapted by Nakamoto for Bitcoin's mining). The key differences:

Scope. Hashcash is per-email; ours is per-packet. A single DeFi transaction might generate multiple packets (forward packet + SURB replies + FEC fragments), so the total PoW cost per transaction is proportional to the number of packets, not just one.

Hash binding. Hashcash commits to the recipient's email address in the hash, so a proof generated for one recipient can't be reused for another. Our PoW commits to the Sphinx header fields (ephemeral key, routing info, MAC), so a proof generated for one packet can't be used for a different packet. More importantly, the PoW commits to the MAC, which commits to the routing info, which commits to the route so an adversary can't solve PoW for a packet and then change its destination.

Verification asymmetry. Both Hashcash and our PoW have the crucial property that solving is expensive but verification is trivial. Verification is a single hash computation sub-microsecond. This is critical for mix nodes that must validate PoW on every incoming packet without creating a DoS vector in the verification itself.

Dynamic difficulty. Hashcash uses a fixed difficulty per token validity period. Our system can adjust difficulty dynamically based on network load increasing difficulty when the network detects elevated traffic (potential spam attack) and decreasing it when load is normal. This is analogous to Bitcoin's difficulty adjustment, but operating on a much faster timescale (minutes, not weeks).

The economic model is straightforward: at difficulty 16, an attacker needs ~65,536 hash operations per spam packet. At 5M hashes/sec on a single CPU core, that's ~13ms per packet. An attacker with 1,000 CPU cores can generate ~77,000 spam packets/sec. Our network of 30 mix nodes can process ~330,000 legitimate packets/sec. So the attacker needs roughly 4,000 CPU cores to match the network's legitimate throughput a non-trivial resource commitment that makes sustained spam attacks expensive without making legitimate use burdensome.

What Sphinx Does NOT Protect

Being precise about what Sphinx guarantees and what it doesn't is important. Too many privacy projects conflate "encrypted" with "private" and end up with systems that are provably secure against the wrong threat model.

Sphinx DOES guarantee:

Bitwise unlinkability. A packet entering a node and the corresponding packet leaving it share zero observable bytes. Different group element, different routing info, different MAC, different payload ciphertext. An observer watching all network links cannot correlate input and output packets by content alone.

Topological anonymity. A node processing a packet doesn't know whether it's the first, second, or third hop. The packet looks structurally identical at every position. (This is true in our stratified topology only if all layers have the same number of nodes otherwise packet routing patterns across layers could leak position. We keep layers uniform.)

Replay detection. Each node computes a replay tag from the unique header fields. If it sees the same tag twice, the second packet is dropped.

Integrity. Any modification between hops causes MAC failure at the next honest node.

Sphinx does NOT guarantee:

Timing privacy. Sphinx is a packet format, not a mixing strategy. If Alice sends a packet at time T and a packet emerges from the exit node at time T + 3ms, a timing adversary can correlate them. Timing privacy requires the mixing layer (Poisson delays, covered in Part 4) and cover traffic (constant-rate λ_P, λ_L, λ_D).

Traffic volume privacy. An adversary can count packets. If Alice normally sends 10 packets/minute and suddenly sends 11, the adversary knows one of those packets is real. Hiding traffic volume requires cover traffic at constant rates.

Endpoint privacy. Sphinx hides which mix node processed which packet, but it doesn't hide the fact that Alice is using the mixnet at all. An ISP can see Alice sending 32KB packets to a known mix node. Hiding endpoint participation requires steganographic channels or pluggable transports (like Tor's obfs4 bridges).

Payload tagging (in our implementation). Because we use ChaCha20 instead of LIONESS for payload encryption, an adversary can flip a bit in the payload and detect the flip at a later hop. The header is protected by per-hop MACs, but the payload is not. This is a known limitation documented in our threat model.

Intersection attacks. If Alice is the only user online at 3 AM, and a transaction appears at 3 AM, no amount of Sphinx processing helps. Anonymity requires an anonymity set other users whose traffic is mixed with yours. Sphinx provides the mechanism for mixing, but the anonymity set size depends on the number of concurrent users.

Side-channel attacks. Memory access patterns during Sphinx processing could theoretically leak information to a co-located adversary (e.g., on the same physical server in a data center). We mitigate this with constant-time implementations (X25519, HMAC comparison via subtle), but a full side-channel audit is ongoing.

Long-term traffic analysis. Over extended periods, an adversary who logs all packet timing data can apply statistical methods (like the statistical disclosure attack [19]) to gradually narrow down sender-receiver relationships. Even with cover traffic, if Alice consistently sends more traffic when communicating with Bob, the adversary accumulates evidence over time. Mixnets provide strong anonymity per-message, but the strength degrades over time against persistent adversaries. This is a fundamental limitation of any anonymous communication system, not specific to Sphinx.

The critical takeaway: Sphinx is the packet format. It makes individual messages unlinkable by content. But anonymity requires the full system: Sphinx packets + Poisson mixing + cover traffic + stratified topology. Pull out any one layer and the whole thing collapses which is exactly what most "privacy" projects do. They ship the ZK proofs and call it done, and then your ISP watches them submit a transaction 200ms before it hits the mempool.

I can't stress this enough. ZK proofs hide what you did on-chain. Sphinx (and the full mixnet stack it's part of) hides who did it. Both are necessary. Neither alone is sufficient. If you only have ZK proofs, your ISP knows you submitted a private transaction, when you submitted it, and how large it was. If you only have Sphinx, your transaction content is visible on-chain. You need both layers, and you need them working together which is why Parts 4-6 of this series cover the complete end-to-end integration.

The 32KB Decision

Our packets are 32,768 bytes. Nym uses ~2KB. Katzenpost uses ~2KB. Why are we 16x larger?

The DeFi Payload Argument

DeFi operations are large. A Uniswap swap intent when you include the ZK proof, the encrypted UTXO notes, the Merkle path witnesses, the deposit/withdrawal parameters, and the routing commands easily exceeds 2KB. Our typical payload sizes:

+---------------------------------------------------------------------+
|                   Typical DeFi Payload Sizes                        |
+---------------------------------------------------------------------+
|                                                                     |
|  Operation                              Payload Size                |
|  ---------                              ------------                |
|  Simple transfer (proof + 2 notes)      ~4 KB                      |
|  Uniswap swap (proof + intent + path)   ~8 KB                      |
|  Multi-output transfer (4 recipients)   ~12 KB                     |
|  Complex DeFi (compound + aave bundle)  ~20 KB                     |
|  SURB + response payload                ~28 KB                     |
|                                                                     |
+---------------------------------------------------------------------+

At 2KB per packet, even a simple transfer would require fragmentation into 2 packets, and a swap would need 4. Fragmentation is bad for three reasons:

1. Linkability. If a swap requires 4 packets and all 4 must arrive to execute, an adversary who observes 4 packets entering the mixnet at the same time from the same source can infer they're related. This partially breaks the unlinkability that Sphinx provides. You can mitigate this by staggering the fragments, but that adds latency.

2. Reliability. With 3-hop paths and ~4.5% per-hop loss, each packet has 87% delivery probability. If you need all 4 fragments, the delivery probability drops to 0.87^4 = 57%. Add Reed-Solomon FEC to recover from fragment loss, and you need even more packets.

3. Atomicity. A DeFi transaction is atomic either the whole thing executes or none of it does. Fragmenting across multiple Sphinx packets means the exit node must reassemble the fragments before executing, which requires state management, timeout handling, and complexity.

The Bandwidth Argument Against

Larger packets mean more bandwidth overhead for cover traffic. If every cover traffic packet is 32KB and you're sending 10 packets/minute, that's 320 KB/minute = ~19 MB/hour of cover traffic alone. At 2KB packets, the same rate would be 1.2 MB/hour.

For mix nodes, the story is worse. A mix node in a 10-node layer receiving traffic from all clients is processing potentially thousands of 32KB packets per second. At 1,000 packets/sec, that's 32 MB/sec = 256 Mbps of raw packet throughput feasible but non-trivial.

The Decision

We chose 32KB because DeFi payloads don't fit in 2KB, and the fragmentation alternatives are worse than the bandwidth overhead. The 32KB size also aligns nicely with jumbo Ethernet frames (9KB MTU) three Ethernet frames per Sphinx packet, which is clean for network transmission.

For payloads larger than 32KB (which we haven't encountered yet in DeFi, but is theoretically possible with very complex multi-step operations), we support application-level fragmentation: the client splits the payload into 32KB chunks, sends each as a separate Sphinx packet, and the exit node reassembles them. The fragmentation protocol uses sequence numbers and checksums in the payload (not the Sphinx header, which is privacy-sensitive), so the mix nodes are unaware that fragmentation is happening.

Why Not Dynamic Sizing?

An obvious alternative: use the smallest packet size that fits the payload, padded to one of several fixed sizes (e.g., 2KB, 8KB, 32KB). This reduces bandwidth for small messages while still accommodating large DeFi payloads.

The problem is that packet size becomes a metadata signal. If an adversary knows that "only DeFi swaps use 32KB packets" and "simple messages use 2KB," the packet size itself reveals the transaction type. This is exactly the kind of metadata leak that Sphinx is designed to prevent.

Every packet must be the same size. No exceptions. One size for all: real transactions, cover traffic, SURBs, loop messages, drop traffic. The 32KB cost is the price of uniformity, and uniformity is non-negotiable for metadata privacy.

This is, incidentally, the same reason Tor uses fixed 514-byte cells regardless of payload size. The overhead is significant (a 1-byte message still consumes 514 bytes), but the alternative variable-size cells that leak payload information is worse.

Packet Lifecycle: From Construction to Delivery

Let me trace the complete lifecycle of a Sphinx packet in the NOX system, from the moment Alice decides to send a DeFi transaction to the moment she receives the result.

+--------------------------------------------------------------------+
|                    Sphinx Packet Lifecycle                           |
+--------------------------------------------------------------------+
 
PHASE 1: Construction (Client-side, ~2ms)
─────────────────────────────────────────
  1. Alice builds transaction intent: "Swap 1.5 ETH → USDC, min 2800"
  2. Alice generates ZK proof (this takes seconds, not shown here)
  3. Alice selects route: M1 → M2 → M3 (random, one per layer)
  4. Alice builds Sphinx packet (key derivation, routing, encryption)
  5. Alice builds N SURBs for the return path
  6. Alice saves SurbRecovery keys locally
  7. Alice solves PoW (brute-force nonce until hash has D leading zeros)
  8. Final packet: 32,768 bytes of random-looking data
 
PHASE 2: Entry (M1, ~62µs crypto + network)
────────────────────────────────────────────
  9. Alice sends 32KB packet to M1 over TLS
 10. M1 validates PoW: SHA256(header_fields || nonce) has D leading zeros?
     → If NO: drop silently (spam)
 11. M1 checks replay tag against Bloom filter
     → If SEEN: drop silently (replay)
 12. M1 processes Sphinx header:
     - ECDH, key derivation, MAC verify, decrypt routing, blind key
 13. M1 extracts: "Forward to M2"
 14. M1 decrypts one payload layer
 15. M1 draws mixing delay from Exponential(λ), e.g. 1.7ms
 16. M1 queues packet for delayed forwarding
 
PHASE 3: Relay (M2, ~62µs crypto + mixing delay)
─────────────────────────────────────────────────
 17. After delay expires, M1 sends packet to M2
 18. M2 performs same processing: PoW check, replay check, Sphinx unwrap
 19. M2 extracts: "Forward to M3"
 20. M2 queues for mixing delay
 
PHASE 4: Exit (M3, ~62µs crypto + execution)
─────────────────────────────────────────────
 21. M3 receives packet from M2
 22. M3 processes Sphinx → routing flag = 0x01 (exit)
 23. M3 extracts plaintext payload:
     - ZK proof, encrypted notes, transaction parameters, SURBs
 24. M3 (as relayer) validates ZK proof
 25. M3 simulates on-chain execution (eth_simulateV1)
 26. M3 checks profitability (gas cost vs. fee)
 27. M3 submits transaction to Ethereum
 28. M3 waits for confirmation
 
PHASE 5: Reply (M3 → M2 → M1 → Alice)
───────────────────────────────────────
 29. M3 constructs reply: "TX confirmed, hash: 0xa1b2..., gas: 42,156"
 30. M3 uses Alice's SURB to encapsulate:
     - Encrypt reply with SURB.payload_key
     - Attach SURB.header (pre-built Sphinx header for return path)
     - Result: 32KB SphinxPacket (indistinguishable from forward)
 31. M3 sends to SURB.first_hop (could be M3 itself, or another node)
 32. Return packet traverses M3 → M2 → M1 via Sphinx processing
     - Each hop strips one layer, blinds key, verifies MAC
     - Same delays, same sizes, same everything
 33. M1 (entry node) receives the return packet
 34. M1 recognizes the exit flag → delivers to Alice
 
PHASE 6: Decryption (Client-side, ~1µs)
────────────────────────────────────────
 35. Alice receives 32KB packet from M1
 36. Alice peels onion layers using saved SurbRecovery.layer_keys
 37. Alice decrypts with SurbRecovery.payload_key
 38. Alice removes ISO 7816-4 padding
 39. Alice reads: "TX confirmed, hash: 0xa1b2..., gas: 42,156"

Total end-to-end latency: typically 200-500ms, dominated by mixing delays (which are intentional they're the source of timing privacy) and on-chain execution time. The Sphinx cryptography itself adds roughly 180 µs per direction (3 hops × 62 µs), which is negligible.

The interesting thing about this lifecycle is how many of these steps are invisible to the mix nodes. M1 doesn't know Alice is sending a DeFi swap. M2 doesn't know this packet came from Alice. M3 doesn't know that the SURB reply will end up back at Alice (it only knows the first hop of the SURB return path). The exit node (M3) knows the content of the transaction (because it has to execute it), but it doesn't know the sender. The entry node (M1) knows the sender (Alice's IP), but doesn't know the content. No single entity in the system has both pieces of information.

This is the fundamental privacy decomposition that Sphinx enables: knowledge is split across entities, and reassembling the complete picture requires compromising multiple independent nodes in different network layers.

Implementation Lessons: What The Paper Doesn't Tell You

Having built a production Sphinx implementation from the Danezis-Goldberg paper [1], there are several practical issues that the paper doesn't address (because it's a cryptographic design paper, not a systems paper). These are the things that cost us weeks of debugging and might save you the same.

Clamping vs. Raw Scalars

X25519 applies "clamping" to private keys: it clears the bottom 3 bits (forcing the scalar to be a multiple of 8, for cofactor safety) and sets the top bit (forcing a fixed bit length). This is fine for normal ECDH, but it breaks Sphinx's blinding mechanism.

Here's why: the blinding factor b_i is derived from a hash output, which is a random 256-bit value. If we use X25519SecretKey::from() to create a scalar from this hash, the clamping step modifies the value. But Alice, who's computing the shared secrets during packet construction, doesn't apply clamping to the blinding factors she uses the raw hash output as a scalar.

Result: the shared secret computed by Alice during construction doesn't match the shared secret computed by the mix node during processing. The MAC verification fails, and the packet is silently dropped. We debugged this for three days before finding the cause.

Our fix: use curve25519_dalek::Scalar directly (via Scalar::from_bytes_mod_order()), bypassing the clamping that X25519SecretKey applies. This is safe because Curve25519's Montgomery form handles small-subgroup points correctly the scalar multiplication maps them to the identity point, which produces a predictable (all-zeros) shared secret that fails MAC verification. No security is lost.

The Filler Off-By-One

The filler computation in the Sphinx paper is described abstractly. Implementing it correctly requires getting the keystream offsets exactly right. Specifically, at each step of the filler construction, the XOR with the keystream must start at offset ROUTING_INFO_SIZE - filler.len(), not ROUTING_INFO_SIZE - SHIFT_SIZE. These are different because the filler grows by SHIFT_SIZE bytes at each step, so filler.len() increases with each iteration.

Getting this off by one byte doesn't cause an immediate error the packet construction succeeds, and the first hop processes correctly. But the second or third hop sees garbage in the padding region, which corrupts the MAC, and the packet is silently dropped. Debugging "packet works for 1 hop but fails at hop 2" with no error message is exactly as fun as it sounds.

Replay Tag Design

The Sphinx paper suggests using H(shared_secret) as a replay tag. But the shared secret is derived from the ephemeral key and the node's private key if an adversary replays a packet with the same ephemeral key, the shared secret (and thus the replay tag) will be the same.

In our implementation, we use Blake3(ephemeral_key || mac || nonce) instead. This provides the same replay detection but is faster to compute (Blake3 is ~3x faster than SHA-256 for short inputs) and doesn't require storing the shared secret or any derived keys in the replay detection table. The tag is computed from header fields that are visible before any ECDH or key derivation, so replay detection can happen before the expensive cryptographic operations. This is a minor optimization, but at 30,000+ packets/second, "minor" optimizations compound.

The Zero-Nonce Safety Argument

Our ChaCha20 calls all use a zero nonce ([0u8; 12]). This looks alarming if you're used to the standard advice "never reuse a nonce." The safety argument:

Each ChaCha20 call uses a different key specifically, a key derived from the shared secret via SHA256(domain_tag || shared_secret). Since the shared secret is unique per packet per hop (determined by the ephemeral key and the node's static key), the key is unique per invocation. A (key, nonce) pair is used exactly once, which satisfies the one-time pad property. Using a fixed zero nonce with a unique key is cryptographically equivalent to using a unique nonce with a fixed key both produce a unique keystream.

This design choice simplifies the implementation (no nonce management, no nonce transmission) without sacrificing security.

ISO 7816-4 Padding: More Than Just Padding

Our payload padding scheme (ISO/IEC 7816-4: append 0x80, then fill with 0x00 to fixed size) has a subtle but important security role beyond just making payloads constant-size.

When a SURB reply arrives at the client, the client tries to decrypt it using each saved SurbRecovery. But the client also receives cover traffic packets that look identical to real replies (same size, same structure). How does the client know which packets are real replies and which are cover traffic?

The answer is the padding validation. After decrypting, the client looks for the 0x80 padding marker. Real replies have valid ISO 7816-4 padding (a 0x80 byte followed by only 0x00 bytes). Cover traffic, after "decryption" with the wrong keys, produces random-looking data that almost never has valid padding the probability of random data passing the validation is roughly 2^-8 per byte position, and with ~32KB of payload, the false-positive rate is vanishingly small.

We implement this check in constant time (using the subtle crate's ConditionallySelectable and ConstantTimeEq traits) to prevent timing-based padding oracle attacks. The padding validation scans the entire buffer without early returns, so the execution time is identical whether the padding is valid or not. This is overkill for our use case (the client is decrypting locally, not responding to a remote oracle), but it's good cryptographic hygiene.

Testing Sphinx: The Fuzzing Gauntlet

One final implementation note: Sphinx code is uniquely difficult to test because most failures are silent. A bug in the filler computation doesn't produce an error it produces a packet that works at hop 1 but fails the MAC check at hop 2, resulting in a silent drop. A bug in the blinding factor doesn't crash it produces the wrong shared secret at the next hop, resulting in a different silent drop.

We run 1,000-iteration fuzz tests on the parser (feeding random bytes to SphinxHeader::from_bytes() and ensuring no panics), multi-hop integration tests that verify end-to-end payload recovery through 3 real hops, and property-based tests that verify identical payloads produce unique (non-correlatable) packets. The property-based tests are critical they verify the fundamental Sphinx guarantee that the same payload encrypted twice produces packets with zero shared bytes.

The Post-Quantum Future: Outfox

Sphinx's reliance on Diffie-Hellman key exchange makes it vulnerable to quantum adversaries. A sufficiently powerful quantum computer running Shor's algorithm could break the discrete logarithm problem underlying ECDH, allowing retroactive decryption of all recorded Sphinx packets. For a financial system processing high-value transactions, this "harvest now, decrypt later" threat is not theoretical it's a matter of timeline.

NIST estimates [17] that cryptographically-relevant quantum computers could arrive by 2035-2040. A Sphinx packet encrypted today with X25519 and recorded by a nation-state adversary becomes fully readable the day that quantum computer boots up. Every routing path, every payload, every transaction retroactively exposed. For DeFi transactions involving large sums, the economic incentive to record and later decrypt is substantial.

Enter Outfox

Piotrowska et al. [9] (several of whom co-authored the original Loopix paper [8]) proposed Outfox in 2024: a post-quantum Sphinx replacement designed for layered mix networks. The key innovations:

KEMs instead of DH. Sphinx's per-hop ECDH and blinding is replaced with Key Encapsulation Mechanisms (KEMs). Each hop performs a KEM decapsulation rather than a scalar multiplication. This eliminates the blinding factor computation entirely Outfox does not need the algebraic structure that makes Sphinx's group element transformations work (and that required the GDH assumption [2]).

Any KEM works, including lattice-based schemes like ML-KEM (Kyber) that resist quantum attacks. This is the fundamental architectural difference: Sphinx requires a group (with multiplication, inversion, and DH-like properties), while Outfox requires only a KEM (encapsulate and decapsulate). KEMs exist in the post-quantum world. DH groups (as we know them) do not.

UC-framework security. Outfox is proven secure in the Universal Composability framework, a stronger model than Sphinx's game-based proof. The proof proceeds through a sequence of 16 cryptographic games [9], each replacing one real component with an ideal one, and showing that no efficient adversary can distinguish consecutive games. The UC guarantee means security holds even when the protocol is composed with arbitrary other protocols important for a system like ours where Sphinx interacts with ZK proof generation, relayer economics, and on-chain settlement.

The 16-game proof sequence is notably more complex than Sphinx's original proof (which was a single-game reduction to DDH and even that turned out to need GDH [2]). But the UC framework provides stronger guarantees: composition with arbitrary protocols, adaptive corruption of parties, and security under concurrent execution. For a production system, these guarantees matter.

Compact headers for fixed topologies. In stratified topologies (which NOX, Nym, and Katzenpost all use), the route structure is constrained one node per layer, in order. Outfox exploits this to produce more compact headers than a general-purpose Sphinx packet would need for the same number of hops. Specifically, instead of encoding the full routing path, Outfox can encode only the node index within each layer (log2(N) bits per hop instead of a full address).

Performance: The Post-Quantum Tax

The critical question is performance. Our current per-hop processing is 30 µs with X25519. What happens with post-quantum KEMs?

The Outfox paper [9] reports Outfox-X25519 at approximately 31 µs per hop essentially identical to our Sphinx implementation, which makes sense because the per-hop crypto is similar (X25519 decapsulation vs. X25519 ECDH). But the post-quantum variants tell a different story:

+---------------------------------------------------------------------+
|                   Post-Quantum Per-Hop Costs                        |
|                  (Outfox paper estimates [9])                       |
+---------------------------------------------------------------------+
|                                                                     |
|  Variant                    Per-Hop    Overhead vs X25519           |
|  -------                    -------    ------------------           |
|  Outfox-X25519               ~31 µs     1.0x (baseline)            |
|  Outfox-ML-KEM-768          ~243 µs     7.8x                       |
|  Katzenpost Xwing (hybrid)  172.6 µs    5.6x                       |
|                                                                     |
+---------------------------------------------------------------------+

ML-KEM-768 adds roughly 8x overhead compared to X25519. That takes our per-hop processing from 30 µs to ~240 µs, and total 3-hop processing from 90 µs to 720 µs. On a single-threaded node, throughput drops from ~11,000 packets/sec to ~1,400 packets/sec.

Is 1,400 packets/sec enough? For our current scale, probably yes. For a network handling millions of daily transactions, it might not be. The performance trajectory of ML-KEM implementations is encouraging optimized implementations are getting faster as hardware support improves but the 8x overhead is non-trivial.

Migration Path

Our current plan is:

Continue with X25519 Sphinx for the immediate deployment. The threat is "harvest now, decrypt later" there's no immediate exposure.
Implement Outfox-X25519 as a drop-in replacement. Same performance, better security proof (UC framework vs. game-based under GDH).
Add Outfox-ML-KEM-768 as an optional upgrade path when post-quantum urgency increases.
Hybrid mode (X25519 + ML-KEM) as a conservative option secure against both classical and quantum adversaries, at the cost of ~3x overhead (the hybrid KEM encapsulates both and combines the shared secrets).

Outfox is planned for deployment in the Nym network. For NOX, migration to a KEM-based format is future work. Our current 95.4% public-key bottleneck (ECDH + blinding) means that a KEM transition will reshape the performance profile entirely.

The Header Size Question

One practical concern with post-quantum Sphinx variants: key sizes. An ML-KEM-768 public key is 1,184 bytes. A ciphertext is 1,088 bytes. Compare this to X25519's 32 bytes for both. In Sphinx, the group element in the header is 32 bytes. In an Outfox-ML-KEM packet, the equivalent field (a KEM ciphertext) would be 1,088 bytes 34x larger.

For a 3-hop packet, the header would need to accommodate 3 × 1,088 = 3,264 bytes of KEM ciphertexts, plus routing info, plus MACs. Our current header is 472 bytes. A post-quantum header could easily be 4,000+ bytes. In a 32KB packet, this still leaves ~28KB for payload adequate for most DeFi operations but tighter than the current ~32KB of payload space.

For systems using 2KB packets (like Nym and Katzenpost), the header expansion is more problematic. A 4KB header in a 2KB packet obviously doesn't work. This is one reason why Katzenpost's post-quantum work focuses on hybrid schemes (Xwing = ML-KEM-768 + X25519, which has smaller combined ciphertexts due to shared encapsulation structure) and why Outfox optimizes header size for stratified topologies.

Our 32KB packet size gives us breathing room here yet another advantage of the "big packets" decision, even if that decision was made for payload capacity, not header expansion.

What a Sphinx Packet Actually Guarantees

Let me be precise about the security properties, because "privacy" is not a single thing and most projects are sloppy about what exactly they're claiming.

Bitwise unlinkability: A packet entering a node and the corresponding packet leaving it share zero observable bytes. Different group element, different routing info, different MAC, different payload ciphertext. An observer watching all network links cannot correlate input and output packets by content.

Topological anonymity: A node processing a packet doesn't know whether it's the first, second, or third hop. The packet looks structurally identical at every position. (This is true in our stratified topology only if all layers have the same number of nodes otherwise packet routing patterns could leak position. We keep layers uniform.)

Replay detection: Each node computes a replay tag from unique header fields. If it's seen that tag before (stored in a data structure with false-positive-tolerant lookup), the packet is dropped. This prevents an adversary from replaying a captured packet to trace its path.

Integrity: The MAC at each hop covers the group element and routing info. Any modification between hops causes MAC verification failure at the next honest node. Modified packets are destroyed.

Forward secrecy (per-packet): Each Sphinx packet uses a fresh ephemeral key. Compromising a mix node's long-term private key after the fact does not reveal the contents of previously-processed packets the shared secrets were never stored, and the ephemeral keys were discarded after use. (This is true only if the node properly zeroizes shared secrets after processing. Our SurbRecovery struct uses #[derive(Zeroize, ZeroizeOnDrop)] for this reason.)

SURB privacy: Forward messages and SURB replies are bitwise indistinguishable. No observer can determine the direction of a message.

The Security Stack

Let me lay out the full privacy stack, because it's easy to lose sight of how the layers compose:

+---------------------------------------------------------------------+
|                    The Full Privacy Stack                            |
+---------------------------------------------------------------------+
|                                                                     |
|  Layer           What It Hides           Protocol Component         |
|  -----           --------------           -----------------         |
|  ZK Proofs       Transaction content      Noir circuits, Groth16    |
|  Sphinx          Packet content linkage   This entire blog post     |
|  Poisson Mixing  Timing correlation       Exponential delays        |
|  Cover Traffic   Traffic volume patterns  λ_P, λ_L, λ_D            |
|  Stratified      Route predictability     Random node selection     |
|    Topology                               per layer                 |
|  PoW             Spam/DoS resistance      SHA-256 partial preimage  |
|  SURBs + FEC     Reply anonymity +        Pre-built headers +       |
|                    reliability            Reed-Solomon coding       |
|                                                                     |
+---------------------------------------------------------------------+

Remove any one layer and the system has a concrete, exploitable weakness. This is why building privacy systems is so much harder than building confidentiality systems encryption gives you confidentiality, but privacy requires hiding the metadata around the encrypted content, and metadata lives in at least six different dimensions.

All of this machinery exists to make one thing possible: a DeFi transaction where nobody not even the person executing it on-chain knows who sent it. The Sphinx packet format is the foundation the thing that makes every message look identical regardless of content, direction, or purpose. Without Sphinx, the rest of the stack has nothing to build on.

Next post: we trace a real swap from intent to settlement, every packet, every proof, every gas payment. We'll see how the Sphinx packet we've dissected in this post carries a ZK proof through the mixnet, how the exit node validates it, how the relayer submits it to Ethereum, and how the SURB carries the receipt back.

If you've made it this far, you understand more about mixnet packet formats than 99% of the crypto industry. The Sphinx paper [1] is 6 pages of dense but beautiful cryptography. The implementation is ~800 lines of Rust. The gap between those two the filler off-by-ones, the clamping bugs, the constant-time requirements, the PoW bounds checks is where the real engineering happens. Papers describe the destination. Code navigates the terrain.

This is Part 3 of a 7-part series on metadata privacy for DeFi. Part 4: "What Happens When You Send a Private Swap" traces a real transaction end-to-end through the system.

References

[1] Danezis, G. & Goldberg, I. (2009). "Sphinx: A Compact and Provably Secure Mix Format." 2009 30th IEEE Symposium on Security and Privacy, pp. 269-282. DOI: 10.1109/SP.2009.15.

[2] Scherer, P. et al. (2023). "Provable Security for the Onion Routing and Mix Network Packet Format Sphinx." arXiv preprint arXiv:2312.08028.

[3] Diaz, C., Halpin, H., & Kiayias, A. (2021). "The Nym Network: The Next Generation of Privacy Infrastructure." Nym Technologies SA.

[4] Katzenpost Sphinx Specification. Angel, Y., Danezis, G., Diaz, C., Piotrowska, A., & Stainton, D. katzenpost.network/docs/specs/sphinx.html.

[5] Elle Mouton (2022). "Lightning Network Onion Routing: Sphinx Packet Construction." ellemouton.com. See also: BOLT-04 specification, lightningnetwork/lightning-onion (Go, production-grade modified Sphinx).

[6] Chaum, D., Das, D., Javani, F., Kate, A., Krasnova, A., de Ruiter, J., & Sherman, A. T. (2016). "cMix: Mixing with Minimal Real-Time Asymmetric Cryptographic Operations." IACR Cryptol. ePrint Arch., 2016/008.

[7] Danezis, G., Dingledine, R., & Mathewson, N. (2003). "Mixminion: Design of a Type III Anonymous Remailer Protocol." 2003 IEEE Symposium on Security and Privacy.

[8] Piotrowska, A. M., Hayes, J., Elahi, T., Meiser, S., & Danezis, G. (2017). "The Loopix Anonymity System." 26th USENIX Security Symposium, pp. 1199-1216.

[9] Piotrowska, A. M. et al. (2024). "Outfox: A Post-Quantum Packet Format for Layered Mixnets." arXiv preprint arXiv:2412.19937. Presented at WPES.

[10] NOX benchmark data: scripts/bench/data/per_hop_breakdown.json, surb_rtt.json, competitors.json. Criterion benchmarks on AMD Ryzen 7 9800X3D, Ubuntu 24.04, rustc 1.93.0, commit db6ea3c, 2026-03-02. Katzenpost numbers from Go testing.B nightly CI. Loopix from USENIX Security 2017 paper. Nym benchmarks exist in nymtech/sphinx repo but results are unpublished.

[11] Murdoch, S. J. & Danezis, G. (2005). "Low-cost Traffic Analysis of Tor." 2005 IEEE Symposium on Security and Privacy, pp. 183-195. DOI: 10.1109/SP.2005.12.

[12] Syverson, P., Tsudik, G., Reed, M., & Landwehr, C. (2000). "Towards an Analysis of Onion Routing Security." International Workshop on Designing Privacy Enhancing Technologies, pp. 96-114. Springer.

[13] Shoup, V. (2001). "OAEP Reconsidered." CRYPTO 2001, LNCS 2139, pp. 239-259. Springer. (Demonstrated the gap in the original OAEP security proof by Bellare & Rogaway.)

[14] Bernstein, D. J. (2006). "Curve25519: New Diffie-Hellman Speed Records." Public Key Cryptography (PKC 2006), LNCS 3958, pp. 207-228. Springer.

[15] Krawczyk, H. & Eronen, P. (2010). "HMAC-based Extract-and-Expand Key Derivation Function (HKDF)." RFC 5869. IETF.

[16] Anderson, R. & Biham, E. (1996). "Two Practical and Provably Secure Block Ciphers: BEAR and LION." International Workshop on Fast Software Encryption, LNCS 1039, pp. 113-120. Springer.

[17] National Academies of Sciences, Engineering, and Medicine. (2019). "Quantum Computing: Progress and Prospects." Washington, DC: The National Academies Press. DOI: 10.17226/25196.

[18] Das, D., Meiser, S., Mohammadi, E., & Kate, A. (2018). "Anonymity Trilemma: Strong Anonymity, Low Bandwidth Overhead, Low Latency Choose Two." 2018 IEEE Symposium on Security and Privacy, pp. 108-126. DOI: 10.1109/SP.2018.00011.

[19] Danezis, G. (2003). "Statistical Disclosure Attacks: Traffic Confirmation in Open Environments." Security and Privacy in the Age of Uncertainty, IFIP Advances in Information and Communication Technology, vol 122, pp. 421-426. Springer.

[20] Back, A. (2002). "Hashcash - A Denial of Service Counter-Measure." Technical report. hashcash.org/papers/hashcash.pdf.

[21] Flashbots. "MEV-Explore v1: MEV Dashboard." explore.flashbots.net. Cumulative realized extractable value data from Ethereum mainnet (2020-2023).