Skip to content

Security

Key Ceremonies, Quorum Approvals, and the Operational Choreography of $1B+ Custody

The complete key ceremony procedure for $500M+ scale custody operations: pre-ceremony checklist, air-gap procedures, quorum approval architecture, and catastrophic failure modes.

12 min
#key-ceremony #custody #quorum #cold-storage #operational-security #hsm #institutional

The first key generation ceremony I ran, I made mistakes I did not recognize until the second. The second, I identified a gap we had not documented. The third, the procedure was tight enough that the external auditor certified it as meeting Lloyd’s of London underwriting requirements.

A key ceremony is the single moment in the lifecycle of a custody system where everything is genuinely at risk. The keys do not exist before the ceremony. After it, they will exist in distributed form that no single party can misuse. But during the ceremony - when the entropy is flowing and the key material is briefly unified - every security assumption you have made is tested simultaneously.

This post documents the exact procedure, the architectural decisions behind it, and the failure modes I have seen in real key ceremonies at other firms.

What a Key Ceremony Is and Why It Is Uniquely Dangerous

A key generation ceremony is the process of creating the cryptographic keys that control your custody system. For a hot or warm wallet using MPC (multi-party computation), the ceremony is the distributed key generation (DKG) protocol run among the N parties. For cold storage using an HSM, it is the initial key generation inside the HSM and the creation of backup shares.

The ceremony is uniquely dangerous for three reasons:

  1. Irreversibility: the keys generated in a ceremony control real assets for potentially decades. A mistake during ceremony (weak entropy, incorrect backup, compromised participant) cannot be corrected without running a new ceremony and migrating all assets.

  2. One-time window: the full private key exists (or nearly exists) at exactly one point: during key generation. This is the only moment where a sufficiently sophisticated attacker could extract it. Before the ceremony, it does not exist. After it, it is split across multiple parties or secured in hardware that prevents extraction.

  3. Social engineering opportunity: the ceremony requires multiple specific people to be in the same place at the same time with specific hardware and credentials. This creates a predictable, high-value target for social engineering, physical coercion, or substitution attacks.

Pre-Ceremony Checklist

The checklist I use, developed after seeing the corners that get cut when this is not written down:

Logistics (2 weeks before):

  • Identify venue: secure data center with controlled access, no cameras in the ceremony room
  • Confirm attendee identities: passport + secondary ID for all participants, verified against employment records
  • Source hardware: air-gapped laptops (purchased new, not reused from any connected use), USB drives (new, sealed packaging), printer (not network-connected), paper shredder on-site
  • Acquire hardware: HSM if cold storage, hardware security keys for all participants
  • Schedule external auditor: at least one independent observer who is not on the key distribution list
  • Prepare backup documentation: Shamir share envelopes, pre-printed cover sheets, tamper-evident seals
  • Confirm legal: attorney on-call if any dispute about share distribution or participant identity

Hardware preparation (1 week before):

  • Air-gapped machines: fresh OS install from verified ISO (compare SHA-256 against official download)
  • Verify ISO: sha256sum ubuntu-24.04-desktop-amd64.iso against the published hash on Ubuntu’s website
  • No network interfaces: disable WiFi in BIOS, remove or disable NIC if possible
  • Install ceremony software from verified source: compare package hashes against published signatures
  • Prepare sealed envelopes: tamper-evident, numbered, recipient named on outside
# On the ceremony laptop, verify the ceremony software integrity before use
sha256sum key-ceremony-tool-v2.3.1.tar.gz
# Compare against the SHA-256 published on the project's release page
# AND against the SHA-256 you computed when you downloaded it last week
# Both must match

# Verify the signing key of the ceremony software
gpg --verify key-ceremony-tool-v2.3.1.tar.gz.sig key-ceremony-tool-v2.3.1.tar.gz
# Must succeed with the known developer key fingerprint

Day-of checklist:

  • Verify attendee identities (passport photo vs in-person)
  • Search room for recording devices (phone detection wand, visual inspection)
  • No phones, smartwatches, or connected devices in the ceremony room
  • All participants sign a statement of understanding before beginning
  • External auditor witness attestation form ready
  • Document start time

The Ceremony Procedure: Cold Storage HSM

For cold storage key generation using a Thales nShield HSM:

Phase 1: HSM initialization (30 minutes)

1. Verify HSM is new/factory-reset: inspect tamper-evident seals, verify serial number
   against purchase receipt

2. Initialize HSM Security World:
   new-world --min-acs-cards 3 --max-acs-cards 5 \
             --module-signing-keys 1

   This creates the Security World, requiring 3 of 5 Administrator Cards for
   administrative operations. The Administrator Cards are the "skeleton keys" for the HSM.

3. Distribute Administrator Cards:
   Each of the 5 card holders is present. The HSM generates a Shamir-shared key
   across the 5 cards. Each holder writes their PIN on a separate sealed card,
   stores the physical card in an envelope, seals it.

4. Verify: perform a test administrative operation requiring 3 cards.
   Uses cards from 3 different holders. Must succeed.

Phase 2: Key generation (20 minutes)

1. Generate the custody private key inside the HSM:
   generatekey simple recovery=yes type=ECDSA curve=SECP256K1 \
              ident="custody-cold-v1" \
              cardset=<new-operator-card-set>

   The key is generated inside the HSM. It never appears in plaintext.
   The HSM encrypts it under the Security World key, which itself is split
   across the 5 Administrator Cards.

2. Export the public key:
   nfkminfo --key-ident custody-cold-v1

   Record the public key (and derived Ethereum address) on paper.
   All participants verify the address.

3. Record the key ident and address in the ceremony log (paper, witnessed).

Phase 3: Backup share creation (60 minutes)

This is the most operationally complex phase:

1. Export the Key Blob: the HSM produces an encrypted key blob (the key material
   encrypted under the Security World key). This blob is useless without the
   Administrator Cards to decrypt it.

2. For geographic distribution, we create Shamir secret shares of the Administrator
   Card data. This requires all 5 Administrator Card holders to be present.

3. Each share is:
   - Printed on paper (not stored digitally)
   - Sealed in a tamper-evident envelope
   - Signed across the seal by all participants present
   - Labeled with the geographic location where it will be stored
   - The holder signs a chain-of-custody form

4. Transport: each share travels to its destination with the custodian.
   No two shares travel together.
   No share is ever photographed.
   The custodian confirms receipt of their share in the secure vault within 48 hours.

Phase 4: Verification (30 minutes)

1. Verify backup integrity: from 3 of the 5 shares, reconstruct the Administrator
   Card data, re-initialize the HSM from the backup, and verify it produces the
   same public key.

   IMPORTANT: This uses the backup, not the original. If the backup is wrong,
   you find out now, while all participants are present and you can re-run.

2. Test signing: generate a test transaction on a testnet, sign it using the HSM,
   verify the signature on-chain.

3. Document: external auditor signs the ceremony completion attestation.

Quorum Approval Architecture for Ongoing Operations

After the ceremony, routine operations use a quorum approval system. A representative tiering for a custody operation with multiple approvers:

Operation Tier            Threshold           Channels
───────────────────────────────────────────────────────────────────────────────
Automated (< $50K)        Policy engine       Software-only; no human
Warm (< $1M)              1-of-5 approvers    Command Center approval UI
Large (< $10M)            2-of-5 approvers    Command Center + OOB verification
Extraordinary (> $10M)    3-of-5 + compliance Command Center + phone call + legal
Cold storage access       4-of-7 + ceremony   Physical key ceremony

The Command Center approval flow:

# approval_service.py
from dataclasses import dataclass
from enum import Enum
import hashlib
import time

class ApprovalTier(Enum):
    AUTOMATED = "automated"
    WARM = "warm"
    LARGE = "large"
    EXTRAORDINARY = "extraordinary"
    COLD_CEREMONY = "cold_ceremony"

@dataclass
class ApprovalRequest:
    transaction_id: str
    amount_usd: float
    destination: str
    purpose: str
    requested_by: str  # system or human identity
    tier: ApprovalTier

    # Security: approval requests carry a cryptographic reference
    # so approvers can verify they are approving the real transaction
    transaction_hash: str  # SHA-256 of the raw transaction bytes

@dataclass
class ApprovalRecord:
    request: ApprovalRequest
    approver_id: str
    approver_hardware_key_id: str  # YubiKey serial number or HSM key ID
    timestamp: float
    signature: str  # ECDSA signature of (request.transaction_hash + approver_id + timestamp)

class QuorumApprovalService:
    THRESHOLDS = {
        ApprovalTier.AUTOMATED: 0,  # no approvals needed
        ApprovalTier.WARM: 1,
        ApprovalTier.LARGE: 2,
        ApprovalTier.EXTRAORDINARY: 3,
        ApprovalTier.COLD_CEREMONY: 4,  # plus ceremony procedure
    }

    def check_quorum(self, request: ApprovalRequest, approvals: list[ApprovalRecord]) -> bool:
        required = self.THRESHOLDS[request.tier]

        # Verify all approvals are for this specific transaction
        for approval in approvals:
            assert approval.request.transaction_hash == request.transaction_hash, \
                "Approval references wrong transaction"

        # Verify all approvers are distinct (no double-counting)
        approver_ids = [a.approver_id for a in approvals]
        assert len(approver_ids) == len(set(approver_ids)), "Duplicate approver"

        # Verify hardware key signatures
        for approval in approvals:
            self._verify_hardware_signature(approval)

        # Check count
        return len(approvals) >= required

    def _verify_hardware_signature(self, approval: ApprovalRecord):
        # The signature must have been produced by a hardware security key
        # This prevents software-only approval forgery
        # Implementation: verify ECDSA signature against known hardware key public keys
        ...

The critical property: every approval must carry a signature from a hardware security key (YubiKey, HSM). An approver who is social-engineered cannot approve a transaction by clicking a button in a compromised UI - they need to physically touch their hardware key. This is the same class of control that MiFID II and most custodial insurance policies require.

Out-of-Band Verification for Large Transactions

For transactions above $1M, we implemented out-of-band verification: the approval request is displayed in the Command Center UI, but the approver must also call a dedicated verification line where an automated system reads back the transaction hash (first 8 hex characters). The approver confirms the hash on the call matches the hash in the UI.

This prevents the Bybit-class attack: even if the UI is showing a different transaction than the one being signed, the OOB verification would reveal the mismatch.

# oob_verification.py
def generate_oob_code(transaction_hash: str) -> str:
    """
    Generate a short, human-readable verification code for OOB confirmation.
    """
    # Take first 4 bytes of hash, convert to decimal, format as 8-digit code
    hash_bytes = bytes.fromhex(transaction_hash[:8])
    code = int.from_bytes(hash_bytes, 'big') % 100_000_000
    return f"{code:08d}"

# Verification call script:
# "This is the ZeroCopy verification line. Your transaction code is
#  [read code digit by digit: 3-7-2-1-9-5-4-8].
#  Please confirm this matches the code shown in your Command Center."

Catastrophic Failure Modes in Real Key Ceremonies

Failure at real firms, documented:

FTX: majority of signing authority concentrated. Sam Bankman-Fried reportedly had personal control over a majority of FTX’s signing infrastructure, effectively bypassing the multisig governance structure. When the firm collapsed, the funds were moved by a small number of people. The ceremony had defined a formal threshold, but operational practice had circumvented it. Fix: independent compliance monitoring of who actually approves transactions vs. who is theoretically required to. Regular audits of approval logs vs. threshold requirements.

Wormhole bridge: upgrade key stored in plaintext. The $320M Wormhole hack exploited a “Solana guardian” key that had been carelessly managed - the upgrade authority was not protected with sufficient threshold requirements. Fix: every privileged key must have its custody level explicitly defined and audited, not just the primary signing keys.

Multiple DeFi protocols: seed phrases in Notion/Google Docs. Several post-incident analyses have revealed that recovery seeds were stored in Notion databases or Google Drive files, sometimes in plain text, sometimes in files labeled “DO NOT SHARE”. Fix: physical-only for mnemonic storage. No cloud storage, ever.

Specific to ceremony execution:

Failure 1: Entropy source not verified before ceremony. The ceremony machine has been running for 2 hours with no user input. The entropy pool may have degraded. Key generation proceeds with lower entropy than required. Fix: verify entropy pool depth (cat /proc/sys/kernel/random/entropy_avail) before key generation. Consider using a hardware RNG (YubiKey 5’s TRNG, a dedicated HRNG device) to supplement the OS entropy pool.

Failure 2: Share holder refusing to return share. One of the 5 Administrator Card holders leaves the company and refuses to return their card. You have 4 of 5 cards - just below your 3-of-5 threshold. Fix: the threshold should be set so that the departure of one participant does not block operations. 3-of-5 means you can lose 2 without losing access. But with 3-of-5, losing 2 participants means you are one bad day away from lockout. Design for 3-of-7 or 4-of-9 for cold storage.

Failure 3: Ceremony software compromised between download and use. The ceremony software was downloaded on a connected machine 2 days ago. A supply-chain attack was injected between your download and today’s ceremony. Fix: verify the hash of the ceremony software immediately before running it, against a hash you stored when you originally verified it (in a separate secure location). Two independent verifications, from different original sources, are better than one.

Failure 4: Physical coercion during ceremony. One of the participants is physically coerced before arriving at the ceremony (“tell us the PIN or we will harm your family”). This attack is documented in DPRK operations. Fix: ceremony participants should not be publicly known. The ceremony should use a cover story (“board meeting”, “compliance review”). Participants travel independently to the venue. Any participant who feels unsafe should have a pre-agreed abort signal.

Failure 5: Backup shares stored in same jurisdiction as HSM. All Shamir backup shares are stored in vaults in the same city as the HSM. An earthquake, data center fire, or court order affecting that city could simultaneously destroy the HSM and all backup shares. Fix: geographic distribution across multiple legal jurisdictions is not optional. At minimum: 3 continents, 3 different legal systems, such that no single jurisdiction can access more than 1 share (for a 3-of-N scheme).

Failure 6: Recovery procedure not tested in 18 months. Backup procedure says “stores exist and are accessible.” In practice, one vault has changed its access procedure and the current vault manager does not know the original enrollee. Vault access for that share requires a week of paperwork. Fix: full recovery drill quarterly. Do not just verify that shares exist - verify that you can access and use them within the expected time window, using people who are currently authorized to access them.


Related reading: Hot-Warm-Cold Wallet Tiering covers how the cold storage keys fit into the broader tiering architecture. MPC vs HSM vs Multisig covers the key management technologies used in each tier. The DPRK Threat Model covers state-actor attacks specifically targeting ceremony participants.

Continue Reading

Enjoyed this?

Get one deep infrastructure insight per week.

Free forever. Unsubscribe anytime.

You're in. Check your inbox.