MPC vs HSM vs Multisig: A Decision Framework for Custody Key Management

The choice of custody architecture is not primarily a technology decision. It is a threat model decision. Every approach makes specific assumptions about who the adversary is, how they would attack, and which controls are operationally sustainable. If your threat model is wrong, your controls are wrong - regardless of how technically sophisticated they are.

I have evaluated all three approaches across institutional trading and ICO platform contexts, and the differences in what each approach optimizes for are more significant than most comparisons acknowledge.

Multisig: On-Chain Enforcement with On-Chain Visibility

On-chain multisig (Safe, Gnosis Safe, native Bitcoin multisig) implements the key threshold in the consensus layer. A transaction requires signatures from M of N designated keys. The contract enforces this; no off-chain policy engine is required.

Strengths:

The security guarantee is verifiable on-chain by anyone
No single custodian can steal funds without compromising M independent keys
Simple to audit: the multisig contract is public code with known behavior
No specialized hardware required - any signing device (Ledger, Trezor, cold storage) works
Decentralized by default: signers can be geographically and organizationally independent

Weaknesses:

Every signer address is public on-chain. An attacker who wants to steal can enumerate all signers and target them for social engineering or physical attacks
Transaction metadata is public: the amount, destination, and timing of every transfer is visible to anyone monitoring the blockchain
No programmable policy without a separate policy contract
As the Bybit hack demonstrated: multisig + hardware wallets is not sufficient against UI-layer compromise attacks
Key rotation requires broadcasting a new multisig configuration on-chain (visible, slower, gas cost)

The on-chain visibility of signer addresses is the most underestimated weakness. For any firm with public custody operations (any exchange, any public DeFi protocol, any ICO platform), the signer addresses enumerated from on-chain data become targets for social engineering and physical compromise attempts. This is documented across multiple custody teams and is not a hypothetical.

Appropriate use cases: on-chain governance of smart contracts, DeFi protocol administration, DAO treasuries, any operation where the transparency of on-chain multisig is a feature rather than a bug.

HSM: FIPS-Validated Air-Gapped Key Material

Hardware Security Modules (Thales nShield, AWS CloudHSM, Yubico YubiHSM) are physical or cloud-based devices that store key material in tamper-resistant hardware. The key never leaves the HSM in plaintext - operations (sign, decrypt, derive) are performed inside the device, and only the result is returned.

FIPS 140-2 Level 3 (the standard for most enterprise HSMs) requires:

Physical tamper-evidence: the device must detect and respond to physical intrusion
Key zeroization on tamper: if the tamper sensor fires, the keys are destroyed
Role-based access control: separate authentication for HSM administration vs key operations
Audit logging: all key operations are logged to an immutable audit trail

Strengths:

FIPS certification provides regulatory acceptability (SOC 2, ISO 27001, MiFID II all accept HSM key storage)
Physical air-gap: key material exists only in the hardware device; software compromise cannot extract it
Tamper response: physical attack triggers key destruction rather than extraction
Performance: CloudHSM can sign at ~1,000 RSA/ECDSA operations/second per partition

Weaknesses:

Single point of failure: if the HSM is lost or destroyed, the key is gone. Backup requires a complex key ceremony
Operationally heavyweight: HSM administration requires specialized knowledge, and mistakes are expensive (factory reset loses all keys)
Limited programmability: you can perform standard cryptographic operations, but complex custody policies (“require a second approval if transfer >$1M, unless to a whitelisted address”) require off-HSM policy logic
Vendor lock-in: migrating from Thales to AWS CloudHSM requires a key ceremony and re-enrollment
Cost: Thales nShield Connect XC starts at ~ $30,000. AWS CloudHSM is$ 1.60/hr ($1,168/mo) per cluster

Appropriate use cases: cold storage backup for keys that have long-term permanence, regulatory compliance environments where “FIPS 140-2 Level 3” is a contractual requirement, any operation where the operational overhead is acceptable and the threat model includes physical attackers.

MPC-CMP: Distributed Key Material with No Single Point of Compromise

Multi-Party Computation for key management (Fireblocks, Zengo, Qredo, or open-source implementations like tofn/GG20) distributes the private key as shares among multiple co-signers. The critical property: the full private key is never assembled on any single machine. Signing happens through a cryptographic protocol where each party contributes its key share, and the final signature is produced without any party learning the complete key.

GG20 (Gennaro-Goldfeder 2020) is the most widely deployed MPC protocol:

Key generation: uses Paillier homomorphic encryption to distribute key shares without revealing them
Signing: requires communication between all threshold parties (2-round for GG20)
Key share never assembled: the protocol produces a valid ECDSA signature without reconstructing the private key

FROST (Flexible Round-Optimized Schnorr Threshold):

Single-round signing (faster than GG20 for Schnorr-compatible chains: Bitcoin, Solana)
More efficient at high N values (e.g., 10-of-15 threshold)
Standardized in IRTF RFC 9591 - recommended for new implementations

Strengths:

No single point of compromise: compromising one signer or one co-signing node does not expose the key
Programmable policy: most MPC implementations allow transaction policies at the API layer
Invisible to the blockchain: on-chain, an MPC transaction looks identical to a single-signer transaction
Key rotation: re-sharing the key without changing the on-chain address is possible (proactive refresh)
Performance: Fireblocks reports <1 second for typical transactions; tofn-based implementations typically sign in 50-200ms

Weaknesses:

Cryptographic complexity: GG20 has had implementation vulnerabilities (the 2022 zero-knowledge proof bypass in several open-source implementations). Rolling your own is dangerous.
Network dependency: threshold signing requires all participating co-signers to be online simultaneously. If a signer goes down, you cannot sign until it recovers or you re-configure the threshold.
Trust model is different: you are trusting the MPC software and network rather than physical hardware. Compromise of the MPC service (Fireblocks has had security incidents) affects all customers.
Audit complexity: proving to a regulator that keys were never assembled requires auditing the cryptographic protocol implementation, not just the hardware

The Decision Matrix

Dimension                 Multisig        HSM                 MPC-CMP
────────────────────────────────────────────────────────────────────────────────────
Security model            M-of-N keys    Physical tamper      Threshold shares,
                          any device     resistance           never assembled
On-chain visibility       Public          Not applicable       Private (single sig)
Key compromise             M-of-N req.    Zeroizes on tamper  Any one share leaked
                                                               → still safe
Physical attack resist.   Low (signers   High (FIPS L3)       Medium (distributed)
                           are known)
Regulatory acceptance     Accepted        Best (FIPS cert)    Accepted (improving)
Operational overhead      Low             High                 Medium
Programmable policy       Off-chain only  Off-device only      Built-in (most vendors)
Key rotation              Costly          Ceremony required    Proactive refresh
Signing latency           Depends/human  <100ms               50ms - 2s
Signer availability       All M online   HSM online           Threshold online
Cost                      ~Free           $15K-$30K+/HSM       $10K-$100K+/yr vendor
Single-point failure      No             Yes (backup needed)  No
Suitable for HFT          No (human req.) Yes (if automated)   Yes
Suitable for cold storage Yes             Yes                  Yes

Custody for ICO launches: the transient-holding problem

The standard MPC-vs-HSM-vs-multisig comparison assumes you are optimizing for long-term, ongoing custody: keys that will hold assets for months or years, signing transactions continuously. But there is a different problem that changes the optimization function entirely: transient custody.

In an ICO whitelabel context, the custody window is bounded. You are holding client and contributor funds during the pre-launch window - from close of the contribution round to the moment the project team is operationally ready to take over. That handoff is the critical event. The question is not “how do we hold these keys safely for the next five years?” but “how do we hold these keys safely for the next 90 days, and hand them off cleanly with cryptographic evidence of clean transfer?”

This changes what you optimize for:

MPC is engineering overhead you do not need. MPC’s main advantage is eliminating a single point of key compromise across ongoing operations. For a bounded window with a hard handoff date, that complexity is not justified. You are not running a continuous signing infrastructure; you are running a holding operation with a defined exit.
HSM gives you the audit trail that the handoff requires. Every operation logged to an immutable store. FIPS 140-2 Level 3 certification that the auditor and the project team’s legal counsel can point to. Clean attestation that key material was never extracted in plaintext. When you hand keys over, the HSM’s audit log is the evidence of responsible custody. That is the record the project team needs for their own compliance story and what regulators reviewing the ICO want to see.
Multisig at the handoff layer. On-chain multisig at the transfer point makes the handoff itself cryptographically verifiable. The transition from Upside’s custody to the project team’s operational control is recorded on-chain. The regulators doing post-launch compliance review can see the chain of custody without relying on any party’s self-report.

The transient ICO custody architecture (illustrative):

Smart-contract deployment pipeline
  → multi-chain deployment to project's target chains
  → per-chain KYC/AML and compliance gate
  → funds held in HSM-backed signing (pre-launch window)
  → multi-jurisdictional due diligence checks
  → project team go-live readiness gate
  → multisig handoff ceremony (Upside's custody → project ops)
  → operator-of-record transfer with on-chain audit trail

The $500M+ figure represents cumulative funds shepherded through ICO launches across the 16-month engagement, not assets under ongoing management. Each launch was a bounded transient custody operation. Zero breaches across all of them.

Recommendations by Firm Size

Under $100M AUM:
  Use Fireblocks or Qredo (MPC-as-a-service)
  Reason: operational overhead of building and maintaining MPC is not justified
  Cost: $20K-$60K/yr for Fireblocks depending on transaction volume
  Risk: Fireblocks outage affects you; mitigate with backup signing path

$100M - $1B AUM:
  MPC (Fireblocks or custom) for hot/warm + HSM for cold backup
  Key ceremony for cold storage, documented and tested annually
  SOC 2 Type II audit requirement

Over $1B AUM:
  Custom TEE-based signing service (ZeroCopy/similar) + HSM cold storage
  Independent policy engine (not your custody provider's)
  Multiple signing providers for redundancy
  Regular penetration testing of the full signing flow
  Insurance (Lloyd's of London institutional crypto coverage)

How This Breaks in Production

Failure 1: GG20 zero-knowledge proof bypass (2022 vulnerability). Several open-source GG20 implementations had a vulnerability where a malicious co-signer could bypass the zero-knowledge proof that proves their key share is valid. A malicious insider who controlled one of the 5 co-signing nodes could manipulate the signing protocol to produce signatures that reveal information about other shares. Fix: use only GG20 implementations that have been patched after August 2022 (the disclosure date). Check your vendor’s security bulletins. If you use Fireblocks, they patched this. If you use tss-lib directly, verify the patch was applied.

Failure 2: MPC network partition causing signing unavailability. One of your 5 MPC co-signers is unreachable (network outage, machine down). With a 3-of-5 threshold, you still have 4 signers - enough to sign. But your MPC implementation requires all 5 to agree on the participant set before signing. You are locked out until the 5th comes back. Fix: implement a resharing protocol that allows the threshold to be temporarily reduced from 3-of-5 to 3-of-4 when a participant is known to be down. This is a complex protocol change - validate it carefully before relying on it.

Failure 3: HSM backup key export in cleartext. An administrator exports a wrapped key backup from the HSM to “test the backup procedure” and stores it on a laptop. The HSM’s tamper-resistance does not protect the exported backup. Fix: HSM key exports must always be wrapped under a key that itself requires M-of-N smartcard holders to decrypt. Document and enforce this in your HSM policy. Audit every key export event.

Failure 4: Policy engine bypassed via direct API call. The policy engine sits in front of the MPC signing nodes. An attacker who has compromised one of the signing nodes’ internal networks can call the signing API directly, bypassing the policy engine. Fix: the policy engine must be co-located inside the signing path, not just in front of it. Each MPC share should only produce a partial signature if the policy engine’s signed approval is included in the signing request - and the policy engine should be running in a TEE that the signing node can attest.

Failure 5: Multisig signer targeted via on-chain enumeration. An attacker queries the Safe contract, finds your signer addresses, cross-references them against ENS names and social media, identifies the signers, and begins a targeted social engineering campaign. This happened to at least two custody teams I am aware of in 2024. Fix: if you use on-chain multisig for high-value operations, do not associate your signer addresses with any human identities. Use dedicated signing addresses that have no transaction history linking them to your personal or organizational identity.

Failure 6: MPC signing node time-of-check to time-of-use race. The policy engine approves a transaction at T1. The transaction is held in a queue and signed at T2 (500ms later). Between T1 and T2, the destination address was added to your OFAC sanctions list by your compliance system. The signed transaction uses the approval from T1, which predates the sanctions list update. Fix: policy checks must run immediately before signature production, not before queuing. The time between policy check and signature must be bounded and monitored.

Related reading: AWS Nitro Enclaves for Wallet Signing covers how ZeroCopy implements the TEE policy engine layer. Hot-Warm-Cold Wallet Tiering covers the full tiering architecture for trading desks and long-term custody operations. Key Ceremonies and Quorum Approvals covers the operational choreography of the cold storage tier.

MPC vs HSM vs Multisig: A Decision Framework for Custody Key Management

Multisig: On-Chain Enforcement with On-Chain Visibility

HSM: FIPS-Validated Air-Gapped Key Material

MPC-CMP: Distributed Key Material with No Single Point of Compromise

The Decision Matrix

Custody for ICO launches: the transient-holding problem

Recommendations by Firm Size

How This Breaks in Production

Continue Reading

Sovereign infrastructure for AI agents handling capital: a practitioner's reference

Threat Modeling a Crypto Trading Desk: STRIDE Applied to a Real Multi-Venue Architecture

MiFID II Transaction Reporting in a Crypto Context: An Engineering Implementation Guide