Security
Designing a Hot-Warm-Cold Wallet Tiering for a Trading Desk Holding $1B+
Framework and worked examples for hot-warm-cold wallet tiering at $1B+ scale: tier ratios, sweep schedules, signing latency tradeoffs, insurance requirements, and threshold math.
Trading desks holding 1B+ scale.
When I explain this architecture to engineers who have been thinking about the problem, the most common reaction is: “we were going in this direction but had not worked out all the numbers.”
Why You Need Tiers at All
The naive custody architecture is a single wallet: deposit funds, keep them there, withdraw when needed. This is what individuals do. It is catastrophically wrong for an institution.
The fundamental tension in custody is between security (keeping keys offline, requiring multiple humans, minimizing network exposure) and operational liquidity (being able to sign transactions quickly to service withdrawals, trading operations, and inter-exchange transfers).
A fully air-gapped cold storage system is maximally secure and operationally useless for a trading desk - every transaction requires a physical key ceremony taking 30-60 minutes and multiple staff. A fully online hot wallet has second-to-second signing latency but exposes the entire fund to any software vulnerability or network compromise.
The tier solution: keep the minimum necessary for day-to-day operations online (hot), a buffer for large transactions online but with human approval (warm), and the majority of funds in air-gapped storage (cold). You optimize each tier for its function, not for a single compromise between security and availability.
Tier Definitions and Ratios
The ratios below, calibrated for a trading desk with typical automated withdrawal and operational flow patterns:
Tier Characteristics % of Total AUM Target Balance
──────────────────────────────────────────────────────────────────────────────────────
Hot Online, automated, no human approval 2-5% $20M-$50M
Warm Online, human approval >threshold 10-20% $100M-$200M
Cold Air-gapped, ceremony required 75-85% $750M-$850M
The hot wallet balance is sized to cover:
- 24 hours of expected automated withdrawals (highest daily withdrawal over the past 30 days)
- Plus a 3x safety buffer for unexpected spikes
- Minus a minimum balance floor (hot wallet triggers a refill when it drops below this)
For a desk with a 90th percentile daily automated withdrawal volume of 8M × 3 = 30M with the floor at 10M).
Hot Wallet Architecture
The hot wallet signs transactions automatically, in software, without human intervention. It is the operationally live portion of the fund.
Key management: MPC-CMP (e.g., 3-of-5 shares distributed across internet-connected machines) or a dedicated signing service behind a policy engine. The threshold means no single machine compromise exposes the key. Policy engine enforces:
- Maximum transfer per hour: 1M)
- Destinations: whitelist only (warm wallet address, exchange cold addresses, pre-approved counterparties)
- Operation types: ETH/ERC-20 transfer, exchange deposits only - no contract calls except to the pre-approved bridge contract
- Velocity: maximum 100 transactions per hour
# hot_wallet_policy.py - policy engine for automated hot wallet transactions
from dataclasses import dataclass
from decimal import Decimal
import time
@dataclass
class TransactionProposal:
to_address: str
value_usd: Decimal
token: str
calldata: bytes # empty for pure transfers
class HotWalletPolicy:
def __init__(self, whitelist: set, hourly_limit_usd: Decimal):
self.whitelist = whitelist
self.hourly_limit_usd = hourly_limit_usd
self._hourly_spend: list = [] # (timestamp, amount) tuples
def evaluate(self, tx: TransactionProposal) -> tuple[bool, str]:
# Rule 1: destination must be whitelisted
if tx.to_address not in self.whitelist:
return False, f"Destination {tx.to_address} not in whitelist"
# Rule 2: no calldata (no contract interactions except whitelist)
if tx.calldata and tx.to_address not in self.allowed_contracts:
return False, "Calldata present on non-contract-whitelist address"
# Rule 3: hourly limit check
now = time.time()
cutoff = now - 3600
recent_spend = sum(amt for ts, amt in self._hourly_spend if ts > cutoff)
if recent_spend + tx.value_usd > self.hourly_limit_usd:
return False, f"Hourly limit ${self.hourly_limit_usd:,.0f} exceeded"
# Rule 4: minimum balance check (would hot wallet go below floor?)
# Implemented in the calling layer with current balance
self._hourly_spend.append((now, tx.value_usd))
return True, "approved"
Balance monitoring and auto-refill:
# hot_wallet_monitor.py
async def check_and_refill_hot_wallet():
balance = await get_hot_wallet_balance()
HOT_FLOOR = Decimal("10_000_000") # $10M - trigger refill
HOT_TARGET = Decimal("25_000_000") # $25M - refill target
if balance < HOT_FLOOR:
refill_amount = HOT_TARGET - balance
# Request warm wallet approval for refill
await request_warm_wallet_transfer(
to=HOT_WALLET_ADDRESS,
amount=refill_amount,
reason=f"Hot wallet refill: balance ${balance:,.0f} below floor",
)
# Human in the loop required for warm → hot sweep
Warm Wallet Architecture
The warm wallet is online but requires human approval for transactions above a threshold. It provides liquidity for large client withdrawals and funding sweeps without the operational overhead of a full key ceremony.
Key management: On-chain multisig (Safe, 3-of-5) for governance visibility + MPC for the signing infrastructure. The on-chain multisig provides auditability; the MPC reduces the single-signer risk.
Threshold design:
Transaction Value Approval Required
────────────────────────────────────────────────────────────────
< $50K Automated (policy engine + hot wallet path)
$50K - $1M 1 authorized approver (any of 5 named approvers)
$1M - $10M 2 authorized approvers (any 2 of 5)
> $10M 3 authorized approvers + compliance sign-off
Cold wallet sweep Full 3-of-5 with ceremony procedure
Approval latency targets:
Hot wallet refill from warm is a routine operation that should complete in <5 minutes. We built a Slack-integrated approval flow: the policy engine posts a message with a transaction summary; an approver clicks “Approve” in Slack (which calls the approval API); once the threshold is met, the warm wallet signs automatically.
For the threshold math: the 8,000 and a 95th percentile near 50K floor means only 5% of withdrawals require human approval - operationally sustainable.
Cold Storage Architecture
The cold wallet holds the majority of funds and is never connected to the internet during normal operations. Accessing it requires a physical key ceremony.
Key management: FIPS 140-2 Level 3 HSM (Thales nShield or equivalent) as the signing device. The HSM private key is backed up as Shamir secret shares split across:
- 4 geographic locations (e.g., London, Singapore, Dubai, New York)
- Each share requires 2 of the 3 named custodians for that location to be present
- 3 of 4 location shares required to reconstruct
This is a 2-of-3 at each location, 3-of-4 across locations - effectively a 6-of-12 scheme with geographic distribution.
Accessing cold funds:
The procedure for a team that has drilled this quarterly:
Cold Wallet Access Procedure:
1. Initiate: CTO + CFO sign a transfer authorization (paper document, witnessed)
2. Assemble: 3 custodians from 3 different geographic locations fly to a pre-designated
meeting location (we used a tier-IV data center with an air-gapped room)
3. Verify: all devices are air-gapped; no phones; camera-free room
4. Reconstruct: custodians insert hardware tokens, reconstruct key shares in the HSM
5. Sign: transaction constructed on air-gapped machine, verified by all 3 custodians,
signed by HSM
6. Broadcast: signed transaction moved to an internet-connected machine via USB
(read-only, verified file transfer), broadcast to network
7. Verify: all 3 custodians verify transaction receipt on a block explorer
8. Secure: key shares re-encrypted and returned to geographic locations
Total time: 4-8 hours depending on travel distances
This is not fast. It is not supposed to be fast. Cold wallet access should be sufficiently operationally painful that it only happens when genuinely necessary. For routine operations, you never touch cold storage.
Funding Sweep Schedule
The sweep schedule determines how funds flow between tiers:
Event Action Frequency
──────────────────────────────────────────────────────────────────────────────────
Hot wallet < $10M floor Auto-request warm top-up Continuous monitoring
Hot wallet > $40M ceiling Auto-request warm withdrawal Continuous monitoring
Warm wallet < $50M floor Alert to ops team Daily reconciliation
Warm wallet > $250M ceiling Schedule cold wallet transfer Weekly review
Cold wallet reconciliation Verify addresses + balances Monthly audit
The hot-warm sweep ceiling is important and often overlooked: if the hot wallet accumulates more than the target (e.g., because inflows exceeded outflows), excess funds should be swept to warm storage automatically. An oversized hot wallet increases single-incident loss if the hot wallet is compromised.
Insurance Implications
Lloyd’s of London institutional crypto coverage (the leading underwriter for crypto custody risk) prices premiums based on:
- Percentage in cold storage: premium multiplier drops significantly above 75% cold
- HSM certification: FIPS 140-2 Level 3 is the minimum; Level 4 (if available) gets a discount
- Geographic distribution: key shares in 3+ jurisdictions reduce premium
- Incident response plan: documented + tested (quarterly drill) gets a discount
- Penetration testing: annual third-party pentest of all online systems
For reference, typical premiums in 2024 for a $1B fund:
- 0% cold storage: 1.5-2.5% of insured value per year
- 75%+ cold storage with HSM: 0.3-0.6% of insured value per year
The premium difference between 0% and 75% cold storage on a 10-20M per year. Building the cold storage architecture is not just a security decision - it is a $10-20M per year financial decision.
How This Breaks in Production
Failure 1: Hot wallet floor set too high, warm wallet approval too slow. Hot wallet floor is $20M. Approval for warm-to-hot refill requires 2 humans. Both humans are in different time zones; one is asleep. The hot wallet drops below floor and cannot refill. Automated withdrawals start failing. Fix: hot wallet floor should be sized with the assumption that refill will take 4 hours (off-hours approval time). The floor must be high enough that you will never run out during that window. Additionally, implement an emergency single-approver override for refills during off-hours, with stricter destination restrictions.
Failure 2: Warm wallet approval phishing. An attacker who has compromised your internal communication channels (Slack, email) posts a fraudulent approval request that appears to come from the policy engine. A legitimate approver approves it, thinking it is a routine refill. The funds go to the attacker. Fix: all approval requests must carry a cryptographic signature from the policy engine’s signing key. The approval UI must verify this signature before presenting the request to the approver. Out-of-band verification (call the requester on a known number) for any request above $1M.
Failure 3: Cold storage backup keys stored insecurely. The Shamir shares are physically printed on laminated cards and stored in bank safety deposit boxes. One custodian dies unexpectedly. Their heir opens the safety deposit box, finds the laminated card, and inadvertently shows it to their estate lawyer. Fix: physical key material must be sealed in tamper-evident envelopes and stored in vaults that require dual control (2 named individuals) to access, not standard safety deposit boxes. Additionally, Shamir shares should be encrypted under an additional password known only to the custodian - so possession of the physical card alone is not sufficient.
Failure 4: Reconciliation failure catching a long-running theft. The warm wallet reconciliation runs daily. An attacker gains access to the warm wallet signing infrastructure and begins draining 15M is gone. Fix: set reconciliation thresholds as a percentage, not an absolute amount. Any outflow >0.1% of tier balance without a corresponding approval record should alert immediately. Run reconciliation hourly for the hot wallet, not daily.
Failure 5: Geographic distribution compromised by legal jurisdiction overlap. Three of your four cold storage locations are in the US (London, New York, Chicago). A US court order can compel all three simultaneously. With 3-of-4 shares in the US, the government can compel reconstruction without the fourth international location. Fix: ensure no single legal jurisdiction holds more than 1 of N shares for any threshold where N-1 shares from that jurisdiction could reconstruct. For a 3-of-4 scheme, at most 2 shares should be in any single jurisdiction.
Failure 6: Hot wallet policy engine has a time-of-check bypass. The policy engine checks the hourly spend limit, approves a transaction, then waits 50ms before the MPC signing nodes produce the signature. A second transaction arrives in that 50ms window, also passes the hourly limit check (because the first transaction has not yet been recorded), and is also approved. Both transactions sign and broadcast. The hourly limit is exceeded by 2x. Fix: implement an atomic reservation system: the policy engine must atomically reserve the spend against the limit before returning approval. Use a distributed lock or optimistic concurrency control with retry.
Related reading: MPC vs HSM vs Multisig covers the key management technology for each tier. Key Ceremonies and Quorum Approvals covers the cold storage ceremony procedure in detail. The Bybit $1.5B Hack Postmortem illustrates what happens when warm wallet controls fail.
Continue Reading
Enjoyed this?
Get one deep infrastructure insight per week.
Free forever. Unsubscribe anytime.
You're in. Check your inbox.