Trading
The OTC Desk as Infrastructure: How Wintermute and Cumberland Move $15B/Day
Crypto OTC at institutional scale: RFQ flows, quote engine mechanics, hedge execution timing, settlement windows, and how OTC fills wire into your exchange position and risk system.
When I was studying institutional crypto trading infrastructure to inform ZeroCopy’s design, Wintermute’s architecture was the most instructive reference. They run what is arguably the most sophisticated OTC operation in crypto - executing over $5B per day across hundreds of assets, with a technology stack that would be respectable at a top-tier traditional finance firm. Studying how they operate fundamentally shaped how I think about the relationship between OTC and exchange trading.
This post is about OTC desks as infrastructure - not the trading strategy, but the systems that must exist behind any serious institutional OTC operation.
What OTC Actually Means
OTC (over-the-counter) in crypto means direct bilateral trading between two parties, bypassing the exchange order book. There is no central limit order book, no public price discovery at the moment of the trade, and no anonymity - both sides know exactly who they’re trading with.
Why does OTC exist at scale? Because exchange order books can’t handle large trades without significant market impact. If a hedge fund wants to buy $50M of BTC on Binance spot, market-ordering that quantity would sweep multiple levels of the order book and push the price up by several percentage points. The fund ends up paying far above the pre-trade market price - what traders call slippage or market impact.
An OTC desk solves this by providing a single bilateral quote for the entire $50M at once. The OTC desk (Wintermute, Cumberland, Galaxy, etc.) commits to a price for the full size - and then hedges the residual risk on the exchange order books after the trade is done, using their own execution infrastructure to minimize impact.
The OTC client gets certainty of execution at a known price. The OTC desk earns the spread and takes the risk of hedging the resulting position.
The RFQ Flow
The Request-For-Quote (RFQ) process is the core protocol of OTC trading:
1. Client sends RFQ: “I want to buy 500 BTC. What’s your price?”
2. Desk receives RFQ: The quote engine ingests this, computes a fair value based on:
- Current exchange mid price across reference venues (Binance spot, Coinbase, Kraken)
- Volatility model - how much can price move in the time the quote is valid?
- Inventory - does the desk already hold BTC that they need to work off?
- Spread model - basis points of spread to charge given size and market conditions
3. Desk returns quote: A two-sided price (bid and offer) with a validity window. For 500 BTC, the validity might be 10-30 seconds. The quote looks like: “500 BTC at 81,400).
4. Client accepts or rejects: If the client accepts within the validity window, the trade is done at that price regardless of where the market has moved.
5. Desk executes hedge: Immediately after trade confirmation, the desk’s execution engine begins hedging. If they just sold 500 BTC at $81,400, they need to buy BTC back on exchanges to eliminate the directional exposure. They now have an inventory problem.
At Wintermute’s scale, this entire cycle - RFQ to confirmation to hedge initiation - happens in under a second for most trades. The validity window on large trades (>$100M) might extend to 30-60 seconds, but the expected execution is much faster.
The Quote Engine: Fair Value + Spread
The quote engine is the intellectual core of an OTC desk. It must solve several problems simultaneously:
Fair value estimation: What is BTC worth right now? At Wintermute’s scale, this is not just the Binance mid price. It’s a weighted composite of multiple venues with outlier rejection, updated on every tick from WebSocket feeds across exchanges. The composite mid is the baseline.
Volatility adjustment: A quote is valid for a time window. During that window, the price can move. If BTC has been trading in tight ranges (realized volatility low), a 30-second quote carries less risk. During high-volatility periods (news events, macro releases), the same validity window is far more dangerous. The quote engine widens spreads during high volatility.
Inventory adjustment: If the desk is already net long 200 BTC, another 500 BTC buy from a client means they’d be net long 700 BTC before hedging. The desk’s quote for the buy is adjusted to be less competitive (they’re not eager for more long exposure) while the sell quote becomes more competitive (they’d love the client to sell to them, which reduces their long inventory). This is inventory-adjusted quoting and it’s core to market-making profitability.
Size impact: Larger trades carry more hedging risk. A 10 BTC trade can be hedged instantaneously with zero market impact. A 500 BTC trade will move markets regardless of how carefully it’s executed. The spread widens with size to compensate for expected hedging slippage.
The output is a pair of prices: bid (desk buys) and offer (desk sells). The spread is the desk’s expected profit if they can hedge perfectly. The actual profit depends on execution quality.
The Execution Engine: Hedging the Residual
After a trade, the desk has residual inventory. They need to flatten this position as quickly as possible while minimizing market impact. This is the execution engine’s job.
The challenge: you can’t just market-order 500 BTC. That would move the price against you, eliminating the spread you earned on the OTC trade. The execution engine uses algorithms to fragment and time the hedges.
Common approaches:
TWAP (Time-Weighted Average Price): Break the 500 BTC into 50 orders of 10 BTC each, executed over the next 10 minutes. Simple but predictable - sophisticated counterparties can front-run a known TWAP algorithm.
Implementation Shortfall: Target the benchmark price (the market price at the time of the OTC trade) and minimize deviation from it. Aggressive when spread widens, passive when it tightens. More sophisticated than TWAP.
Liquidity-seeking algorithms: Monitor order book depth and size across multiple exchanges simultaneously. Execute where and when liquidity is available, balancing urgency against impact. Wintermute runs on 50+ venues simultaneously.
The unhedged time window is the critical risk metric. From the moment the OTC trade confirms to the moment the hedge is complete, the desk has directional market risk. If BTC moves 0.3% against them during a 3-minute hedge window, that 0.3% comes directly out of the spread they earned. At 120,000 of PnL variance on a single trade.
Settlement: The Operational Layer
The trade is done. The hedge is executing. Now the desk has to actually settle the OTC trade - transfer the assets.
For crypto-to-crypto trades (BTC/USDT), settlement is typically T+0 - same day, often within hours, via on-chain transfer or exchange transfer (if both parties use the same exchange’s internal transfers, which clear instantly with no on-chain fees).
For crypto-to-fiat trades (BTC/USD wire), settlement is T+1 - the wire transfer settles the next business day. During this window, the desk has settlement risk: the client might not pay (counterparty risk) or the wire might fail.
Large OTC desks mitigate settlement risk by:
- Pre-funded accounts: Major counterparties maintain pre-funded accounts with the desk; trades settle instantly against this balance
- Credit relationships: Top-tier institutional counterparties have credit lines; they can trade against future settlement
- Escrow: For large transactions without established credit, funds are escrowed in a trusted third party before trade confirmation
The settlement engine in an OTC desk’s infrastructure must:
- Track all open trades and their settlement status
- Monitor incoming asset transfers (on-chain confirmations, exchange internal transfers)
- Release outbound transfers only when inbound confirmation is received (delivery vs payment)
- Handle failed settlements - reversals, dispute workflows, and credit recovery
How OTC Connects to Your Exchange Position System
For a firm running both OTC and exchange trading (as most sophisticated firms do), the OTC trades must flow into the same position aggregation system as exchange fills.
The position vector for a BTC market-maker at end of day:
- Exchange positions: long 50 BTC net across Binance, Bybit, OKX
- OTC fills: short 75 BTC net from client facilitation trades today
- Net position: short 25 BTC
If your OTC fill system and exchange fill system are separate, you can’t see this net. The trader might look at exchange positions and think they’re long 50 BTC, unaware that OTC fills have them net short overall.
The integration architecture:
OTC Trade Confirmation
│
▼
OTC Fill Service
(normalize to internal FillEvent format)
│
▼
Position Aggregation Service
(same service that consumes exchange fills)
│
▼
Net Position Per Instrument
(used by risk limits, hedge engine, reporting)
The fill normalization is critical. An OTC fill from a client chat platform and an exchange fill from a WebSocket stream arrive in completely different formats. Your position aggregation layer needs a canonical FillEvent schema:
from dataclasses import dataclass
from datetime import datetime
from enum import Enum
class FillSource(Enum):
BINANCE = "binance"
OKX = "okx"
BYBIT = "bybit"
OTC_WINTERMUTE = "otc_wintermute"
OTC_CUMBERLAND = "otc_cumberland"
OTC_INTERNAL = "otc_internal"
@dataclass
class FillEvent:
"""
Canonical fill representation for position aggregation.
All fill sources (exchange + OTC) must normalize to this format.
"""
fill_id: str # Unique ID (exchange order ID or OTC trade ID)
source: FillSource # Where this fill came from
instrument: str # "BTC/USDT" normalized instrument name
side: str # "BUY" or "SELL"
quantity: float # In base currency
price: float # Execution price
fee_usd: float # Fee paid (negative = rebate received)
timestamp: datetime # UTC fill time
# OTC-specific fields (None for exchange fills)
otc_counterparty: str | None = None
settlement_status: str | None = None # "PENDING", "SETTLED", "FAILED"
settlement_deadline: datetime | None = None
class PositionManager:
"""
Aggregates fills from all sources into net positions.
This is the single source of truth for position state.
"""
def __init__(self):
self._positions: dict[str, float] = {} # instrument -> net qty
self._fill_log: list[FillEvent] = []
def process_fill(self, fill: FillEvent) -> None:
sign = 1.0 if fill.side == "BUY" else -1.0
delta = sign * fill.quantity
self._positions[fill.instrument] = (
self._positions.get(fill.instrument, 0.0) + delta
)
self._fill_log.append(fill)
# Alert on large OTC fills that create significant new exposure
if fill.source.value.startswith("otc_"):
net = self._positions[fill.instrument]
print(f"OTC fill processed: {fill.instrument} net position = {net:.4f}")
def net_position(self, instrument: str) -> float:
return self._positions.get(instrument, 0.0)
def all_positions(self) -> dict[str, float]:
return dict(self._positions)
The Latency Requirements
The quote is valid for a window. During that window, the market moves. The desk must hedge after the trade. Every millisecond of latency in the hedging execution adds expected cost.
For Wintermute running at $5B+ daily volume:
- Quote generation: sub-millisecond (pre-computed fair value, fast spread model)
- RFQ to quote delivery: 1-5ms via their API
- Trade confirmation receipt: depends on client’s connection, typically 10-100ms
- Hedge initiation: sub-100ms from trade confirmation (execution algo triggered immediately)
- Full hedge completion: seconds to minutes depending on size and market conditions
The risk window is the time between trade confirmation and hedge completion. For a 100M trade, it might take 5-10 minutes even with sophisticated execution.
During that risk window, the desk has naked directional exposure. If BTC moves 0.5% in 5 minutes (entirely normal), the expected PnL on the hedge is ±100M trade. The spread on the OTC trade might have been $150,000. The execution quality of the hedge determines whether the trade was profitable.
Why This Is Relevant If You’re Not an OTC Desk
Even if you’re building trading infrastructure for a hedge fund that only trades on exchanges - not running an OTC desk - understanding OTC mechanics matters.
OTC flow moves markets: Large OTC trades that get hedged on exchanges are a significant source of order flow. If Wintermute is hedging a large $100M buy they facilitated for a sovereign wealth fund, that hedging flow will appear on exchange order books as sustained buying pressure. Understanding that this flow pattern exists helps you model order flow more accurately.
Your institutional counterparties use OTC: If you’re building infrastructure for a fund that eventually trades at scale, some of your fills will come through OTC desks. Your execution stack needs to handle both fill sources without treating them differently from a position accounting perspective.
Fee optimization: OTC trades for large size typically price better than the market impact cost of executing the same volume on exchange. For certain size ranges, OTC is simply the economically superior execution venue. Your smart order routing logic should include OTC desk quotes as a venue option.
The OTC desk is not a separate universe from exchange trading - it’s a parallel execution layer for the same market, connected to exchange infrastructure at every step of the flow. The best institutional trading systems treat both as equivalent fill sources and aggregate them into a single position truth.
Continue Reading
Enjoyed this?
Get one deep infrastructure insight per week.
Free forever. Unsubscribe anytime.
You're in. Check your inbox.