Building a Smart Order Router for Fragmented Crypto Liquidity

At Akuna Capital, we maintained connections to 12 exchanges simultaneously. Not because we needed 12 venues for every trade, but because at any given moment, the best execution for a given order might be on any of them. The smart order router (SOR) was the component that decided: given this symbol, this side, this size, and this urgency - which venue, what order type, and what price?

Getting SOR design right is one of the highest-leverage engineering decisions in a crypto trading system. A poor SOR costs you on every single order - worse fills, higher fees, or missed opportunities. A good SOR compounds over time. At $10M/day notional, a 0.5 bps improvement in average execution quality is$ 5,000/day.

The SOR Decision Tree

The SOR is not a black box. Every routing decision should be explainable and deterministic for a given market snapshot. Here is the decision tree I use:

1. Is this symbol available on multiple venues?
   No → Use only available venue

2. For each available venue, compute: fee-adjusted effective price
   (bid or ask adjusted for taker fee, or offer adjusted for rebate)

3. Filter venues with insufficient liquidity at target price
   (available quantity at limit price < order quantity)

4. Filter venues currently at or near rate limit
   (weight used > 80% of limit, or order count > 85% of limit)

5. Among remaining venues: rank by fee-adjusted price
   Best price wins; tie-break on venue reliability score

6. For large orders (> venue_depth_threshold): split across venues
   Allocate proportional to available liquidity at each level

7. Check if taker vs maker matters for this order
   If urgency=HIGH: taker acceptable
   If urgency=LOW: restrict to venues where we can post maker

8. Execute

Fee-Adjusted Effective Price

The nominal best price on multiple venues is not the execution quality metric. The fee-adjusted effective price is.

For a taker buy order:

Effective price = ask_price × (1 + taker_fee_rate)

For a maker buy order:

Effective price = bid_price × (1 - maker_rebate_rate)

Fee structure by venue (approximate 2024 values, verify current rates):

Venue     Maker fee    Taker fee    Notes
-------   ----------   ---------    -----
Binance   -0.02%       0.05%        Rebate for makers (negative = rebate)
OKX       -0.02%       0.05%        Similar to Binance
Bybit     -0.01%       0.06%        Lower rebate, higher taker
Deribit   -0.01%       0.03%        Options: different model (see below)
Kraken    0.02%        0.05%        No maker rebate

Concrete example: BTC at $43,500. Buy order for 0.1 BTC.

Venue A ask: $43,500 (taker fee: 0.05%)
Effective price = $43,500 × 1.0005 = $43,521.75

Venue B ask: $43,502 (taker fee: 0.02%)
Effective price = $43,502 × 1.0002 = $43,510.70

→ Route to Venue B despite worse nominal price
Net saving: $11.05 per fill
At $10M/day notional: ~$2,500/day from fee-adjusted routing alone

In Python pseudocode:

from dataclasses import dataclass
from typing import Optional


@dataclass
class VenueFeeSchedule:
    maker_rebate: float  # Negative = you receive rebate
    taker_fee: float


@dataclass
class VenueQuote:
    venue: str
    bid: float
    bid_qty: float
    ask: float
    ask_qty: float
    fee: VenueFeeSchedule
    rate_limit_headroom: float  # 0.0-1.0, fraction of limit remaining


def fee_adjusted_price(
    nominal_price: float,
    is_buy: bool,
    is_taker: bool,
    fee: VenueFeeSchedule,
) -> float:
    """Lower is better for buys. Higher is better for sells."""
    if is_taker:
        fee_rate = fee.taker_fee
    else:
        fee_rate = fee.maker_rebate  # Negative means cost is negative = rebate

    if is_buy:
        return nominal_price * (1 + fee_rate)
    else:
        return nominal_price * (1 - fee_rate)


def select_venue_for_taker_buy(
    quotes: list[VenueQuote],
    quantity: float,
    min_rate_limit_headroom: float = 0.15,
) -> Optional[str]:
    """
    Select best venue for a taker buy order.
    Returns venue name, or None if no suitable venue available.
    """
    candidates = []

    for q in quotes:
        # Filter: insufficient liquidity
        if q.ask_qty < quantity:
            continue

        # Filter: rate limit too low
        if q.rate_limit_headroom < min_rate_limit_headroom:
            continue

        eff_price = fee_adjusted_price(
            nominal_price=q.ask,
            is_buy=True,
            is_taker=True,
            fee=q.fee,
        )
        candidates.append((eff_price, q.venue, q))

    if not candidates:
        return None

    # Sort by effective price (ascending for buys)
    candidates.sort(key=lambda x: x[0])
    return candidates[0][1]

Liquidity Slippage: When to Split Across Venues

For orders larger than the top-of-book liquidity, you need to decide: fill the remainder on the same venue at worse prices, or split across venues.

The slippage curve tells you the effective average price as a function of fill size:

def compute_slippage_curve(
    order_book: dict,  # {"bids": [[price, qty], ...], "asks": [[price, qty], ...]}
    side: str,
    max_qty: float,
) -> list[tuple[float, float]]:
    """
    Returns list of (cumulative_qty, avg_price) pairs.
    Shows effective average fill price as order size grows.
    """
    levels = order_book['asks'] if side == 'buy' else order_book['bids']
    curve = []
    cumulative_qty = 0.0
    cumulative_notional = 0.0

    for price, qty in levels:
        price, qty = float(price), float(qty)
        fill_qty = min(qty, max_qty - cumulative_qty)
        cumulative_qty += fill_qty
        cumulative_notional += fill_qty * price
        avg_price = cumulative_notional / cumulative_qty
        curve.append((cumulative_qty, avg_price))

        if cumulative_qty >= max_qty:
            break

    return curve


def should_split_order(
    quotes: list[VenueQuote],
    order_qty: float,
    side: str,
    max_acceptable_slippage_bps: float = 2.0,
) -> bool:
    """
    Determines if splitting across venues reduces slippage enough to justify
    the added complexity.
    """
    best_venue = quotes[0]
    single_venue_book = best_venue.ask if side == 'buy' else best_venue.bid

    # Simplified: check if we can fill the full order at top-of-book
    available_at_top = best_venue.ask_qty if side == 'buy' else best_venue.bid_qty

    if available_at_top >= order_qty:
        return False  # No split needed, full fill at top-of-book price

    # Compute expected slippage on single venue
    # (requires order book data, not just top of book)
    # This is where you need the full depth from order book reconstruction
    return True  # Simplified: split whenever we exceed top-of-book quantity

The splitting decision involves a real tradeoff: splitting reduces slippage but doubles the number of orders, which:

Uses twice the rate limit weight
Requires reconciling two fills instead of one
Creates partial fill risk (one venue fills, other doesn’t)
Introduces race conditions (market moves between the two order placements)

For most retail crypto strategies, splitting makes sense only when the order size exceeds $100K+ notional and the price impact of single-venue execution is measurable.

What Breaks When One Venue Degrades

The SOR must handle venue degradation gracefully. Degradation modes:

Rate limit approaching: Route around the venue. Remove it from the candidate set when rate_limit_headroom < 15%. Re-enable when headroom recovers.

Elevated latency: If order placement RTT for a venue exceeds 3× its normal baseline, route around it. Elevated latency means orders are arriving late and fills are happening at worse prices than intended.

WebSocket lag (sequence gap): When the order book for a venue is desynchronized (see Order Book Reconstruction at Scale), the SOR is operating on a stale book. Route around venues with stale books or increase the minimum acceptable liquidity threshold to account for uncertainty.

Venue downtime: The hardest case. If a venue is down and you have open orders on it:

You can’t cancel those orders (venue is down)
You don’t know if they’re still working, partially filled, or cancelled
Your position may be uncertain

The correct response: mark all outstanding orders on the degraded venue as “uncertain”, route all new orders to other venues, and reconcile position when the venue recovers. Do not assume orders were cancelled just because you can’t reach the venue - they may fill when connectivity is restored.

class VenueStatus:
    HEALTHY = "healthy"
    DEGRADED_LATENCY = "degraded_latency"
    DEGRADED_RATE_LIMIT = "degraded_rate_limit"
    STALE_BOOK = "stale_book"
    DISCONNECTED = "disconnected"


class SORVenueManager:
    def __init__(self, venues: list[str]):
        self._status: dict[str, str] = {v: VenueStatus.HEALTHY for v in venues}
        self._rate_limit_headroom: dict[str, float] = {v: 1.0 for v in venues}
        self._last_order_latency_ms: dict[str, float] = {v: 0.0 for v in venues}
        self._baseline_latency_ms: dict[str, float] = {}

    def get_eligible_venues(self) -> list[str]:
        """Return venues eligible for order routing."""
        return [
            v for v, status in self._status.items()
            if status in (VenueStatus.HEALTHY,)
        ]

    def update_order_latency(self, venue: str, latency_ms: float) -> None:
        self._last_order_latency_ms[venue] = latency_ms

        if venue not in self._baseline_latency_ms:
            self._baseline_latency_ms[venue] = latency_ms
            return

        baseline = self._baseline_latency_ms[venue]
        if latency_ms > baseline * 3:
            self._status[venue] = VenueStatus.DEGRADED_LATENCY
        else:
            # Exponential moving average update for baseline
            self._baseline_latency_ms[venue] = 0.95 * baseline + 0.05 * latency_ms

    def update_rate_limit(self, venue: str, headroom: float) -> None:
        self._rate_limit_headroom[venue] = headroom
        if headroom < 0.15:
            self._status[venue] = VenueStatus.DEGRADED_RATE_LIMIT
        elif self._status[venue] == VenueStatus.DEGRADED_RATE_LIMIT and headroom > 0.3:
            self._status[venue] = VenueStatus.HEALTHY

    def mark_book_stale(self, venue: str) -> None:
        self._status[venue] = VenueStatus.STALE_BOOK

    def mark_book_synchronized(self, venue: str) -> None:
        if self._status[venue] == VenueStatus.STALE_BOOK:
            self._status[venue] = VenueStatus.HEALTHY

The Latency Budget for SOR Decisions

The SOR adds latency to every order. The question is: how much is acceptable?

For co-located strategies (1-5µs to exchange):

SOR decision must complete in < 1µs
Requires C++ or Rust implementation
Algorithmic complexity must be O(n) or lower on number of venues
All market data must be in L1/L2 cache

For cloud-based strategies (10-200ms to exchange):

SOR decision can take up to 100µs
Python is acceptable for the routing logic
Market data doesn’t need to be cache-resident

For Dubai-to-Asia routing (90-175ms to exchange):

SOR can take up to 1ms - a rounding error relative to network RTT
Python implementation with full order book access is fine

My implementation: Python with asyncio, all order book data in memory, SOR decision takes ~50µs. At 175ms network RTT to Binance, this is < 0.03% overhead - negligible.

import time
from typing import Optional


async def route_order(
    symbol: str,
    side: str,
    quantity: float,
    urgency: str,  # "high" or "low"
    order_books: dict,  # symbol -> exchange -> book
    venue_manager: SORVenueManager,
    fee_schedules: dict[str, VenueFeeSchedule],
) -> Optional[dict]:
    """
    Full SOR routing decision. Returns order parameters for the selected venue.
    """
    t0 = time.perf_counter_ns()

    eligible_venues = venue_manager.get_eligible_venues()
    if not eligible_venues:
        return None

    books_for_symbol = order_books.get(symbol, {})
    candidates = []

    for venue in eligible_venues:
        book = books_for_symbol.get(venue)
        if book is None or not book.is_synchronized:
            continue

        if side == 'buy':
            nominal_price = book.best_ask()[0] if book.best_ask() else None
            available_qty = book.best_ask()[1] if book.best_ask() else 0
        else:
            nominal_price = book.best_bid()[0] if book.best_bid() else None
            available_qty = book.best_bid()[1] if book.best_bid() else 0

        if nominal_price is None:
            continue

        is_taker = (urgency == "high" or
                   (side == 'buy' and nominal_price <= book.best_ask()[0]) or
                   (side == 'sell' and nominal_price >= book.best_bid()[0]))

        eff_price = fee_adjusted_price(
            nominal_price=nominal_price,
            is_buy=(side == 'buy'),
            is_taker=is_taker,
            fee=fee_schedules[venue],
        )

        candidates.append({
            'venue': venue,
            'nominal_price': nominal_price,
            'eff_price': eff_price,
            'available_qty': available_qty,
            'is_taker': is_taker,
        })

    if not candidates:
        return None

    # Sort by effective price (ascending for buy, descending for sell)
    reverse = (side == 'sell')
    candidates.sort(key=lambda c: c['eff_price'], reverse=reverse)

    best = candidates[0]

    decision_latency_us = (time.perf_counter_ns() - t0) / 1000
    # Log decision latency for monitoring - alert if > 500µs

    return {
        'venue': best['venue'],
        'symbol': symbol,
        'side': side,
        'quantity': min(quantity, best['available_qty']),
        'price': best['nominal_price'],
        'order_type': 'MARKET' if best['is_taker'] and urgency == 'high' else 'LIMIT',
        'time_in_force': 'IOC' if best['is_taker'] else 'GTC',
        'decision_latency_us': decision_latency_us,
    }

How This Breaks in Production

1. Position drift when one venue fills and another doesn’t Symptom: Strategy’s net position doesn’t match expected. Reconciliation shows fills on Venue A but no fills on Venue B despite both orders being placed. Root cause: Split order - half on Venue A, half on Venue B. Venue B filled but the fill notification was delayed (WebSocket lag). Strategy placed a second order before receiving the Venue B fill, resulting in double the intended size. Fix: Track pending fills per venue explicitly. Don’t place follow-up orders until all pending fills from the current order batch are confirmed or timed out.

2. Hedge imbalance from stale routing decision Symptom: Delta-neutral strategy shows net delta exposure. One leg of a hedge filled, the other didn’t. Root cause: SOR selected Venue B for the hedge leg based on order book state from 200ms ago. By the time the order arrived, Venue B’s liquidity had moved. The hedge order filled partially or not at all. Fix: For latency-sensitive hedges, use IOC orders rather than GTC. An IOC that doesn’t fill in full is a clear signal to immediately re-hedge. A GTC that partially fills and sits in the book creates a longer-lived position discrepancy.

3. Rate limit headroom calculation wrong - venue over-routed Symptom: After 5 minutes of trading, all orders start failing with “429 too many requests” on a specific venue. Root cause: Rate limit headroom was computed from the last REST response’s X-MBX-USED-WEIGHT-1M header, but multiple concurrent orders were in flight. The actual consumed weight was higher than the last observed value. Fix: Track in-flight requests separately. Add estimated weight of pending requests to observed weight before computing headroom. Err conservative - if headroom is uncertain, route to the next best venue.

4. Fee-adjusted routing ignores maker/taker transition Symptom: During fast markets, SOR is routing to Venue A because of lower taker fees. But the strategy is actually posting maker orders (the price isn’t crossing). Venue B has better maker rebate. Root cause: SOR computes effective price using taker fee for all orders, even when the strategy intends to rest in the book. Fix: Pass urgency/intended order type to fee_adjusted_price. Use maker_rebate for intended maker orders, taker_fee for intended taker orders.

5. Venue down → all orders concentrated on remaining venues → rate limits Symptom: When one of three venues goes down, the other two hit rate limits within seconds. Orders start failing. Root cause: Order flow that was spread across three venues is now concentrated on two. Each venue’s rate limit was sized for 1/3 of the total order flow. At 1/2 each, limits are exceeded. Fix: Implement order flow shaping. When venue count drops from 3 to 2, reduce total order rate by 33% or raise risk thresholds to trade fewer symbols until capacity is restored.

6. SOR decision latency spikes under order book resync Symptom: Order placement latency (measured from signal to send) spikes to 5-10ms intermittently, correlating with order book resync events. Root cause: When an order book is resyncing, the SOR iterates over venues and calls book.is_synchronized, which acquires a lock. The resync process holds this lock while fetching the REST snapshot, blocking SOR decisions for 200-800ms. Fix: Use a non-blocking read for is_synchronized - a simple boolean flag without locking. Accept the possibility of routing to a briefly stale book (better than stalling all routing). Implement lock-free book status tracking.

For the order book data that feeds this SOR, see Order Book Reconstruction at Scale. For the exchange-specific quirks that affect routing decisions, see Binance Connectivity Deep Dive and OKX, Bybit, and Deribit API Guide. For the venue geography that determines routing fallbacks, see Exchange Co-Location in the Cloud Era. For the fee and rebate mechanics that drive fee-adjusted pricing, see Rebate Capture and Maker-Taker Dynamics.

Building a Smart Order Router for Fragmented Crypto Liquidity

The SOR Decision Tree

Fee-Adjusted Effective Price

Liquidity Slippage: When to Split Across Venues

What Breaks When One Venue Degrades

The Latency Budget for SOR Decisions

How This Breaks in Production

Continue Reading

Sovereign Trading Infrastructure: Why the Next Generation of HFT Will Run Inside Enclaves

On-Premise GPU vs Cloud for Trading AI: When the Math Tips

AI-Driven Execution Agents: BAML/Letta Patterns for Trading Workflow Orchestration