FIX Protocol from First Principles: Sessions, Sequence Gaps, and Resend Logic

At Upside, where I was Head of Engineering, we managed $500M+ AUM and were MiFID II compliant. MiFID II’s transaction reporting requirements under Article 26 mandate that you submit execution reports for every trade in a reportable instrument within T+1. The practical implementation at our scale required FIX drop copy sessions to three separate trade reporting venues simultaneously. Getting FIX session management correct - not just “it mostly works” correct, but “survives a 3 AM sequence gap during a volatile market open” correct - was one of the harder infrastructure problems I’ve worked on.

This post covers FIX from first principles with emphasis on the session layer mechanics that most engineers learn only after their first production incident.

What FIX Actually Is

FIX (Financial Information eXchange) is a messaging standard for financial transaction communication. It exists in two layers:

FIX Session Layer (Transport Layer): Manages connectivity, sequence numbers, and message recovery. This is the part that most engineers struggle with. Session messages: Logon, Logout, Heartbeat, TestRequest, ResendRequest, SequenceReset, Reject.

FIX Application Layer: The actual business messages - NewOrderSingle, ExecutionReport, MarketDataSnapshotFullRefresh, etc. These are what your strategy cares about.

The session layer’s guarantee is simple but profound: every message sent has a sequence number, and every message must be received exactly once, in order, with no gaps. If a gap is detected, the session must either recover the missing messages or reset.

The Anatomy of a FIX Message

FIX messages are pipe-delimited key-value pairs. The delimiter is SOH (ASCII 0x01), but for readability I’ll use | here.

A FIX 4.2 NewOrderSingle (35=D) looks like:

Raw format:

8=FIX.4.2|9=148|35=D|49=CLIENTFIRM|56=EXCHANGE|34=1234|52=20240115-10:30:00.000|
11=ORDER-001|1=ACC001|21=1|55=AAPL|54=1|60=20240115-10:30:00.000|
40=2|44=175.00|38=100|59=0|10=123|

Parsed:

Tag 8  = FIX.4.2          (BeginString - protocol version)
Tag 9  = 148              (BodyLength - byte count from tag 35 to tag 10 delimiter)
Tag 35 = D                (MsgType - D = NewOrderSingle)
Tag 49 = CLIENTFIRM       (SenderCompID - your identifier)
Tag 56 = EXCHANGE         (TargetCompID - their identifier)
Tag 34 = 1234             (MsgSeqNum - sequence number)
Tag 52 = 20240115-10:30:00.000  (SendingTime - UTC)
Tag 11 = ORDER-001        (ClOrdID - client order ID, must be unique)
Tag 1  = ACC001           (Account)
Tag 21 = 1                (HandlInst - 1=automated)
Tag 55 = AAPL             (Symbol)
Tag 54 = 1                (Side - 1=Buy, 2=Sell)
Tag 60 = 20240115-10:30:00.000  (TransactTime)
Tag 40 = 2                (OrdType - 2=Limit)
Tag 44 = 175.00           (Price)
Tag 38 = 100              (OrderQty)
Tag 59 = 0                (TimeInForce - 0=Day)
Tag 10 = 123              (Checksum - sum of all byte values mod 256, 3 digits)

The checksum (tag 10) is a 3-digit, zero-padded, mod-256 sum of all characters in the message up to and including the SOH after tag 9. Your FIX engine computes and validates this on every message.

BodyLength (tag 9) is the byte count from the first character of tag 35 to the SOH delimiter after tag 10, inclusive. A BodyLength mismatch causes immediate session rejection.

The Logon Sequence

FIX session establishment follows a specific handshake. Here is the correct initiator-side flow:

Initiator sends Logon (MsgType=A):

8=FIX.4.2|9=...|35=A|49=CLIENT|56=SERVER|34=1|52=20240115-10:00:00|
98=0|108=30|10=...|

Tag 98 = 0  (EncryptMethod - 0=None; always 0 in modern FIX)
Tag 108 = 30 (HeartBtInt - heartbeat interval in seconds)

Sequence number on Logon is always 1 for a new session. If you’re reconnecting to a session that was already established (same SenderCompID/TargetCompID pair), you continue from where you left off - you do not reset to 1. Sending sequence 1 on a reconnect tells the counterparty “I have lost my state, please recover from 1.” The counterparty will respond with either acceptance or a ResendRequest for messages you’ve missed.

Acceptor responds with Logon (MsgType=A):

8=FIX.4.2|9=...|35=A|49=SERVER|56=CLIENT|34=1|52=20240115-10:00:00|
98=0|108=30|10=...|

If the acceptor’s expected sequence number doesn’t match what you sent, they respond with a Logout with an error message, then close the connection. You must reconcile sequence numbers before reconnecting.

Sequence Numbers: The Most Critical Invariant

The sequence number guarantee is: NextExpectedMsgSeqNum = LastReceivedMsgSeqNum + 1. If message 4 arrives and the last received was message 2, the session has a gap - message 3 is missing.

This is not a “might be a problem” situation. A sequence gap means you may have missed a trade execution, a cancellation, or a position update. The session must recover the gap before proceeding.

Gap detection triggers ResendRequest (MsgType=2):

When a receiver detects a gap (received seq 5, expected seq 3), it sends:

35=2|7=3|16=0|

Tag 7  = 3    (BeginSeqNo - start of gap)
Tag 16 = 0    (EndSeqNo - 0 means "resend everything from BeginSeqNo to now")

The sender responds with the missing messages, followed by a SequenceReset-GapFill (MsgType=4) to mark any administrative messages (Heartbeats, TestRequests) that don’t need replaying:

# If messages 3 and 4 were Heartbeats (don't need replay):
35=4|123=Y|34=3|36=5|

Tag 123 = Y    (GapFillFlag - Y means this is a gap fill, not a reset)
Tag 34  = 3    (MsgSeqNum - first gap-filled sequence)
Tag 36  = 5    (NewSeqNo - next sequence the receiver should expect)

This tells the receiver: “Messages 3 and 4 were administrative and don’t need to be resent. Jump ahead to sequence 5.”

For application messages (trades, cancels) that must be replayed:

# Resend the original NewOrderSingle with PossDupFlag=Y
35=D|34=3|43=Y|...|

Tag 43 = Y    (PossDupFlag - indicates this is a resent message)

The receiver applies resent messages and ignores any it has already processed (based on ClOrdID or ExecID uniqueness).

Heartbeat and TestRequest

Heartbeat (MsgType=0) is sent when no other message has been sent within the HeartBtInt interval:

35=0|

TestRequest (MsgType=1) is sent when no message has been received within HeartBtInt + some tolerance:

35=1|112=TEST-001|

Tag 112 = TEST-001  (TestReqID - arbitrary identifier)

The counterparty must respond with a Heartbeat that echoes the TestReqID:

35=0|112=TEST-001|

If no response arrives within HeartBtInt + tolerance, the session is considered dead and must be disconnected then re-established.

The typical pattern in a FIX engine:

import asyncio
import time

class FIXSessionManager:
    def __init__(self, heartbeat_interval: int = 30):
        self._heartbeat_interval = heartbeat_interval
        self._last_sent: float = time.monotonic()
        self._last_received: float = time.monotonic()
        self._pending_test_req_id: str | None = None

    async def monitor_heartbeat(self) -> None:
        """Run as background task after Logon."""
        tolerance = 5  # Seconds beyond heartbeat interval to wait

        while True:
            await asyncio.sleep(1)
            now = time.monotonic()

            # Time since last sent message
            since_sent = now - self._last_sent
            if since_sent >= self._heartbeat_interval:
                await self._send_heartbeat()
                self._last_sent = now

            # Time since last received message
            since_received = now - self._last_received
            if since_received >= self._heartbeat_interval + tolerance:
                if self._pending_test_req_id is None:
                    # Send TestRequest
                    test_req_id = f"TEST-{int(now * 1000)}"
                    self._pending_test_req_id = test_req_id
                    await self._send_test_request(test_req_id)
                else:
                    # Already sent TestRequest, no response - session is dead
                    await self._disconnect_session("Heartbeat timeout")
                    return

    def on_heartbeat_received(self, msg: dict) -> None:
        self._last_received = time.monotonic()
        test_req_id = msg.get('112')
        if test_req_id and test_req_id == self._pending_test_req_id:
            self._pending_test_req_id = None  # TestRequest answered

    async def _disconnect_session(self, reason: str) -> None:
        await self._send_logout(reason)
        # Close TCP connection

Drop Copy Sessions

A drop copy session is a FIX session where one party receives copies of all trades from a third party. Used for:

Regulatory reporting (MiFID II Article 26 submissions)
Prime broker position tracking
Risk manager oversight without affecting execution

Drop copy architecture:

Execution Broker
       │
       ├── Primary session (you → broker, orders/cancels)
       │
       └── Drop copy session (broker → you, copies of all executions)

The drop copy session is subscribed-to only - you never send application messages on it, only session-level messages (Logon, Heartbeat, ResendRequest). All application messages come from the broker.

Critical drop copy design consideration: drop copy sequence numbers are independent of the primary session. A gap in the drop copy session means missed execution reports. You must implement ResendRequest recovery specifically for drop copy, because a missed execution report means your risk system has a wrong position.

At Upside, we had drop copy sessions to three reporting venues: our prime broker, an MTF, and a regulatory reporting facility. We maintained a strict invariant: no order was considered “confirmed” until it appeared in all three drop copy streams. If any stream had a gap, we halted new order placement until the gap was recovered or human review was complete.

FIX 4.2 vs 4.4 vs FIXT/5.0: Practical Differences

FIX 4.2: The most widely deployed version in equities and FX. Missing some fields that later became standard. Options are barely defined. Still required by many prime brokers.

FIX 4.4: Adds parties component (better counterparty identification), improves options handling, adds LastLiquidityInd for market maker reporting. Most MiFID II-era brokers support 4.4.

FIXT 1.1 / FIX 5.0: Transport and application layers separated. FIXT 1.1 is the transport (session layer), FIX 5.0 SP2 is the application layer. In practice, most firms haven’t moved to 5.0 - 4.4 covers most institutional needs. FIX 5.0 is primarily relevant if you’re implementing FAST encoding (binary) or working with complex options strategies.

For crypto trading: FIX is only relevant for institutional prime brokerage relationships, not for direct exchange connectivity (crypto exchanges use REST/WebSocket, not FIX). If you’re connecting to a crypto prime broker (B2C2, SFOX, Galaxy) or a crypto ECN that supports institutional connections, you’ll need FIX 4.2 or 4.4.

FIX Engine Libraries

QuickFIX/J (Java): The reference open-source implementation. Well-tested, widely deployed. Configure via session settings file, implement Application interface.

QuickFIX/n (C#/.NET): Port of QuickFIX/J to .NET. Used heavily in Windows-based trading environments.

QuickFIX (C++): The original C++ implementation. Fastest, but requires more manual work.

Onix Solutions / B2Bits: Commercial implementations with better support for FIX 5.0 and FAST encoding. Relevant for high-frequency equities.

For a Python implementation:

import quickfix
import quickfix44

class TradingApplication(quickfix.Application):
    """
    QuickFIX Python application implementation.
    Called by the session layer for all session and application events.
    """

    def onCreate(self, sessionID: quickfix.SessionID) -> None:
        self._session_id = sessionID

    def onLogon(self, sessionID: quickfix.SessionID) -> None:
        print(f"Session logged on: {sessionID}")

    def onLogout(self, sessionID: quickfix.SessionID) -> None:
        print(f"Session logged out: {sessionID}")

    def toAdmin(self, message: quickfix.Message, sessionID: quickfix.SessionID) -> None:
        """Called before admin messages are sent - add custom fields here."""
        msg_type = quickfix.MsgType()
        message.getHeader().getField(msg_type)
        if msg_type.getValue() == quickfix.MsgType_Logon:
            # Add Username/Password if required by counterparty
            message.setField(quickfix.Username("myuser"))
            message.setField(quickfix.Password("mypass"))

    def fromAdmin(self, message: quickfix.Message, sessionID: quickfix.SessionID) -> None:
        """Called when admin messages are received."""
        pass

    def toApp(self, message: quickfix.Message, sessionID: quickfix.SessionID) -> None:
        """Called before application messages are sent."""
        pass

    def fromApp(self, message: quickfix.Message, sessionID: quickfix.SessionID) -> None:
        """Called when application messages are received - your main handler."""
        msg_type = quickfix.MsgType()
        message.getHeader().getField(msg_type)

        if msg_type.getValue() == quickfix.MsgType_ExecutionReport:
            self._handle_execution_report(message)

    def _handle_execution_report(self, message: quickfix.Message) -> None:
        exec_type = quickfix.ExecType()
        message.getField(exec_type)

        if exec_type.getValue() == quickfix.ExecType_Fill:
            order_id = quickfix.ClOrdID()
            last_qty = quickfix.LastQty()
            last_px = quickfix.LastPx()
            message.getField(order_id)
            message.getField(last_qty)
            message.getField(last_px)
            print(f"Fill: {order_id.getValue()} qty={last_qty.getValue()} px={last_px.getValue()}")

    def send_new_order(self, symbol: str, side: str, qty: float, price: float) -> str:
        order = quickfix44.NewOrderSingle()
        order.setField(quickfix.ClOrdID(f"ORD-{int(time.time() * 1000)}"))
        order.setField(quickfix.Symbol(symbol))
        order.setField(quickfix.Side(quickfix.Side_BUY if side == "BUY" else quickfix.Side_SELL))
        order.setField(quickfix.OrdType(quickfix.OrdType_LIMIT))
        order.setField(quickfix.Price(price))
        order.setField(quickfix.OrderQty(qty))
        order.setField(quickfix.TimeInForce(quickfix.TimeInForce_DAY))
        order.setField(quickfix.TransactTime())  # Sets to current time

        quickfix.Session.sendToTarget(order, self._session_id)
        return order.getField(quickfix.ClOrdID()).getValue()

How This Breaks in Production

1. Sequence gap during exchange maintenance window Symptom: After scheduled exchange maintenance at 2 AM UTC, your FIX session reconnects but immediately gets flooded with ResendRequests. Processing the resent messages takes 10+ seconds, during which your risk system has stale position data. Root cause: During maintenance, some ExecutionReports (fills that completed just before the window closed) were sent but your session was closed before receiving them. The sequence gap spans those undelivered reports. Fix: Implement a “gap recovery” state that blocks new order placement until all pending ResendRequests are resolved. Log all resent messages and verify position consistency before resuming trading.

2. PossDupFlag mishandling causes double-booked fills Symptom: Your position is double what it should be. Two fills are booked for the same order, but the exchange confirms only one. Root cause: A resent ExecutionReport has PossDupFlag=Y (tag 43) set. Your handler doesn’t check this flag and books the fill a second time. The correct behavior is to check the ExecID for uniqueness and ignore already-processed reports. Fix: Maintain a set of processed ExecIDs (tag 17). Before booking any fill, check ExecID not in processed_exec_ids. Store ExecIDs persistently across session restarts.

3. HeartBtInt mismatch causes immediate disconnection Symptom: FIX session connects and immediately disconnects with a Logout mentioning “HeartBtInt mismatch”. Root cause: Your Logon sends HeartBtInt=30, but the counterparty requires a specific value (often 60 for drop copy sessions). Counterparty rejects sessions with non-compliant heartbeat intervals. Fix: Confirm the required HeartBtInt with your counterparty before implementing. Some prime brokers require 60 seconds, some allow values between 10-60. Make it configurable.

4. SequenceReset without GapFillFlag causes session to logout Symptom: After a ResendRequest, the counterparty sends a SequenceReset and your session immediately disconnects with a Logout. Root cause: You sent a SequenceReset (35=4) without GapFillFlag (tag 123=Y). A SequenceReset without GapFillFlag is a hard reset that resets the sequence to NewSeqNo. Many counterparties treat this as a protocol error for a mid-session reset. Fix: Always include GapFillFlag=Y on SequenceResets used to skip over administrative messages during gap fill. Only omit GapFillFlag on a full session reset (which is rare and should require human approval).

5. Drop copy lag causes risk system to operate on stale position Symptom: During a volatile market, your strategy places a new order while the drop copy stream is lagged. The new order fills, but the position update arrives late via drop copy. In the gap, the risk system thinks your position is smaller than it is, and allows another order that would push you over your position limit. Root cause: Drop copy sessions have independent sequence numbers and their own propagation latency. The execution broker queues drop copies asynchronously from execution. During load, the drop copy can be 2-10 seconds behind reality. Fix: Use the primary execution session’s ExecutionReports for real-time position tracking. Use drop copy as the reconciliation/audit feed, not as the primary position data source.

6. SenderCompID/TargetCompID swap causes connection rejection Symptom: FIX session fails to connect. The acceptor logs show “Invalid session ID” or “Unknown counterparty”. Root cause: In the FIX session configuration, SenderCompID and TargetCompID are from the perspective of the sender. Many engineers accidentally swap them (writing the acceptor’s ID as SenderCompID). From your perspective as the initiator: SenderCompID is YOUR identifier, TargetCompID is the EXCHANGE/broker identifier. Fix: Verify with your counterparty: “My SenderCompID is X, your TargetCompID is X, correct?” The asymmetry is intentional - the acceptor’s Logon response will have swapped SenderCompID/TargetCompID from your initiator Logon.

For the compliance context that drives FIX drop copy requirements, see MiFID II Clock Sync and Crypto. For the connectivity infrastructure around these FIX sessions, see TCP Tuning for Trading. For real-time synchronization across all venue connections, see SLOs for Trading Systems.