Skip to content

Security

SOC 2 Type II for Trading Platforms: What Auditors Actually Look At and Where Engineering Teams Fail

I passed a SOC 2 Type II audit at an ICO platform with $500M+ in client funds. Here is what auditors actually looked for - and the three findings that almost failed us.

15 min
#soc2 #compliance #audit #trading #security #fintech

Three weeks before the SOC 2 Type II audit window closed at Upside, our auditor sent a list. On it: 14 production deploys over the prior six months that had no corresponding change management ticket. Fourteen. Each one was technically unauthorized under our own declared controls. The engineers who made those deploys were not malicious - they were moving fast on a Saturday afternoon to patch a latency regression before Monday trading. But the audit does not care about intent. Under the Trust Service Criteria, “Security” requires that changes to production systems follow a documented approval process, full stop.

We spent two weeks in a controlled panic generating retroactive evidence, interviewing engineers about what they actually did, cross-referencing deploy timestamps against Slack messages and Jira comments. We passed - barely - with a finding noted in the report. That finding cost us a six-week remediation cycle and a follow-up audit. It was entirely preventable.

If you are about to start a SOC 2 Type II engagement for a trading platform, this is the guide I wish I had before that audit window opened.

What SOC 2 Actually Is

SOC 2 (Service Organization Control 2) is an AICPA auditing standard. It evaluates whether a service organization’s controls meet the Trust Service Criteria (TSC). The five criteria are:

  • Security (CC) - The system is protected against unauthorized access (logical and physical)
  • Availability (A) - The system is available for operation and use as committed
  • Confidentiality (C) - Information designated as confidential is protected
  • Processing Integrity (PI) - System processing is complete, valid, accurate, timely, and authorized
  • Privacy (P) - Personal information is collected, used, retained, disclosed, and disposed of in accordance with commitments

Most trading platforms only include Security and Availability in their audit scope. This is a mistake. If you handle client data (you do), Confidentiality and Privacy should be in scope. If you execute client orders (you do), Processing Integrity should be in scope. The reason firms exclude them is that they require more engineering work to evidence. The reason you should include them is that institutional clients increasingly ask for it, and the Processing Integrity criteria maps almost perfectly to the correctness guarantees trading infrastructure must provide anyway.

Type I vs. Type II: The Difference That Matters

A Type I report evaluates whether your controls are suitably designed at a single point in time. The auditor looks at your control descriptions, your policies, your architecture, and concludes: yes, these controls look like they would work. It is a snapshot audit of intent.

A Type II report evaluates whether your controls operated effectively over a period - typically six to twelve months. The auditor samples evidence from across that window and verifies that each control was actually followed, consistently, on every instance that required it. Type II is the only kind that institutional counterparties and regulators care about. A Type I report tells them you had a policy. A Type II report tells them you followed it.

This distinction has a critical implication for engineering teams: the audit window starts when the audit starts, not when you start preparing. If you engage an auditor in January and your audit window is January through June, then every production deploy from January onward needs a change ticket. You cannot retroactively create six months of clean evidence. The controls must be running.

The Five Trust Service Criteria: What Auditors Pull

Security (CC Controls)

The Security criteria are organized into the Common Criteria (CC) series. The evidence auditors actually pull:

CC6.1 - Logical access controls

  • A population list of all accounts with production access, pulled from your IAM system (AWS IAM, Okta, AD)
  • A sample of N accounts from that population, with evidence that each was provisioned through an approved access request
  • Terminated employee records cross-referenced against the access list - auditors look for accounts that stayed active after termination
  • MFA enforcement evidence: auditors want to see that MFA is enforced at the IdP level, not just “encouraged”

CC6.2 - Access provisioning and deprovisioning

  • New hire access requests with approver signatures
  • Offboarding tickets showing access revoked within your stated SLA (many firms commit to 24 hours - auditors will find the cases where you took four days)

CC6.3 - Role-based access and least privilege

  • Evidence that production access is limited to those who require it
  • For trading platforms specifically: the separation between people who can modify trading parameters and people who can push to production

CC7.1 - Vulnerability management

  • Quarterly vulnerability scans (authenticated network scans, not just external port scans)
  • Remediation evidence: for every critical/high finding, a ticket showing it was remediated within your stated SLA
  • Auditors compare the scan date to the ticket close date - gaps are findings

CC8.1 - Change management

  • Every production change linked to an approved change ticket
  • This is where trading firms fail most often

Availability (A Controls)

A1.1 - Performance monitoring

  • Your monitoring tooling (Datadog, Grafana, Prometheus) configured with uptime SLOs
  • Evidence of alerting firing when thresholds were breached
  • Incident records showing the alert was acted upon

A1.2 - Disaster recovery and backup

  • Backup configuration and execution logs
  • Evidence of periodic restore testing (not just backup creation - auditors want to see that you verified the restore works)
  • RTO/RPO commitments documented and tested

Processing Integrity (PI Controls)

This is where trading platforms have the most to prove and the most naturally available evidence.

PI1.1 - Completeness of processing

  • Trade reconciliation records: evidence that every order placed was either filled, partially filled, or rejected, with no silent drops
  • For algorithmic trading: logs showing strategy execution matched the intended signal

PI1.2 - Processing accuracy

  • Order validation logs showing that every order was validated against parameters before submission
  • Exception logs for rejected orders with reasons

The Three Most Common Findings for Trading Platforms

Finding 1: Production Access Without a Change Ticket

This is the finding that almost failed us at Upside. The pattern is consistent across trading firms: engineers have SSH or console access to production for operational reasons, and they use it. Sometimes to fix a problem. Sometimes to run a one-off query. Sometimes to tweak a config parameter under time pressure.

The audit control requires that every change to a production system - code, configuration, infrastructure - follows an approved change process. Direct SSH access that bypasses this process is a control failure, regardless of the justification.

The fix is architectural, not cultural. You cannot rely on engineers to file tickets under pressure. The controls that work:

  • Immutable infrastructure: production systems are deployed from artifacts, SSH access is disabled by default, changes require a new deployment
  • Break-glass access: emergency SSH access is available but every session is logged, alerted, and requires a ticket to be filed within two hours or access is auto-revoked
  • Deployment pipeline as the only path to production: if the only way to change production is through your CI/CD pipeline, and every pipeline run requires a ticket, the audit trail generates itself

Finding 2: Shared Service Account Credentials

Trading platforms almost universally have service accounts - credentials used by applications to authenticate to other services. The finding pattern: a single API key or password that is used by multiple services, rotated infrequently, and whose access is broader than any individual service requires.

Auditors care about this because shared credentials mean you cannot attribute an action to a specific service (or a specific compromised service). When your NATS broker logs show an event from service_account_prod, you cannot tell whether that was the order management system or the monitoring agent.

The fix: one credential per service, with access scoped to what that service actually needs, with rotation on a documented schedule. For trading systems specifically:

  • Exchange API keys: one key per strategy, rotated quarterly or on personnel change
  • Database credentials: each service has its own database user with access only to the tables it needs
  • Internal service auth: use short-lived tokens (JWTs with 1-hour expiry) issued by an identity service, not long-lived shared secrets

Finding 3: No Documented Incident Response Procedure That Was Actually Followed

Every trading firm has an incident response policy document. Most of them were written for the SOC 2 engagement and have never been consulted during an actual incident. Auditors know this, and they check.

The evidence pull for this control: when an incident occurred during the audit window, provide the incident timeline, the communications log, the root cause analysis document, and the post-incident ticket for remediation. Auditors then check that this sequence matches your documented procedure.

The failure mode: the incident response policy says “the on-call engineer will create an incident ticket within 30 minutes of detection and notify the security team within one hour.” During the audit, an incident occurred. The on-call engineer fixed it in 15 minutes via Slack and never created a ticket. The security team was not notified. The policy was not followed.

The fix: incident response must be a workflow, not a document. Your oncall tooling (PagerDuty, Incident.io) should enforce the creation of an incident record on alert fire. The security team notification should be automated. Post-incident RCAs should be templated so engineers fill them out rather than writing from scratch.

What Auditors Do Not Care About

This is as important as what they do care about.

They do not care what language your code is in. Rust vs. Python vs. Go is irrelevant to the audit. What matters is whether the code change went through an approved change process.

They do not care which specific tools you use. You can use GitHub Actions or Jenkins for CI/CD. AWS or GCP for cloud. Okta or Azure AD for identity. The specific tool is irrelevant - what matters is that the tool generates evidence of the controls operating.

They do not care about your architecture diagrams. Diagrams are supporting documentation. Auditors care about whether the controls described in the diagrams are evidenced in the logs.

They do not care about aspirational controls. “We plan to implement quarterly access reviews” is not a control. Either the review happened with evidence, or it did not.

The Evidence Checklist Engineering Must Maintain

This is what you need to produce for every month of the audit window. Build the automation to generate this before the window opens.

Access and identity

  • Monthly: full list of accounts with production access, with role and last active date
  • Per event: access provisioning tickets for new employees/contractors
  • Per event: access revocation tickets, with timestamp of ticket vs. timestamp of access removal
  • Quarterly: access review sign-off - a named reviewer certified that each account in the list is appropriate

Change management

  • Per deploy: CI/CD pipeline run log with commit SHA, deployer identity, timestamp
  • Per deploy: linked change ticket in your project tracker showing approval
  • Quarterly: change management process review - verify no deploys occurred outside the pipeline

Vulnerability management

  • Quarterly: authenticated vulnerability scan results
  • Per finding: remediation ticket with severity, discovery date, and close date
  • Monthly: summary report showing no critical/high findings open beyond SLA

Incident response

  • Per incident: incident ticket created within the SLA after alert
  • Per incident: timeline of response actions
  • Per incident: post-incident RCA document
  • Per incident: remediation ticket for identified root cause

Availability

  • Monthly: uptime report against SLO commitments
  • Per event: backup execution logs
  • Semi-annual: restore test results, with timestamp and verified data integrity

Training

  • Annual: training completion records for all personnel with production access
  • The records must include the specific training completed, the date, and a sign-off

The Timeline: Type II Takes Six Months Minimum

If you want a Type II report for a twelve-month period, you need your controls operating for twelve months before you can get the report. If you are starting from zero controls today, the earliest you can have a twelve-month Type II report is thirteen months from now (twelve months of evidence plus audit fieldwork).

The practical path for most trading platforms:

  1. Month 0-1: Implement controls, run a readiness assessment (this can be done with an internal audit or a pre-assessment engagement with your audit firm)
  2. Month 1-3: Controls run under observation, gaps identified and remediated
  3. Month 3-9: Audit window (six-month minimum for Type II)
  4. Month 9-11: Auditor fieldwork, evidence collection, draft report
  5. Month 11-12: Management responses to findings, final report issued

A six-month window for a first Type II audit is achievable if the controls are clean. Twelve months is more defensible and required if you want to cover a full calendar year.

Choosing an Audit Firm

For trading platforms, not every audit firm has the relevant experience. The questions to ask in the selection process:

Financial services experience. Has the auditor conducted SOC 2 audits for other trading firms, asset managers, or fintech companies? Auditors without financial services experience will spend the first half of the engagement learning your domain rather than understanding your controls.

Crypto-specific competency. If your platform handles digital assets, does the auditor understand the custody model, the key management architecture, and the regulatory treatment of crypto assets? An auditor who conflates smart contract security with SOC 2 controls is not the right choice.

System description quality. The SOC 2 report includes a description of your system, written with auditor input. Ask to see sample system descriptions from similar clients. A well-written system description is clear, technically accurate, and does not obscure complexity with vague language.

Readiness assessment. Most audit firms offer a readiness assessment before the formal audit window. This is a pre-audit that identifies control gaps while there is still time to remediate them. Engage the readiness assessment 3 months before the audit window opens.

For most trading startups, the three firms with the best combination of financial services experience and technical competency are Deloitte (for large regulated entities), Armanino (for crypto-native firms), and KPMG Spark (for mid-market). AICPA-accredited boutique audit firms with fintech specialization can be faster and more cost-effective for smaller engagements.

Scoping the Audit: Inclusion Decisions

The scope of your SOC 2 audit determines which systems, controls, and evidence are evaluated. Scoping too broadly increases cost and complexity. Scoping too narrowly produces a report that customers will not rely on.

The principle for trading platforms: include every system that handles customer data or customer assets in the audit scope. If a system touches customer order data, execution records, or custody balances, it belongs in scope.

Common scoping mistakes:

Excluding the trading engine. Some firms try to scope the audit to their customer portal and exclude the core trading engine. Customers are placing orders through the portal that the trading engine executes. The trading engine’s change management, access controls, and logging are directly relevant to the controls customers care about.

Excluding third-party cloud providers. Your AWS or GCP environment is in scope. The cloud provider’s infrastructure is not in your scope - they have their own compliance certifications (SOC 2 Type II, ISO 27001) which your auditor will reference as complementary controls. You are responsible for the controls you operate within the cloud environment.

Excluding the engineering team’s toolchain. Your CI/CD pipeline, your source code repository, and your deployment automation are in scope. A compromise of your build pipeline can undermine every other control in the system.

How This Breaks in Production

The failure mode I see most often is not lack of controls - it is controls that exist in policy documents but are not enforced architecturally. The change management policy exists. Engineers are just also allowed to SSH directly into production, and when they are under pressure at 2am, they do.

The way to think about this: every control in your SOC 2 scope needs a technical enforcement mechanism, not just a policy statement. If the control is “all production changes require an approved ticket,” then the technical enforcement is “it is architecturally impossible to deploy to production without going through a pipeline that requires a linked ticket.” If you rely on human compliance under pressure, you will have findings.

At Upside, the remediation after our change management finding was to disable direct SSH access to production for all engineers except a named group of three with explicit break-glass procedures. Every other change had to go through the pipeline. The audit finding went away. The operational overhead was real but manageable.

The second failure mode specific to trading platforms: neglecting the Processing Integrity and Confidentiality criteria because the scope only includes Security and Availability. A customer’s institutional compliance team will read your report and note that you have not attested to the accuracy and completeness of trade processing, or to the protection of their confidential order data. They will ask why. The answer “we did not include those criteria in our audit scope” is a harder conversation than the upfront cost of including them.

The auditors are not trying to catch you doing something wrong. They are trying to verify that your controls work when no one is watching. Design your controls for 2am on a Saturday when the on-call engineer is stressed and moving fast. If the control holds then, it will hold during the audit.

Continue Reading

Enjoyed this?

Get one deep infrastructure insight per week.

Free forever. Unsubscribe anytime.

You're in. Check your inbox.