Compare

7 min read

Why not oasdiff / oasdiff Pro / Spectral / Speakeasy contract testing?

These are excellent tools. StateAnchor uses oasdiff as an internal dependency for its diff primitives -- it is not a competitor. The question is not “does it detect changes?” — it is “what happens after detection?”

Detection accuracy: 34-scenario ground-truth corpus

spec-diff: 100% on 34 scenarios · api-smart-diff: 65% on 20-scenario comparable subset

Full corpus (34 scenarios): 28 breaking + 6 non-breaking. spec-diff: 28/28 detected, 0/6 false positives. api-smart-diff evaluated on the 20 scenarios with prev/curr OpenAPI fixture pairs: 13/20 detected (65%).

The benchmark compares StateAnchor’s spec-diff engine against api-smart-diff — the library underlying oasdiff — on a shared set of OpenAPI 3.x fixture pairs. The 20 scenarios with OpenAPI pairs cover the most common categories of breaking change: field removals, auth changes, constraint tightening, parameter changes, response shape changes, and oneOf/anyOf modifications.

ID	Scenario	spec-diff	api-smart-diff
BC-01	Required request field removed	ok	x
BC-02	Optional request field removed	ok	x
BC-03	Request field type changed string→integer	ok	ok
BC-04	Auth scheme changed null→bearer	ok	x
BC-05	Auth token scheme changed bearer→apikey	ok	ok
BC-06	Enum value removed from request field	ok	ok
BC-07	New required query param added	ok	x
BC-08	Optional query param made required	ok	ok
BC-09	Query param removed entirely	ok	ok
BC-10	minLength added to request field (was unconstrained)	ok	ok
BC-11	maximum decreased on request field (100→10)	ok	ok
BC-12	pattern added to request string field	ok	ok
BC-13	Endpoint removed entirely	ok	ok
BC-14	Required response field removed	ok	ok
BC-15	Response field type changed integer→string	ok	x
BC-16	Response schema type changed object→array	ok	ok
BC-17	200 status code removed (204 remains)	ok	ok
BC-18	Error response shape changed (property added to error object)	ok	x
BC-19	oneOf variant removed from request field	ok	ok
BC-20	Nested object field removed (request_body.address.zip)	ok	x

api-smart-diff result on BC-01-BC-20: 13/20 (65%). All 20 are breaking scenarios. spec-diff result: 20/20. Full corpus (28 breaking + 6 non-breaking): spec-diff 28/28 detected, 0 false positives.

Why the 35% gap exists

api-smart-diff asks “does the shape change?” — spec-diff asks “does this change break consumer contracts?”

api-smart-diff operates on raw OpenAPI 3.x specs using structural diff semantics. It classifies changes by action (add/remove/replace) at the JSON path level but applies conservative breaking-change rules designed for general OpenAPI tooling. The missed scenarios fall into three categories where structural rules fall short:

Auth scheme transitions (BC-04): Changing null to bearer adds a security requirement where none existed. Structurally additive; contractually breaking for every unauthenticated caller.
Response type changes (BC-15, BC-18):api-smart-diff applies conservative rules for response bodies — adding a property may not be flagged breaking. spec-diff applies response-context rules: response field type changes and error response shape changes break consumers reading that data.
Required field and param additions (BC-01, BC-02, BC-07, BC-20):api-smart-diff may classify request-body field removals as non-breaking because clients can omit optional fields. spec-diff applies StateAnchor-specific semantic rules: any required-field removal is ERR regardless of context, and required-param additions break every existing caller that doesn’t send them.

Why detection quality matters more than rule count

Rule-based tools count violations. A change either matches a rule or it doesn’t. This works for structural changes but fails for semantic ones. A field renamed from amount to amount_usdis technically additive — old field removed, new field added — but every caller using the old name breaks. A type change from string tostring-formatted-date is structurally identical but behaviorally breaking. Rules do not know which SDK versions are pinned to which behavior.

StateAnchor adds a semantic layer. After structural diffing classifies a change, an LLM ensemble evaluates whether the change is actually breaking in context — considering the field’s semantic role, existing exceptions for similar patterns, and the behavioral contract implied by the spec. The LLM layer catches what rules miss.

The exception ledger creates a feedback loop that compounds over time. Every exception a developer creates — “we know this breaks callers, we’re shipping it anyway for reason X” — is a human label. These accumulate into a corpus of real production breaking change decisions. The detection system can improve from this corpus. This is structurally impossible for any static rule-based tool.

Capability	oasdiff	oasdiff Pro	Spectral	Speakeasy CT	StateAnchor
Detects breaking changes	✓	✓	✓ (rules-based)	∼	✓
Detection approach	Structural rules (OSS CLI)	Structural rules (450+)	Linting rules only	SDK generation checks	Structural (33 kinds) + LLM semantic evaluation + production exception corpus
Improves from production experience	— static rules	— static rules	— static rules	— static rules	✓ exception corpus accumulates labeled production outcomes
Categorical gate (ERR / WARN / INFO)	—	—	—	—	✓
LLM-evaluated verdict	—	—	—	—	✓
PR gating (reports to GitHub PRs)	—	✓	—	✓	✓
Exception ledger	—	—	—	—	✓
SOC 2 compliance export	—	—	—	—	✓
Multi-syndrome baseline comparison	—	—	—	—	✓
SDK-generation-centric	—	—	—	✓	—
Spec-governance-centric	∼	∼	∼	—	✓
Drift pressure tracking	—	—	—	—	✓
Artifact generation (SDK, MCP)	—	—	—	✓	✓
Tamper-evident audit log	—	—	—	—	✓
Git-native (no hosted service required)	✓	∼	✓	∼	✓

∼ = partial / SDK-scoped. CT = Speakeasy contract testing (GitHub Actions, shipped Jan-Feb 2026). Check each tool’s site for latest capabilities.

oasdiff

Great CLI for point-in-time spec diff. You point it at two OpenAPI files and get a structured change list plus a breaking-or-not flag per rule. Runs locally, runs in CI, runs anywhere you can run Go.

What it does not do: persist anything, produce a verdict you can route on, compare against more than one baseline, or coordinate with downstream artifacts. Use it in scripts. Use StateAnchor for enforcement.

oasdiff Pro

Launched April 2026. oasdiff Pro is the hosted GitHub App layer on top of the open-source oasdiff CLI. It posts breaking change reports directly to GitHub PRs with one-click approve/reject. 450+ rule-based checks -- the most comprehensive rule library in the OSS ecosystem.

Where oasdiff Pro stops: it is rule-based. Rules match patterns; they don’t evaluate semantic intent. A field renamed from amount to amount_usdfires a WARN-lane rule, but whether it is actually safe to ship depends on what consumers assumed about the unit. StateAnchor’s LLM evaluation reads the endpoint context and assesses the real consumer impact. oasdiff Pro also offers no exception ledger with SOC 2-grade dual-control -- one-click approve is a single actor bypassing the gate unilaterally.

oasdiff tells you what changed. StateAnchor tells you whether it’s safe to ship.

Spectral

Excellent linter for OpenAPI style rules. Run it on every spec to catch things like missing descriptions, non-canonical enums, inconsistent tag casing, or any organization-specific convention you encode as a rule. It is the single most-used piece of API governance tooling in the wild.

Spectral and StateAnchor are complementary. Lint first (Spectral), gate after (StateAnchor). StateAnchor is not a linter and does not try to be — it enforces contract compatibility across baselines, not style.

Speakeasy contract testing

Speakeasy shipped SDK contract testing via GitHub Actions in January–February 2026 — their first move into enforcement territory. Contract testing verifies that your OpenAPI schema still generates a valid, compilable SDK on every PR.

Speakeasy contract testing is SDK-generation-centric: it answers “Does my schema still produce a valid SDK?” StateAnchor is spec-governance-centric: it answers “Is this change safe to ship to all consumers?” These are different questions. A schema can pass contract testing while still containing consumer-breaking changes (a field removed, a type tightened, a required parameter added to an existing endpoint). StateAnchor catches those.

The two tools are complementary, not competing. Speakeasy contract testing for SDK generation quality. StateAnchor for consumer safety governance.

openapi-diff

Similar shape to oasdiff— a Java-native diff tool with breaking-change detection. Mature, widely used in enterprise Java shops. Same structural limit: one baseline, no gate, no downstream.

The core gap: detection without enforcement

All the open-source tools above answer “what changed between these two spec files?” That is a necessary input to governance. It is not, on its own, governance.

Knowing a change is breaking does not stop it from shipping. Detecting it on your local machine does not prevent it from landing on someone else’s branch. Surfacing it in a CI log does not record an audit trail that survives the next force-push.

StateAnchor is the layer that turns detection into enforcement:

Runs four independent syndromes (parent, merge-base, LKG, deployed) on every sync — see Gate engine.
Classifies every change into ERR / WARN / INFO so the gate produces a decision, not a diff — see Gate classification.
Adds LLM evaluation on top of rules to catch semantic breaking changes that pattern matching misses.
Records a scoped exception ledger for intentional breaking changes, with expiry and approver metadata -- dual-control, not one-click.
Anchors every verdict in a tamper-evident audit log so downstream teams can verify what shipped without access to your account.
Exports SOC 2 compliance evidence per gate decision for audit programs.
Mints a share link for every verdict so the evidence trail does not live inside a CI log line.

Where to use what

If you’re already running oasdiff (or openapi-diff, Spectral, Speakeasy contract testing) in CI: keep it. StateAnchor sits upstream of your merge. It catches what your diff tool would have caught, plus the delta against your deployed version and last known-good build, plus an LLM-evaluated verdict, plusan enforceable exception ledger, plus a SOC 2-ready audit trail. Spectral for lint. oasdiff for ad-hoc diffs. StateAnchor for the gate.

← Gate engine Quick start →