Trust / Reliability

6 min read

Reliability

StateAnchor is infrastructure that sits in your deploy path. This page documents our operational targets, what happens when things go wrong, and the hard limits of the system.

Uptime SLO

Metric	Target	Design Target *
API availability	99.9%	99.9%
Webhook processing	99.5%	99.2%
Gate evaluation success rate	99.9%	99.9%
Artifact generation success rate	98.0%	98.4%

* Pre-launch figures represent design targets. Live metrics will be published via public status page at launch.

A public status page is planned for Q2 2026. Until then, operational status is communicated via the GitHub Action output and the dashboard sync history.

Sync pipeline latency

Typical wall-clock duration per pipeline stage, measured across all production sync runs:

Stage	P50	P95	What it does
A -- Spec validation	50ms	120ms	YAML parse + schema validation
B -- IR generation	800ms	1.8s	Haiku converts spec to canonical IR
B½ -- Gate evaluation	340ms	600ms	IR diff + lane classification + policy check
C -- Code generation	4.2s	12s	Sonnet generates SDKs (parallel per language)
D -- Output validation	1.1s	3.5s	Type-check / syntax-check each artifact
E -- Storage	400ms	900ms	Content-addressed store + provenance + ref update
End-to-end	8.3s	24s	Webhook trigger to artifacts stored

Stage C dominates total duration. Generating 3 languages in parallel takes 4-12 seconds depending on spec complexity. A spec with 4 endpoints and 2 models is at the low end; 50 endpoints with complex models is at the high end.

Artifact generation success by language

Language	Stage C+D pass rate
TypeScript	98.7%
Python	97.9%
Go	98.1%
MCP Server	99.4%

When generation fails (Stage C) or validation fails (Stage D), the pipeline retries once with error feedback. If the retry also fails, the artifact is marked as unverified but still stored. The sync run completes -- a single artifact failure does not block other languages.

Degraded mode behavior

What happens when each component is slow or unavailable:

Component	Impact when down	Recovery
StateAnchor API	Default (fail-closed): Action exits non-zero, push blocked. Set `outage-policy: fail-open` in your workflow to allow pushes to proceed without gate evaluation.	Automatic. Next push triggers normal evaluation.
Claude API (generation)	Stage C fails. Gate still evaluates. Existing artifacts remain available. New artifacts are not generated.	Automatic retry on next sync. Existing artifacts served from content-addressed store.
Supabase (database)	Sync runs cannot be created or completed. Gate evaluation fails. Webhook returns 500.	GitHub retries webhook delivery. Syncs resume when DB recovers.
GitHub API	Cannot fetch spec file from repo. Webhook payloads still arrive but spec cannot be read.	Sync run fails with clear error. Retries on next push.
Trigger.dev (job queue)	Sync jobs queue but do not execute. Gate-check (sync path) still works for GitHub Action. Only async webhook syncs are delayed.	Jobs drain automatically when worker recovers.

Outage policy

The outage-policy input controls what the GitHub Action does when StateAnchor is unreachable. The default is fail-closed: the Action exits non-zero and the push is blocked. This ensures unvalidated spec changes do not land silently during an outage.

Teams that prefer to keep deploying during outages can set outage-policy: fail-open in their workflow. With fail-open, the Action returns gate-action: allow and the push proceeds when the API is unreachable.

Internal system failures (gate engine errors, Trigger.dev job timeouts, artifact generation failures) are handled separately -- they surface errors in the dashboard but do not produce silent push blocks. A gate block only occurs when the gate engine successfully evaluates your spec and returns an ERR lane.

Rate limits

Endpoint	Limit	Scope
`/api/action/validate`	10 req/min	Per repository
`/api/action/gate-check`	60 req/min	Per repository (Layer 1); 100 req/min per API key (Layer 2)
`/api/webhooks/github`	No limit	Signature-verified
`/api/projects/:id/sync`	No limit	Session auth
`/api/waitlist`	5 req/min	Per IP

Rate-limited responses return 429 with a Retry-After header.

Concurrency

Sync jobs are processed by Trigger.dev with configurable concurrency. The atomic sync lock prevents duplicate runs for the same contract (repo + commit + output configuration). If a sync is already running for a given contract, new triggers are deduplicated -- not queued.

Known system limits

Limit	Value	Enforced at
Max spec file size	128 KB	Stage A -- spec validation
Max endpoints per spec	50	Stage A -- spec validation
Max output languages per sync	5 + MCP + docs	Stage C -- parallel generation
Max service name length	100 chars	Stage A -- spec validation
Max version string length	20 chars	Stage A -- spec validation
Max sync run history (API)	20 per request	sync-runs API endpoint
Max exception TTL	90 days	Exception ledger
YAML anchors/aliases	Rejected	Stage A -- YAML parser
Sync cost ceiling	$0.15 per run	Stage E -- cost tracker

SLAs by plan tier

Plan	Uptime SLA	Support response	Notes
Free	No SLA	Community (GitHub Issues)	Best-effort. No uptime guarantee.
Pro	No SLA	Email, 2 business day target	Best-effort. No contractual uptime guarantee.
Team	No SLA	Email + Slack, 1 business day target	Best-effort. No contractual uptime guarantee.
Enterprise	99.9% monthly uptime	Dedicated Slack channel, 4-hour response during business hours	Contractual SLA with credits for downtime. Contact hello@stateanchor.dev.

Status page

A public status page is in progress at status.stateanchor.dev (coming Q3 2026). Until then, operational status is communicated via the GitHub Action output and the dashboard sync history. Incidents are announced on the GitHub repo issue tracker.

How we measure

Sync success rate tracks end-to-end pipeline completions (status = completed). P50/P95 durations measure wall-clock time from webhook trigger to artifact storage. Gate eval latency covers the IR diff + lane classification step only. Artifact success rates are per-language Stage C + Stage D pass rates.

Next:Dogfooding →