6 min read
Reliability
StateAnchor is infrastructure that sits in your deploy path. This page documents our operational targets, what happens when things go wrong, and the hard limits of the system.
Uptime SLO
| Metric | Target | Design Target * |
|---|---|---|
| API availability | 99.9% | 99.9% |
| Webhook processing | 99.5% | 99.2% |
| Gate evaluation success rate | 99.9% | 99.9% |
| Artifact generation success rate | 98.0% | 98.4% |
* Pre-launch figures represent design targets. Live metrics will be published via public status page at launch.
A public status page is planned for Q2 2026. Until then, operational status is communicated via the GitHub Action output and the dashboard sync history.
Sync pipeline latency
Typical wall-clock duration per pipeline stage, measured across all production sync runs:
| Stage | P50 | P95 | What it does |
|---|---|---|---|
| A -- Spec validation | 50ms | 120ms | YAML parse + schema validation |
| B -- IR generation | 800ms | 1.8s | Haiku converts spec to canonical IR |
| B½ -- Gate evaluation | 340ms | 600ms | IR diff + lane classification + policy check |
| C -- Code generation | 4.2s | 12s | Sonnet generates SDKs (parallel per language) |
| D -- Output validation | 1.1s | 3.5s | Type-check / syntax-check each artifact |
| E -- Storage | 400ms | 900ms | Content-addressed store + provenance + ref update |
| End-to-end | 8.3s | 24s | Webhook trigger to artifacts stored |
Stage C dominates total duration. Generating 3 languages in parallel takes 4-12 seconds depending on spec complexity. A spec with 4 endpoints and 2 models is at the low end; 50 endpoints with complex models is at the high end.
Artifact generation success by language
| Language | Stage C+D pass rate |
|---|---|
| TypeScript | 98.7% |
| Python | 97.9% |
| Go | 98.1% |
| MCP Server | 99.4% |
When generation fails (Stage C) or validation fails (Stage D), the pipeline retries once with error feedback. If the retry also fails, the artifact is marked as unverified but still stored. The sync run completes -- a single artifact failure does not block other languages.
Degraded mode behavior
What happens when each component is slow or unavailable:
| Component | Impact when down | Recovery |
|---|---|---|
| StateAnchor API | Default (fail-closed): Action exits non-zero, push blocked. Set outage-policy: fail-open in your workflow to allow pushes to proceed without gate evaluation. | Automatic. Next push triggers normal evaluation. |
| Claude API (generation) | Stage C fails. Gate still evaluates. Existing artifacts remain available. New artifacts are not generated. | Automatic retry on next sync. Existing artifacts served from content-addressed store. |
| Supabase (database) | Sync runs cannot be created or completed. Gate evaluation fails. Webhook returns 500. | GitHub retries webhook delivery. Syncs resume when DB recovers. |
| GitHub API | Cannot fetch spec file from repo. Webhook payloads still arrive but spec cannot be read. | Sync run fails with clear error. Retries on next push. |
| Trigger.dev (job queue) | Sync jobs queue but do not execute. Gate-check (sync path) still works for GitHub Action. Only async webhook syncs are delayed. | Jobs drain automatically when worker recovers. |
Outage policy
The outage-policy input controls what the GitHub Action does when StateAnchor is unreachable. The default is fail-closed: the Action exits non-zero and the push is blocked. This ensures unvalidated spec changes do not land silently during an outage.
Teams that prefer to keep deploying during outages can set outage-policy: fail-open in their workflow. With fail-open, the Action returns gate-action: allow and the push proceeds when the API is unreachable.
Internal system failures (gate engine errors, Trigger.dev job timeouts, artifact generation failures) are handled separately -- they surface errors in the dashboard but do not produce silent push blocks. A gate block only occurs when the gate engine successfully evaluates your spec and returns an ERR lane.
Rate limits
| Endpoint | Limit | Scope |
|---|---|---|
/api/action/validate | 10 req/min | Per repository |
/api/action/gate-check | 60 req/min | Per repository (Layer 1); 100 req/min per API key (Layer 2) |
/api/webhooks/github | No limit | Signature-verified |
/api/projects/:id/sync | No limit | Session auth |
/api/waitlist | 5 req/min | Per IP |
Rate-limited responses return 429 with a Retry-After header.
Concurrency
Sync jobs are processed by Trigger.dev with configurable concurrency. The atomic sync lock prevents duplicate runs for the same contract (repo + commit + output configuration). If a sync is already running for a given contract, new triggers are deduplicated -- not queued.
Known system limits
| Limit | Value | Enforced at |
|---|---|---|
| Max spec file size | 128 KB | Stage A -- spec validation |
| Max endpoints per spec | 50 | Stage A -- spec validation |
| Max output languages per sync | 5 + MCP + docs | Stage C -- parallel generation |
| Max service name length | 100 chars | Stage A -- spec validation |
| Max version string length | 20 chars | Stage A -- spec validation |
| Max sync run history (API) | 20 per request | sync-runs API endpoint |
| Max exception TTL | 90 days | Exception ledger |
| YAML anchors/aliases | Rejected | Stage A -- YAML parser |
| Sync cost ceiling | $0.15 per run | Stage E -- cost tracker |
SLAs by plan tier
| Plan | Uptime SLA | Support response | Notes |
|---|---|---|---|
| Free | No SLA | Community (GitHub Issues) | Best-effort. No uptime guarantee. |
| Pro | No SLA | Email, 2 business day target | Best-effort. No contractual uptime guarantee. |
| Team | No SLA | Email + Slack, 1 business day target | Best-effort. No contractual uptime guarantee. |
| Enterprise | 99.9% monthly uptime | Dedicated Slack channel, 4-hour response during business hours | Contractual SLA with credits for downtime. Contact hello@stateanchor.dev. |
Status page
A public status page is in progress at status.stateanchor.dev (coming Q3 2026). Until then, operational status is communicated via the GitHub Action output and the dashboard sync history. Incidents are announced on the GitHub repo issue tracker.
How we measure
Sync success rate tracks end-to-end pipeline completions (status = completed). P50/P95 durations measure wall-clock time from webhook trigger to artifact storage. Gate eval latency covers the IR diff + lane classification step only. Artifact success rates are per-language Stage C + Stage D pass rates.