4 min read
Emergency bypass
StateAnchor is designed to block breaking changes. Occasionally, you need to ship a breaking change immediately -- a production incident, a time-critical security fix, or a case where the gate is catching a false positive. This page explains how to bypass the gate safely, what gets recorded, and how to re-enable enforcement afterward.
Bypass is a tool, not an escape hatch. Every bypass is logged with actor, reason, timestamp, and expiration. There is no way to bypass silently.
When to use emergency bypass
- Production incident: A deployed API is broken and the fix requires a breaking spec change. You need the gate to let this through while you stabilize.
- Time-critical security fix: A vulnerability requires removing or modifying an endpoint immediately. The gate would normally block an endpoint removal (ERR lane).
- False positive: The gate is blocking a change that is not actually breaking. The diff engine classified it incorrectly. You need to ship now and fix the classification later.
- Planned deprecation: You are intentionally removing an endpoint as part of a migration. The gate is doing its job -- but you have already communicated the deprecation to consumers.
How to bypass: drift exception
The primary bypass mechanism is the drift exception. This is the recommended approach for all cases.
- Go to the project detail page and find the blocked sync run in the history section.
- Note the specific finding that caused the block -- endpoint, change kind, and gate reason.
- In project settings, create a drift exception:
- Endpoint: the specific endpoint (e.g.,
DELETE /users/:id) - Kind: the change kind (e.g.,
endpoint_removed) - Approver: your name (required -- cannot be blank)
- TTL: set the shortest duration that covers the incident (hours, not months)
- Endpoint: the specific endpoint (e.g.,
- Push the change again or trigger a manual sync. The excepted finding is suppressed. If no other ERR items exist, the gate proceeds.
Exception TTL guidance
| Scenario | Recommended TTL |
|---|---|
| Production incident (hotfix) | 24 hours |
| Security fix | 48 hours |
| Planned deprecation | 30 days |
| False positive (pending classification fix) | 7 days |
Maximum TTL is 90 days. When the exception expires, the finding reactivates. If the underlying change is still present, the gate will block again on the next push.
What bypass records in the audit trail
Every exception is stored in the exception ledger with full provenance:
- Actor: Who created the exception (name, not just a user ID)
- Endpoint + kind: Exactly what is being suppressed
- Created at: When the exception was created
- Expires at: When it automatically deactivates
- Sync runs affected: Which sync runs proceeded because of this exception
The sync run record also logs that an exception was applied. The spec_diff_json field includes both active_items (findings that affected the gate) and suppressed_items (findings that were suppressed by exceptions).
This means you can always answer: "Who bypassed the gate for this deploy, why, and when does enforcement resume?"
How to re-enable enforcement
After the incident is resolved:
- Option A: Delete the exception in project settings. The finding reactivates immediately. Next push is evaluated normally.
- Option B: Let the exception expire. If you set a short TTL (24-48 hours), enforcement resumes automatically.
- Option C: Update the spec to reflect the new state. If the breaking change was intentional (e.g., endpoint deprecated), update
stateanchor.yamlto match. The finding disappears because the spec and reality agree.
What NOT to do
| Anti-pattern | Why it is wrong | What to do instead |
|---|---|---|
| Create a permanent exception (90-day TTL for a hotfix) | Suppresses the finding long after the incident is over. Masks real drift. | Use the shortest TTL that covers the incident. |
| Remove the GitHub Action from your workflow | Disables all gate enforcement for all changes, not just the incident. | Use a scoped exception for the specific finding. |
| Push directly to main without the action running | No gate evaluation, no audit trail, no artifact regeneration. You are flying blind. | Always push through the normal flow, even with an exception. |
Set warn_count_threshold: 999 to silence WARN items | Effectively disables WARN enforcement. WARN items accumulate silently. | Use a targeted exception for the specific WARN finding. |
StateAnchor outage behavior
If StateAnchor itself is unavailable (network error, service outage, timeout), the Action behavior is controlled by the outage-policy input. The default is fail-closed: the push is blocked. Set outage-policy: fail-open in your workflow to allow pushes to proceed during outages -- the Action returns gate-action: allow with a soft-pass result. You do not need to manually bypass StateAnchor if StateAnchor is down and you have configured fail-open.
This is a deliberate design decision. StateAnchor is a safety layer, not a hard dependency. It should never be the reason you cannot deploy during an incident.