Production Readiness

Operational runbook for teams evaluating StateAnchor for production use.

Outage behavior

If StateAnchor is unreachable during a push, the Action behavior is controlled by the outage-policy input. Default: fail-closed (push blocked). Set outage-policy: fail-open to keep velocity during outages.

GitHub permissions required

permissions:
  id-token: write   # OIDC token exchange with StateAnchor
  contents: read    # Read stateanchor.yaml from the repo

id-token: write -- GitHub generates a short-lived OIDC token. StateAnchor exchanges it for a single-use action token. The GitHub OIDC token is never stored.
contents: read -- the Action reads stateanchor.yaml from the repo. No write access to your code.

No other permissions are required. StateAnchor never requests write access to your repository.

OIDC claims checked

The OIDC token exchange validates three claims before issuing an action token:

repository -- must match a repo linked in your StateAnchor project.
ref -- the branch or tag that triggered the workflow.
workflow -- the workflow file path. Logged for audit.

If any claim fails validation, the exchange returns 403 and the Action exits.

What happens if StateAnchor API times out

Fail closed. The Action exits with a non-zero exit code. The push is blocked.

Override with the outage-policy input:

- uses: stateanchor/sync-action@v1
  with:
    api-key: ${{ secrets.STATEANCHOR_API_KEY }}
    outage-policy: fail-open    # default: fail-closed
    local-fallback: true        # run local validation if remote unavailable

fail-open runs local schema validation only and posts a SOFT-PASS result. Remote gate checks are skipped.

SOFT-PASS is not a gate lane.It means the remote evaluation could not run — no breaking-change diff, no ERR / WARN / INFO classification, no artifact generation. The Action emits result: soft-passand exits 0 so the workflow doesn’t fail during an outage, but the sync was not validated end-to-end. Treat it as “spec parsed locally but the gate was unreachable” — review the sync manually before deploying.

What happens if artifact generation partially succeeds

The sync run is marked failed. No partial artifacts are published. Either all artifacts generate successfully, or none are committed. This is not configurable.

Branch protection: warn vs block

The gate engine assigns each change to a categorical lane (not a score threshold). The lane drives the Action exit code; the composite 0-100 score is display-only and never determines the block decision.

PASS / INFO -- additive-only change. Action exits 0. PR check passes.
WARN -- degradations below the configured warn_count_threshold (default 0, which means WARN never blocks). Action exits 0 with a warning annotation. PR check passes. A comment is posted listing the warning findings.
ERR -- any breaking-change finding (endpoint removed, required field removed, type changed, auth changed, enum value removed, oneOf/anyOf modified). Action exits non-zero. PR check fails. If branch protection requires this check, the merge is blocked.

Configure the WARN threshold in stateanchor.yaml under gate policy.

How approvals are authorized and logged

When a breaking change exceeds the warn threshold but an engineer approves it:

An exception is created in the reviewed exceptions list.
Each exception records: approver clerk_id, timestamp, expiry date, the specific operation being excepted, and a reason.
Exceptions are time-bounded. After expiry, the same change will trigger the gate again.
All exceptions are visible in the project dashboard audit trail.

How to run in check-only mode only

Check-only mode evaluates the spec and runs the gate engine but does not generate artifacts or block the push.

- uses: stateanchor/sync-action@v1
  with:
    api-key: ${{ secrets.STATEANCHOR_API_KEY }}
    mode: audit

Use this to observe gate scores on PRs before enforcing them as required checks.

Gate failure behavior

Internal system failures never produce silent blocks. A gate block only occurs when the gate engine successfully evaluates your spec and returns an ERR lane (or a WARN lane above the configured threshold). The behavior when StateAnchor is unreachable from your workflow is controlled by the outage-policy input (default: fail-closed, see above).

Failure scenarioBehavior

StateAnchor API unreachableDefault (fail-closed): Action exits non-zero, push blocked. Set outage-policy: fail-open to allow pushes to proceed.

Trigger.dev sync job timeoutSync marked failed, push proceeds, dashboard shows failed run

Gate engine error (not a gate decision)Push proceeds, error surfaced in dashboard

Artifact generation failureGate result recorded, artifacts not generated, push proceeds

Scanner timeout or errorDrift detection skipped for that run, gate proceeds from spec only

You can verify this behavior at any time by checking the GitHub Action logs -- every run produces an explicit exit code and reason.

Security →Gate engine docs →Philosophy →