Lux Docs
Operator

Reconciliation API

Operator state machine, health checks, and upgrade lifecycle

State Machine

Every Lux CRD follows the same lifecycle phases:

Pending --> Creating --> Bootstrapping --> Running <--> Degraded
PhaseDescription
PendingCR accepted, waiting for dependencies
CreatingKubernetes resources being provisioned (StatefulSet, Services, ConfigMaps)
BootstrappingPods running, waiting for chain sync and peer connectivity
RunningAll validators healthy, chains synced, APIs responding
DegradedOne or more health checks failing, operator attempting recovery

Transitions are evaluated every 60 seconds during reconciliation.

Health Checks

The operator performs two levels of health checking on each validator pod:

Liveness -- HTTP GET to /ext/health/liveness expecting status 200.

Readiness -- JSON-RPC call to health.health checking result.healthy is true.

Health policy is configurable per LuxNetwork:

FieldDefaultDescription
maxHeightSkew10Maximum allowed P-chain height difference between nodes
gracePeriodSeconds300Seconds after pod start before enforcing checks
checkIntervalSeconds60Seconds between health check cycles
requireInboundValidatorsfalseRequire inbound peer connections

Upgrade Strategy

OnDelete (default)

The StatefulSet uses OnDelete update strategy. The operator does not restart pods automatically. Delete pods manually to pick up image changes.

RollingCanary

Automated rolling upgrade with safety gates:

  1. Operator detects image tag change on the StatefulSet
  2. Deletes pods highest-index-first (e.g., luxd-4 before luxd-0)
  3. Waits for pod Ready + liveness check
  4. Waits stabilizationSeconds (default 60s)
  5. Runs readiness health check before proceeding to next pod
  6. On failure after 5 retries, aborts upgrade and sets phase to Degraded

PodDisruptionBudget maxUnavailable matches the upgrade strategy setting.

Startup Gate

Before starting luxd, an init container checks TCP reachability of peer pods:

FieldDefaultDescription
minPeers2Peers that must be TCP-reachable
timeoutSeconds300Maximum wait time
checkIntervalSeconds5Seconds between attempts
onTimeoutStartAnywayAction on timeout: Fail or StartAnyway

On this page