Lux Monitoring
Full Observability Stack
Overview
Lux Monitoring provides a complete observability stack for Lux validators and network services. It ships two Docker Compose configurations -- one for Lux mainnet node monitoring and one for DEX-specific monitoring -- with pre-built Grafana dashboards, Prometheus scrape targets, Loki log aggregation, and Alertmanager routing. There is also a standalone monitoring-installer.sh shell script for bare-metal Ubuntu validators that installs Prometheus, Grafana, and node_exporter via systemd.
Quick Reference
| Item | Value |
|---|---|
| Repo | github.com/luxfi/monitoring |
| Stack | Docker Compose |
| Public URL | monitor.lux.network |
| Grafana default creds | admin / luxnetwork |
| Grafana port | 3100 (mapped from container 3000) |
| Prometheus port | 9090 |
| Loki port | 3101 (mapped from container 3100) |
| Alertmanager port | 9093 |
| Node Exporter port | 9100 |
| Postgres Exporter port | 9187 |
| TSDB retention | 30 days |
| Loki retention | 30 days (720h) |
Compose Files
There is no root compose.yml. Two compose files exist:
compose.luxnet.yml -- Lux Network Monitoring
The primary stack. Starts 8 services:
| Service | Image | Container Name | Purpose |
|---|---|---|---|
luxd | ubuntu:22.04 | lux-node | Lux node (network-id 96369, ports 9630/9631/8546) |
prometheus | prom/prometheus:latest | lux-prometheus | Metrics TSDB, 30d retention |
grafana | grafana/grafana:latest | lux-grafana | Dashboard UI |
loki | grafana/loki:latest | lux-loki | Log aggregation (TSDB schema v13) |
promtail | grafana/promtail:latest | lux-promtail | Log shipping (Docker, luxd, syslog, Blockscout) |
node-exporter | prom/node-exporter:latest | lux-node-exporter | System metrics |
postgres-exporter | prometheuscommunity/postgres-exporter:latest | lux-postgres-exporter | PostgreSQL metrics |
alertmanager | prom/alertmanager:latest | lux-alertmanager | Alert routing |
Networks: lux-monitoring (internal bridge), hanzo-network (external, shared with other Hanzo services).
Volumes: prometheus_data, grafana_data, loki_data, alertmanager_data, luxd_data.
compose.dex.yml -- DEX Extension
Extends the main stack with DEX-specific services:
| Service | Container Name | Purpose |
|---|---|---|
dex-exporter | lux-dex-exporter | Node exporter for DEX host (port 9101) |
dex-metrics-aggregator | lux-dex-metrics | Collects DEX engine/consensus/HFT metrics |
vector | lux-dex-vector | High-perf log aggregation (Vector, port 8686) |
jaeger | lux-dex-jaeger | Distributed tracing (Jaeger UI port 16686) |
cadvisor | lux-dex-cadvisor | Container metrics (port 8088) |
Additional Grafana plugins installed: grafana-piechart-panel, grafana-worldmap-panel.
Pre-configured Dashboards
13 dashboard JSON files across three directories:
Core Dashboards (grafana/dashboards/)
| File | Dashboard | Key Panels |
|---|---|---|
c_chain.json | C-Chain | EVM block production, gas usage, tx throughput |
c_chain_load.json | C-Chain Load | Block gas utilization, pending tx pool depth |
p_chain.json | P-Chain | Validator set size, staking metrics, subnet creation |
x_chain.json | X-Chain | UTXO transactions, asset creation rates |
machine.json | Machine Metrics | CPU, memory, disk I/O, network bandwidth |
database.json | Database | PostgreSQL connections, query perf, replication |
logs.json | Logs | Structured log search via Loki datasource |
main.json | Main Overview | Network-wide health summary, peer count |
network.json | Network | P2P message latency, bandwidth in/out, warp delivery |
subnets.json | Subnets | Subnet health, validator participation |
lux-comprehensive.json | Comprehensive | All-in-one overview dashboard |
DEX Dashboard (grafana/dashboards/dex/)
| File | Dashboard |
|---|---|
dex-performance.json | Order matching latency (p50/p95/p99/p999), throughput, DPDK/RDMA metrics |
MorpheusVM Dashboard (grafana/dashboards/morpheusvm/)
| File | Dashboard |
|---|---|
performance.json | MorpheusVM execution performance |
Grafana Provisioning
Datasources (grafana/provisioning/datasources/datasources.yml)
| Name | Type | URL |
|---|---|---|
| Prometheus | prometheus | http://prometheus:9090 (default, POST method) |
| Loki | loki | http://loki:3100 (max 5000 lines) |
| Lux Node | prometheus | http://host.docker.internal:9650/ext/metrics |
| PostgreSQL | postgres | lux-postgres:5432, database explorer_luxnet |
Dashboard Provider (grafana/provisioning/dashboards/dashboards.yml)
Provider name: Lux Network Dashboards, folder: Lux Network, uid: lux-network. Auto-updates every 10s and allows UI edits.
Prometheus Scrape Configuration
Main Config (prometheus/prometheus.yml)
Global scrape interval: 15s. Evaluation interval: 15s.
| Job Name | Metrics Path | Target | Instance Label |
|---|---|---|---|
prometheus | /metrics | localhost:9090 | -- |
node | /metrics | node-exporter:9100 | -- |
luxd | /ext/metrics | luxd:9630, host.docker.internal:9630 | lux-mainnet |
c-chain | /ext/bc/C/metrics | luxd:9630 | c-chain-mainnet |
c-chain-rpc | /ext/bc/C/rpc | luxd:9630 | c-chain-rpc |
x-chain | /ext/bc/X/metrics | luxd:9630 | x-chain-mainnet |
p-chain | /ext/bc/P/metrics | luxd:9630 | p-chain-mainnet |
lux-health | /ext/health | luxd:9630 | lux-health |
blockscout-lux | /metrics | host.docker.internal:4000 | blockscout-lux-mainnet |
postgres | /metrics | postgres-exporter:9187 | -- |
grafana | /metrics | grafana:3000 | -- |
loki | /metrics | loki:3100 | -- |
DEX Config (prometheus/dex-prometheus.yml)
16 additional scrape jobs for DEX monitoring with intervals down to 100ms for HFT metrics:
| Job | Scrape Interval | Target | Purpose |
|---|---|---|---|
dex-engine | 15s | lux-dex-node:9090 | Order book engine |
dex-matching | 1s | lux-dex-node:8080 | Match latency (HFT) |
dex-websocket | 15s | lux-dex-node:8081 | WebSocket feed |
dex-consensus | 15s | lux-dex-node:5000 | FPC consensus (K=1) |
dex-quantum | 15s | lux-dex-node:8080 | Ringtail-BLS signatures |
dex-hft | 100ms | lux-dex-node:8080 | Ultra-HF trading |
dex-dpdk | 15s | lux-dex-node:8080 | Kernel bypass (DPDK) |
dex-rdma | 15s | lux-dex-node:8080 | Zero-copy replication |
dex-gpu | 15s | lux-dex-node:8080 | MLX/CUDA acceleration |
DEX Recording Rules (prometheus/dex-rules.yml)
Evaluated every 5s:
| Rule | Expression |
|---|---|
dex:matching_latency:p50 | histogram_quantile(0.50, rate(dex_matching_latency_bucket[1m])) |
dex:matching_latency:p95 | histogram_quantile(0.95, ...) |
dex:matching_latency:p99 | histogram_quantile(0.99, ...) |
dex:matching_latency:p999 | histogram_quantile(0.999, ...) |
dex:orders_per_second | rate(dex_orders_processed_total[1m]) |
dex:trades_per_second | rate(dex_trades_executed_total[1m]) |
dex:consensus_round_time | avg rate of dex_consensus_round_duration_seconds |
dex:consensus_finality_time | histogram_quantile(0.95, rate(dex_consensus_finality_bucket[1m])) |
Alert Rules
Chain Alerts (prometheus/alerts/chain-alerts.yml)
Evaluated every 30s:
| Alert | Expression | For | Severity |
|---|---|---|---|
LuxNodeDown | up\{job="luxd"\} == 0 | 5m | critical |
CChainNotSyncing | rate(lux_C_last_accepted_height[5m]) == 0 | 10m | warning |
CChainHighRejectionRate | rejection > 10% of accepted | 5m | warning |
PChainNotSyncing | rate(lux_P_last_accepted_height[5m]) == 0 | 10m | warning |
XChainNotSyncing | rate(lux_X_last_accepted_height[5m]) == 0 | 10m | warning |
LowPeerCount | lux_network_peers < 5 | 10m | warning |
HighCPUUsage | process_cpu_seconds_total > 80% | 10m | warning |
HighMemoryUsage | MemAvailable < 10% | 10m | critical |
PostgresDown | up\{job="postgres"\} == 0 | 5m | critical |
DEX Alerts (prometheus/dex-rules.yml)
| Alert | Threshold | Severity |
|---|---|---|
DEXHighLatency | p99 > 1ms | critical |
DEXElevatedLatency | p95 > 500us | warning |
DEXLowThroughput | < 1M trades/sec | warning |
DEXConsensusFailure | consensus node down | critical |
DEXHighMemoryUsage | > 90% memory | warning |
DEXDatabaseDown | PostgreSQL down | critical |
DEXWebSocketDrops | > 10 disconnects/sec | warning |
DEXOrderBookDesync | sync errors > 0 | critical |
DEXPacketDrops | > 100 DPDK drops/sec | warning |
DEXGPUFailure | GPU unavailable | warning |
Alertmanager Configuration (alertmanager/alertmanager.yml)
- Route grouping: by
alertname,cluster,service - Group wait: 10s, repeat interval: 12h
- Critical alerts route to
criticalreceiver (configure webhook/email) - Inhibition: critical suppresses warning for same
alertname/instance
Loki Configuration
- Schema: TSDB v13 with filesystem storage
- Ingestion rate: 16 MB/s (burst 32 MB/s)
- Per-stream rate limit: 2 MB/s (burst 4 MB/s)
- Max entries per query: 5000
- Reject samples older than 168h (7 days)
- Retention: 720h (30 days)
- Query cache: 100 MB embedded cache
Promtail Log Sources
| Job | Path Pattern | Pipeline |
|---|---|---|
docker | /var/lib/docker/containers/*/*log | JSON parse, container name extraction |
luxd | /lux-logs/network_current/node*/logs/*.log | Multiline, regex for timestamp/level/logger |
syslog | /var/log/syslog | Regex for hostname/program/pid |
blockscout | Docker container logs | JSON parse, stream filtering |
Bare-metal Installer
grafana/monitoring-installer.sh -- 5-step installer for Ubuntu validators:
--1Install Prometheus (systemd service)--2Install Grafana (systemd service)--3Install node_exporter (systemd service)--4Install Lux Grafana dashboards--5Install additional dashboards (optional)
Supports both amd64 and arm64 architectures. Run without arguments to download latest dashboards only.
Quickstart
git clone https://github.com/luxfi/monitoring.git
cd monitoring
# Create external network if needed
docker network create hanzo-network
# Start the full Lux monitoring stack
./start-monitoring.sh
# Or directly:
docker compose -f compose.luxnet.yml up -d
# Start with DEX monitoring extension
docker compose -f compose.luxnet.yml -f compose.dex.yml up -d
# Access points:
# Grafana: http://localhost:3100 (admin/luxnetwork)
# Prometheus: http://localhost:9090
# Loki: http://localhost:3101
# Alertmanager: http://localhost:9093
# Jaeger (DEX): http://localhost:16686Production Deployment
deploy-monitor-nginx.sh configures nginx reverse proxy for monitor.lux.network:
- Symlinks nginx config to
/etc/nginx/sites-enabled/ - Tests and reloads nginx
- Requires Cloudflare DNS A record pointing to the server
- Public URL:
https://monitor.lux.network
Related Skills
lux/lux-node.md-- Node that exports metrics at/ext/metricslux/lux-universe.md-- Production K8s infrastructure