Lux Docs

Overview

LuxCPP is a multi-repo C++ monorepo providing GPU-accelerated compute for blockchain, cryptography, FHE, ML, and DEX workloads. The workspace at ~/work/luxcpp/ contains 15+ sub-repos under the github.com/luxcpp/* GitHub organization, with public-facing packages referenced as github.com/luxfi/* where applicable. The architecture is plugin-based: a core GPU library provides a stable C ABI, and backend plugins (Metal, CUDA, WebGPU) are built and loaded separately at runtime.

Tech Stack

Layer	Technology
Language	C++17 (core), C++23 (accel SDK), Objective-C++ (.mm for Metal)
GPU Backends	Metal (Apple Silicon), CUDA 12.0+ (NVIDIA), WebGPU/Dawn (cross-platform), CPU/SIMD (fallback)
Shader Languages	Metal Shading Language (.metal), CUDA (.cu), WGSL (.wgsl)
Build System	CMake 3.20+, Conan 2.x (package manager)
FHE Base	OpenFHE fork (TFHE/CGGI, CKKS, BGV/BFV)
HTTP Framework	Drogon (fork as `luxcpp/http`)
RPC	gRPC 1.62 (fork as `luxcpp/grpc`)
Crypto	BLS12-381, BN254, secp256k1, ML-DSA, ML-KEM, FROST, Poseidon2, BLAKE3, KZG
Testing	GoogleTest, doctest, CTest
License	BSD-3-Clause-Eco (open), Proprietary (CUDA kernels)

Key Dependencies

System: OpenSSL 3.2, zlib, brotli, OpenMP (optional)
Conan packages: nlohmann_json 3.11, fmt 10.2, spdlog 1.12, cereal 1.3, Boost 1.84 (DEX), GTest 1.14
GPU: Metal.framework (macOS), CUDA Toolkit 12.0+ (NVIDIA), Dawn/wgpu-native (WebGPU)
FHE: OpenMP (optional parallelization)

When to use

Building or modifying C++ GPU kernels for blockchain/crypto operations
Adding new GPU-accelerated operations (tensor, crypto, FHE, ZK)
Working on the plugin-based backend architecture
Building Go/Rust/Python FFI bindings against libluxaccel or libluxgpu
Optimizing NTT, MSM, TFHE bootstrap, or other crypto primitives on GPU
Adding a new backend plugin (e.g., ROCm, Vulkan)
Working on the DEX matching engine
Modifying the session/storage server for the Session network

Hard requirements

CMake 3.20+ for all sub-repos
C++17 minimum (C++23 for lux-accel SDK)
macOS 12+ for Metal backend (Apple Silicon recommended)
CUDA Toolkit 12.0+ and Compute Capability 7.0+ (Volta+) for CUDA backend
Dawn or wgpu-native for WebGPU backend
All repos use github.com/luxcpp/* remotes (NOT github.com/luxfi/* for the C++ repos)
Public C API header is <lux/gpu.h> -- stable ABI, never break it
Backend plugins export exactly one symbol: lux_gpu_backend_init
Plugin naming: libluxgpu_backend_<name>.\{so,dylib,dll\}
Backend ABI version must match: LUX_GPU_BACKEND_ABI_VERSION (currently 2)

Quick reference

Repository Map

Sub-repo	Remote	Purpose
`gpu/`	`luxcpp/gpu`	Core library: plugin loader, CPU backend, C API (`libluxgpu`)
`metal/`	`luxcpp/metal`	Metal backend plugin (Apple Silicon, MLX integration)
`cuda/`	`luxcpp/cuda`	CUDA backend plugin (NVIDIA, proprietary license)
`webgpu/`	`luxcpp/webgpu`	WebGPU backend plugin (Dawn/wgpu, cross-platform)
`crypto/`	`luxcpp/crypto`	BLS12-381, ML-DSA, ML-KEM, secp256k1
`lattice/`	`luxcpp/lattice`	NTT, polynomial rings, Gaussian sampling, Go bindings
`fhe/`	`luxcpp/fhe`	OpenFHE fork: TFHE, CKKS, BGV, threshold FHE, Go bindings
`lux-accel/`	`luxcpp/accel`	Unified SDK: session-based API, C ABI for Go/FFI (`libluxaccel`)
`lux-gpu/`	`luxcpp/lux-gpu`	C++ GPU SDK (device, buffer, kernel, registry abstractions)
`lux-metal/`	(subdir)	Metal plugin wrapper (Conan package)
`lux-cuda/`	(subdir)	CUDA plugin wrapper (Conan package)
`lux-webgpu/`	(subdir)	WebGPU plugin wrapper (Conan package)
`dex/`	`luxcpp/dex`	Order book matching engine (lock-free, sub-microsecond)
`consensus/`	(subdir)	Consensus acceleration kernels
`session/`	`luxcpp/session`	Session network storage server (PQ crypto, HTTPS/QUIC)
`http/`	`luxcpp/http`	Drogon HTTP framework fork
`grpc/`	`luxcpp/grpc`	gRPC fork
`mlx-c-api/`	(subdir)	MLX C API bridge (`liblux_gpu_api.dylib`)

Dependency Hierarchy

gpu/              Foundation (plugin loader, CPU backend, C ABI)
  |
  +-- metal/      Metal backend plugin (macOS)
  +-- cuda/       CUDA backend plugin (NVIDIA)
  +-- webgpu/     WebGPU backend plugin (cross-platform)
  |
crypto/           BLS pairings, post-quantum (depends on gpu optionally)
lattice/          NTT acceleration (depends on gpu)
  |
fhe/              TFHE/CKKS/BGV (depends on crypto + lattice)
  |
lux-accel/        Unified SDK: session + tensor + ml + crypto + zk + lattice + fhe + dex

Build (any sub-repo)

cd ~/work/luxcpp/<repo>
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
ctest --test-dir build --output-on-failure
cmake --install build --prefix /usr/local

Build with Conan (full workspace)

cd ~/work/luxcpp
conan install . --output-folder=build --build=missing
cd build && cmake .. -DCMAKE_TOOLCHAIN_FILE=conan_toolchain.cmake
cmake --build . -j$(nproc)

Consume in CMake

find_package(lux-gpu REQUIRED)
target_link_libraries(myapp PRIVATE lux::gpu)

# Or with the unified SDK
find_package(lux-accel REQUIRED)
target_link_libraries(myapp PRIVATE lux::accel)

Consume via pkg-config (for CGO)

export CGO_CFLAGS=$(pkg-config --cflags lux-gpu)
export CGO_LDFLAGS=$(pkg-config --libs lux-gpu)
CGO_ENABLED=1 go build ./...

One-file quickstart

// quickstart.c - GPU-accelerated tensor operations
// Build: cc -o quickstart quickstart.c -lluxgpu -ldl
#include <lux/gpu.h>
#include <stdio.h>

int main(void) {
    LuxGPU* gpu = lux_gpu_create();  // auto-detect best backend
    if (!gpu) { fprintf(stderr, "No GPU backend\n"); return 1; }

    printf("Backend: %s\n", lux_gpu_backend_name(gpu));

    // Tensor operations
    int64_t shape[] = {4, 4};
    LuxTensor* a = lux_tensor_ones(gpu, shape, 2, LUX_FLOAT32);
    LuxTensor* b = lux_tensor_full(gpu, shape, 2, LUX_FLOAT32, 2.0);
    LuxTensor* c = lux_tensor_matmul(gpu, a, b);

    lux_gpu_sync(gpu);

    float result[16];
    lux_tensor_to_host(c, result, sizeof(result));
    printf("c[0] = %.1f (expect 8.0)\n", result[0]);

    // Crypto: Poseidon2 hash
    uint64_t inputs[4] = {1, 2, 3, 4};
    uint64_t outputs[2];
    lux_poseidon2_hash(gpu, inputs, outputs, 2, 2);

    // NTT
    uint64_t poly[4] = {1, 2, 3, 4};
    lux_ntt_forward(gpu, poly, 4, 0xFFFFFFFF00000001ULL);

    lux_tensor_destroy(c);
    lux_tensor_destroy(b);
    lux_tensor_destroy(a);
    lux_gpu_destroy(gpu);
    return 0;
}

Core Concepts

Plugin Architecture

The core library (gpu/) defines a stable C ABI via backend_plugin.h. Each backend is a shared library exporting lux_gpu_backend_init, which returns a vtable of function pointers. The core dispatches all operations through this vtable.

Runtime backend selection (priority order):

LUX_BACKEND env var: metal, cuda, webgpu, cpu
Explicit API: lux_gpu_create_with_backend(LUX_BACKEND_METAL)
Auto-detect: CUDA > Metal > WebGPU > CPU

Plugin search paths:

LUX_GPU_BACKEND_PATH environment variable
System library paths (/usr/lib/lux-gpu/, etc.)
Relative to executable

Capability flags on the vtable descriptor indicate what a backend supports:

LUX_CAP_TENSOR_OPS | LUX_CAP_MATMUL | LUX_CAP_NTT | LUX_CAP_MSM |
LUX_CAP_FHE | LUX_CAP_TFHE | LUX_CAP_BLS12_381 | LUX_CAP_BN254 |
LUX_CAP_KZG | LUX_CAP_POSEIDON2 | LUX_CAP_BLAKE3 | ...

GPU Kernel Categories

All three GPU backends (Metal, CUDA, WebGPU) implement matching kernel sets:

Category	Kernels	Notes
gpu/	binary, unary, reduce, softmax, layer_norm, rms_norm, gemv, conv, fft, rope, attention, scan, sort	ML tensor operations
crypto/	bls12_381, bn254, secp256k1, msm, kzg, poseidon, blake3, goldilocks, frost_, mldsa_verify, corona_, shamir	Cryptographic primitives
fhe/	ntt_*, tfhe_bootstrap, tfhe_keyswitch, blind_rotate, external_product, scheme_switch	FHE operations
lattice/	modular, ntt_negacyclic, poly_arithmetic, twiddle	Lattice crypto acceleration
steel/	steel_gemm_, steel_attention_, steel_conv_*	High-perf GEMM/attention/conv
zk/	poseidon2, merkle	ZK-specific hash and tree

lux-accel SDK (Unified FFI Layer)

lux-accel/ is the single shared library for Go/FFI consumers. It statically embeds libluxgpu and exports only C ABI functions via <lux/accel/c_api.h>. The C++ SDK (accel.hpp) provides a session-based API with typed accessors:

auto session = lux::accel::Session::create();
session->ml().matmul(a, b, c);
session->crypto().blsVerify(sig, msg, pubkey);
session->fhe().nttForward(poly, modulus);
session->zk().merkleRoot(leaves);
session->dex().matchOrders(bids, asks);

FHE (OpenFHE Fork)

The fhe/ sub-repo is a fork of OpenFHE with Lux GPU extensions. Supports:

TFHE/CGGI: Boolean circuits, ~10ms bootstrap per gate
CKKS: Approximate arithmetic on reals
BGV/BFV: Exact integer arithmetic
Threshold FHE: Distributed decryption (Shamir-based)
Go bindings: github.com/luxfi/fhe/go (tfhe, ckks, threshold packages)

CUDA Licensing

CUDA kernels (cuda/) are proprietary and require commercial licensing:

Ecosystem tier: included with Lux Network staking
Developer tier: $999/month
Enterprise tier: custom pricing

Performance claims on A100: NTT 2^24 in 4.2ms, MSM 2^24 in 1.8s.

Session Storage Server

session/ is a C++ storage server for the Lux Session network (encrypted messaging). Uses:

Post-quantum crypto (ML-KEM-768, ML-DSA-65) via luxcpp/crypto
GPU acceleration for batch crypto
HTTPS/QUIC serving
CGO bindings for Go integration (libsession_cgo.a)

DEX Matching Engine

dex/ is a sub-microsecond order book engine with:

Price-time priority (FIFO)
Lock-free data structures (Boost)
C/C++/Go bindings
GPU-accelerated AMM swap computation

Naming Conventions

Item	Convention	Example
CMake package	`lux-<pkg>`	`lux-gpu`, `lux-accel`
CMake target	`lux::<pkg>`	`lux::gpu`, `lux::accel`
Library file	`liblux<pkg>`	`libluxgpu.dylib`, `libluxaccel.so`
Header path	`lux/<pkg>/`	`#include <lux/gpu.h>`, `#include <lux/accel/c_api.h>`
Plugin file	`libluxgpu_backend_<name>`	`libluxgpu_backend_metal.dylib`
pkg-config	`lux-<pkg>`	`pkg-config --libs lux-gpu`

Conan Build Options

Key options in the top-level conanfile.py:

Option	Default	Description
`with_gpu`	ON	GPU compute library
`with_metal`	ON (macOS)	Metal backend
`with_cuda`	OFF	CUDA backend (requires license)
`with_webgpu`	ON	WebGPU/Dawn backend
`with_fhe`	ON	Fully homomorphic encryption
`with_crypto`	ON	Cryptographic primitives
`with_lattice`	ON	Lattice-based crypto
`with_http`	ON	HTTP/REST framework
`with_grpc`	OFF	gRPC (large dependency)
`with_dex`	ON	DEX matching engine
`embed_kernels`	ON	Embed shader sources in binary

Troubleshooting

No GPU backend detected

# Check backend availability
export LUX_GPU_BACKEND_PATH=/usr/local/lib/lux-gpu
# Or force CPU fallback
export LUX_BACKEND=cpu

Metal backend not loading

Ensure macOS 12+ and Xcode Command Line Tools installed
Build the metal plugin separately: cd ~/work/luxcpp/metal && cmake -B build && cmake --build build
Install to plugin path: cmake --install build --prefix /usr/local

CUDA build fails

Requires CUDA Toolkit 12.0+ and nvcc in PATH
Compute Capability 7.0+ (Volta/Turing/Ampere/Hopper)
Private repo requires license and NDA

CMake can't find lux-gpu

# Set CMAKE_PREFIX_PATH to where lux-gpu is installed
cmake -B build -DCMAKE_PREFIX_PATH=/usr/local
# Or use pkg-config
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

FHE build slow

FHE (OpenFHE) is a large codebase; use -j$(nproc) for parallel builds
Disable tests with -DBUILD_UNITTESTS=OFF for faster iteration

CGO linking errors

# Ensure libluxaccel is installed and discoverable
export CGO_CFLAGS="-I/usr/local/include"
export CGO_LDFLAGS="-L/usr/local/lib -lluxaccel"
CGO_ENABLED=1 go build ./...

ABI version mismatch

Backend plugins must match LUX_GPU_BACKEND_ABI_VERSION (currently 2)
Rebuild all plugins when core ABI changes
Check with: the core logs "ABI version mismatch" on plugin load failure

lux/lux-gpu.md -- Go bindings for GPU acceleration (consumes libluxaccel)
lux/lux-fhe.md -- FHE Go package (consumes luxcpp/fhe C++ library)
lux/lux-crypto.md -- Go crypto package (BLS, ML-DSA, ML-KEM)
lux/lux-lattice.md -- Go lattice crypto (NTT, polynomial rings)
lux/lux-dex.md -- DEX matching engine details
lux/lux-session.md -- Session network and storage server

Lux Accel - C++ GPU/Crypto/FHE Acceleration Libraries