Lux Accel - C++ GPU/Crypto/FHE Acceleration Libraries
Documentation for Lux Accel - C++ GPU/Crypto/FHE Acceleration Libraries
Overview
LuxCPP is a multi-repo C++ monorepo providing GPU-accelerated compute for blockchain, cryptography, FHE, ML, and DEX workloads. The workspace at ~/work/luxcpp/ contains 15+ sub-repos under the github.com/luxcpp/* GitHub organization, with public-facing packages referenced as github.com/luxfi/* where applicable. The architecture is plugin-based: a core GPU library provides a stable C ABI, and backend plugins (Metal, CUDA, WebGPU) are built and loaded separately at runtime.
Tech Stack
| Layer | Technology |
|---|---|
| Language | C++17 (core), C++23 (accel SDK), Objective-C++ (.mm for Metal) |
| GPU Backends | Metal (Apple Silicon), CUDA 12.0+ (NVIDIA), WebGPU/Dawn (cross-platform), CPU/SIMD (fallback) |
| Shader Languages | Metal Shading Language (.metal), CUDA (.cu), WGSL (.wgsl) |
| Build System | CMake 3.20+, Conan 2.x (package manager) |
| FHE Base | OpenFHE fork (TFHE/CGGI, CKKS, BGV/BFV) |
| HTTP Framework | Drogon (fork as luxcpp/http) |
| RPC | gRPC 1.62 (fork as luxcpp/grpc) |
| Crypto | BLS12-381, BN254, secp256k1, ML-DSA, ML-KEM, FROST, Poseidon2, BLAKE3, KZG |
| Testing | GoogleTest, doctest, CTest |
| License | BSD-3-Clause-Eco (open), Proprietary (CUDA kernels) |
Key Dependencies
- System: OpenSSL 3.2, zlib, brotli, OpenMP (optional)
- Conan packages: nlohmann_json 3.11, fmt 10.2, spdlog 1.12, cereal 1.3, Boost 1.84 (DEX), GTest 1.14
- GPU: Metal.framework (macOS), CUDA Toolkit 12.0+ (NVIDIA), Dawn/wgpu-native (WebGPU)
- FHE: OpenMP (optional parallelization)
When to use
- Building or modifying C++ GPU kernels for blockchain/crypto operations
- Adding new GPU-accelerated operations (tensor, crypto, FHE, ZK)
- Working on the plugin-based backend architecture
- Building Go/Rust/Python FFI bindings against
libluxaccelorlibluxgpu - Optimizing NTT, MSM, TFHE bootstrap, or other crypto primitives on GPU
- Adding a new backend plugin (e.g., ROCm, Vulkan)
- Working on the DEX matching engine
- Modifying the session/storage server for the Session network
Hard requirements
- CMake 3.20+ for all sub-repos
- C++17 minimum (C++23 for lux-accel SDK)
- macOS 12+ for Metal backend (Apple Silicon recommended)
- CUDA Toolkit 12.0+ and Compute Capability 7.0+ (Volta+) for CUDA backend
- Dawn or wgpu-native for WebGPU backend
- All repos use
github.com/luxcpp/*remotes (NOTgithub.com/luxfi/*for the C++ repos) - Public C API header is
<lux/gpu.h>-- stable ABI, never break it - Backend plugins export exactly one symbol:
lux_gpu_backend_init - Plugin naming:
libluxgpu_backend_<name>.\{so,dylib,dll\} - Backend ABI version must match:
LUX_GPU_BACKEND_ABI_VERSION(currently 2)
Quick reference
Repository Map
| Sub-repo | Remote | Purpose |
|---|---|---|
gpu/ | luxcpp/gpu | Core library: plugin loader, CPU backend, C API (libluxgpu) |
metal/ | luxcpp/metal | Metal backend plugin (Apple Silicon, MLX integration) |
cuda/ | luxcpp/cuda | CUDA backend plugin (NVIDIA, proprietary license) |
webgpu/ | luxcpp/webgpu | WebGPU backend plugin (Dawn/wgpu, cross-platform) |
crypto/ | luxcpp/crypto | BLS12-381, ML-DSA, ML-KEM, secp256k1 |
lattice/ | luxcpp/lattice | NTT, polynomial rings, Gaussian sampling, Go bindings |
fhe/ | luxcpp/fhe | OpenFHE fork: TFHE, CKKS, BGV, threshold FHE, Go bindings |
lux-accel/ | luxcpp/accel | Unified SDK: session-based API, C ABI for Go/FFI (libluxaccel) |
lux-gpu/ | luxcpp/lux-gpu | C++ GPU SDK (device, buffer, kernel, registry abstractions) |
lux-metal/ | (subdir) | Metal plugin wrapper (Conan package) |
lux-cuda/ | (subdir) | CUDA plugin wrapper (Conan package) |
lux-webgpu/ | (subdir) | WebGPU plugin wrapper (Conan package) |
dex/ | luxcpp/dex | Order book matching engine (lock-free, sub-microsecond) |
consensus/ | (subdir) | Consensus acceleration kernels |
session/ | luxcpp/session | Session network storage server (PQ crypto, HTTPS/QUIC) |
http/ | luxcpp/http | Drogon HTTP framework fork |
grpc/ | luxcpp/grpc | gRPC fork |
mlx-c-api/ | (subdir) | MLX C API bridge (liblux_gpu_api.dylib) |
Dependency Hierarchy
gpu/ Foundation (plugin loader, CPU backend, C ABI)
|
+-- metal/ Metal backend plugin (macOS)
+-- cuda/ CUDA backend plugin (NVIDIA)
+-- webgpu/ WebGPU backend plugin (cross-platform)
|
crypto/ BLS pairings, post-quantum (depends on gpu optionally)
lattice/ NTT acceleration (depends on gpu)
|
fhe/ TFHE/CKKS/BGV (depends on crypto + lattice)
|
lux-accel/ Unified SDK: session + tensor + ml + crypto + zk + lattice + fhe + dexBuild (any sub-repo)
cd ~/work/luxcpp/<repo>
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
ctest --test-dir build --output-on-failure
cmake --install build --prefix /usr/localBuild with Conan (full workspace)
cd ~/work/luxcpp
conan install . --output-folder=build --build=missing
cd build && cmake .. -DCMAKE_TOOLCHAIN_FILE=conan_toolchain.cmake
cmake --build . -j$(nproc)Consume in CMake
find_package(lux-gpu REQUIRED)
target_link_libraries(myapp PRIVATE lux::gpu)
# Or with the unified SDK
find_package(lux-accel REQUIRED)
target_link_libraries(myapp PRIVATE lux::accel)Consume via pkg-config (for CGO)
export CGO_CFLAGS=$(pkg-config --cflags lux-gpu)
export CGO_LDFLAGS=$(pkg-config --libs lux-gpu)
CGO_ENABLED=1 go build ./...One-file quickstart
// quickstart.c - GPU-accelerated tensor operations
// Build: cc -o quickstart quickstart.c -lluxgpu -ldl
#include <lux/gpu.h>
#include <stdio.h>
int main(void) {
LuxGPU* gpu = lux_gpu_create(); // auto-detect best backend
if (!gpu) { fprintf(stderr, "No GPU backend\n"); return 1; }
printf("Backend: %s\n", lux_gpu_backend_name(gpu));
// Tensor operations
int64_t shape[] = {4, 4};
LuxTensor* a = lux_tensor_ones(gpu, shape, 2, LUX_FLOAT32);
LuxTensor* b = lux_tensor_full(gpu, shape, 2, LUX_FLOAT32, 2.0);
LuxTensor* c = lux_tensor_matmul(gpu, a, b);
lux_gpu_sync(gpu);
float result[16];
lux_tensor_to_host(c, result, sizeof(result));
printf("c[0] = %.1f (expect 8.0)\n", result[0]);
// Crypto: Poseidon2 hash
uint64_t inputs[4] = {1, 2, 3, 4};
uint64_t outputs[2];
lux_poseidon2_hash(gpu, inputs, outputs, 2, 2);
// NTT
uint64_t poly[4] = {1, 2, 3, 4};
lux_ntt_forward(gpu, poly, 4, 0xFFFFFFFF00000001ULL);
lux_tensor_destroy(c);
lux_tensor_destroy(b);
lux_tensor_destroy(a);
lux_gpu_destroy(gpu);
return 0;
}Core Concepts
Plugin Architecture
The core library (gpu/) defines a stable C ABI via backend_plugin.h. Each backend is a shared library exporting lux_gpu_backend_init, which returns a vtable of function pointers. The core dispatches all operations through this vtable.
Runtime backend selection (priority order):
LUX_BACKENDenv var:metal,cuda,webgpu,cpu- Explicit API:
lux_gpu_create_with_backend(LUX_BACKEND_METAL) - Auto-detect: CUDA > Metal > WebGPU > CPU
Plugin search paths:
LUX_GPU_BACKEND_PATHenvironment variable- System library paths (
/usr/lib/lux-gpu/, etc.) - Relative to executable
Capability flags on the vtable descriptor indicate what a backend supports:
LUX_CAP_TENSOR_OPS | LUX_CAP_MATMUL | LUX_CAP_NTT | LUX_CAP_MSM |
LUX_CAP_FHE | LUX_CAP_TFHE | LUX_CAP_BLS12_381 | LUX_CAP_BN254 |
LUX_CAP_KZG | LUX_CAP_POSEIDON2 | LUX_CAP_BLAKE3 | ...GPU Kernel Categories
All three GPU backends (Metal, CUDA, WebGPU) implement matching kernel sets:
| Category | Kernels | Notes |
|---|---|---|
| gpu/ | binary, unary, reduce, softmax, layer_norm, rms_norm, gemv, conv, fft, rope, attention, scan, sort | ML tensor operations |
| crypto/ | bls12_381, bn254, secp256k1, msm, kzg, poseidon, blake3, goldilocks, frost_, mldsa_verify, ringtail_, shamir | Cryptographic primitives |
| fhe/ | ntt_*, tfhe_bootstrap, tfhe_keyswitch, blind_rotate, external_product, scheme_switch | FHE operations |
| lattice/ | modular, ntt_negacyclic, poly_arithmetic, twiddle | Lattice crypto acceleration |
| steel/ | steel_gemm_, steel_attention_, steel_conv_* | High-perf GEMM/attention/conv |
| zk/ | poseidon2, merkle | ZK-specific hash and tree |
lux-accel SDK (Unified FFI Layer)
lux-accel/ is the single shared library for Go/FFI consumers. It statically embeds libluxgpu and exports only C ABI functions via <lux/accel/c_api.h>. The C++ SDK (accel.hpp) provides a session-based API with typed accessors:
auto session = lux::accel::Session::create();
session->ml().matmul(a, b, c);
session->crypto().blsVerify(sig, msg, pubkey);
session->fhe().nttForward(poly, modulus);
session->zk().merkleRoot(leaves);
session->dex().matchOrders(bids, asks);FHE (OpenFHE Fork)
The fhe/ sub-repo is a fork of OpenFHE with Lux GPU extensions. Supports:
- TFHE/CGGI: Boolean circuits, ~10ms bootstrap per gate
- CKKS: Approximate arithmetic on reals
- BGV/BFV: Exact integer arithmetic
- Threshold FHE: Distributed decryption (Shamir-based)
- Go bindings:
github.com/luxfi/fhe/go(tfhe, ckks, threshold packages)
CUDA Licensing
CUDA kernels (cuda/) are proprietary and require commercial licensing:
- Ecosystem tier: included with Lux Network staking
- Developer tier: $999/month
- Enterprise tier: custom pricing
Performance claims on A100: NTT 2^24 in 4.2ms, MSM 2^24 in 1.8s.
Session Storage Server
session/ is a C++ storage server for the Lux Session network (encrypted messaging). Uses:
- Post-quantum crypto (ML-KEM-768, ML-DSA-65) via
luxcpp/crypto - GPU acceleration for batch crypto
- HTTPS/QUIC serving
- CGO bindings for Go integration (
libsession_cgo.a)
DEX Matching Engine
dex/ is a sub-microsecond order book engine with:
- Price-time priority (FIFO)
- Lock-free data structures (Boost)
- C/C++/Go bindings
- GPU-accelerated AMM swap computation
Naming Conventions
| Item | Convention | Example |
|---|---|---|
| CMake package | lux-<pkg> | lux-gpu, lux-accel |
| CMake target | lux::<pkg> | lux::gpu, lux::accel |
| Library file | liblux<pkg> | libluxgpu.dylib, libluxaccel.so |
| Header path | lux/<pkg>/ | #include <lux/gpu.h>, #include <lux/accel/c_api.h> |
| Plugin file | libluxgpu_backend_<name> | libluxgpu_backend_metal.dylib |
| pkg-config | lux-<pkg> | pkg-config --libs lux-gpu |
Conan Build Options
Key options in the top-level conanfile.py:
| Option | Default | Description |
|---|---|---|
with_gpu | ON | GPU compute library |
with_metal | ON (macOS) | Metal backend |
with_cuda | OFF | CUDA backend (requires license) |
with_webgpu | ON | WebGPU/Dawn backend |
with_fhe | ON | Fully homomorphic encryption |
with_crypto | ON | Cryptographic primitives |
with_lattice | ON | Lattice-based crypto |
with_http | ON | HTTP/REST framework |
with_grpc | OFF | gRPC (large dependency) |
with_dex | ON | DEX matching engine |
embed_kernels | ON | Embed shader sources in binary |
Troubleshooting
No GPU backend detected
# Check backend availability
export LUX_GPU_BACKEND_PATH=/usr/local/lib/lux-gpu
# Or force CPU fallback
export LUX_BACKEND=cpuMetal backend not loading
- Ensure macOS 12+ and Xcode Command Line Tools installed
- Build the metal plugin separately:
cd ~/work/luxcpp/metal && cmake -B build && cmake --build build - Install to plugin path:
cmake --install build --prefix /usr/local
CUDA build fails
- Requires CUDA Toolkit 12.0+ and
nvccin PATH - Compute Capability 7.0+ (Volta/Turing/Ampere/Hopper)
- Private repo requires license and NDA
CMake can't find lux-gpu
# Set CMAKE_PREFIX_PATH to where lux-gpu is installed
cmake -B build -DCMAKE_PREFIX_PATH=/usr/local
# Or use pkg-config
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfigFHE build slow
- FHE (OpenFHE) is a large codebase; use
-j$(nproc)for parallel builds - Disable tests with
-DBUILD_UNITTESTS=OFFfor faster iteration
CGO linking errors
# Ensure libluxaccel is installed and discoverable
export CGO_CFLAGS="-I/usr/local/include"
export CGO_LDFLAGS="-L/usr/local/lib -lluxaccel"
CGO_ENABLED=1 go build ./...ABI version mismatch
- Backend plugins must match
LUX_GPU_BACKEND_ABI_VERSION(currently 2) - Rebuild all plugins when core ABI changes
- Check with: the core logs "ABI version mismatch" on plugin load failure
Related Skills
lux/lux-gpu.md-- Go bindings for GPU acceleration (consumeslibluxaccel)lux/lux-fhe.md-- FHE Go package (consumesluxcpp/fheC++ library)lux/lux-crypto.md-- Go crypto package (BLS, ML-DSA, ML-KEM)lux/lux-lattice.md-- Go lattice crypto (NTT, polynomial rings)lux/lux-dex.md-- DEX matching engine detailslux/lux-session.md-- Session network and storage server