Lux Docs
Lux Skills Reference

Lux Accel - C++ GPU/Crypto/FHE Acceleration Libraries

Documentation for Lux Accel - C++ GPU/Crypto/FHE Acceleration Libraries

Overview

LuxCPP is a multi-repo C++ monorepo providing GPU-accelerated compute for blockchain, cryptography, FHE, ML, and DEX workloads. The workspace at ~/work/luxcpp/ contains 15+ sub-repos under the github.com/luxcpp/* GitHub organization, with public-facing packages referenced as github.com/luxfi/* where applicable. The architecture is plugin-based: a core GPU library provides a stable C ABI, and backend plugins (Metal, CUDA, WebGPU) are built and loaded separately at runtime.

Tech Stack

LayerTechnology
LanguageC++17 (core), C++23 (accel SDK), Objective-C++ (.mm for Metal)
GPU BackendsMetal (Apple Silicon), CUDA 12.0+ (NVIDIA), WebGPU/Dawn (cross-platform), CPU/SIMD (fallback)
Shader LanguagesMetal Shading Language (.metal), CUDA (.cu), WGSL (.wgsl)
Build SystemCMake 3.20+, Conan 2.x (package manager)
FHE BaseOpenFHE fork (TFHE/CGGI, CKKS, BGV/BFV)
HTTP FrameworkDrogon (fork as luxcpp/http)
RPCgRPC 1.62 (fork as luxcpp/grpc)
CryptoBLS12-381, BN254, secp256k1, ML-DSA, ML-KEM, FROST, Poseidon2, BLAKE3, KZG
TestingGoogleTest, doctest, CTest
LicenseBSD-3-Clause-Eco (open), Proprietary (CUDA kernels)

Key Dependencies

  • System: OpenSSL 3.2, zlib, brotli, OpenMP (optional)
  • Conan packages: nlohmann_json 3.11, fmt 10.2, spdlog 1.12, cereal 1.3, Boost 1.84 (DEX), GTest 1.14
  • GPU: Metal.framework (macOS), CUDA Toolkit 12.0+ (NVIDIA), Dawn/wgpu-native (WebGPU)
  • FHE: OpenMP (optional parallelization)

When to use

  • Building or modifying C++ GPU kernels for blockchain/crypto operations
  • Adding new GPU-accelerated operations (tensor, crypto, FHE, ZK)
  • Working on the plugin-based backend architecture
  • Building Go/Rust/Python FFI bindings against libluxaccel or libluxgpu
  • Optimizing NTT, MSM, TFHE bootstrap, or other crypto primitives on GPU
  • Adding a new backend plugin (e.g., ROCm, Vulkan)
  • Working on the DEX matching engine
  • Modifying the session/storage server for the Session network

Hard requirements

  • CMake 3.20+ for all sub-repos
  • C++17 minimum (C++23 for lux-accel SDK)
  • macOS 12+ for Metal backend (Apple Silicon recommended)
  • CUDA Toolkit 12.0+ and Compute Capability 7.0+ (Volta+) for CUDA backend
  • Dawn or wgpu-native for WebGPU backend
  • All repos use github.com/luxcpp/* remotes (NOT github.com/luxfi/* for the C++ repos)
  • Public C API header is <lux/gpu.h> -- stable ABI, never break it
  • Backend plugins export exactly one symbol: lux_gpu_backend_init
  • Plugin naming: libluxgpu_backend_<name>.\{so,dylib,dll\}
  • Backend ABI version must match: LUX_GPU_BACKEND_ABI_VERSION (currently 2)

Quick reference

Repository Map

Sub-repoRemotePurpose
gpu/luxcpp/gpuCore library: plugin loader, CPU backend, C API (libluxgpu)
metal/luxcpp/metalMetal backend plugin (Apple Silicon, MLX integration)
cuda/luxcpp/cudaCUDA backend plugin (NVIDIA, proprietary license)
webgpu/luxcpp/webgpuWebGPU backend plugin (Dawn/wgpu, cross-platform)
crypto/luxcpp/cryptoBLS12-381, ML-DSA, ML-KEM, secp256k1
lattice/luxcpp/latticeNTT, polynomial rings, Gaussian sampling, Go bindings
fhe/luxcpp/fheOpenFHE fork: TFHE, CKKS, BGV, threshold FHE, Go bindings
lux-accel/luxcpp/accelUnified SDK: session-based API, C ABI for Go/FFI (libluxaccel)
lux-gpu/luxcpp/lux-gpuC++ GPU SDK (device, buffer, kernel, registry abstractions)
lux-metal/(subdir)Metal plugin wrapper (Conan package)
lux-cuda/(subdir)CUDA plugin wrapper (Conan package)
lux-webgpu/(subdir)WebGPU plugin wrapper (Conan package)
dex/luxcpp/dexOrder book matching engine (lock-free, sub-microsecond)
consensus/(subdir)Consensus acceleration kernels
session/luxcpp/sessionSession network storage server (PQ crypto, HTTPS/QUIC)
http/luxcpp/httpDrogon HTTP framework fork
grpc/luxcpp/grpcgRPC fork
mlx-c-api/(subdir)MLX C API bridge (liblux_gpu_api.dylib)

Dependency Hierarchy

gpu/              Foundation (plugin loader, CPU backend, C ABI)
  |
  +-- metal/      Metal backend plugin (macOS)
  +-- cuda/       CUDA backend plugin (NVIDIA)
  +-- webgpu/     WebGPU backend plugin (cross-platform)
  |
crypto/           BLS pairings, post-quantum (depends on gpu optionally)
lattice/          NTT acceleration (depends on gpu)
  |
fhe/              TFHE/CKKS/BGV (depends on crypto + lattice)
  |
lux-accel/        Unified SDK: session + tensor + ml + crypto + zk + lattice + fhe + dex

Build (any sub-repo)

cd ~/work/luxcpp/<repo>
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
ctest --test-dir build --output-on-failure
cmake --install build --prefix /usr/local

Build with Conan (full workspace)

cd ~/work/luxcpp
conan install . --output-folder=build --build=missing
cd build && cmake .. -DCMAKE_TOOLCHAIN_FILE=conan_toolchain.cmake
cmake --build . -j$(nproc)

Consume in CMake

find_package(lux-gpu REQUIRED)
target_link_libraries(myapp PRIVATE lux::gpu)

# Or with the unified SDK
find_package(lux-accel REQUIRED)
target_link_libraries(myapp PRIVATE lux::accel)

Consume via pkg-config (for CGO)

export CGO_CFLAGS=$(pkg-config --cflags lux-gpu)
export CGO_LDFLAGS=$(pkg-config --libs lux-gpu)
CGO_ENABLED=1 go build ./...

One-file quickstart

// quickstart.c - GPU-accelerated tensor operations
// Build: cc -o quickstart quickstart.c -lluxgpu -ldl
#include <lux/gpu.h>
#include <stdio.h>

int main(void) {
    LuxGPU* gpu = lux_gpu_create();  // auto-detect best backend
    if (!gpu) { fprintf(stderr, "No GPU backend\n"); return 1; }

    printf("Backend: %s\n", lux_gpu_backend_name(gpu));

    // Tensor operations
    int64_t shape[] = {4, 4};
    LuxTensor* a = lux_tensor_ones(gpu, shape, 2, LUX_FLOAT32);
    LuxTensor* b = lux_tensor_full(gpu, shape, 2, LUX_FLOAT32, 2.0);
    LuxTensor* c = lux_tensor_matmul(gpu, a, b);

    lux_gpu_sync(gpu);

    float result[16];
    lux_tensor_to_host(c, result, sizeof(result));
    printf("c[0] = %.1f (expect 8.0)\n", result[0]);

    // Crypto: Poseidon2 hash
    uint64_t inputs[4] = {1, 2, 3, 4};
    uint64_t outputs[2];
    lux_poseidon2_hash(gpu, inputs, outputs, 2, 2);

    // NTT
    uint64_t poly[4] = {1, 2, 3, 4};
    lux_ntt_forward(gpu, poly, 4, 0xFFFFFFFF00000001ULL);

    lux_tensor_destroy(c);
    lux_tensor_destroy(b);
    lux_tensor_destroy(a);
    lux_gpu_destroy(gpu);
    return 0;
}

Core Concepts

Plugin Architecture

The core library (gpu/) defines a stable C ABI via backend_plugin.h. Each backend is a shared library exporting lux_gpu_backend_init, which returns a vtable of function pointers. The core dispatches all operations through this vtable.

Runtime backend selection (priority order):

  1. LUX_BACKEND env var: metal, cuda, webgpu, cpu
  2. Explicit API: lux_gpu_create_with_backend(LUX_BACKEND_METAL)
  3. Auto-detect: CUDA > Metal > WebGPU > CPU

Plugin search paths:

  1. LUX_GPU_BACKEND_PATH environment variable
  2. System library paths (/usr/lib/lux-gpu/, etc.)
  3. Relative to executable

Capability flags on the vtable descriptor indicate what a backend supports:

LUX_CAP_TENSOR_OPS | LUX_CAP_MATMUL | LUX_CAP_NTT | LUX_CAP_MSM |
LUX_CAP_FHE | LUX_CAP_TFHE | LUX_CAP_BLS12_381 | LUX_CAP_BN254 |
LUX_CAP_KZG | LUX_CAP_POSEIDON2 | LUX_CAP_BLAKE3 | ...

GPU Kernel Categories

All three GPU backends (Metal, CUDA, WebGPU) implement matching kernel sets:

CategoryKernelsNotes
gpu/binary, unary, reduce, softmax, layer_norm, rms_norm, gemv, conv, fft, rope, attention, scan, sortML tensor operations
crypto/bls12_381, bn254, secp256k1, msm, kzg, poseidon, blake3, goldilocks, frost_, mldsa_verify, ringtail_, shamirCryptographic primitives
fhe/ntt_*, tfhe_bootstrap, tfhe_keyswitch, blind_rotate, external_product, scheme_switchFHE operations
lattice/modular, ntt_negacyclic, poly_arithmetic, twiddleLattice crypto acceleration
steel/steel_gemm_, steel_attention_, steel_conv_*High-perf GEMM/attention/conv
zk/poseidon2, merkleZK-specific hash and tree

lux-accel SDK (Unified FFI Layer)

lux-accel/ is the single shared library for Go/FFI consumers. It statically embeds libluxgpu and exports only C ABI functions via <lux/accel/c_api.h>. The C++ SDK (accel.hpp) provides a session-based API with typed accessors:

auto session = lux::accel::Session::create();
session->ml().matmul(a, b, c);
session->crypto().blsVerify(sig, msg, pubkey);
session->fhe().nttForward(poly, modulus);
session->zk().merkleRoot(leaves);
session->dex().matchOrders(bids, asks);

FHE (OpenFHE Fork)

The fhe/ sub-repo is a fork of OpenFHE with Lux GPU extensions. Supports:

  • TFHE/CGGI: Boolean circuits, ~10ms bootstrap per gate
  • CKKS: Approximate arithmetic on reals
  • BGV/BFV: Exact integer arithmetic
  • Threshold FHE: Distributed decryption (Shamir-based)
  • Go bindings: github.com/luxfi/fhe/go (tfhe, ckks, threshold packages)

CUDA Licensing

CUDA kernels (cuda/) are proprietary and require commercial licensing:

  • Ecosystem tier: included with Lux Network staking
  • Developer tier: $999/month
  • Enterprise tier: custom pricing

Performance claims on A100: NTT 2^24 in 4.2ms, MSM 2^24 in 1.8s.

Session Storage Server

session/ is a C++ storage server for the Lux Session network (encrypted messaging). Uses:

  • Post-quantum crypto (ML-KEM-768, ML-DSA-65) via luxcpp/crypto
  • GPU acceleration for batch crypto
  • HTTPS/QUIC serving
  • CGO bindings for Go integration (libsession_cgo.a)

DEX Matching Engine

dex/ is a sub-microsecond order book engine with:

  • Price-time priority (FIFO)
  • Lock-free data structures (Boost)
  • C/C++/Go bindings
  • GPU-accelerated AMM swap computation

Naming Conventions

ItemConventionExample
CMake packagelux-<pkg>lux-gpu, lux-accel
CMake targetlux::<pkg>lux::gpu, lux::accel
Library fileliblux<pkg>libluxgpu.dylib, libluxaccel.so
Header pathlux/<pkg>/#include <lux/gpu.h>, #include <lux/accel/c_api.h>
Plugin filelibluxgpu_backend_<name>libluxgpu_backend_metal.dylib
pkg-configlux-<pkg>pkg-config --libs lux-gpu

Conan Build Options

Key options in the top-level conanfile.py:

OptionDefaultDescription
with_gpuONGPU compute library
with_metalON (macOS)Metal backend
with_cudaOFFCUDA backend (requires license)
with_webgpuONWebGPU/Dawn backend
with_fheONFully homomorphic encryption
with_cryptoONCryptographic primitives
with_latticeONLattice-based crypto
with_httpONHTTP/REST framework
with_grpcOFFgRPC (large dependency)
with_dexONDEX matching engine
embed_kernelsONEmbed shader sources in binary

Troubleshooting

No GPU backend detected

# Check backend availability
export LUX_GPU_BACKEND_PATH=/usr/local/lib/lux-gpu
# Or force CPU fallback
export LUX_BACKEND=cpu

Metal backend not loading

  • Ensure macOS 12+ and Xcode Command Line Tools installed
  • Build the metal plugin separately: cd ~/work/luxcpp/metal && cmake -B build && cmake --build build
  • Install to plugin path: cmake --install build --prefix /usr/local

CUDA build fails

  • Requires CUDA Toolkit 12.0+ and nvcc in PATH
  • Compute Capability 7.0+ (Volta/Turing/Ampere/Hopper)
  • Private repo requires license and NDA

CMake can't find lux-gpu

# Set CMAKE_PREFIX_PATH to where lux-gpu is installed
cmake -B build -DCMAKE_PREFIX_PATH=/usr/local
# Or use pkg-config
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

FHE build slow

  • FHE (OpenFHE) is a large codebase; use -j$(nproc) for parallel builds
  • Disable tests with -DBUILD_UNITTESTS=OFF for faster iteration

CGO linking errors

# Ensure libluxaccel is installed and discoverable
export CGO_CFLAGS="-I/usr/local/include"
export CGO_LDFLAGS="-L/usr/local/lib -lluxaccel"
CGO_ENABLED=1 go build ./...

ABI version mismatch

  • Backend plugins must match LUX_GPU_BACKEND_ABI_VERSION (currently 2)
  • Rebuild all plugins when core ABI changes
  • Check with: the core logs "ABI version mismatch" on plugin load failure
  • lux/lux-gpu.md -- Go bindings for GPU acceleration (consumes libluxaccel)
  • lux/lux-fhe.md -- FHE Go package (consumes luxcpp/fhe C++ library)
  • lux/lux-crypto.md -- Go crypto package (BLS, ML-DSA, ML-KEM)
  • lux/lux-lattice.md -- Go lattice crypto (NTT, polynomial rings)
  • lux/lux-dex.md -- DEX matching engine details
  • lux/lux-session.md -- Session network and storage server

On this page