A general-purpose, efficient Riemannian Gaussian sampler. The library provides exact spectral sampling from isotropic Gaussian distributions on compact matrix manifolds, with current support for the rotation group $SO(d)$ and the Stiefel manifold $V(n, k)$.
Samples lie exactly on the manifold (up to floating-point precision) with the correct concentration around a user-supplied mean frame. The spectral approach — decomposing each sample into a Weyl-chamber angle draw (Phase I, done once at construction) and a manifold lift (Phase II, per sample) — avoids the exponential rejection rates of naive methods and the geodesic discretisation error of random walks, while remaining efficient enough for large batches on both CPU and GPU.
| Manifold | Algorithm | Phase I (per HMC step) | Phase II (per sample) |
|---|---|---|---|
| $V(n, k)$ — Stiefel | Algorithm 4.2 | $O(k^2)$ — HMC over $k$ principal angles; Dyson repulsion gradient is $O(k^2)$ | $O(k^3)$ matrix exponential on $2k \times 2k$ active core, then $O(nk^2)$ QR + frame assembly |
| $SO(d)$ — Rotations | Algorithm 4.3 | $O(d^2)$ — HMC over $\lfloor d/2 \rfloor$ angles; cosine-difference repulsion gradient is $O(m^2),\ m = \lfloor d/2 \rfloor$ | $O(d)$ Givens-block canonical rotation $\Theta$, then $O(d^3)$ Haar $O(d)$ draw via QR + conjugation $Q\Theta Q^\top \hat{M}$ |
Phase I runs once at construction (burn-in). Each sample() call executes Phase II only. The dominant per-sample cost is $O(nk^2)$ for Stiefel and $O(d^3)$ for $SO(d)$, both arising from the QR decomposition.
This library uses isomorphism as its tensor backend. Install it first, choosing the backend that matches your hardware:
brew tap c0rmac/homebrew-isomorphism
# Apple Silicon (recommended — uses the Metal GPU via MLX)
brew install isomorphism --with-mlx
# Any CPU (Eigen)
brew install isomorphism --with-eigen
# PyTorch / LibTorch
brew install isomorphism --with-torch
For full installation options and building from source, see the isomorphism documentation.
brew tap c0rmac/homebrew-riemannian-gaussian-sampler
brew install riemannian-gaussian-sampler
git clone https://github.com/c0rmac/riemannian-gaussian-sampler.git
cd riemannian-gaussian-sampler
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build
If isomorphism is not on the default CMake prefix path, point to its source tree directly:
cmake -S . -B build \
-DISOMORPHISM_DIR=/path/to/isomorphism \
-DCMAKE_BUILD_TYPE=Release
find_package(sampler REQUIRED)
add_executable(my_app main.cpp)
target_link_libraries(my_app PRIVATE sampler::sampler)
The Riemannian Gaussian distribution on a smooth Riemannian manifold $(\mathcal{M}, g)$ was first established by Pennec (2006) as the maximum-entropy distribution given a fixed mean point and covariance. For an isotropic covariance with concentration parameter $\alpha > 0$, the density with respect to the Riemannian volume measure is
\[\mu_\infty(X) = \frac{1}{Z(\alpha,\widehat{M})}\exp\left(-\alpha\, d_g^2(X,\, \widehat{M})\right), \qquad Z(\alpha,\widehat{M}) = \int_{\mathcal{M}} \exp\left(-\alpha\, d_g^2(X,\widehat{M})\right) d\mu_g(X)\]where $d_g(X, \widehat{M})$ is the geodesic distance between $X$ and the mean frame $\widehat{M} \in \mathcal{M}$, and $\alpha = \lambda/\delta^2$ plays the role of a precision (inverse variance). Larger $\alpha$ concentrates the distribution tightly around $\widehat{M}$; as $\alpha \to 0$ the distribution approaches the uniform (Haar) measure on $\mathcal{M}$.
Direct sampling from $\mu_\infty$ is difficult: $Z(\alpha, \widehat{M})$ has no closed form, the manifold is high-dimensional, and naive rejection sampling has exponentially poor acceptance rates in the concentrated regime.
The key insight that makes exact and efficient sampling possible is a decomposition of the manifold into a low-dimensional shape space and a high-dimensional but analytically tractable orientation fibre.
$SO(d)$ is a compact symmetric space. $V(n, k)$ is a compact homogeneous space — not symmetric in general, but it admits an analogous spectral decomposition via its reductive structure as the quotient $O(n) / O(n-k)$. In both cases, every point $X \in \mathcal{M}$ can be written as
\[X = h \cdot \exp(A(\theta)) \cdot \widehat{M}, \qquad h \in H,\quad \theta \in \mathcal{W}\]where $H$ is the stabiliser subgroup (the “orientation” degrees of freedom), $\mathcal{W}$ is the Weyl chamber (a cone of dimension $m \ll \dim \mathcal{M}$), and $A(\theta)$ is the canonical Lie-algebra element encoding the shape parameters $\theta$.
A generalised integration formula factorises the Riemannian volume measure exactly:
\[\int_{\mathcal{M}} f(X)\, d\mu_g(X) = C \int_{\mathcal{W}} f(\theta)\, w(\theta)\, d\theta\]where the density kernel $w(\theta)$ is the Jacobian of the spectral change of variables, given explicitly by
\[w(\theta) = \prod_{\alpha \in \Phi^+} \left|\sin\left(c_\alpha\, \alpha(\theta)\right)\right|^{m_\alpha}\]with $\Phi^+$ the positive restricted roots, $m_\alpha$ the geometric multiplicity of each root, and $c_\alpha$ a structural constant. The induced density on the shape space is therefore
\[p_\infty(\theta) = \frac{1}{Z_{\mathcal{W}}}\, w(\theta)\, \exp\left(-\alpha\, \|A(\theta)\|_g^2\right), \qquad Z_{\mathcal{W}} = \int_{\mathcal{W}} w(\theta)\, \exp\left(-\alpha\, \|A(\theta)\|_g^2\right) d\theta\]This is the exact marginal of $\mu_\infty$ under the spectral projection — no approximation is made. Sampling $\theta \sim p_\infty$ via HMC (Phase I) and then drawing an orientation $h \sim \mathrm{Haar}(H)$ and forming $X = h \cdot \exp(A(\theta)) \cdot \widehat{M}$ (Phase II) produces exact, unbiased samples from $\mu_\infty$. For $SO(d)$, $H = O(d)$ (the full orthogonal group); for $V(n, k)$, $H = O(n-k)$ (the stabiliser of the canonical frame, acting on the orthogonal complement).
For $SO(d)$ the Weyl chamber has dimension $m = \lfloor d/2 \rfloor$; for $V(n, k)$ it has dimension $k$. This reduction — from $\frac{d(d-1)}{2}$ or $nk - \frac{k(k+1)}{2}$ manifold dimensions down to $m$ shape parameters — is what makes Monte Carlo sampling practical.
A detailed paper with full proofs is currently pending review.
#include <sampler/so_gaussian_sampler.hpp>
#include <isomorphism/math.hpp>
namespace math = isomorphism::math;
int main() {
const int d = 4;
// Mean rotation: identity
isomorphism::Tensor M_hat = math::eye(d, isomorphism::DType::Float32);
sampler::SOdGaussianSampler::Config cfg;
cfg.alpha = 2.0; // concentration: higher = tighter around M_hat
cfg.num_samples = 1;
sampler::SOdGaussianSampler samp(M_hat, d, cfg);
// Returns [1, d, d]. Call sample() repeatedly — burn-in only runs once.
isomorphism::Tensor X = samp.sample();
}
#include <sampler/stiefel_gaussian_sampler.hpp>
#include <isomorphism/math.hpp>
namespace math = isomorphism::math;
int main() {
const int n = 10, k = 3;
// Mean frame: first k columns of I_n (canonical Stiefel origin)
std::vector<float> data(n * k, 0.f);
for (int i = 0; i < k; ++i) data[i * k + i] = 1.f;
isomorphism::Tensor X_hat = math::array(data, {n, k}, isomorphism::DType::Float32);
sampler::StiefelGaussianSampler::Config cfg;
cfg.alpha = 1.0;
cfg.num_samples = 1;
sampler::StiefelGaussianSampler samp(X_hat, n, k, cfg);
// Returns [n, k]. X^T X ≈ I_k.
isomorphism::Tensor X = samp.sample();
}
Set num_samples = N to draw a batch of N independent samples in one call.
The returned tensor has shape [N, d, d] (SO) or [N, n, k] (Stiefel).
sampler::SOdGaussianSampler::Config cfg;
cfg.alpha = 1.0;
cfg.num_samples = 256; // 256 independent SO(4) rotations per call
cfg.angle_cfg.num_chains = 256; // one HMC chain per sample — fully parallel
cfg.angle_cfg.num_threads = 8; // use 8 threads across those chains
sampler::SOdGaussianSampler samp(M_hat, 4, cfg);
isomorphism::Tensor batch = samp.sample(); // [256, 4, 4]
The num_chains chains run in parallel during burn-in; at sampling time each
chain advances one thinned HMC step and the resulting angle vector is lifted to
the manifold independently, giving exact i.i.d. samples across the batch.
By default all samplers are single-threaded (polite for library consumers). To
enable parallelism, set num_threads per sampler or globally:
#include <sampler/thread_config.hpp>
// Per-sampler: use 4 threads for this sampler's HMC chains
cfg.angle_cfg.num_threads = 4;
cfg.angle_cfg.num_chains = 16;
// Global override: cap all samplers to 8 threads for this process
sampler::set_num_threads(8);
// Clear the global cap (revert to per-sampler values)
sampler::set_num_threads(0);
The global override (set_num_threads) takes priority over the per-sampler
num_threads field. This lets you tune thread usage once at application startup
without modifying individual sampler configs.
If your application already holds MLX arrays, torch tensors, or Eigen matrices you can pass them directly to the sampler without any CPU round-trip using the interop headers provided by isomorphism.
#include <isomorphism/interop/mlx.hpp>
#include <sampler/so_gaussian_sampler.hpp>
namespace iso_mlx = isomorphism::interop::mlx;
// Wrap a native MLX array as an isomorphism::Tensor (zero-copy)
mlx::core::array m_hat_mlx = /* your array */;
isomorphism::Tensor M_hat = iso_mlx::wrap(m_hat_mlx);
sampler::SOdGaussianSampler samp(M_hat, d, cfg);
isomorphism::Tensor X = samp.sample();
// Unwrap the result back to a native MLX array (zero-copy)
mlx::core::array X_mlx = iso_mlx::unwrap(X);
#include <isomorphism/interop/torch.hpp>
#include <sampler/stiefel_gaussian_sampler.hpp>
namespace iso_torch = isomorphism::interop::torch;
torch::Tensor x_hat_torch = torch::eye(n).narrow(1, 0, k);
isomorphism::Tensor X_hat = iso_torch::wrap(x_hat_torch);
sampler::StiefelGaussianSampler samp(X_hat, n, k, cfg);
isomorphism::Tensor X = samp.sample();
torch::Tensor X_torch = iso_torch::unwrap(X);
#include <isomorphism/interop/eigen.hpp>
#include <sampler/stiefel_gaussian_sampler.hpp>
namespace iso_eigen = isomorphism::interop::eigen;
Eigen::MatrixXf x_hat_eigen = Eigen::MatrixXf::Identity(n, k);
isomorphism::Tensor X_hat = iso_eigen::wrap(x_hat_eigen);
sampler::StiefelGaussianSampler samp(X_hat, n, k, cfg);
isomorphism::Tensor X = samp.sample();
// Zero-copy map back into Eigen (contiguous tensors only)
auto X_map = iso_eigen::to_matrix_map(X); // Eigen::Map<MatrixXf>
// Or copy into a new MatrixXf
Eigen::MatrixXf X_eigen = iso_eigen::to_matrix(X);
sample() is cheap to call repeatedly — burn-in runs only at construction
time. The typical pattern for an iterative algorithm is:
// Build once (triggers burn-in)
sampler::SOdGaussianSampler samp(M_hat, d, cfg);
for (int step = 0; step < n_steps; ++step) {
// Draw a fresh batch — only Phase II runs
isomorphism::Tensor X = samp.sample();
// ... update M_hat based on X ...
// If the mean frame changes substantially, rebuild with the new mean.
// This does NOT re-run burn-in; call rebuild_angle_sampler() explicitly
// only if alpha or the HMC config has changed.
samp = sampler::SOdGaussianSampler(new_M_hat, d, cfg);
}
// Acceptance rate after burn-in should sit in [0.30, 0.95].
// Values outside this range suggest epsilon needs tuning.
double acc = samp.angle_acceptance_rate();
std::printf("HMC acceptance rate: %.3f\n", acc);
// Adjust epsilon and warm-up via angle_cfg if needed:
cfg.angle_cfg.init_epsilon = 5e-4; // coarser initial step size
cfg.angle_cfg.target_accept = 0.70; // dual-averaging target
cfg.angle_cfg.burn_in = 3000; // longer warm-up
SOdGaussianSampler::Config| Field | Default | Description |
|---|---|---|
num_samples |
1 |
Batch size N — number of $SO(d)$ matrices per sample() call |
alpha |
1.0 |
Concentration α = λ/δ². Higher = tighter around M̂ |
dtype |
Float32 |
Tensor element type |
angle_cfg |
— | Nested SOdAngleSampler::Config (see below) |
StiefelGaussianSampler::Config| Field | Default | Description |
|---|---|---|
num_samples |
1 |
Batch size N — number of $V(n,k)$ frames per sample() call |
alpha |
1.0 |
Concentration α = λ/δ². Higher = tighter around X̂ |
dtype |
Float32 |
Tensor element type |
angle_cfg |
— | Nested PrincipalAngleSampler::Config (see below) |
These fields appear in both SOdAngleSampler::Config and
PrincipalAngleSampler::Config and are set via cfg.angle_cfg.*:
| Field | Default | Description |
|---|---|---|
num_chains |
1 |
Parallel HMC chains (set equal to num_samples for i.i.d. batches) |
num_threads |
1 |
OpenMP threads for this sampler’s HMC regions |
burn_in |
2000 |
Burn-in steps per chain |
leapfrog_steps |
5 |
Leapfrog steps per HMC trajectory |
init_epsilon |
1e-4 |
Initial step size (dual-averaging adapts it during burn-in) |
target_accept |
0.65 |
Dual-averaging acceptance rate target |
thinning |
1 |
HMC steps between consecutive samples |
seed |
0 |
RNG seed (0 = time-based; chain c receives seed + c) |
include/sampler/
sampler_base.hpp # Abstract SamplerBase interface
thread_config.hpp # Global thread budget (set_num_threads)
angle_sampler_hmc.hpp # CRTP HMC base (declaration only)
principal_angle_sampler.hpp # Stiefel principal angle density
so_angle_sampler.hpp # $SO(d)$ rotation angle density
stiefel_gaussian_sampler.hpp # $V(n,k)$ top-level sampler (Algorithm 4.2)
so_gaussian_sampler.hpp # $SO(d)$ top-level sampler (Algorithm 4.3)
src/
thread_config.cpp
angle_sampler_hmc.cpp # CRTP method bodies + explicit instantiations
principal_angle_sampler.cpp
so_angle_sampler.cpp
stiefel_gaussian_sampler.cpp
so_gaussian_sampler.cpp
MIT