Langevin Dynamics
Langevin dynamics is a [[Stochastic Differential Equation (SDE)|stochastic differential equation]] ([[Stochastic Differential Equation (SDE)|SDE]]) that describes the motion of a particle under the combined influence of a deterministic drift (gradient of a potential) and random thermal fluctuations. In machine learning, it serves as a sampling algorithm that generates samples from a target distribution
1. Core Concept
1.1 Physical Origin: Brownian Motion with Drift
Langevin dynamics originates from statistical physics, describing a Brownian particle in a potential field
where:
-
: particle mass -
: deterministic force from potential -
: friction (dissipation) -
: thermal fluctuations (white noise) -
: standard Gaussian white noise,
1.2 Overdamped Limit
In the overdamped limit (
This is an [[Stochastic Differential Equation (SDE)|Itô SDE]] with:
- Drift:
— follows the negative gradient of the potential - Diffusion:
— constant additive noise
1.3 From Physics to Sampling
Replace the physical potential
This yields the Langevin sampling equation:
The stationary distribution of this [[Stochastic Differential Equation (SDE)|SDE]] is exactly
2. Mathematical Foundation
2.1 Stationary Distribution
Theorem: The overdamped Langevin SDE
has
Proof sketch (via [[Fokker-Planck Equation|Fokker-Planck]]):
The [[Fokker-Planck Equation]] for this [[Stochastic Differential Equation (SDE)|SDE]] is:
Setting
2.2 Discrete-Time Approximation (Euler-Maruyama)
The continuous [[Stochastic Differential Equation (SDE)|SDE]] is discretized using the Euler-Maruyama scheme:
where
1 | def langevin_dynamics(score_fn, x_init, n_steps, step_size): |
2.3 Discretization Error
The Euler-Maruyama discretization introduces an
| Algorithm | Acronym | Accept/Reject | Bias | Variance |
|---|---|---|---|---|
| Unadjusted Langevin | ULA | ❌ No |
|
Lower |
| Metropolis-Adjusted | MALA | ✅ Yes | Asymptotically unbiased | Higher (rejections) |
| Stochastic Gradient Langevin | SGLD | ❌ No |
|
Lower (scalable) |
2.4 Convergence Rate
Under log-concavity (
Key takeaway: convergence is exponentially fast in continuous time, with discretization error
3. Langevin Dynamics for Generative Modeling
3.1 Score-Based Sampling
The breakthrough insight of score-based generative modeling:
If we can learn
, we can sample from via Langevin dynamics — without ever computing the normalization constant.
1 | Score-Based Sampling Pipeline |
3.2 Annealed Langevin Dynamics
Problem: A single score model struggles with multi-modal, complex distributions.
Solution (NCSN) : Train at multiple noise levels
1 | def annealed_langevin(score_model, noise_levels, steps_per_level, step_size): |
Annealing schedule design:
| Parameter | Typical Value | Rationale |
|---|---|---|
|
|
1.0 – 10.0 | Large enough to cover data modes |
|
|
0.01 | Small enough for precision |
|
|
10 – 50 | Geometric progression:
|
| Steps per level | 10 – 100 | Longer at smaller
|
|
|
|
Ensures stable dynamics at each scale |
3.3 Correctors in Predictor-Corrector Framework
In [[Diffusion Model|diffusion models]], Langevin dynamics serves as the corrector that refines samples:
1 | Predictor-Corrector Sampling Loop |
Why Langevin as corrector?
- The predictor step may drift away from the true distribution
- Langevin dynamics, given the exact score, converges toward the correct conditional distribution
- A few corrector steps significantly improve sample quality
3.4 Comparison: Langevin vs. ODE vs. [[Stochastic Differential Equation (SDE)|SDE]] Sampling
| Aspect | Langevin Dynamics | ODE (Probability Flow) | Reverse [[Stochastic Differential Equation (SDE)|SDE]] |
|---|---|---|---|
| Stochasticity | Stochastic | Deterministic | Stochastic |
| Score usage |
|
|
|
| Convergence guarantee | Yes (
|
Path-dependent (fixed start) | Path-dependent |
| Step efficiency | Many steps needed | Fewer steps ([[DPM-Solver]]) | Many steps |
| Quality | High (stochastic refinement) | Good (fast) | High |
| Role | Corrector, standalone sampler | Predictor | Predictor |
4. Algorithmic Variants
4.1 MALA: Metropolis-Adjusted Langevin Algorithm
MALA adds a Metropolis-Hastings accept-reject step to remove discretization bias:
1 | def mala_step(x, score_fn, step_size): |
4.2 SGLD: Stochastic Gradient Langevin Dynamics
For large datasets, SGLD uses mini-batch gradients:
where
Key properties:
- Scalable to massive datasets
- No accept-reject step (unadjusted)
- Decreasing step size ensures convergence
4.3 Underdamped Langevin Dynamics
Reintroducing momentum (kinetic Langevin) for faster mixing:
Advantages over overdamped:
- Faster convergence (momentum reduces random-walk behavior)
- Better exploration of multi-modal distributions
- Used in advanced MCMC samplers (Hamiltonian Monte Carlo is a related approach)
1 | def underdamped_langevin_step(x, v, score_fn, gamma, step_size): |
5. Connection to Key Concepts
5.1 Langevin Dynamics → [[Score Function]]
Langevin dynamics is the primary consumer of the [[Score Function]] in generative modeling:
Without the [[Score Function]], Langevin dynamics cannot sample. Without Langevin dynamics, the learned score has no sampling mechanism. This mutual dependency makes them the core pair of score-based generation.
5.2 Langevin Dynamics → [[Diffusion Model]]
In diffusion models, Langevin dynamics appears as:
- Corrector step: Refines samples after predictor ODE/[[Stochastic Differential Equation (SDE)|SDE]] steps
- Ancestral sampling connection: DDPM reverse process can be viewed as Langevin dynamics with a learned score
- Quality boost: Even 1-2 corrector Langevin steps significantly improve FID
5.3 Langevin Dynamics → [[Stochastic Differential Equation (SDE)]]
The overdamped Langevin equation is an Itô SDE:
This connects to the general [[Stochastic Differential Equation (SDE)|SDE]] framework used in score-based generative models (Song et al., 2021), where different choices of
5.4 Langevin Dynamics → [[Fokker-Planck Equation]]
The [[Fokker-Planck Equation]] provides the density-level description of Langevin dynamics:
This PDE describes how the ensemble distribution evolves — and proves that
5.5 Langevin Dynamics → [[Wiener Process|Wiener Process]]
The noise term
6. Practical Implementation
6.1 Complete Langevin Sampler
1 | class LangevinSampler: |
6.2 Step Size Tuning
| Symptom | Likely Cause | Fix |
|---|---|---|
| Diverging samples | Step size too large | Reduce
|
| No mixing (stuck) | Step size too small | Increase
|
| Mode collapse | Insufficient noise | Use annealing schedule |
| High autocorrelation | Underdamped needed | Add momentum (kinetic Langevin) |
| Numerical instability | Poor score estimate | Gradient clipping, check score model |
6.3 Computational Complexity
For
| Component | Cost per step | Total cost |
|---|---|---|
| Score evaluation |
|
|
| Noise generation |
|
|
| State update |
|
|
| Total | — |
|
The dominant cost is score model evaluation — in diffusion models, the [[U-Net]]/[[DiT]] forward pass for each Langevin corrector step.
7. Theoretical Properties
7.1 Reversibility and Detailed Balance
The overdamped Langevin [[Stochastic Differential Equation (SDE)|SDE]] is reversible with respect to
where
7.2 Ergodicity
Under mild conditions (positive density, smooth score, proper tails), Langevin dynamics is ergodic:
This guarantees that time averages converge to ensemble averages — a crucial property for MCMC applications.
7.3 Mixing Time
The mixing time (time to reach
where
8. Comparison with Other Sampling Methods
| Method | Gradient | Stochastic | Acceptance | Scaling | Best For |
|---|---|---|---|---|---|
| Langevin (ULA) | Score only | Yes | No |
|
Continuous, differentiable |
| MALA | Score + log-p | Yes | Yes |
|
Exact sampling, high-dim |
| HMC | Score only | Yes (implicit) | Yes |
|
Multi-modal, correlated |
| Gibbs | None | Conditional | Yes |
|
Factorized conditionals |
| RW Metropolis | None | Yes | Yes |
|
Low-dim, non-diff. |
| Rejection | None | Yes | Yes |
|
Low-dim only |
Langevin advantage: Only needs
9. Core Formula Cards
| # | Formula | Meaning |
|---|---|---|
| 1 |
|
Overdamped Langevin [[Stochastic Differential Equation (SDE)|SDE]] (continuous) |
| 2 |
|
Euler-Maruyama discretization (ULA) |
| 3 |
|
[[Fokker-Planck Equation]] for Langevin |
| 4 |
|
Convergence rate (log-concave) |
| 5 |
|
White noise correlation ([[Wiener Process |
| 6 |
|
Physical potential ↔ probability connection |
10. Summary
Langevin dynamics bridges statistical physics and deep generative modeling through a simple yet profound connection: the physical force
Its three key roles in modern ML:
| Role | Context | Significance |
|---|---|---|
| Standalone sampler | Score-based models (NCSN) | Generates samples from learned score without normalization |
| Corrector | Predictor-corrector diffusion | Refines samples, improves quality with 1-2 steps |
| Theoretical bridge | [[Stochastic Differential Equation (SDE)|SDE]] ↔ Density evolution | Links particle trajectories ([[Stochastic Differential Equation (SDE)|SDE]]) to distribution evolution (Fokker-Planck) |
The equation itself is deceptively simple —
Related Concepts
- [[Score Function]]
- [[Diffusion Model]]
- [[Stochastic Differential Equation (SDE)]]
- [[Fokker-Planck Equation]]
- [[Wiener Process|Wiener Process]]
- [[Probability Flow ODE]]
- [[DDIM]]
- [[DPM-Solver]]
- [[Markov Process]]
- [[Martingale]]
- [[Metropolis-Hastings]]
- [[Hamiltonian Monte Carlo]]