Probability Flow ODE
Probability Flow ODE is a deterministic ordinary differential equation that shares the same marginal probability density
1. Core Concept
In diffusion models, the forward process is typically described by a [[Stochastic Differential Equation (SDE)|Stochastic Differential Equation]]:
The Probability Flow ODE is a deterministic counterpart that has the same marginal distribution
[!NOTE] Key Insight
While the [[Stochastic Differential Equation (SDE)|SDE]] introduces randomness through, the ODE achieves the same probability distribution evolution through a deterministic trajectory guided by the [[Score Function]] .
2. Derivation from Fokker-Planck Equation
2.1 Fokker-Planck Equation (Forward [[Stochastic Differential Equation (SDE)|SDE]])
The evolution of probability density
2.2 Probability Flow Velocity Field
We can rewrite the Fokker-Planck equation as a continuity equation:
where the velocity field
2.3 ODE Formulation
The deterministic ODE that follows this velocity field is:
This ODE preserves the same marginal distributions
3. Comparison: [[Stochastic Differential Equation (SDE)|SDE]] vs ODE
| Aspect | Forward [[Stochastic Differential Equation (SDE)|SDE]] | Probability Flow ODE |
|---|---|---|
| Form | Stochastic | Deterministic |
| Noise term |
|
None |
| Trajectory | Random paths | Deterministic paths |
| Marginal distribution |
|
Same
|
| Sampling | Requires random noise | Deterministic integration |
| Reversibility | Reverse [[Stochastic Differential Equation (SDE)|SDE]] needed | Simply integrate backward |
[!TIP] Practical Advantage
The ODE formulation allows using advanced ODE solvers (like Runge-Kutta methods) for more efficient and accurate sampling compared to Euler-Maruyama discretization of SDEs.
4. Reverse-Time Probability Flow ODE
4.1 Reverse ODE Formulation
To generate samples, we integrate the ODE backward from
where the [[Score Function]]
4.2 Variance-Preserving [[Stochastic Differential Equation (SDE)|SDE]] Case
For the variance-preserving [[Stochastic Differential Equation (SDE)|SDE]] where
4.3 Variance-Exploding [[Stochastic Differential Equation (SDE)|SDE]] Case
For the variance-exploding [[Stochastic Differential Equation (SDE)|SDE]] where
5. Relationship to Score-Based Models
5.1 [[Score Function]] Connection
The Probability Flow ODE explicitly reveals the role of the [[Score Function|score function]]
- The [[Score Function]] points in the direction of highest probability increase
- The ODE trajectory follows this direction, scaled by
5.2 Connection to Langevin Dynamics
Discretizing the reverse ODE is closely related to [[Langevin Dynamics|Langevin Dynamics]]:
The ODE removes the noise term, making the process deterministic.
6. Sampling Algorithms
6.1 Basic ODE Solver
1 | # Pseudocode: Reverse-time Probability Flow ODE sampling |
6.2 Advanced ODE Solvers
| Solver | Order | Characteristics |
|---|---|---|
| Euler | 1st | Simple, but requires small steps |
| RK2 (Midpoint) | 2nd | Better accuracy, moderate cost |
| RK4 | 4th | High accuracy, commonly used |
| DOPRI5 | Adaptive | Automatic step size control |
| [[DPM-Solver]] | Specialized | Designed for diffusion models |
[!NOTE] [[DPM-Solver]]
[[DPM-Solver]] exploits the semi-linear structure of the Probability Flow ODE to achieve high-order convergence with fewer function evaluations, significantly accelerating sampling.
6.3 [[DPM-Solver]] Details
Semi-linear Structure:
The Probability Flow ODE can be rewritten as:
where
Key Insight: The linear part
[[DPM-Solver]]-2:
Advantages:
- 10-20 steps for high-quality samples (vs 1000+ for [[Diffusion Model|DDPM]])
- Rigorous mathematical foundation
- Compatible with all [[Stochastic Differential Equation (SDE)|SDE]]/ODE-based diffusion models
[!TIP] Practical Usage
For most applications, [[DPM-Solver]]-2 or [[DPM-Solver]]-3 with 10-20 steps provides excellent quality-speed tradeoff.
7. Theoretical Properties
7.1 Marginal Distribution Preservation
Theorem: The Probability Flow ODE and the forward [[Stochastic Differential Equation (SDE)|SDE]] have identical marginal distributions
Proof Sketch: Both satisfy the same Fokker-Planck equation, and with the same initial condition, they must have the same solution.
7.2 Deterministic Mapping
The ODE defines a deterministic bijection between
This property enables:
- Exact likelihood computation via change of variables
- Latent space interpolation with meaningful trajectories
- Inversion of real data to latent space
7.3 Trajectory Regularity
ODE trajectories are smoother than [[Stochastic Differential Equation (SDE)|SDE]] paths:
- [[Stochastic Differential Equation (SDE)|SDE]]: Rough, non-differentiable paths (due to
) - ODE: Smooth, differentiable paths (enabling gradient-based optimization)
8. Applications
8.1 Likelihood Computation
Using the instantaneous change of variables formula:
Integrating from
8.2 Latent Space Manipulation
Since the ODE is deterministic:
- Interpolation: Linearly interpolate in latent space, then decode
- Editing: Modify latent code and integrate back
- Attribute manipulation: Navigate in semantic directions
8.3 Accelerated Sampling
- Few-step generation: Use adaptive ODE solvers with large steps
- Distillation: Train student model to match ODE trajectories
- Consistency models: Learn direct mapping along ODE paths
9. Core Formula Cards
[!QUOTE] Forward [[Stochastic Differential Equation (SDE)|SDE]]
[!QUOTE] Probability Flow ODE
[!QUOTE] Reverse-Time ODE
[!QUOTE] Velocity Field
[!QUOTE] Likelihood Computation
[!QUOTE] Variance-Preserving [[Stochastic Differential Equation (SDE)|SDE]] (VP-[[Stochastic Differential Equation (SDE)|SDE]])
[!QUOTE] Variance-Exploding [[Stochastic Differential Equation (SDE)|SDE]] (VE-[[Stochastic Differential Equation (SDE)|SDE]])
10. Theoretical Analysis
10.1 Connection to Continuous Normalizing Flows
The Probability Flow ODE defines a continuous normalizing flow (CNF):
- Forward map:
(data to noise) - Inverse map:
(noise to data) - Log-determinant: Tractable via instantaneous change of variables
Change of Variables Formula:
where
10.2 [[Score Function]] Approximation
The true [[Score Function]]
1. Denoising Score Matching:
2. Explicit Score for Gaussian Perturbation:
For
3. Network Parameterization:
- Score form:
directly predicts score - Noise form:
predicts noise (equivalent) - Velocity form:
predicts velocity field
10.3 Optimal Transport Perspective
The Probability Flow ODE can be viewed as finding an optimal transport path between data and noise distributions:
- Wasserstein-2 distance: Minimizes transport cost
- Benamou-Brenier formula: Connects OT with fluid dynamics
- Straight flows: Recent work aims to make ODE trajectories more linear
10.4 Rectified Flows
Key Idea: Learn a straighter ODE trajectory for faster sampling.
Standard Probability Flow ODE:
- Curved trajectories
- Requires many ODE solver steps
Rectified Flows:
- Iteratively “straighten” the ODE path
- Can achieve high quality with just 1-2 Euler steps
- Formula:
where learns straight paths
11. Practical Implementation Tips
11.1 Numerical Stability
Common Issues:
-
Stiff ODEs: When
varies rapidly- Solution: Use adaptive step size solvers
-
Score explosion: Near
or- Solution: Clip score values, use logarithmic time grids
-
Overflow in exponential terms:
- Solution: Work in log-space, use numerically stable formulas
Best Practices:
1 | # Numerically stable score computation |
11.2 Efficient Sampling Strategies
1. Time Discretization:
| Strategy | Steps | Quality | Speed |
|---|---|---|---|
| Uniform | 1000 | High | Slow |
| Log-spaced | 50-100 | High | Medium |
| [[DPM-Solver]] | 10-20 | High | Fast |
| Consistency | 1-5 | Medium | Very Fast |
2. Caching Strategies:
- Cache
, values - Precompute integral terms
- Batch multiple samples together
3. Parallelization:
- Score evaluation is batch-parallelizable
- ODE integration is sequential (cannot parallelize time steps)
- Use GPU for score model, CPU for ODE solver if needed
11.3 Debugging Checklist
Sampling Quality Issues:
- [ ] Score model trained correctly? Check loss curves
- [ ] Time encoding correct? Verify
or - [ ] ODE solver step size appropriate? Try reducing step size
- [ ] Boundary conditions correct? Check
- [ ] Score clipping needed? Monitor score magnitudes
Likelihood Computation Issues:
- [ ] Trace computation stable? Use Hutchinson’s estimator for high dimensions
- [ ] Integration method accurate? Use adaptive quadrature
- [ ] Initial log-density correct? Verify
12. Recent Advances (2023-2024)
12.1 [[Flow Matching]]
Key Insight: Instead of deriving ODE from [[Stochastic Differential Equation (SDE)|SDE]], directly learn a velocity field that matches a target flow.
Objective:
where
Advantages:
- Simpler than [[Stochastic Differential Equation (SDE)|SDE]]-based approach
- More flexible flow design
- Connections to optimal transport
12.2 Consistency Models
Goal: Learn direct mapping from any
Benefits:
- One-step generation possible
- Few-step generation (2-8 steps) with high quality
- Self-consistency loss (no need for score matching)
12.3 Rectified Flows
Idea: Iteratively straighten ODE trajectories.
Algorithm:
- Train initial model with standard Probability Flow ODE
- Generate samples, create new (noise, data) pairs
- Retrain model to learn straighter paths
- Repeat 2-3 times
Result: Nearly linear trajectories, enabling 1-2 step generation.
12.4 Comparison of Fast Sampling Methods
| Method | Steps | Training Cost | Quality | Flexibility |
|---|---|---|---|---|
| [[DPM-Solver]] | 10-20 | None (solver only) | High | High |
| Consistency Models | 1-8 | High (distillation) | High | Medium |
| Rectified Flows | 1-10 | Medium (retraining) | High | High |
| [[Flow Matching]] | 10-50 | Similar to [[Stochastic Differential Equation (SDE)|SDE]] | High | Very High |
13. Comparison with Related Methods
13.1 Probability Flow ODE vs Reverse [[Stochastic Differential Equation (SDE)|SDE]]
| Aspect | Probability Flow ODE | Reverse [[Stochastic Differential Equation (SDE)|SDE]] |
|---|---|---|
| Form | Deterministic | Stochastic |
| Equation |
|
|
| Sampling | ODE solver | [[Stochastic Differential Equation (SDE)|SDE]] solver |
| Variance | Zero (deterministic) | Non-zero (random) |
| Likelihood | Exact computation | Intractable |
| Quality | High (with sufficient steps) | High (with corrector) |
13.2 Probability Flow ODE vs Neural ODE
| Aspect | Probability Flow ODE | Neural ODE |
|---|---|---|
| Origin | From diffusion models | From continuous-depth networks |
| Velocity field |
|
Learned
|
| Training | Score matching | End-to-end backprop |
| Purpose | Generative modeling | Continuous dynamics |
| Likelihood | Exact via divergence | Exact via adjoint method |
13.3 Generative Model Comparison
| Model | Sampling | Likelihood | Training Stability | Sample Quality |
|---|---|---|---|---|
| GAN | Fast (1 step) | Intractable | Unstable | High |
| VAE | Fast (1 step) | Lower bound | Stable | Medium |
| Normalizing Flow | Fast (parallel) | Exact | Stable | Medium-High |
| Diffusion ([[Stochastic Differential Equation (SDE)|SDE]]) | Slow (100+ steps) | Tractable | Stable | Very High |
| Diffusion (ODE) | Medium (10-50 steps) | Exact | Stable | Very High |
Related Concepts
- [[Diffusion Model]]
- [[Stochastic Differential Equation (SDE)]]
- [[Score Function]]
- [[Wiener Process|Wiener Process]]
- [[Fokker-Planck Equation]]
- [[Kolmogorov Equations]]
- [[Langevin Dynamics]]
- [[DPM-Solver]]
- [[DDIM]]
- [[Continuous Normalizing Flow]]
- [[Neural ODE]]
- [[Flow Matching]]
- [[Consistency Models]]
- [[Rectified Flows]]
- [[Optimal Transport]]
- [[Martingale]]
Dataview Query
1 | LIST |
References
- Paper: Score-Based Generative Modeling through SDEs (Song et al., 2021)
- Paper: Maximum Likelihood Training of Score-Based Diffusion Models (Song et al., 2021)
- Paper: [[DPM-Solver]]: A Fast ODE Solver for [[Diffusion Model|Diffusion Probabilistic Model]] Sampling (Lu et al., 2022)
- Paper: [[Flow Matching]] for Generative Modeling (Lipman et al., 2023)
- Paper: Consistency Models (Song et al., 2023)
- Paper: Building Normalizing Flows with Stochastic Interpolants (Albergo et al., 2023)
- Paper: Rectified Flow (Liu et al., 2022)
- Blog: What are Diffusion Models? - Lilian Weng
- Blog: [[Flow Matching]] - AI papers summary
- Course: CS236 Deep Generative Models (Stanford)
- Course: STAT 260 High-Dimensional Statistics (Berkeley)