Probability Flow ODE

Probability Flow ODE is a deterministic ordinary differential equation that shares the same marginal probability density pt(x) as the forward [[Stochastic Differential Equation (SDE)|SDE]] in diffusion models. It provides an alternative perspective on the diffusion process by eliminating the stochastic component while preserving the distribution evolution.


1. Core Concept

In diffusion models, the forward process is typically described by a [[Stochastic Differential Equation (SDE)|Stochastic Differential Equation]]:

dx=f(t)xdt+g(t)dWt

The Probability Flow ODE is a deterministic counterpart that has the same marginal distribution pt(x) at every time t :

dx=[f(t)x12g(t)2xlogpt(x)]dt

[!NOTE] Key Insight
While the [[Stochastic Differential Equation (SDE)|SDE]] introduces randomness through Wt , the ODE achieves the same probability distribution evolution through a deterministic trajectory guided by the [[Score Function]] xlogpt(x) .


2. Derivation from Fokker-Planck Equation

2.1 Fokker-Planck Equation (Forward [[Stochastic Differential Equation (SDE)|SDE]])

The evolution of probability density pt(x) under the forward [[Stochastic Differential Equation (SDE)|SDE]] follows the Fokker-Planck equation:

pt(x)t=x[f(t)xpt(x)]+12g(t)2x2pt(x)

2.2 Probability Flow Velocity Field

We can rewrite the Fokker-Planck equation as a continuity equation:

pt(x)t=x[vt(x)pt(x)]

where the velocity field vt(x) is:

vt(x)=f(t)x12g(t)2xlogpt(x)

2.3 ODE Formulation

The deterministic ODE that follows this velocity field is:

dxdt=f(t)x12g(t)2xlogpt(x)

This ODE preserves the same marginal distributions pt(x) as the original [[Stochastic Differential Equation (SDE)|SDE]].


3. Comparison: [[Stochastic Differential Equation (SDE)|SDE]] vs ODE

Aspect Forward [[Stochastic Differential Equation (SDE)|SDE]] Probability Flow ODE
Form Stochastic Deterministic
Noise term g(t)dWt None
Trajectory Random paths Deterministic paths
Marginal distribution pt(x) Same pt(x)
Sampling Requires random noise Deterministic integration
Reversibility Reverse [[Stochastic Differential Equation (SDE)|SDE]] needed Simply integrate backward

[!TIP] Practical Advantage
The ODE formulation allows using advanced ODE solvers (like Runge-Kutta methods) for more efficient and accurate sampling compared to Euler-Maruyama discretization of SDEs.


4. Reverse-Time Probability Flow ODE

4.1 Reverse ODE Formulation

To generate samples, we integrate the ODE backward from t=T to t=0 :

dxdt=f(t)x+12g(t)2xlogpt(x)

where the [[Score Function]] xlogpt(x) is approximated by a neural network sθ(x,t) .

4.2 Variance-Preserving [[Stochastic Differential Equation (SDE)|SDE]] Case

For the variance-preserving [[Stochastic Differential Equation (SDE)|SDE]] where f(t)=12β(t) and g(t)=β(t) :

dxdt=12β(t)x12β(t)xlogpt(x)

4.3 Variance-Exploding [[Stochastic Differential Equation (SDE)|SDE]] Case

For the variance-exploding [[Stochastic Differential Equation (SDE)|SDE]] where f(t)=0 and g(t)=dσ2(t)dt :

dxdt=12dσ2(t)dtxlogpt(x)

5. Relationship to Score-Based Models

5.1 [[Score Function]] Connection

The Probability Flow ODE explicitly reveals the role of the [[Score Function|score function]] xlogpt(x) :

  • The [[Score Function]] points in the direction of highest probability increase
  • The ODE trajectory follows this direction, scaled by 12g(t)2

5.2 Connection to Langevin Dynamics

Discretizing the reverse ODE is closely related to [[Langevin Dynamics|Langevin Dynamics]]:

xtΔt=xt+12g(t)2xlogpt(x)Δt+noise term

The ODE removes the noise term, making the process deterministic.


6. Sampling Algorithms

6.1 Basic ODE Solver

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Pseudocode: Reverse-time Probability Flow ODE sampling
x_T = sample_from_prior() # Usually N(0, I)
timesteps = linspace(T, 0, N)

for t, t_next in zip(timesteps[:-1], timesteps[1:]):
# Estimate score function
score = score_model(x_t, t)

# Compute velocity field
dt = t_next - t
velocity = f(t) * x_t - 0.5 * g(t)**2 * score

# Euler integration (can use higher-order methods)
x_next = x_t + velocity * dt

x_t = x_next

return x_0

6.2 Advanced ODE Solvers

Solver Order Characteristics
Euler 1st Simple, but requires small steps
RK2 (Midpoint) 2nd Better accuracy, moderate cost
RK4 4th High accuracy, commonly used
DOPRI5 Adaptive Automatic step size control
[[DPM-Solver]] Specialized Designed for diffusion models

[!NOTE] [[DPM-Solver]]
[[DPM-Solver]] exploits the semi-linear structure of the Probability Flow ODE to achieve high-order convergence with fewer function evaluations, significantly accelerating sampling.

6.3 [[DPM-Solver]] Details

Semi-linear Structure:

The Probability Flow ODE can be rewritten as:

dxdt=f(t)x+g(t)sθ(x,t)

where sθ(x,t)12g(t)xlogpt(x) .

Key Insight: The linear part f(t)x can be solved analytically, while only the nonlinear part ([[Score Function]]) requires numerical integration.

[[DPM-Solver]]-2:

xts=αtsαtxtσtsttsατστdτsθ(xt,t)

Advantages:

  • 10-20 steps for high-quality samples (vs 1000+ for [[Diffusion Model|DDPM]])
  • Rigorous mathematical foundation
  • Compatible with all [[Stochastic Differential Equation (SDE)|SDE]]/ODE-based diffusion models

[!TIP] Practical Usage
For most applications, [[DPM-Solver]]-2 or [[DPM-Solver]]-3 with 10-20 steps provides excellent quality-speed tradeoff.


7. Theoretical Properties

7.1 Marginal Distribution Preservation

Theorem: The Probability Flow ODE and the forward [[Stochastic Differential Equation (SDE)|SDE]] have identical marginal distributions pt(x) for all t[0,T] .

Proof Sketch: Both satisfy the same Fokker-Planck equation, and with the same initial condition, they must have the same solution.

7.2 Deterministic Mapping

The ODE defines a deterministic bijection between x0 (data) and xT (noise):

xT=ΦT(x0),x0=ΦT1(xT)

This property enables:

  • Exact likelihood computation via change of variables
  • Latent space interpolation with meaningful trajectories
  • Inversion of real data to latent space

7.3 Trajectory Regularity

ODE trajectories are smoother than [[Stochastic Differential Equation (SDE)|SDE]] paths:

  • [[Stochastic Differential Equation (SDE)|SDE]]: Rough, non-differentiable paths (due to Wt )
  • ODE: Smooth, differentiable paths (enabling gradient-based optimization)

8. Applications

8.1 Likelihood Computation

Using the instantaneous change of variables formula:

ddtlogpt(x(t))=tr(dvt(x)dx)

Integrating from t=0 to t=T :

logp0(x0)=logpT(xT)+0Ttr(dvt(x)dx)dt

8.2 Latent Space Manipulation

Since the ODE is deterministic:

  • Interpolation: Linearly interpolate in latent space, then decode
  • Editing: Modify latent code and integrate back
  • Attribute manipulation: Navigate in semantic directions

8.3 Accelerated Sampling

  • Few-step generation: Use adaptive ODE solvers with large steps
  • Distillation: Train student model to match ODE trajectories
  • Consistency models: Learn direct mapping along ODE paths

9. Core Formula Cards

[!QUOTE] Forward [[Stochastic Differential Equation (SDE)|SDE]]

dx=f(t)xdt+g(t)dWt

[!QUOTE] Probability Flow ODE

dx=[f(t)x12g(t)2xlogpt(x)]dt

[!QUOTE] Reverse-Time ODE

dxdt=f(t)x+12g(t)2xlogpt(x)

[!QUOTE] Velocity Field

vt(x)=f(t)x12g(t)2xlogpt(x)

[!QUOTE] Likelihood Computation

logp0(x0)=logpT(xT)+0Ttr(dvt(x)dx)dt

[!QUOTE] Variance-Preserving [[Stochastic Differential Equation (SDE)|SDE]] (VP-[[Stochastic Differential Equation (SDE)|SDE]])

f(t)=12β(t),g(t)=β(t)

[!QUOTE] Variance-Exploding [[Stochastic Differential Equation (SDE)|SDE]] (VE-[[Stochastic Differential Equation (SDE)|SDE]])

f(t)=0,g(t)=dσ2(t)dt

10. Theoretical Analysis

10.1 Connection to Continuous Normalizing Flows

The Probability Flow ODE defines a continuous normalizing flow (CNF):

  • Forward map: x0xT (data to noise)
  • Inverse map: xTx0 (noise to data)
  • Log-determinant: Tractable via instantaneous change of variables

Change of Variables Formula:

logp0(x0)=logpT(xT)+0Txvt(x(t))dt

where xvt(x) is the divergence of the velocity field.

10.2 [[Score Function]] Approximation

The true [[Score Function]] xlogpt(x) is intractable. We approximate it using:

1. Denoising Score Matching:

L(θ)=Et,x0,xt[sθ(xt,t)xtlogp(xtx0)2]

2. Explicit Score for Gaussian Perturbation:

For xt=αtx0+σtϵ where ϵN(0,I) :

xtlogp(xtx0)=ϵσt

3. Network Parameterization:

  • Score form: sθ(xt,t) directly predicts score
  • Noise form: ϵθ(xt,t) predicts noise (equivalent)
  • Velocity form: vθ(xt,t) predicts velocity field

10.3 Optimal Transport Perspective

The Probability Flow ODE can be viewed as finding an optimal transport path between data and noise distributions:

  • Wasserstein-2 distance: Minimizes transport cost
  • Benamou-Brenier formula: Connects OT with fluid dynamics
  • Straight flows: Recent work aims to make ODE trajectories more linear

10.4 Rectified Flows

Key Idea: Learn a straighter ODE trajectory for faster sampling.

Standard Probability Flow ODE:

  • Curved trajectories
  • Requires many ODE solver steps

Rectified Flows:

  • Iteratively “straighten” the ODE path
  • Can achieve high quality with just 1-2 Euler steps
  • Formula: dxdt=vθ(x,t) where vθ learns straight paths

11. Practical Implementation Tips

11.1 Numerical Stability

Common Issues:

  1. Stiff ODEs: When β(t) varies rapidly

    • Solution: Use adaptive step size solvers
  2. Score explosion: Near t=0 or t=T

    • Solution: Clip score values, use logarithmic time grids
  3. Overflow in exponential terms: αt=e0tβ(s)ds

    • Solution: Work in log-space, use numerically stable formulas

Best Practices:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Numerically stable score computation
def stable_score(x_t, predicted_noise, sigma_t):
# Avoid division by very small sigma
sigma_t = torch.clamp(sigma_t, min=1e-6)
return -predicted_noise / sigma_t

# Adaptive time stepping
def get_time_schedule(N, schedule='loglinear'):
if schedule == 'loglinear':
# More steps near t=0 where dynamics are faster
t = torch.linspace(1e-5, 1.0, N)
t = torch.log(t) / torch.log(t[0]) # Logarithmic spacing
return t

11.2 Efficient Sampling Strategies

1. Time Discretization:

Strategy Steps Quality Speed
Uniform 1000 High Slow
Log-spaced 50-100 High Medium
[[DPM-Solver]] 10-20 High Fast
Consistency 1-5 Medium Very Fast

2. Caching Strategies:

  • Cache αt , σt values
  • Precompute integral terms ttατστdτ
  • Batch multiple samples together

3. Parallelization:

  • Score evaluation is batch-parallelizable
  • ODE integration is sequential (cannot parallelize time steps)
  • Use GPU for score model, CPU for ODE solver if needed

11.3 Debugging Checklist

Sampling Quality Issues:

  • [ ] Score model trained correctly? Check loss curves
  • [ ] Time encoding correct? Verify t[0,T] or t[0,1]
  • [ ] ODE solver step size appropriate? Try reducing step size
  • [ ] Boundary conditions correct? Check pT(x)N(0,I)
  • [ ] Score clipping needed? Monitor score magnitudes

Likelihood Computation Issues:

  • [ ] Trace computation stable? Use Hutchinson’s estimator for high dimensions
  • [ ] Integration method accurate? Use adaptive quadrature
  • [ ] Initial log-density correct? Verify logpT(xT)=12xT2D2log(2π)

12. Recent Advances (2023-2024)

12.1 [[Flow Matching]]

Key Insight: Instead of deriving ODE from [[Stochastic Differential Equation (SDE)|SDE]], directly learn a velocity field that matches a target flow.

Objective:

L(θ)=Et,x0,x1[vθ(xt,t)ut(xt)2]

where ut(x) is the conditional velocity field from x0 to x1 .

Advantages:

  • Simpler than [[Stochastic Differential Equation (SDE)|SDE]]-based approach
  • More flexible flow design
  • Connections to optimal transport

12.2 Consistency Models

Goal: Learn direct mapping from any xt to x0 along ODE trajectory.

fθ(xt,t)x0t[0,T]

Benefits:

  • One-step generation possible
  • Few-step generation (2-8 steps) with high quality
  • Self-consistency loss (no need for score matching)

12.3 Rectified Flows

Idea: Iteratively straighten ODE trajectories.

Algorithm:

  1. Train initial model with standard Probability Flow ODE
  2. Generate samples, create new (noise, data) pairs
  3. Retrain model to learn straighter paths
  4. Repeat 2-3 times

Result: Nearly linear trajectories, enabling 1-2 step generation.

12.4 Comparison of Fast Sampling Methods

Method Steps Training Cost Quality Flexibility
[[DPM-Solver]] 10-20 None (solver only) High High
Consistency Models 1-8 High (distillation) High Medium
Rectified Flows 1-10 Medium (retraining) High High
[[Flow Matching]] 10-50 Similar to [[Stochastic Differential Equation (SDE)|SDE]] High Very High

13.1 Probability Flow ODE vs Reverse [[Stochastic Differential Equation (SDE)|SDE]]

Aspect Probability Flow ODE Reverse [[Stochastic Differential Equation (SDE)|SDE]]
Form Deterministic Stochastic
Equation dx=vt(x)dt dx=[vt(x)g(t)2logpt]dt+g(t)dWt
Sampling ODE solver [[Stochastic Differential Equation (SDE)|SDE]] solver
Variance Zero (deterministic) Non-zero (random)
Likelihood Exact computation Intractable
Quality High (with sufficient steps) High (with corrector)

13.2 Probability Flow ODE vs Neural ODE

Aspect Probability Flow ODE Neural ODE
Origin From diffusion models From continuous-depth networks
Velocity field f(t)x12g(t)2logpt(x) Learned fθ(x,t)
Training Score matching End-to-end backprop
Purpose Generative modeling Continuous dynamics
Likelihood Exact via divergence Exact via adjoint method

13.3 Generative Model Comparison

Model Sampling Likelihood Training Stability Sample Quality
GAN Fast (1 step) Intractable Unstable High
VAE Fast (1 step) Lower bound Stable Medium
Normalizing Flow Fast (parallel) Exact Stable Medium-High
Diffusion ([[Stochastic Differential Equation (SDE)|SDE]]) Slow (100+ steps) Tractable Stable Very High
Diffusion (ODE) Medium (10-50 steps) Exact Stable Very High

  • [[Diffusion Model]]
  • [[Stochastic Differential Equation (SDE)]]
  • [[Score Function]]
  • [[Wiener Process|Wiener Process]]
  • [[Fokker-Planck Equation]]
  • [[Kolmogorov Equations]]
  • [[Langevin Dynamics]]
  • [[DPM-Solver]]
  • [[DDIM]]
  • [[Continuous Normalizing Flow]]
  • [[Neural ODE]]
  • [[Flow Matching]]
  • [[Consistency Models]]
  • [[Rectified Flows]]
  • [[Optimal Transport]]
  • [[Martingale]]

Dataview Query

1
2
3
LIST
FROM #probability_flow OR #ode OR #diffusion_model
SORT file.ctime DESC

References

  • Paper: Score-Based Generative Modeling through SDEs (Song et al., 2021)
  • Paper: Maximum Likelihood Training of Score-Based Diffusion Models (Song et al., 2021)
  • Paper: [[DPM-Solver]]: A Fast ODE Solver for [[Diffusion Model|Diffusion Probabilistic Model]] Sampling (Lu et al., 2022)
  • Paper: [[Flow Matching]] for Generative Modeling (Lipman et al., 2023)
  • Paper: Consistency Models (Song et al., 2023)
  • Paper: Building Normalizing Flows with Stochastic Interpolants (Albergo et al., 2023)
  • Paper: Rectified Flow (Liu et al., 2022)
  • Blog: What are Diffusion Models? - Lilian Weng
  • Blog: [[Flow Matching]] - AI papers summary
  • Course: CS236 Deep Generative Models (Stanford)
  • Course: STAT 260 High-Dimensional Statistics (Berkeley)