2026-06-30

Probability Flow ODE

Probability Flow ODE is a deterministic ordinary differential equation that shares the same marginal probability density $p_{t} (x)$ as the forward [[Stochastic Differential Equation (SDE)|SDE]] in diffusion models. It provides an alternative perspective on the diffusion process by eliminating the stochastic component while preserving the distribution evolution.

1. Core Concept

In diffusion models, the forward process is typically described by a [[Stochastic Differential Equation (SDE)|Stochastic Differential Equation]]:

d x = f (t) x d t + g (t) d W_{t}

The Probability Flow ODE is a deterministic counterpart that has the same marginal distribution $p_{t} (x)$ at every time $t$ :

d x = [f (t) x - \frac{1}{2} g (t)^{2} \nabla_{x} \log p_{t} (x)] d t

[!NOTE] Key Insight
While the [[Stochastic Differential Equation (SDE)|SDE]] introduces randomness through $W_{t}$ , the ODE achieves the same probability distribution evolution through a deterministic trajectory guided by the [[Score Function]] $\nabla_{x} \log p_{t} (x)$ .

2. Derivation from Fokker-Planck Equation

2.1 Fokker-Planck Equation (Forward [[Stochastic Differential Equation (SDE)|SDE]])

The evolution of probability density $p_{t} (x)$ under the forward [[Stochastic Differential Equation (SDE)|SDE]] follows the Fokker-Planck equation:

\frac{\partial p_{t} (x)}{\partial t} = - \nabla_{x} \cdot [f (t) x p_{t} (x)] + \frac{1}{2} g (t)^{2} \nabla_{x}^{2} p_{t} (x)

2.2 Probability Flow Velocity Field

We can rewrite the Fokker-Planck equation as a continuity equation:

\frac{\partial p_{t} (x)}{\partial t} = - \nabla_{x} \cdot [v_{t} (x) p_{t} (x)]

where the velocity field $v_{t} (x)$ is:

v_{t} (x) = f (t) x - \frac{1}{2} g (t)^{2} \nabla_{x} \log p_{t} (x)

2.3 ODE Formulation

The deterministic ODE that follows this velocity field is:

\frac{d x}{d t} = f (t) x - \frac{1}{2} g (t)^{2} \nabla_{x} \log p_{t} (x)

This ODE preserves the same marginal distributions $p_{t} (x)$ as the original [[Stochastic Differential Equation (SDE)|SDE]].

3. Comparison: [[Stochastic Differential Equation (SDE)|SDE]] vs ODE

Aspect	Forward [[Stochastic Differential Equation (SDE)\|SDE]]	Probability Flow ODE
Form	Stochastic	Deterministic
Noise term	$g (t) d W_{t}$	None
Trajectory	Random paths	Deterministic paths
Marginal distribution	$p_{t} (x)$	Same $p_{t} (x)$
Sampling	Requires random noise	Deterministic integration
Reversibility	Reverse [[Stochastic Differential Equation (SDE)\|SDE]] needed	Simply integrate backward

[!TIP] Practical Advantage
The ODE formulation allows using advanced ODE solvers (like Runge-Kutta methods) for more efficient and accurate sampling compared to Euler-Maruyama discretization of SDEs.

4. Reverse-Time Probability Flow ODE

4.1 Reverse ODE Formulation

To generate samples, we integrate the ODE backward from $t = T$ to $t = 0$ :

\frac{d x}{d t} = - f (t) x + \frac{1}{2} g (t)^{2} \nabla_{x} \log p_{t} (x)

where the [[Score Function]] $\nabla_{x} \log p_{t} (x)$ is approximated by a neural network $s_{θ} (x, t)$ .

4.2 Variance-Preserving [[Stochastic Differential Equation (SDE)|SDE]] Case

For the variance-preserving [[Stochastic Differential Equation (SDE)|SDE]] where $f (t) = - \frac{1}{2} β (t)$ and $g (t) = \sqrt{β (t)}$ :

\frac{d x}{d t} = - \frac{1}{2} β (t) x - \frac{1}{2} β (t) \nabla_{x} \log p_{t} (x)

4.3 Variance-Exploding [[Stochastic Differential Equation (SDE)|SDE]] Case

For the variance-exploding [[Stochastic Differential Equation (SDE)|SDE]] where $f (t) = 0$ and $g (t) = \sqrt{\frac{d σ^{2} (t)}{d t}}$ :

\frac{d x}{d t} = - \frac{1}{2} \frac{d σ^{2} (t)}{d t} \nabla_{x} \log p_{t} (x)

5. Relationship to Score-Based Models

5.1 [[Score Function]] Connection

The Probability Flow ODE explicitly reveals the role of the [[Score Function|score function]] $\nabla_{x} \log p_{t} (x)$ :

The [[Score Function]] points in the direction of highest probability increase
The ODE trajectory follows this direction, scaled by $\frac{1}{2} g (t)^{2}$

5.2 Connection to Langevin Dynamics

Discretizing the reverse ODE is closely related to [[Langevin Dynamics|Langevin Dynamics]]:

x_{t - Δ t} = x_{t} + \frac{1}{2} g (t)^{2} \nabla_{x} \log p_{t} (x) Δ t + noise term

The ODE removes the noise term, making the process deterministic.

6. Sampling Algorithms

6.1 Basic ODE Solver

# Pseudocode: Reverse-time Probability Flow ODE sampling
x_T = sample_from_prior()  # Usually N(0, I)
timesteps = linspace(T, 0, N)

for t, t_next in zip(timesteps[:-1], timesteps[1:]):
    # Estimate score function
    score = score_model(x_t, t)
    
    # Compute velocity field
    dt = t_next - t
    velocity = f(t) * x_t - 0.5 * g(t)**2 * score
    
    # Euler integration (can use higher-order methods)
    x_next = x_t + velocity * dt
    
    x_t = x_next

return x_0

6.2 Advanced ODE Solvers

Solver	Order	Characteristics
Euler	1st	Simple, but requires small steps
RK2 (Midpoint)	2nd	Better accuracy, moderate cost
RK4	4th	High accuracy, commonly used
DOPRI5	Adaptive	Automatic step size control
[[DPM-Solver]]	Specialized	Designed for diffusion models

[!NOTE] [[DPM-Solver]]
[[DPM-Solver]] exploits the semi-linear structure of the Probability Flow ODE to achieve high-order convergence with fewer function evaluations, significantly accelerating sampling.

6.3 [[DPM-Solver]] Details

Semi-linear Structure:

The Probability Flow ODE can be rewritten as:

\frac{d x}{d t} = f (t) x + g (t) s_{θ} (x, t)

where $s_{θ} (x, t) \approx - \frac{1}{2} g (t) \nabla_{x} \log p_{t} (x)$ .

Key Insight: The linear part $f (t) x$ can be solved analytically, while only the nonlinear part ([[Score Function]]) requires numerical integration.

[[DPM-Solver]]-2:

x_{t_{s}} = \frac{α_{t_{s}}}{α_{t}} x_{t} - σ_{t_{s}} \int_{t}^{t_{s}} \frac{α_{τ}}{σ_{τ}} d τ \cdot s_{θ} (x_{t}, t)

Advantages:

10-20 steps for high-quality samples (vs 1000+ for [[Diffusion Model|DDPM]])
Rigorous mathematical foundation
Compatible with all [[Stochastic Differential Equation (SDE)|SDE]]/ODE-based diffusion models

[!TIP] Practical Usage
For most applications, [[DPM-Solver]]-2 or [[DPM-Solver]]-3 with 10-20 steps provides excellent quality-speed tradeoff.

7. Theoretical Properties

7.1 Marginal Distribution Preservation

Theorem: The Probability Flow ODE and the forward [[Stochastic Differential Equation (SDE)|SDE]] have identical marginal distributions $p_{t} (x)$ for all $t \in [0, T]$ .

Proof Sketch: Both satisfy the same Fokker-Planck equation, and with the same initial condition, they must have the same solution.

7.2 Deterministic Mapping

The ODE defines a deterministic bijection between $x_{0}$ (data) and $x_{T}$ (noise):

x_{T} = Φ_{T} (x_{0}), x_{0} = Φ_{T}^{- 1} (x_{T})

This property enables:

Exact likelihood computation via change of variables
Latent space interpolation with meaningful trajectories
Inversion of real data to latent space

7.3 Trajectory Regularity

ODE trajectories are smoother than [[Stochastic Differential Equation (SDE)|SDE]] paths:

[[Stochastic Differential Equation (SDE)|SDE]]: Rough, non-differentiable paths (due to $W_{t}$ )
ODE: Smooth, differentiable paths (enabling gradient-based optimization)

8. Applications

8.1 Likelihood Computation

Using the instantaneous change of variables formula:

\frac{d}{d t} \log p_{t} (x (t)) = - tr (\frac{d v_{t} (x)}{d x})

Integrating from $t = 0$ to $t = T$ :

\log p_{0} (x_{0}) = \log p_{T} (x_{T}) + \int_{0}^{T} tr (\frac{d v_{t} (x)}{d x}) d t

8.2 Latent Space Manipulation

Since the ODE is deterministic:

Interpolation: Linearly interpolate in latent space, then decode
Editing: Modify latent code and integrate back
Attribute manipulation: Navigate in semantic directions

8.3 Accelerated Sampling

Few-step generation: Use adaptive ODE solvers with large steps
Distillation: Train student model to match ODE trajectories
Consistency models: Learn direct mapping along ODE paths

9. Core Formula Cards

[!QUOTE] Forward [[Stochastic Differential Equation (SDE)|SDE]]
$d x = f (t) x d t + g (t) d W_{t}$

[!QUOTE] Probability Flow ODE
$d x = [f (t) x - \frac{1}{2} g (t)^{2} \nabla_{x} \log p_{t} (x)] d t$

[!QUOTE] Reverse-Time ODE
$\frac{d x}{d t} = - f (t) x + \frac{1}{2} g (t)^{2} \nabla_{x} \log p_{t} (x)$

[!QUOTE] Velocity Field
$v_{t} (x) = f (t) x - \frac{1}{2} g (t)^{2} \nabla_{x} \log p_{t} (x)$

[!QUOTE] Likelihood Computation
$\log p_{0} (x_{0}) = \log p_{T} (x_{T}) + \int_{0}^{T} tr (\frac{d v_{t} (x)}{d x}) d t$

[!QUOTE] Variance-Preserving [[Stochastic Differential Equation (SDE)|SDE]] (VP-[[Stochastic Differential Equation (SDE)|SDE]])
$f (t) = - \frac{1}{2} β (t), g (t) = \sqrt{β (t)}$

[!QUOTE] Variance-Exploding [[Stochastic Differential Equation (SDE)|SDE]] (VE-[[Stochastic Differential Equation (SDE)|SDE]])
$f (t) = 0, g (t) = \sqrt{\frac{d σ^{2} (t)}{d t}}$

10. Theoretical Analysis

10.1 Connection to Continuous Normalizing Flows

The Probability Flow ODE defines a continuous normalizing flow (CNF):

Forward map: $x_{0} \to x_{T}$ (data to noise)
Inverse map: $x_{T} \to x_{0}$ (noise to data)
Log-determinant: Tractable via instantaneous change of variables

Change of Variables Formula:

\log p_{0} (x_{0}) = \log p_{T} (x_{T}) + \int_{0}^{T} \nabla_{x} \cdot v_{t} (x (t)) d t

where $\nabla_{x} \cdot v_{t} (x)$ is the divergence of the velocity field.

10.2 [[Score Function]] Approximation

The true [[Score Function]] $\nabla_{x} \log p_{t} (x)$ is intractable. We approximate it using:

1. Denoising Score Matching:

L (θ) = E_{t, x_{0}, x_{t}} [∥ s_{θ} (x_{t}, t) - \nabla_{x_{t}} \log p (x_{t} ∣ x_{0}) ∥^{2}]

2. Explicit Score for Gaussian Perturbation:

For $x_{t} = α_{t} x_{0} + σ_{t} ϵ$ where $ϵ \sim N (0, I)$ :

\nabla_{x_{t}} \log p (x_{t} ∣ x_{0}) = - \frac{ϵ}{σ_{t}}

3. Network Parameterization:

Score form: $s_{θ} (x_{t}, t)$ directly predicts score
Noise form: $ϵ_{θ} (x_{t}, t)$ predicts noise (equivalent)
Velocity form: $v_{θ} (x_{t}, t)$ predicts velocity field

10.3 Optimal Transport Perspective

The Probability Flow ODE can be viewed as finding an optimal transport path between data and noise distributions:

Wasserstein-2 distance: Minimizes transport cost
Benamou-Brenier formula: Connects OT with fluid dynamics
Straight flows: Recent work aims to make ODE trajectories more linear

10.4 Rectified Flows

Key Idea: Learn a straighter ODE trajectory for faster sampling.

Standard Probability Flow ODE:

Curved trajectories
Requires many ODE solver steps

Rectified Flows:

Iteratively “straighten” the ODE path
Can achieve high quality with just 1-2 Euler steps
Formula: $\frac{d x}{d t} = v_{θ} (x, t)$ where $v_{θ}$ learns straight paths

11. Practical Implementation Tips

11.1 Numerical Stability

Common Issues:

Stiff ODEs: When $β (t)$ varies rapidly
- Solution: Use adaptive step size solvers
Score explosion: Near $t = 0$ or $t = T$
- Solution: Clip score values, use logarithmic time grids
Overflow in exponential terms: $α_{t} = e^{- \int_{0}^{t} β (s) d s}$
- Solution: Work in log-space, use numerically stable formulas

Best Practices:

# Numerically stable score computation
def stable_score(x_t, predicted_noise, sigma_t):
    # Avoid division by very small sigma
    sigma_t = torch.clamp(sigma_t, min=1e-6)
    return -predicted_noise / sigma_t

# Adaptive time stepping
def get_time_schedule(N, schedule='loglinear'):
    if schedule == 'loglinear':
        # More steps near t=0 where dynamics are faster
        t = torch.linspace(1e-5, 1.0, N)
        t = torch.log(t) / torch.log(t[0])  # Logarithmic spacing
    return t

11.2 Efficient Sampling Strategies

1. Time Discretization:

Strategy	Steps	Quality	Speed
Uniform	1000	High	Slow
Log-spaced	50-100	High	Medium
[[DPM-Solver]]	10-20	High	Fast
Consistency	1-5	Medium	Very Fast

2. Caching Strategies:

Cache $α_{t}$ , $σ_{t}$ values
Precompute integral terms $\int_{t}^{t^{'}} \frac{α_{τ}}{σ_{τ}} d τ$
Batch multiple samples together

3. Parallelization:

Score evaluation is batch-parallelizable
ODE integration is sequential (cannot parallelize time steps)
Use GPU for score model, CPU for ODE solver if needed

11.3 Debugging Checklist

Sampling Quality Issues:

[ ] Score model trained correctly? Check loss curves
[ ] Time encoding correct? Verify $t \in [0, T]$ or $t \in [0, 1]$
[ ] ODE solver step size appropriate? Try reducing step size
[ ] Boundary conditions correct? Check $p_{T} (x) \approx N (0, I)$
[ ] Score clipping needed? Monitor score magnitudes

Likelihood Computation Issues:

[ ] Trace computation stable? Use Hutchinson’s estimator for high dimensions
[ ] Integration method accurate? Use adaptive quadrature
[ ] Initial log-density correct? Verify $\log p_{T} (x_{T}) = - \frac{1}{2} ∥ x_{T} ∥^{2} - \frac{D}{2} \log (2 π)$

12. Recent Advances (2023-2024)

12.1 [[Flow Matching]]

Key Insight: Instead of deriving ODE from [[Stochastic Differential Equation (SDE)|SDE]], directly learn a velocity field that matches a target flow.

Objective:

L (θ) = E_{t, x_{0}, x_{1}} [∥ v_{θ} (x_{t}, t) - u_{t} (x_{t}) ∥^{2}]

where $u_{t} (x)$ is the conditional velocity field from $x_{0}$ to $x_{1}$ .