2026-06-30

DDIM (Denoising Diffusion Implicit Models)

DDIM is a faster deterministic sampling method for [[Diffusion Model|diffusion models]] that generalizes [[Diffusion Model|DDPM]] by relaxing the Markovian assumption. It enables 10-50× fewer sampling steps while maintaining comparable quality, and introduces the crucial concept of deterministic inversion between noise and data.

1. Core Concept

1.1 Motivation

[[Diffusion Model|DDPM]] problem: Requires 1000 steps for high-quality sampling.

Root cause: The reverse process must closely follow the forward process’s path, which was defined as a [[Markov Process|Markov chain]] with small Gaussian steps.

DDIM insight: The [[Diffusion Model|DDPM]] objective only depends on marginals $q (x_{t} ∣ x_{0})$ , not on the specific joint distribution $q (x_{1 : T} ∣ x_{0})$ . This means we can define a different forward process that shares the same marginals but allows faster reverse sampling.

1.2 Key Innovation

DDIM defines a non-Markovian forward process:

[[Diffusion Model|DDPM]] forward: $q (x_{t} ∣ x_{t - 1})$ (Markovian)
DDIM forward: $q (x_{t - 1} ∣ x_{t}, x_{0})$ (non-Markovian, conditions on $x_{0}$ )

Both share the same marginal distribution:

q (x_{t} ∣ x_{0}) = N (x_{t}; \sqrt{{\bar{α}}_{t}} x_{0}, (1 - {\bar{α}}_{t}) I)

[!NOTE] Core Insight
DDIM proves that the [[Diffusion Model|DDPM]] training objective is valid for a family of inference distributions, not just the Markovian one. By choosing a non-Markovian inference distribution, we can produce higher quality samples with fewer steps.

2. Mathematical Foundation

2.1 [[Diffusion Model|DDPM]] Review

[[Diffusion Model|DDPM]] forward process (Markov):

q (x_{1 : T} ∣ x_{0}) = \prod_{t = 1}^{T} q (x_{t} ∣ x_{t - 1})

where $q (x_{t} ∣ x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I)$ .

Marginal $q (x_{t} ∣ x_{0})$ :

x_{t} = \sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ, ϵ \sim N (0, I)

where ${\bar{α}}_{t} = \prod_{s = 1}^{t} α_{s}$ and $α_{t} = 1 - β_{t}$ .

2.2 DDIM Inference Distribution

DDIM defines a non-Markovian forward process:

q_{σ} (x_{1 : T} ∣ x_{0}) = q_{σ} (x_{T} ∣ x_{0}) \prod_{t = 2}^{T} q_{σ} (x_{t - 1} ∣ x_{t}, x_{0})

where:

q_{σ} (x_{T} ∣ x_{0}) = N (x_{T}; \sqrt{{\bar{α}}_{T}} x_{0}, (1 - {\bar{α}}_{T}) I)

and for $t > 1$ :

q_{σ} (x_{t - 1} ∣ x_{t}, x_{0}) = N (x_{t - 1}; \sqrt{{\bar{α}}_{t - 1}} x_{0} + \sqrt{1 - {\bar{α}}_{t - 1} - σ_{t}^{2}} \cdot \frac{x_{t} - \sqrt{{\bar{α}}_{t}} x_{0}}{\sqrt{1 - {\bar{α}}_{t}}}, σ_{t}^{2} I)

Parameter $σ_{t}$ controls stochasticity:

σ_{t} = η \sqrt{\frac{1 - {\bar{α}}_{t - 1}}{1 - {\bar{α}}_{t}}} \sqrt{1 - \frac{{\bar{α}}_{t}}{{\bar{α}}_{t - 1}}}

$η = 1$ : [[Diffusion Model|DDPM]] (fully stochastic)
$η = 0$ : DDIM (fully deterministic)

2.3 Marginally Consistent

Theorem: For any choice of $σ_{t}$ , the DDIM forward process satisfies:

q_{σ} (x_{t} ∣ x_{0}) = N (x_{t}; \sqrt{{\bar{α}}_{t}} x_{0}, (1 - {\bar{α}}_{t}) I)

This means all $σ_{t}$ choices produce the same marginals as [[Diffusion Model|DDPM]].

Consequence: A [[Diffusion Model|DDPM]]-trained model (which only depends on these marginals) can be used with any $σ_{t}$ !

3. DDIM Sampling

3.1 Generative Process

DDIM reverse process:

Given a noisy sample $x_{t}$ and prediction $x_{0}^{(t)}$ , generate $x_{t - 1}$ :

x_{t - 1} = \sqrt{{\bar{α}}_{t - 1}} \underset{predicted x_{0}}{\underset{⏟}{(\frac{x_{t} - \sqrt{1 - {\bar{α}}_{t}} ϵ_{θ} (x_{t}, t)}{\sqrt{{\bar{α}}_{t}}})}} + \underset{direction pointing to x_{t}}{\underset{⏟}{\sqrt{1 - {\bar{α}}_{t - 1} - σ_{t}^{2}} \cdot ϵ_{θ} (x_{t}, t)}} + \underset{random noise}{\underset{⏟}{σ_{t} z_{t}}}

where $z_{t} \sim N (0, I)$ .

Three components:

Predicted $x_{0}$ : Estimate of clean data from noisy $x_{t}$
Direction to $x_{t}$ : Points toward the current noisy sample
Random noise: Controlled by $σ_{t}$ (zero for deterministic case)

3.2 Deterministic DDIM ( $η = 0$ )

When $η = 0$ (no random noise):

x_{t - 1} = \sqrt{{\bar{α}}_{t - 1}} (\frac{x_{t} - \sqrt{1 - {\bar{α}}_{t}} ϵ_{θ} (x_{t}, t)}{\sqrt{{\bar{α}}_{t}}}) + \sqrt{1 - {\bar{α}}_{t - 1}} \cdot ϵ_{θ} (x_{t}, t)

Properties:

Deterministic mapping: $x_{0}$ uniquely determined by $x_{T}$
Invertible: Can compute $x_{T}$ from $x_{0}$ (DDIM inversion)
Consistent: Same noise produces same output

3.3 Accelerated Sampling

DDIM can use a subsequence of timesteps:

Full schedule: $τ = [T, T - 1, \dots, 1]$

Subsampled: $τ = [τ_{S}, τ_{S - 1}, \dots, τ_{1}]$ where $S ≪ T$

Example (T=1000, S=50):

Original: $[1000, 999, 998, \dots, 1]$ (1000 steps)
Subsampled: $[1000, 980, 960, \dots, 20, 1]$ (50 steps)

[!TIP] Practical Choice
DDIM with 50-100 steps typically achieves quality close to full [[Diffusion Model|DDPM]] (1000 steps), giving 10-20× speedup.

3.4 Sampling Pseudocode

# Deterministic DDIM Sampling
def ddim_sample(model, x_T, timesteps, eta=0.0):
    """
    model: Noise prediction network epsilon_theta(x, t)
    x_T: Initial noise ~ N(0, I)
    timesteps: Subsequence of [T, ..., 1]
    eta: 0 for DDIM, 1 for DDPM
    """
    x_t = x_T
    
    for i in range(len(timesteps)):
        t = timesteps[i]
        
        # Predict noise
        eps_theta = model(x_t, t)
        
        # Predict x_0
        x0_pred = (x_t - sqrt(1 - alpha_bar[t]) * eps_theta) / sqrt(alpha_bar[t])
        
        # Compute next timestep
        if i < len(timesteps) - 1:
            t_next = timesteps[i + 1]
        else:
            t_next = 0
        
        # Compute sigma (stochasticity)
        sigma_t = eta * sqrt((1 - alpha_bar[t_next]) / (1 - alpha_bar[t])) * \
                  sqrt(1 - alpha_bar[t] / alpha_bar[t_next])
        
        # Direction pointing to x_t
        direction = sqrt(1 - alpha_bar[t_next] - sigma_t**2) * eps_theta
        
        # Random noise (only for DDPM, eta=1)
        z = torch.randn_like(x_t) if eta > 0 else 0
        
        # Update
        x_t = sqrt(alpha_bar[t_next]) * x0_pred + direction + sigma_t * z
    
    return x_t

4. DDIM Inversion

4.1 Forward (Encoding) Process

DDIM inversion reverses the sampling process: given a real image $x_{0}$ , find the noise $x_{T}$ that would generate it.

Algorithm:

x_{t + 1} = \sqrt{{\bar{α}}_{t + 1}} (\frac{x_{t} - \sqrt{1 - {\bar{α}}_{t}} ϵ_{θ} (x_{t}, t)}{\sqrt{{\bar{α}}_{t}}}) + \sqrt{1 - {\bar{α}}_{t + 1}} \cdot ϵ_{θ} (x_{t}, t)

# DDIM Inversion (encoding real image to noise)
def ddim_inversion(model, x_0, timesteps):
    """
    Encode real image x_0 to noise x_T
    """
    x_t = x_0
    
    for i in range(len(timesteps)):
        t = timesteps[i]
        
        # Predict noise
        eps_theta = model(x_t, t)
        
        # Predict x_0
        x0_pred = (x_t - sqrt(1 - alpha_bar[t]) * eps_theta) / sqrt(alpha_bar[t])
        
        # Compute next timestep
        t_next = timesteps[i + 1] if i < len(timesteps) - 1 else T
        
        # DDIM forward step
        direction = sqrt(1 - alpha_bar[t_next]) * eps_theta
        x_t = sqrt(alpha_bar[t_next]) * x0_pred + direction
    
    return x_t  # This is x_T (the noise code)

4.2 Applications of Inversion

1. Real Image Editing:

Encode image to noise: $x_{0} \to x_{T}$
Modify or guide the reverse process
Decode back: $x_{T} \to {\hat{x}}_{0}$ (edited)

2. Semantic Interpolation:

Encode two images: $x_{0}^{(1)} \to x_{T}^{(1)}$ , $x_{0}^{(2)} \to x_{T}^{(2)}$
Interpolate in noise space: $x_{T} = (1 - λ) x_{T}^{(1)} + λ x_{T}^{(2)}$
Decode: $x_{T} \to {\hat{x}}_{0}$ (interpolated)

3. Attribute Manipulation:

Encode image
Apply semantic direction in noise space
Decode for controlled editing

5. Theoretical Analysis

5.1 Why Fewer Steps Work

[[Diffusion Model|DDPM]] problem: Each reverse step is a small Gaussian step that assumes close proximity between $x_{t}$ and $x_{t - 1}$ .

DDIM solution: The generative process directly jumps to the predicted $x_{0}$ , then mixes back with noise.

Mathematical justification:

The [[Diffusion Model|DDPM]] training loss:

L = \sum_{t = 1}^{T} E [∥ ϵ - ϵ_{θ} (x_{t}, t) ∥^{2}]

depends only on the marginal $q (x_{t} ∣ x_{0})$ , not the joint $q (x_{1 : T} ∣ x_{0})$ . Therefore, any inference distribution with matching marginals is valid.

5.2 Consistency Properties

Theorem (Consistency): For the same initial noise $x_{T}$ and model $ϵ_{θ}$ , DDIM with different numbers of steps produces samples that converge to the same output as the number of steps increases.

Practical implication:

10 steps: Approximate result, some artifacts
50 steps: Good quality, minor differences from 1000-step [[Diffusion Model|DDPM]]
100 steps: Nearly identical to [[Diffusion Model|DDPM]]

5.3 Connection to [[Probability Flow ODE]]

DDIM as ODE discretization:

As $η = 0$ and $T \to \infty$ , DDIM converges to the [[Probability Flow ODE]]:

\frac{d x}{d t} = f (t) x - \frac{1}{2} g (t)^{2} \nabla_{x} \log p_{t} (x)

DDIM vs ODE solvers:

Method	Type	Steps	Quality
DDIM-50	Discrete ODE	50	High
DDIM-100	Discrete ODE	100	Very High
[[DPM-Solver]]-2	Higher-order ODE	20	Very High
Euler ODE	1st-order ODE	100	Medium-High

[!NOTE] Historical Significance
DDIM was the first to show that diffusion models can be sampled with far fewer steps, paving the way for subsequent ODE-based samplers like [[Probability Flow ODE]] and [[DPM-Solver]].

6. Comparison with [[Diffusion Model|DDPM]]

6.1 Forward Process

Aspect	[[Diffusion Model\|DDPM]]	DDIM
Type	Markovian	Non-Markovian
Joint distribution	$q (x_{1 : T} ∣ x_{0}) = \prod_{t} q (x_{t} ∣ x_{t - 1})$	$q_{σ} (x_{1 : T} ∣ x_{0})$ (not factorized)
Marginals	$N (\sqrt{{\bar{α}}_{t}} x_{0}, (1 - {\bar{α}}_{t}) I)$	Same
Inference	$q (x_{t - 1} ∣ x_{t})$	$q_{σ} (x_{t - 1} ∣ x_{t}, x_{0})$

6.2 Reverse (Sampling) Process

Aspect	[[Diffusion Model\|DDPM]]	DDIM ( $η = 1$ )	DDIM ( $η = 0$ )
Stochasticity	Random	Random	Deterministic
Noise injection	Yes	Yes	No
Invertible	No	No	Yes
Steps	1000	10-1000	10-1000

6.3 Quality vs Speed Trade-off

Method	FID (CIFAR-10)	Steps	Time (relative)
[[Diffusion Model\|DDPM]]	3.17	1000	1.0×
DDIM	4.16	100	0.1×
DDIM	6.84	50	0.05×
DDIM	13.36	20	0.02×
DDIM	23.05	10	0.01×

[!TIP] Speed-Quality Balance
DDIM-100 provides an excellent balance: nearly the same quality as [[Diffusion Model|DDPM]]-1000 but 10× faster.

7. Stochastic DDIM ( $η > 0$ )

7.1 Continuous Stochasticity Control

General DDIM update:

x_{t - 1} = \sqrt{{\bar{α}}_{t - 1}} x_{0}^{(t)} + \sqrt{1 - {\bar{α}}_{t - 1} - σ_{t}^{2}} \cdot ϵ_{θ} (x_{t}, t) + σ_{t} z_{t}

Effects of $η$ :

$η$	Stochasticity	Diversity	Determinism	Best For
0.0	None	Fixed output	Perfect	Editing, inversion
0.2	Low	Some variation	Near-deterministic	Balanced quality
0.5	Medium	Moderate	Partial	General sampling
0.8	High	High	Low	Diverse generation
1.0	Full ([[Diffusion Model\|DDPM]])	Maximum	None	Maximum quality

7.2 When to Use Each Mode

Deterministic ( $η = 0$ ):

Image editing (need reproducibility)
DDIM inversion
Semantic interpolation
Latent space exploration

Stochastic ( $η = 1$ ):

Maximum sample diversity
Unconditional generation
When quality trumps speed

Intermediate ( $0 < η < 1$ ):

Controlled diversity
Balanced speed-quality trade-off

8. Applications

8.1 Real Image Editing

DDIM Inversion + Editing Pipeline:

Encode: Use DDIM inversion to map real image to noise
Modify: Apply text-guided editing in the denoising process
Decode: Generate edited image

Key advantage: DDIM inversion preserves image structure better than random noise.

Example: Prompt-to-Prompt, Null-text Inversion, EDICT.

8.2 Semantic Interpolation

Process:

Encode image A and B via DDIM inversion
Interpolate noise codes: $x_{T}^{λ} = (1 - λ) x_{T}^{A} + λ x_{T}^{B}$
Decode interpolated noise

Result: Smooth semantic transition between images.

8.3 Latent Space Manipulation

Finding semantic directions:

Encode many images
Find directions in noise space corresponding to attributes
Apply directional shifts for controlled editing

8.4 Accelerated Training

Progressive distillation:

Train teacher model with [[Diffusion Model|DDPM]]
Use DDIM to generate high-quality samples
Train student model with fewer steps
Repeat for further acceleration

9. Practical Implementation

9.1 Training

Key fact: DDIM uses the same training as [[Diffusion Model|DDPM]]!

# DDIM Training = DDPM Training
# No modification needed!

def ddpm_training_loss(model, x_0):
    # Sample timestep
    t = random.randint(1, T)
    
    # Sample noise
    eps = torch.randn_like(x_0)
    
    # Forward diffusion
    x_t = sqrt(alpha_bar[t]) * x_0 + sqrt(1 - alpha_bar[t]) * eps
    
    # Predict noise
    eps_theta = model(x_t, t)
    
    # MSE loss
    loss = F.mse_loss(eps_theta, eps)
    
    return loss

# Same model can be used for DDPM and DDIM sampling!

9.2 Timestep Selection

Strategies for subsampling:

Uniform: Select every $k$ -th step
- Simple but suboptimal
Quadratic: More steps near $t = 0$
1
2
3
4
def quadratic_schedule(T, S):
steps = np.linspace(0, T, S+1)**2
steps = (steps / steps[-1] * T).astype(int)
return sorted(set(steps))

Linear: Evenly spaced steps

1 2	def linear_schedule(T, S): return np.linspace(0, T, S+1).astype(int)

Recommendation: Linear schedule works well for 50+ steps; quadratic better for very few steps (< 20).

9.3 Debugging Checklist

[ ] Verify $α_{t}$ and ${\bar{α}}_{t}$ values are correct
[ ] Check $x_{0}$ prediction formula matches noise schedule
[ ] Test with $η = 0$ (deterministic) first
[ ] Compare output with [[Diffusion Model|DDPM]] baseline
[ ] Verify inversion consistency: $x_{0} \to x_{T} \to x_{0}$
[ ] Monitor numerical stability (no NaN/Inf)
[ ] Test with different step counts (10, 20, 50, 100)

10. Core Formula Cards

[!QUOTE] DDIM Marginal Distribution
$q_{σ} (x_{t} ∣ x_{0}) = N (x_{t}; \sqrt{{\bar{α}}_{t}} x_{0}, (1 - {\bar{α}}_{t}) I)$

[!QUOTE] DDIM Generative Process
$x_{t - 1} = \sqrt{{\bar{α}}_{t - 1}} (\frac{x_{t} - \sqrt{1 - {\bar{α}}_{t}} ϵ_{θ} (x_{t}, t)}{\sqrt{{\bar{α}}_{t}}}) + \sqrt{1 - {\bar{α}}_{t - 1} - σ_{t}^{2}} \cdot ϵ_{θ} (x_{t}, t) + σ_{t} z_{t}$

[!QUOTE] Deterministic DDIM ( $η = 0$ )
$x_{t - 1} = \sqrt{{\bar{α}}_{t - 1}} (\frac{x_{t} - \sqrt{1 - {\bar{α}}_{t}} ϵ_{θ} (x_{t}, t)}{\sqrt{{\bar{α}}_{t}}}) + \sqrt{1 - {\bar{α}}_{t - 1}} \cdot ϵ_{θ} (x_{t}, t)$

[!QUOTE] Predicted $x_{0}$ (One-step Estimation)
$x_{0}^{(t)} = \frac{x_{t} - \sqrt{1 - {\bar{α}}_{t}} ϵ_{θ} (x_{t}, t)}{\sqrt{{\bar{α}}_{t}}}$

[!QUOTE] DDIM Inversion (Forward)
$x_{t + 1} = \sqrt{{\bar{α}}_{t + 1}} (\frac{x_{t} - \sqrt{1 - {\bar{α}}_{t}} ϵ_{θ} (x_{t}, t)}{\sqrt{{\bar{α}}_{t}}}) + \sqrt{1 - {\bar{α}}_{t + 1}} \cdot ϵ_{θ} (x_{t}, t)$

[!QUOTE] Stochasticity Parameter
$σ_{t} = η \sqrt{\frac{1 - {\bar{α}}_{t - 1}}{1 - {\bar{α}}_{t}}} \sqrt{1 - \frac{{\bar{α}}_{t}}{{\bar{α}}_{t - 1}}}$

11. Extensions and Variants

11.1 DDIM with Classifier Guidance

Combine DDIM sampling with classifier guidance:

{\tilde{ϵ}}_{θ} (x_{t}, t) = ϵ_{θ} (x_{t}, t) - w \sqrt{1 - {\bar{α}}_{t}} \nabla_{x_{t}} \log p_{ϕ} (y ∣ x_{t})

where $w$ is the guidance scale.

11.2 DDIM with Classifier-Free Guidance

{\tilde{ϵ}}_{θ} (x_{t}, t, c) = ϵ_{θ} (x_{t}, t, \emptyset) + w (ϵ_{θ} (x_{t}, t, c) - ϵ_{θ} (x_{t}, t, \emptyset))

11.3 Spherical DDIM

Motivation: $x_{T} \sim N (0, I)$ has norm $\approx \sqrt{d}$ .

Fix: Normalize to unit sphere for better interpolation.

x_{T}^{sphere} = \sqrt{d} \cdot \frac{x_{T}}{∥ x_{T} ∥}

11.4 Comparison with Other Fast Samplers

Method	Type	Steps	Inversion	Training Required
DDIM	Discrete ODE	10-100	Yes	No (uses [[Diffusion Model\|DDPM]] model)
[[Diffusion Model\|DDPM]] (few-step)	[[Stochastic Differential Equation (SDE)\|SDE]]	50-100	No	No
[[DPM-Solver]]	High-order ODE	10-20	Possible	No
[[Flow Matching]]	ODE	10-100	Yes	Separate training
Consistency Models	Direct mapping	1-8	No	Separate training

[[Diffusion Model]]
[[Probability Flow ODE]]
[[Stochastic Differential Equation (SDE)]]
[[Score Function]]
[[DPM-Solver]]
[[Langevin Dynamics]]
[[Flow Matching]]
[[Consistency Models]]
[[Wiener Process|Wiener Process]]
[[Markov Process]]
[[Neural ODE]]
[[Prompt-to-Prompt]]

Dataview Query

1
2
3

LIST
FROM #ddim OR #diffusion_model OR #fast_sampling
SORT file.ctime DESC

References

Paper: Denoising Diffusion Implicit Models (Song et al., 2021)
Paper: Denoising Diffusion Probabilistic Models (Ho et al., 2020)
Paper: Score-Based Generative Modeling through SDEs (Song et al., 2021)
Blog: What are Diffusion Models? - Lilian Weng
Blog: DDIM Explained - Papers with Code
GitHub: https://github.com/ermongroup/ddim
Course: CS236 Deep Generative Models (Stanford)