DDIM (Denoising Diffusion Implicit Models)

DDIM is a faster deterministic sampling method for [[Diffusion Model|diffusion models]] that generalizes [[Diffusion Model|DDPM]] by relaxing the Markovian assumption. It enables 10-50× fewer sampling steps while maintaining comparable quality, and introduces the crucial concept of deterministic inversion between noise and data.


1. Core Concept

1.1 Motivation

[[Diffusion Model|DDPM]] problem: Requires 1000 steps for high-quality sampling.

Root cause: The reverse process must closely follow the forward process’s path, which was defined as a [[Markov Process|Markov chain]] with small Gaussian steps.

DDIM insight: The [[Diffusion Model|DDPM]] objective only depends on marginals q(xtx0) , not on the specific joint distribution q(x1:Tx0) . This means we can define a different forward process that shares the same marginals but allows faster reverse sampling.

1.2 Key Innovation

DDIM defines a non-Markovian forward process:

  • [[Diffusion Model|DDPM]] forward: q(xtxt1) (Markovian)
  • DDIM forward: q(xt1xt,x0) (non-Markovian, conditions on x0 )

Both share the same marginal distribution:

q(xtx0)=N(xt;α¯tx0,(1α¯t)I)

[!NOTE] Core Insight
DDIM proves that the [[Diffusion Model|DDPM]] training objective is valid for a family of inference distributions, not just the Markovian one. By choosing a non-Markovian inference distribution, we can produce higher quality samples with fewer steps.


2. Mathematical Foundation

2.1 [[Diffusion Model|DDPM]] Review

[[Diffusion Model|DDPM]] forward process (Markov):

q(x1:Tx0)=t=1Tq(xtxt1)

where q(xtxt1)=N(xt;1βtxt1,βtI) .

Marginal q(xtx0) :

xt=α¯tx0+1α¯tϵ,ϵN(0,I)

where α¯t=s=1tαs and αt=1βt .

2.2 DDIM Inference Distribution

DDIM defines a non-Markovian forward process:

qσ(x1:Tx0)=qσ(xTx0)t=2Tqσ(xt1xt,x0)

where:

qσ(xTx0)=N(xT;α¯Tx0,(1α¯T)I)

and for t>1 :

qσ(xt1xt,x0)=N(xt1;α¯t1x0+1α¯t1σt2xtα¯tx01α¯t,σt2I)

Parameter σt controls stochasticity:

σt=η1α¯t11α¯t1α¯tα¯t1
  • η=1 : [[Diffusion Model|DDPM]] (fully stochastic)
  • η=0 : DDIM (fully deterministic)

2.3 Marginally Consistent

Theorem: For any choice of σt , the DDIM forward process satisfies:

qσ(xtx0)=N(xt;α¯tx0,(1α¯t)I)

This means all σt choices produce the same marginals as [[Diffusion Model|DDPM]].

Consequence: A [[Diffusion Model|DDPM]]-trained model (which only depends on these marginals) can be used with any σt !


3. DDIM Sampling

3.1 Generative Process

DDIM reverse process:

Given a noisy sample xt and prediction x0(t) , generate xt1 :

xt1=α¯t1(xt1α¯tϵθ(xt,t)α¯t)predicted x0+1α¯t1σt2ϵθ(xt,t)direction pointing to xt+σtztrandom noise

where ztN(0,I) .

Three components:

  1. Predicted x0 : Estimate of clean data from noisy xt
  2. Direction to xt : Points toward the current noisy sample
  3. Random noise: Controlled by σt (zero for deterministic case)

3.2 Deterministic DDIM ( η=0 )

When η=0 (no random noise):

xt1=α¯t1(xt1α¯tϵθ(xt,t)α¯t)+1α¯t1ϵθ(xt,t)

Properties:

  • Deterministic mapping: x0 uniquely determined by xT
  • Invertible: Can compute xT from x0 (DDIM inversion)
  • Consistent: Same noise produces same output

3.3 Accelerated Sampling

DDIM can use a subsequence of timesteps:

Full schedule: τ=[T,T1,,1]

Subsampled: τ=[τS,τS1,,τ1] where ST

Example (T=1000, S=50):

  • Original: [1000,999,998,,1] (1000 steps)
  • Subsampled: [1000,980,960,,20,1] (50 steps)

[!TIP] Practical Choice
DDIM with 50-100 steps typically achieves quality close to full [[Diffusion Model|DDPM]] (1000 steps), giving 10-20× speedup.

3.4 Sampling Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Deterministic DDIM Sampling
def ddim_sample(model, x_T, timesteps, eta=0.0):
"""
model: Noise prediction network epsilon_theta(x, t)
x_T: Initial noise ~ N(0, I)
timesteps: Subsequence of [T, ..., 1]
eta: 0 for DDIM, 1 for DDPM
"""
x_t = x_T

for i in range(len(timesteps)):
t = timesteps[i]

# Predict noise
eps_theta = model(x_t, t)

# Predict x_0
x0_pred = (x_t - sqrt(1 - alpha_bar[t]) * eps_theta) / sqrt(alpha_bar[t])

# Compute next timestep
if i < len(timesteps) - 1:
t_next = timesteps[i + 1]
else:
t_next = 0

# Compute sigma (stochasticity)
sigma_t = eta * sqrt((1 - alpha_bar[t_next]) / (1 - alpha_bar[t])) * \
sqrt(1 - alpha_bar[t] / alpha_bar[t_next])

# Direction pointing to x_t
direction = sqrt(1 - alpha_bar[t_next] - sigma_t**2) * eps_theta

# Random noise (only for DDPM, eta=1)
z = torch.randn_like(x_t) if eta > 0 else 0

# Update
x_t = sqrt(alpha_bar[t_next]) * x0_pred + direction + sigma_t * z

return x_t

4. DDIM Inversion

4.1 Forward (Encoding) Process

DDIM inversion reverses the sampling process: given a real image x0 , find the noise xT that would generate it.

Algorithm:

xt+1=α¯t+1(xt1α¯tϵθ(xt,t)α¯t)+1α¯t+1ϵθ(xt,t)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# DDIM Inversion (encoding real image to noise)
def ddim_inversion(model, x_0, timesteps):
"""
Encode real image x_0 to noise x_T
"""
x_t = x_0

for i in range(len(timesteps)):
t = timesteps[i]

# Predict noise
eps_theta = model(x_t, t)

# Predict x_0
x0_pred = (x_t - sqrt(1 - alpha_bar[t]) * eps_theta) / sqrt(alpha_bar[t])

# Compute next timestep
t_next = timesteps[i + 1] if i < len(timesteps) - 1 else T

# DDIM forward step
direction = sqrt(1 - alpha_bar[t_next]) * eps_theta
x_t = sqrt(alpha_bar[t_next]) * x0_pred + direction

return x_t # This is x_T (the noise code)

4.2 Applications of Inversion

1. Real Image Editing:

  • Encode image to noise: x0xT
  • Modify or guide the reverse process
  • Decode back: xTx^0 (edited)

2. Semantic Interpolation:

  • Encode two images: x0(1)xT(1) , x0(2)xT(2)
  • Interpolate in noise space: xT=(1λ)xT(1)+λxT(2)
  • Decode: xTx^0 (interpolated)

3. Attribute Manipulation:

  • Encode image
  • Apply semantic direction in noise space
  • Decode for controlled editing

5. Theoretical Analysis

5.1 Why Fewer Steps Work

[[Diffusion Model|DDPM]] problem: Each reverse step is a small Gaussian step that assumes close proximity between xt and xt1 .

DDIM solution: The generative process directly jumps to the predicted x0 , then mixes back with noise.

Mathematical justification:

The [[Diffusion Model|DDPM]] training loss:

L=t=1TE[ϵϵθ(xt,t)2]

depends only on the marginal q(xtx0) , not the joint q(x1:Tx0) . Therefore, any inference distribution with matching marginals is valid.

5.2 Consistency Properties

Theorem (Consistency): For the same initial noise xT and model ϵθ , DDIM with different numbers of steps produces samples that converge to the same output as the number of steps increases.

Practical implication:

  • 10 steps: Approximate result, some artifacts
  • 50 steps: Good quality, minor differences from 1000-step [[Diffusion Model|DDPM]]
  • 100 steps: Nearly identical to [[Diffusion Model|DDPM]]

5.3 Connection to [[Probability Flow ODE]]

DDIM as ODE discretization:

As η=0 and T , DDIM converges to the [[Probability Flow ODE]]:

dxdt=f(t)x12g(t)2xlogpt(x)

DDIM vs ODE solvers:

Method Type Steps Quality
DDIM-50 Discrete ODE 50 High
DDIM-100 Discrete ODE 100 Very High
[[DPM-Solver]]-2 Higher-order ODE 20 Very High
Euler ODE 1st-order ODE 100 Medium-High

[!NOTE] Historical Significance
DDIM was the first to show that diffusion models can be sampled with far fewer steps, paving the way for subsequent ODE-based samplers like [[Probability Flow ODE]] and [[DPM-Solver]].


6. Comparison with [[Diffusion Model|DDPM]]

6.1 Forward Process

Aspect [[Diffusion Model|DDPM]] DDIM
Type Markovian Non-Markovian
Joint distribution q(x1:Tx0)=tq(xtxt1) qσ(x1:Tx0) (not factorized)
Marginals N(α¯tx0,(1α¯t)I) Same
Inference q(xt1xt) qσ(xt1xt,x0)

6.2 Reverse (Sampling) Process

Aspect [[Diffusion Model|DDPM]] DDIM ( η=1 ) DDIM ( η=0 )
Stochasticity Random Random Deterministic
Noise injection Yes Yes No
Invertible No No Yes
Steps 1000 10-1000 10-1000

6.3 Quality vs Speed Trade-off

Method FID (CIFAR-10) Steps Time (relative)
[[Diffusion Model|DDPM]] 3.17 1000 1.0×
DDIM 4.16 100 0.1×
DDIM 6.84 50 0.05×
DDIM 13.36 20 0.02×
DDIM 23.05 10 0.01×

[!TIP] Speed-Quality Balance
DDIM-100 provides an excellent balance: nearly the same quality as [[Diffusion Model|DDPM]]-1000 but 10× faster.


7. Stochastic DDIM ( η>0 )

7.1 Continuous Stochasticity Control

General DDIM update:

xt1=α¯t1x0(t)+1α¯t1σt2ϵθ(xt,t)+σtzt

Effects of η :

η Stochasticity Diversity Determinism Best For
0.0 None Fixed output Perfect Editing, inversion
0.2 Low Some variation Near-deterministic Balanced quality
0.5 Medium Moderate Partial General sampling
0.8 High High Low Diverse generation
1.0 Full ([[Diffusion Model|DDPM]]) Maximum None Maximum quality

7.2 When to Use Each Mode

Deterministic ( η=0 ):

  • Image editing (need reproducibility)
  • DDIM inversion
  • Semantic interpolation
  • Latent space exploration

Stochastic ( η=1 ):

  • Maximum sample diversity
  • Unconditional generation
  • When quality trumps speed

Intermediate ( 0<η<1 ):

  • Controlled diversity
  • Balanced speed-quality trade-off

8. Applications

8.1 Real Image Editing

DDIM Inversion + Editing Pipeline:

  1. Encode: Use DDIM inversion to map real image to noise
  2. Modify: Apply text-guided editing in the denoising process
  3. Decode: Generate edited image

Key advantage: DDIM inversion preserves image structure better than random noise.

Example: Prompt-to-Prompt, Null-text Inversion, EDICT.

8.2 Semantic Interpolation

Process:

  1. Encode image A and B via DDIM inversion
  2. Interpolate noise codes: xTλ=(1λ)xTA+λxTB
  3. Decode interpolated noise

Result: Smooth semantic transition between images.

8.3 Latent Space Manipulation

Finding semantic directions:

  1. Encode many images
  2. Find directions in noise space corresponding to attributes
  3. Apply directional shifts for controlled editing

8.4 Accelerated Training

Progressive distillation:

  1. Train teacher model with [[Diffusion Model|DDPM]]
  2. Use DDIM to generate high-quality samples
  3. Train student model with fewer steps
  4. Repeat for further acceleration

9. Practical Implementation

9.1 Training

Key fact: DDIM uses the same training as [[Diffusion Model|DDPM]]!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# DDIM Training = DDPM Training
# No modification needed!

def ddpm_training_loss(model, x_0):
# Sample timestep
t = random.randint(1, T)

# Sample noise
eps = torch.randn_like(x_0)

# Forward diffusion
x_t = sqrt(alpha_bar[t]) * x_0 + sqrt(1 - alpha_bar[t]) * eps

# Predict noise
eps_theta = model(x_t, t)

# MSE loss
loss = F.mse_loss(eps_theta, eps)

return loss

# Same model can be used for DDPM and DDIM sampling!

9.2 Timestep Selection

Strategies for subsampling:

  1. Uniform: Select every k -th step

    • Simple but suboptimal
  2. Quadratic: More steps near t=0

    1
    2
    3
    4
    def quadratic_schedule(T, S):
    steps = np.linspace(0, T, S+1)**2
    steps = (steps / steps[-1] * T).astype(int)
    return sorted(set(steps))
  3. Linear: Evenly spaced steps

    1
    2
    def linear_schedule(T, S):
    return np.linspace(0, T, S+1).astype(int)

Recommendation: Linear schedule works well for 50+ steps; quadratic better for very few steps (< 20).

9.3 Debugging Checklist

  • [ ] Verify αt and α¯t values are correct
  • [ ] Check x0 prediction formula matches noise schedule
  • [ ] Test with η=0 (deterministic) first
  • [ ] Compare output with [[Diffusion Model|DDPM]] baseline
  • [ ] Verify inversion consistency: x0xTx0
  • [ ] Monitor numerical stability (no NaN/Inf)
  • [ ] Test with different step counts (10, 20, 50, 100)

10. Core Formula Cards

[!QUOTE] DDIM Marginal Distribution

qσ(xtx0)=N(xt;α¯tx0,(1α¯t)I)

[!QUOTE] DDIM Generative Process

xt1=α¯t1(xt1α¯tϵθ(xt,t)α¯t)+1α¯t1σt2ϵθ(xt,t)+σtzt

[!QUOTE] Deterministic DDIM ( η=0 )

xt1=α¯t1(xt1α¯tϵθ(xt,t)α¯t)+1α¯t1ϵθ(xt,t)

[!QUOTE] Predicted x0 (One-step Estimation)

x0(t)=xt1α¯tϵθ(xt,t)α¯t

[!QUOTE] DDIM Inversion (Forward)

xt+1=α¯t+1(xt1α¯tϵθ(xt,t)α¯t)+1α¯t+1ϵθ(xt,t)

[!QUOTE] Stochasticity Parameter

σt=η1α¯t11α¯t1α¯tα¯t1

11. Extensions and Variants

11.1 DDIM with Classifier Guidance

Combine DDIM sampling with classifier guidance:

ϵ~θ(xt,t)=ϵθ(xt,t)w1α¯txtlogpϕ(yxt)

where w is the guidance scale.

11.2 DDIM with Classifier-Free Guidance

ϵ~θ(xt,t,c)=ϵθ(xt,t,)+w(ϵθ(xt,t,c)ϵθ(xt,t,))

11.3 Spherical DDIM

Motivation: xTN(0,I) has norm d .

Fix: Normalize to unit sphere for better interpolation.

xTsphere=dxTxT

11.4 Comparison with Other Fast Samplers

Method Type Steps Inversion Training Required
DDIM Discrete ODE 10-100 Yes No (uses [[Diffusion Model|DDPM]] model)
[[Diffusion Model|DDPM]] (few-step) [[Stochastic Differential Equation (SDE)|SDE]] 50-100 No No
[[DPM-Solver]] High-order ODE 10-20 Possible No
[[Flow Matching]] ODE 10-100 Yes Separate training
Consistency Models Direct mapping 1-8 No Separate training

  • [[Diffusion Model]]
  • [[Probability Flow ODE]]
  • [[Stochastic Differential Equation (SDE)]]
  • [[Score Function]]
  • [[DPM-Solver]]
  • [[Langevin Dynamics]]
  • [[Flow Matching]]
  • [[Consistency Models]]
  • [[Wiener Process|Wiener Process]]
  • [[Markov Process]]
  • [[Neural ODE]]
  • [[Prompt-to-Prompt]]

Dataview Query

1
2
3
LIST
FROM #ddim OR #diffusion_model OR #fast_sampling
SORT file.ctime DESC

References

  • Paper: Denoising Diffusion Implicit Models (Song et al., 2021)
  • Paper: Denoising Diffusion Probabilistic Models (Ho et al., 2020)
  • Paper: Score-Based Generative Modeling through SDEs (Song et al., 2021)
  • Blog: What are Diffusion Models? - Lilian Weng
  • Blog: DDIM Explained - Papers with Code
  • GitHub: https://github.com/ermongroup/ddim
  • Course: CS236 Deep Generative Models (Stanford)