2026-06-30

Kolmogorov Equations

The Kolmogorov equations are a family of fundamental equations that govern the time evolution of transition probabilities in [[Markov Process|Markov processes]]. They form the mathematical backbone connecting discrete-state jump processes, continuous-state diffusions, and the PDE descriptions of stochastic dynamics — including the [[Fokker-Planck Equation|Fokker-Planck equation]] and the backward equation used in option pricing and hitting-time problems.

1. Core Concept

1.1 The Kolmogorov Triplet

Kolmogorov’s legacy in stochastic processes crystallizes into three interlocking equations:

Kolmogorov Equations — Three Pillars
═══════════════════════════════════════════════════════

  Chapman-Kolmogorov
  (Semigroup Property)
  P(s+t) = P(s) P(t)
        │
        ├── Differentiate w.r.t. forward time t
        │         ↓
        │   Kolmogorov Forward Equation
        │   ∂_t p = L* p
        │   (Fokker-Planck / Master Equation)
        │   "Given where I started, where will I be?"
        │
        └── Differentiate w.r.t. backward time s
                  ↓
            Kolmogorov Backward Equation
            ∂_t u = L u
            "Given where I'll end, what's my expected payoff?"
═══════════════════════════════════════════════════════

Equation	Domain	What It Describes	Key Application
Chapman-Kolmogorov	Discrete + Continuous time	Semigroup property of transitions	Foundation of all Markov models
Forward (Fokker-Planck)	Continuous time	Evolution of probability density $p (x, t)$	Diffusion models, physics, population dynamics
Backward	Continuous time	Evolution of conditional expectations $u (x, t)$	Option pricing, hitting times, Feynman-Kac

1.2 Why They Matter Together

The three equations are not independent — they are different facets of the same underlying semigroup structure:

Chapman-Kolmogorov is the algebraic identity — it asserts that transitions compose
Forward equation is the differential identity — it describes the future of the density
Backward equation is the adjoint identity — it describes the past-dependence of expectations

In modern [[Diffusion Model|diffusion models]], all three appear:

Chapman-Kolmogorov: the Markov chain of the forward noising process
Forward (Fokker-Planck): the evolution of $p_{t} (x)$ along the forward SDE
Backward: the foundation for score matching via denoising

2. Chapman-Kolmogorov Equation

2.1 Discrete-Time Markov Chains (DTMC)

For a discrete-time [[Markov Process|Markov chain]] with transition matrix $P = [p_{i j}]$ :

p_{i j}^{(n)} = P (X_{m + n} = j ∣ X_{m} = i)

The Chapman-Kolmogorov equation states that multi-step transitions compose via matrix multiplication:

p_{i j}^{(m + n)} = \sum_{k} p_{i k}^{(m)} p_{k j}^{(n)}

Or in matrix form:

P^{(m + n)} = P^{(m)} P^{(n)}

Interpretation: To go from $i$ to $j$ in $m + n$ steps, you must pass through some intermediate state $k$ at step $m$ — and the probability is the sum over all possible intermediate states.

2.2 Continuous-Time Markov Chains (CTMC)

For a CTMC with generator matrix $Q$ , the transition probability $P (t) = [p_{i j} (t)]$ satisfies:

P (s + t) = P (s) P (t), P (0) = I

This is the semigroup property: the transition operator $P (t)$ forms a one-parameter semigroup.

2.3 General State Space

For a Markov process on a general (possibly continuous) state space with transition kernel $P (t, x, A)$ :

P (s + t, x, A) = \int_{S} P (t, y, A) P (s, x, d y)

This is the most general form: to transition from $x$ to any state in set $A$ over time $s + t$ , integrate over all possible intermediate positions $y$ at time $s$ .

2.4 Probabilistic Interpretation

The Chapman-Kolmogorov equation is more than a formula — it’s a consistency condition:

If you know the 1-step transition probabilities, you know everything about the process.

Time:  0 ────── s ────── s+t
        i ──→── k ──→── j
         \______________/
              m+n steps

Every path from $i$ to $j$ decomposes uniquely into a prefix and a suffix — and the probability factors accordingly.

3. Kolmogorov Forward Equation

3.1 CTMC Form (Master Equation)

Starting from $P (s + t) = P (s) P (t)$ and differentiating with respect to forward time $t$ at $t = 0$ :

\frac{d P (t)}{d t} = P (t) Q, P (0) = I

In component form:

\frac{d p_{i j} (t)}{d t} = \sum_{k} p_{i k} (t) q_{k j}

Interpretation: The rate of change of $p_{i j}$ equals the net probability flux into state $j$ — transitions from occupied states minus transitions out.

3.2 Diffusion Form (Fokker-Planck Equation)

For a [[Stochastic Differential Equation (SDE)|diffusion process]] $d X_{t} = μ (X_{t}) d t + σ (X_{t}) d W_{t}$ , the forward equation becomes the [[Fokker-Planck Equation]]:

\frac{\partial p (x, t)}{\partial t} = - \frac{\partial}{\partial x} [μ (x) p (x, t)] + \frac{1}{2} \frac{\partial^{2}}{\partial x^{2}} [σ^{2} (x) p (x, t)]

Compact operator notation:

\frac{\partial p}{\partial t} = L^{*} p

where $L^{*}$ is the adjoint of the infinitesimal generator.

[!NOTE] Unified View
Both the CTMC master equation and the Fokker-Planck equation are Kolmogorov forward equations — they differ only in the state space (discrete vs continuous) and the form of the generator.

3.3 Forward vs Fokker-Planck Terminology

Context	Equation Name	Generator
Discrete state (CTMC)	Kolmogorov Forward / Master Equation	$Q$ -matrix
Continuous state (diffusion)	Fokker-Planck Equation / Forward Kolmogorov	$L^{*} = - \partial_{x} μ + \frac{1}{2} \partial_{x}^{2} σ^{2}$
General Markov process	Kolmogorov Forward Equation	$A^{*}$ (adjoint of generator)

3.4 Role in Diffusion Models

In [[Diffusion Model|diffusion models]], the forward noising process is a Markov diffusion. Its density $p_{t} (x)$ satisfies the forward equation:

\frac{\partial p_{t}}{\partial t} = - \nabla \cdot [f (t) x p_{t}] + \frac{1}{2} g (t)^{2} \nabla^{2} p_{t}

This equation determines how the data distribution $p_{0}$ evolves toward Gaussian noise $p_{T}$ , and it forms the theoretical basis for deriving the [[Probability Flow ODE]].

4. Kolmogorov Backward Equation

4.1 CTMC Form

Differentiating $P (s + t) = P (s) P (t)$ with respect to backward time $s$ :

\frac{d P (t)}{d t} = Q P (t), P (0) = I

In component form:

\frac{d p_{i j} (t)}{d t} = \sum_{k} q_{i k} p_{k j} (t)

Key difference from forward: In the forward equation, the sum is over the second index of $Q$ (destination); in the backward, over the first index (origin).

4.2 Diffusion Form

For a diffusion, the backward equation governs the conditional expectation:

u (x, t) = E [f (X_{T}) ∣ X_{t} = x]

\frac{\partial u}{\partial t} + μ (x) \frac{\partial u}{\partial x} + \frac{1}{2} σ^{2} (x) \frac{\partial^{2} u}{\partial x^{2}} = 0

with terminal condition $u (x, T) = f (x)$ .

Compact operator form:

\frac{\partial u}{\partial t} + L u = 0

where $L$ is the infinitesimal generator (NOT its adjoint — this is the key difference from the forward equation).

4.3 Forward vs Backward — Side-by-Side

Aspect	Forward (Fokker-Planck)	Backward
Variable	$x$ (future state)	$x_{0}$ (initial state)
Unknown	Density $p (x, t ∣ x_{0}, 0)$	Expectation $u (x_{0}, t) = E [f (X_{t}) ∣ X_{0} = x_{0}]$
Operator	$L^{*}$ (adjoint)	$L$ (generator)
Initial/Boundary	$p (x, 0) = δ (x - x_{0})$	$u (x, T) = f (x)$ (terminal)
Direction	Forward in time	Backward in time
Lineage	From $p_{t}$ to $p_{t + d t}$	From expectation at $t$ to $t - d t$
CTMC Form	$d P / d t = P Q$	$d P / d t = Q P$

[!WARNING] The Adjoint Distinction
The forward equation uses $L^{*}$ (adjoint), the backward uses $L$ . For self-adjoint generators (e.g., pure Brownian motion $L = \frac{1}{2} Δ$ ), forward and backward equations coincide — but this is the exception, not the rule.

4.4 Feynman-Kac Extension

The backward equation generalizes to include a potential (discount/killing) term $r (x)$ :

\frac{\partial u}{\partial t} + L u - r (x) u = 0, u (x, T) = f (x)

This has the stochastic representation:

u (x, t) = E [f (X_{T}) \exp (- \int_{t}^{T} r (X_{s}) d s) | X_{t} = x]

Applications: Option pricing (Black-Scholes), exit problems, reaction-diffusion systems.

5. The Generator and Semigroup Framework

5.1 Transition Semigroup

The transition operators ${P_{t}}_{t \geq 0}$ form a strongly continuous semigroup (C₀-semigroup) on the space of bounded measurable functions:

(P_{t} f) (x) = E [f (X_{t}) ∣ X_{0} = x]

Properties:

Identity: $P_{0} = I$
Semigroup: $P_{s + t} = P_{s} P_{t}$ (Chapman-Kolmogorov)
Continuity: $lim_{t \to 0} P_{t} f = f$ (strong continuity)

5.2 Infinitesimal Generator

The generator $A$ is the derivative of the semigroup at zero:

A f = lim_{t \to 0} \frac{P_{t} f - f}{t}

This single operator encodes ALL information about the process dynamics.

Process	Generator $A$
[[Wiener Process\|Wiener Process]]	$\frac{1}{2} \frac{d^{2}}{d x^{2}}$
General diffusion	$μ (x) \frac{d}{d x} + \frac{σ^{2} (x)}{2} \frac{d^{2}}{d x^{2}}$
CTMC	$(Q f) (i) = \sum_{j} q_{i j} f (j)$
Jump-diffusion	$A_{diff} + λ \int [f (x + y) - f (x)] ν (d y)$

5.3 The Unified Kolmogorov Equations

From the semigroup property, BOTH Kolmogorov equations follow:

Forward: $\frac{d}{d t} P_{t} = A P_{t}$ (or $P_{t} A$ depending on convention)
→ Acting on the density: $\frac{\partial p_{t}}{\partial t} = A^{*} p_{t}$

Backward: $\frac{d}{d t} P_{t} f = A P_{t} f$
→ Acting on the test function: $\frac{\partial}{\partial t} P_{t} f = A P_{t} f$

               Chapman-Kolmogorov
               P_{s+t} = P_s P_t
                    │
       ┌────────────┴────────────┐
       │                         │
 Differentiate s             Differentiate t
 (backward)                  (forward)
       │                         │
       ▼                         ▼
Backward Equation          Forward Equation
d/dt P_t f = A P_t f      ∂p/∂t = A* p
(test functions)           (densities)

6. Connection to Diffusion Models

6.1 The Full Kolmogorov Picture

In [[Diffusion Model|diffusion models]], the Kolmogorov equations provide the complete mathematical scaffolding:

Component	Kolmogorov Equation	Role
Forward noising process	Chapman-Kolmogorov	$q (x_{t} ∣ x_{0}) = \int q (x_{t} ∣ x_{s}) q (x_{s} ∣ x_{0}) d x_{s}$
Marginal density evolution	Forward (Fokker-Planck)	$\partial_{t} p_{t} = - \nabla \cdot [f x p_{t}] + \frac{1}{2} g^{2} \nabla^{2} p_{t}$
Score matching	Backward	Links $\nabla_{x} \log p_{t} (x)$ to denoising objective
Probability Flow ODE	Forward (continuity form)	Same marginals as SDE
Reverse-time SDE	Backward (time-reversed)	$d x = [f x - g^{2} \nabla \log p_{t}] d t + g d {\bar{W}}_{t}$

6.2 From Chapman-Kolmogorov to DDPM

The DDPM forward process is defined by discrete Markov transitions:

q (x_{t} ∣ x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I)

The Chapman-Kolmogorov equation allows compressing multiple steps:

q (x_{t} ∣ x_{0}) = \int q (x_{t} ∣ x_{t - 1}) \dots q (x_{1} ∣ x_{0}) d x_{1} \dots d x_{t - 1}

Due to the Gaussian structure and Chapman-Kolmogorov, this simplifies to a single Gaussian:

q (x_{t} ∣ x_{0}) = N (x_{t}; \sqrt{{\bar{α}}_{t}} x_{0}, (1 - {\bar{α}}_{t}) I)

This is the Chapman-Kolmogorov equation in action: multi-step transitions reduce to a single closed-form expression, making efficient training possible.

6.3 Kolmogorov Equations in Score-Based Models

In score-based generative models ([[Score Function|Score SDE]] framework):

Forward equation describes how $p_{t}$ spreads from data → noise
Backward equation describes the reverse-time dynamics for sampling
The score function $\nabla_{x} \log p_{t} (x)$ appears in the backward equation as the drift correction term

The equivalence between SDE sampling and [[Probability Flow ODE]] sampling follows from the fact that both share the same Kolmogorov forward equation — they produce identical marginal distributions at all times.

7. Mathematical Properties

7.1 Uniqueness

Under standard regularity conditions (Lipschitz drift, bounded diffusion, non-degenerate noise):

The forward equation has a unique solution for a given initial density
The backward equation has a unique solution for a given terminal condition
Both solutions are $C^{1, 2}$ (continuously differentiable in $t$ , twice in $x$ )

7.2 Positivity and Conservation

Both the forward and backward equations preserve fundamental properties:

Forward: $\int p (x, t) d x = 1$ for all $t$ (conservation of probability)
Forward: $p (x, 0) \geq 0 \Rightarrow p (x, t) \geq 0$ (positivity preservation)
Backward: Maximum principle — $u (x, t)$ is bounded by its terminal values

7.3 Self-Adjoint Case

When $L = L^{*}$ (self-adjoint), forward and backward equations coincide. This occurs for:

[[Wiener Process|Wiener Process]]: $L = \frac{1}{2} \frac{d^{2}}{d x^{2}}$ (the Laplacian is self-adjoint)
Gradient diffusions with symmetric potential: $d X_{t} = - \nabla V (X_{t}) d t + \sqrt{2} d W_{t}$

In general diffusions with non-zero drift, the generator is not self-adjoint — forward and backward equations are genuinely different.

7.4 Spectral Interpretation

The forward and backward equations share the same spectrum (eigenvalues of $L$ ), but different eigenfunctions:

Forward eigenfunctions = left eigenvectors of $L$
Backward eigenfunctions = right eigenvectors of $L$

The spectral gap $λ_{1}$ (smallest non-zero eigenvalue) determines the mixing time of the process.

8. Core Formula Cards

[!QUOTE] Chapman-Kolmogorov (General)
$P (s + t, x, A) = \int_{S} P (t, y, A) P (s, x, d y)$

[!QUOTE] Chapman-Kolmogorov (DTMC)
$P^{(m + n)} = P^{(m)} P^{(n)}$

[!QUOTE] Kolmogorov Forward (CTMC)
$\frac{d P (t)}{d t} = P (t) Q, P (0) = I$

[!QUOTE] Kolmogorov Backward (CTMC)
$\frac{d P (t)}{d t} = Q P (t), P (0) = I$

[!QUOTE] Kolmogorov Forward (Diffusion / Fokker-Planck)
$\frac{\partial p}{\partial t} = - \frac{\partial}{\partial x} [μ p] + \frac{1}{2} \frac{\partial^{2}}{\partial x^{2}} [σ^{2} p]$

[!QUOTE] Kolmogorov Backward (Diffusion)
$\frac{\partial u}{\partial t} + μ \frac{\partial u}{\partial x} + \frac{σ^{2}}{2} \frac{\partial^{2} u}{\partial x^{2}} = 0$

[!QUOTE] Infinitesimal Generator
$A f (x) = lim_{t \to 0} \frac{E [f (X_{t}) ∣ X_{0} = x] - f (x)}{t}$

[!QUOTE] Semigroup Property
$P_{s + t} = P_{s} P_{t}, P_{0} = I$

9. Summary

Aspect	Description
What they describe	Time evolution of transition probabilities in Markov processes
Chapman-Kolmogorov	Algebraic consistency: transitions compose via semigroup property
Forward (Fokker-Planck)	How probability density flows forward in time
Backward	How conditional expectations evolve backward from terminal conditions
Unifying framework	All three derive from the semigroup property of Markov transitions
Key distinction	Forward uses adjoint generator $L^{*}$ , backward uses generator $L$
Role in diffusion models	Forward = density evolution; Backward = reverse process / score matching foundation
Named after	Andrey Kolmogorov (1931) — who also axiomatized probability theory

Kolmogorov’s equations are the mathematical thread that connects the algebraic (Chapman-Kolmogorov), probabilistic (forward density), and analytic (backward expectations) descriptions of stochastic dynamics — a unification that remains at the heart of modern generative modeling.

Dataview Query

1
2
3

LIST
FROM #kolmogorov_equations OR #stochastic_process OR #markov_process
SORT file.ctime DESC

[[Markov Process]]
[[Fokker-Planck Equation]]
[[Stochastic Differential Equation (SDE)]]
[[Wiener Process|Wiener Process]]
[[Diffusion Model]]
[[Score Function]]
[[Probability Flow ODE]]
[[Martingale]]
[[Langevin Dynamics]]
[[Feynman-Kac Formula]]

References

Paper: Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung (Kolmogorov, 1931 — foundational paper)
Book: Continuous Martingales and Brownian Motion (Revuz & Yor, Chapter III: Markov Processes)
Book: Stochastic Differential Equations (Øksendal, Chapter 8: Diffusions and Kolmogorov Equations)
Book: Markov Processes: Characterization and Convergence (Ethier & Kurtz)
Book: Probability Theory and Stochastic Processes (Grimmett & Stirzaker)
Book: Diffusion Models: A Comprehensive Guide (Yang Song, Chapter on Score SDE)
Wikipedia: Chapman-Kolmogorov equation, Fokker-Planck equation, C₀-semigroup, Infinitesimal generator
"

ChungMG

Mathematics & Machine Learning

Kolmogorov Equations

1. Core Concept

1.1 The Kolmogorov Triplet

1.2 Why They Matter Together

2. Chapman-Kolmogorov Equation

2.1 Discrete-Time Markov Chains (DTMC)

2.2 Continuous-Time Markov Chains (CTMC)

2.3 General State Space

2.4 Probabilistic Interpretation

3. Kolmogorov Forward Equation

3.1 CTMC Form (Master Equation)

3.2 Diffusion Form (Fokker-Planck Equation)

3.3 Forward vs Fokker-Planck Terminology

3.4 Role in Diffusion Models

4. Kolmogorov Backward Equation

4.1 CTMC Form

4.2 Diffusion Form

4.3 Forward vs Backward — Side-by-Side

4.4 Feynman-Kac Extension

5. The Generator and Semigroup Framework

5.1 Transition Semigroup

5.2 Infinitesimal Generator

5.3 The Unified Kolmogorov Equations

6. Connection to Diffusion Models

6.1 The Full Kolmogorov Picture

6.2 From Chapman-Kolmogorov to DDPM

6.3 Kolmogorov Equations in Score-Based Models

7. Mathematical Properties

7.1 Uniqueness

7.2 Positivity and Conservation

7.3 Self-Adjoint Case

7.4 Spectral Interpretation

8. Core Formula Cards

9. Summary

Dataview Query

References

Kolmogorov Equations

1. Core Concept

1.1 The Kolmogorov Triplet

1.2 Why They Matter Together

2. Chapman-Kolmogorov Equation

2.1 Discrete-Time Markov Chains (DTMC)

2.2 Continuous-Time Markov Chains (CTMC)

2.3 General State Space

2.4 Probabilistic Interpretation

3. Kolmogorov Forward Equation

3.1 CTMC Form (Master Equation)

3.2 Diffusion Form (Fokker-Planck Equation)

3.3 Forward vs Fokker-Planck Terminology

3.4 Role in Diffusion Models

4. Kolmogorov Backward Equation

4.1 CTMC Form

4.2 Diffusion Form

4.3 Forward vs Backward — Side-by-Side

4.4 Feynman-Kac Extension

5. The Generator and Semigroup Framework

5.1 Transition Semigroup

5.2 Infinitesimal Generator

5.3 The Unified Kolmogorov Equations

6. Connection to Diffusion Models

6.1 The Full Kolmogorov Picture

6.2 From Chapman-Kolmogorov to DDPM

6.3 Kolmogorov Equations in Score-Based Models

7. Mathematical Properties

7.1 Uniqueness

7.2 Positivity and Conservation

7.3 Self-Adjoint Case

7.4 Spectral Interpretation

8. Core Formula Cards

9. Summary

Dataview Query

Related Concepts

References