2026-06-30

Markov Process

A Markov Process is a stochastic process that satisfies the Markov Property: the future state depends only on the current state, not on the past history. This “memoryless” property makes Markov processes fundamental in modeling random systems across physics, finance, biology, and machine learning.

1. Core Concept

1.1 Markov Property

Intuitive Understanding:

“Given the present, the future is independent of the past.”

Formal Definition:

A stochastic process ${X_{t}}_{t \in T}$ satisfies the Markov property if:

P (X_{t + n} = x ∣ X_{t} = x_{t}, X_{t - 1} = x_{t - 1}, \dots, X_{0} = x_{0}) = P (X_{t + n} = x ∣ X_{t} = x_{t})

for all $n \geq 1$ and all states $x, x_{t}, x_{t - 1}, \dots, x_{0}$ .

[!NOTE] Key Insight
The Markov property means that the current state contains all the information needed to predict the future. Past history provides no additional predictive power once we know the present.

1.2 State Space and Time

Markov processes are classified by:

State Space:

Discrete: Finite or countable states (Markov Chain)
Continuous: Real-valued or vector-valued states

Time Parameter:

Discrete-time: $t \in {0, 1, 2, \dots}$
Continuous-time: $t \in [0, \infty)$

1.3 Classification

Type	Time	State Space	Example
Discrete-time Markov Chain (DTMC)	Discrete	Discrete	Random walk, board games
Continuous-time Markov Chain (CTMC)	Continuous	Discrete	Queueing systems, Poisson process
Markov Process (general)	Continuous	Continuous	[[Wiener Process
Hidden Markov Model (HMM)	Discrete	Discrete (hidden)	Speech recognition, NLP

2. Discrete-Time Markov Chains (DTMC)

2.1 Definition

A discrete-time Markov chain is characterized by:

State space: $S = {s_{1}, s_{2}, \dots, s_{N}}$
Initial distribution: $π_{0} (i) = P (X_{0} = s_{i})$
Transition matrix: $P = [p_{i j}]$ , where $p_{i j} = P (X_{n + 1} = s_{j} ∣ X_{n} = s_{i})$

Properties of transition matrix:

$p_{i j} \geq 0$ for all $i, j$
$\sum_{j} p_{i j} = 1$ for all $i$ (row stochastic)

2.2 Chapman-Kolmogorov Equation

The $n$ -step transition probability is:

p_{i j}^{(n)} = P (X_{m + n} = s_{j} ∣ X_{m} = s_{i}) = (P^{n})_{i j}

Matrix form:

P^{(m + n)} = P^{(m)} P^{(n)}

2.3 State Classification

Reachability:

State $j$ is reachable from state $i$ if $p_{i j}^{(n)} > 0$ for some $n \geq 0$

Communication:

States $i$ and $j$ communicate ( $i \leftrightarrow j$ ) if $i$ is reachable from $j$ and vice versa

Classification:

Property	Definition	Meaning
Recurrent	Return to state with probability 1	Will visit again
Transient	Return probability < 1	May never return
Absorbing	$p_{i i} = 1$	Once entered, never leaves
Ergodic	Recurrent + aperiodic + positive recurrent	Converges to stationary distribution

2.4 Stationary Distribution

A distribution $π$ is stationary if:

π P = π

or equivalently:

π_{j} = \sum_{i} π_{i} p_{i j} \forall j

Existence and Uniqueness Theorem:

For an irreducible, aperiodic, positive recurrent Markov chain:

Unique stationary distribution $π$ exists
$lim_{n \to \infty} p_{i j}^{(n)} = π_{j}$ for all $i, j$
$π_{j} = \frac{1}{E [T_{j}]}$ , where $T_{j}$ is return time to state $j$

2.5 Example: Random Walk

Simple symmetric random walk on integers:

X_{n + 1} = {\begin{cases} X_{n} + 1 & with probability p \\ X_{n} - 1 & with probability 1 - p \end{cases}

Transition probabilities:

$p_{i, i + 1} = p$
$p_{i, i - 1} = 1 - p$

Properties:

If $p = 0.5$ : Null recurrent (returns with probability 1, but infinite expected return time)
If $p \neq 0.5$ : Transient (drifts to $\pm \infty$ )

3. Continuous-Time Markov Chains (CTMC)

3.1 Definition

A continuous-time Markov chain is characterized by:

State space: $S = {s_{1}, s_{2}, \dots}$
Generator matrix (Q-matrix): $Q = [q_{i j}]$

Properties:

$q_{i j} \geq 0$ for $i \neq j$ (transition rates)
$q_{i i} = - \sum_{j \neq i} q_{i j}$ (negative diagonal)
$\sum_{j} q_{i j} = 0$ for all $i$

3.2 Transition Probability Matrix

The transition probability matrix $P (t)$ satisfies the Kolmogorov equations:

Forward equation:

\frac{d P (t)}{d t} = P (t) Q

Backward equation:

\frac{d P (t)}{d t} = Q P (t)

Solution:

P (t) = e^{Q t} = \sum_{n = 0}^{\infty} \frac{(Q t)^{n}}{n!}

3.3 Holding Times

Key property: The time spent in state $i$ before transitioning is exponentially distributed:

T_{i} \sim Exp (λ_{i}), where λ_{i} = - q_{i i}

Memoryless property:

P (T_{i} > s + t ∣ T_{i} > s) = P (T_{i} > t)

3.4 Embedded Markov Chain

The jump chain (embedded DTMC) has transition matrix:

{\tilde{p}}_{i j} = {\begin{cases} \frac{q_{i j}}{- q_{i i}} & if i \neq j \\ 0 & if i = j \end{cases}

3.5 Stationary Distribution

For an irreducible CTMC, stationary distribution $π$ satisfies:

π Q = 0

or:

\sum_{i} π_{i} q_{i j} = 0 \forall j

4. General Markov Processes

4.1 Transition Kernel

For continuous state space, the transition probability is described by a kernel:

P (t, x, A) = P (X_{s + t} \in A ∣ X_{s} = x)

where $A$ is a measurable set.

4.2 Chapman-Kolmogorov Equation (General)

P (s + t, x, A) = \int_{S} P (t, y, A) P (s, x, d y)

4.3 Feller Property

A Markov process is Feller if its transition semigroup maps continuous functions to continuous functions. This ensures:

Well-behaved sample paths
Existence of generator
Connection to PDEs

4.4 Infinitesimal Generator

The generator $A$ of a Markov process acts on functions $f$ :

A f (x) = lim_{t \to 0} \frac{E [f (X_{t}) ∣ X_{0} = x] - f (x)}{t}

Examples:

[[Wiener Process|Wiener Process]]: $A = \frac{1}{2} \frac{d^{2}}{d x^{2}}$
[[Stochastic Differential Equation (SDE)|SDE]] $d X_{t} = μ (X_{t}) d t + σ (X_{t}) d W_{t}$ :

A = μ (x) \frac{d}{d x} + \frac{1}{2} σ^{2} (x) \frac{d^{2}}{d x^{2}}

5. Connection to Other Concepts

5.1 Markov Process and [[Wiener Process|Wiener Process]]

The [[Wiener Process|Wiener Process]] is a Markov process with:

Continuous state space: $R$
Continuous time: $t \in [0, \infty)$
Transition density: $p (t, x, y) = \frac{1}{\sqrt{2 π t}} e^{- (y - x)^{2} / (2 t)}$

Markov property:

P (W_{t + s} \in A ∣ W_{u}, 0 \leq u \leq t) = P (W_{t + s} \in A ∣ W_{t})

5.2 Markov Process and [[Stochastic Differential Equation (SDE)|SDE]]

Solutions to SDEs are Markov processes:

d X_{t} = μ (X_{t}, t) d t + σ (X_{t}, t) d W_{t}

Conditions for Markov property:

Coefficients $μ, σ$ depend only on current state $X_{t}$ (not history)
[[Wiener Process|Wiener Process]] increments are independent

5.3 Markov Process and [[Martingale]]

Relationship:

Not all Markov processes are [[Martingale|martingales]]
Not all martingales are Markov processes

Example: [[Wiener Process|Wiener Process]] is both Markov and [[Martingale|martingale]].

[[Martingale]] condition for Markov process:

A Markov process $X_{t}$ is a [[Martingale|martingale]] iff:

E [X_{t + s} ∣ X_{t}] = X_{t}

5.4 Markov Process and [[Diffusion Model|Diffusion Models]]

Forward process in diffusion models:

q (x_{t} ∣ x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I)

This is a Markov chain! The full forward process:

q (x_{0 : T}) = q (x_{0}) \prod_{t = 1}^{T} q (x_{t} ∣ x_{t - 1})

Key implications:

Efficient sampling: only need current state
Tractable posterior: $q (x_{t - 1} ∣ x_{t}, x_{0})$ simplifies
Training objective decomposes over time steps

6. Important Theorems

6.1 Strong Markov Property

Definition: A process has the strong Markov property if the Markov property holds at stopping times $τ$ :

P (X_{τ + t} \in A ∣ F_{τ}) = P (X_{τ + t} \in A ∣ X_{τ})

Theorem: [[Wiener Process|Wiener Process]] and solutions to SDEs satisfy the strong Markov property.

6.2 Ergodic Theorem

For an ergodic Markov chain with stationary distribution $π$ :

\frac{1}{n} \sum_{k = 0}^{n - 1} f (X_{k}) \overset{a . s .}{\to} \sum_{i} π_{i} f (s_{i})

Interpretation: Time average converges to ensemble average.

6.3 Convergence Rate

For a finite, irreducible, aperiodic Markov chain:

∥ P^{n} (x, \cdot) - π ∥_{TV} \leq C λ^{n}

where:

$λ \in (0, 1)$ is the second-largest eigenvalue modulus
$C$ is a constant
$∥ \cdot ∥_{TV}$ is total variation distance

Mixing time: $t_{mix} (ϵ) = min {n : ∥ P^{n} (x, \cdot) - π ∥_{TV} \leq ϵ}$

6.4 Reversibility

A Markov chain is reversible if it satisfies detailed balance:

π_{i} p_{i j} = π_{j} p_{j i} \forall i, j

Implications:

Easier to compute stationary distribution
Connection to physics (detailed balance in thermodynamics)
Used in MCMC algorithms ([[Metropolis-Hastings]])

7. Applications

7.1 [[Diffusion Model|Diffusion Models]]

Forward process (Markov chain):

x_{0} \to x_{1} \to x_{2} \to \dots \to x_{T}

Key properties:

Each step depends only on previous step
Enables tractable computation of $q (x_{t} ∣ x_{0})$
Reverse process also Markov (approximately)

7.2 [[Martingale|Martingale]] Theory

Markov martingales:

[[Wiener Process|Wiener Process]]
Geometric [[Wiener Process|Brownian motion]]
Solutions to certain SDEs

Applications:

Financial modeling (stock prices)
Optimal stopping problems
Stochastic control

7.3 Queueing Theory

M/M/1 queue:

Arrivals: Poisson process (rate $λ$ )
Service: Exponential (rate $μ$ )
State: Number of customers in system

Transition rates:

$q_{i, i + 1} = λ$ (arrival)
$q_{i, i - 1} = μ$ (departure)

Stationary distribution (if $λ < μ$ ):

π_{n} = (1 - ρ) ρ^{n}, ρ = \frac{λ}{μ}

7.4 Population Dynamics

Birth-death process:

State: Population size
Transitions: Birth (+1) or death (-1)

Applications:

Ecology (species population)
Epidemiology (disease spread)
Genetics (allele frequencies)

7.5 Reinforcement Learning

Markov Decision Process (MDP):

State space $S$
Action space $A$
Transition: $P (s^{'} ∣ s, a)$
Reward: $R (s, a, s^{'})$

Bellman equation:

V (s) = max_{a} [R (s, a) + γ \sum_{s^{'}} P (s^{'} ∣ s, a) V (s^{'})]

7.6 PageRank Algorithm

Web surfing as Markov chain:

States: Web pages
Transitions: Following links
Stationary distribution: PageRank scores

Transition matrix:

P = α W + (1 - α) \frac{1}{N} 1

where $W$ is link matrix, $α \approx 0.85$ is damping factor.

8. Simulation and Sampling

8.1 DTMC Simulation

def simulate_markov_chain(P, x0, n_steps):
    """
    P: Transition matrix
    x0: Initial state
    n_steps: Number of steps
    """
    states = [x0]
    current_state = x0
    
    for _ in range(n_steps):
        # Sample next state
        probs = P[current_state]
        next_state = np.random.choice(len(probs), p=probs)
        states.append(next_state)
        current_state = next_state
    
    return states

8.2 CTMC Simulation (Gillespie Algorithm)

def simulate_ctmc(Q, x0, T_max):
    """
    Q: Generator matrix
    x0: Initial state
    T_max: Maximum time
    """
    trajectory = [(0, x0)]
    current_state = x0
    current_time = 0
    
    while current_time < T_max:
        # Sample holding time
        rate = -Q[current_state, current_state]
        holding_time = np.random.exponential(1 / rate)
        
        # Update time
        current_time += holding_time
        if current_time > T_max:
            break
        
        # Sample next state
        rates = Q[current_state].copy()
        rates[current_state] = 0
        rates /= rates.sum()
        
        next_state = np.random.choice(len(rates), p=rates)
        trajectory.append((current_time, next_state))
        current_state = next_state
    
    return trajectory

8.3 MCMC Sampling

[[Metropolis-Hastings]] algorithm:

def metropolis_hastings(target_proposal, x0, n_samples):
    samples = [x0]
    current = x0
    
    for _ in range(n_samples):
        # Propose new state
        proposed = proposal(current)
        
        # Acceptance probability
        alpha = min(1, target(proposed) * proposal(proposed, current) / 
                       (target(current) * proposal(current, proposed)))
        
        # Accept or reject
        if np.random.random() < alpha:
            current = proposed
        
        samples.append(current)
    
    return samples

9. Advanced Topics

9.1 Hidden Markov Models (HMM)

Structure:

Hidden states: $X_{t}$ (Markov chain)
Observations: $Y_{t}$ (depends on $X_{t}$ )

Components:

Transition matrix: $A = [a_{i j}]$
Emission probabilities: $B = [b_{j} (y)]$
Initial distribution: $π$

Algorithms:

Forward algorithm: Compute $P (Y_{1 : T})$
Viterbi algorithm: Find most likely state sequence
Baum-Welch algorithm: Learn parameters (EM)

9.2 Markov Random Fields (MRF)

Definition: Undirected graphical model with Markov property:

P (X_{i} ∣ X_{V ∖ {i}}) = P (X_{i} ∣ X_{N (i)})

where $N (i)$ is the neighborhood of node $i$ .

Applications:

Image segmentation
Spatial statistics
Physics (Ising model)

9.3 Jump-Diffusion Processes

Combine continuous diffusion with discrete jumps:

d X_{t} = μ (X_{t}) d t + σ (X_{t}) d W_{t} + d J_{t}

where $J_{t}$ is a jump process (e.g., [[Poisson Process]]).

Applications:

Financial modeling (asset prices with sudden crashes)
Neuroscience (neuron spiking)
Queueing systems

9.4 Feynman-Kac Formula

Connects Markov processes to PDEs:

u (x, t) = E [\int_{0}^{t} f (X_{s}, s) d s + g (X_{t}) ∣ X_{0} = x]

solves the PDE:

\frac{\partial u}{\partial t} = A u + f, u (x, 0) = g (x)

where $A$ is the generator of $X_{t}$ .

10. Core Formula Cards

[!QUOTE] Markov Property
$P (X_{t + n} = x ∣ X_{t}, X_{t - 1}, \dots, X_{0}) = P (X_{t + n} = x ∣ X_{t})$

[!QUOTE] Chapman-Kolmogorov Equation (DTMC)
$P^{(m + n)} = P^{(m)} P^{(n)}$

[!QUOTE] Stationary Distribution (DTMC)
$π P = π$

[!QUOTE] Kolmogorov Forward Equation (CTMC)
$\frac{d P (t)}{d t} = P (t) Q$

[!QUOTE] Stationary Distribution (CTMC)
$π Q = 0$

[!QUOTE] Infinitesimal Generator
$A f (x) = lim_{t \to 0} \frac{E [f (X_{t}) ∣ X_{0} = x] - f (x)}{t}$

[!QUOTE] Detailed Balance (Reversibility)
$π_{i} p_{i j} = π_{j} p_{j i}$

[[Stochastic Process]]
[[Wiener Process|Wiener Process]]
[[Stochastic Differential Equation (SDE)]]
[[Martingale]]
[[Poisson Process]]
[[Diffusion Model]]
[[Kolmogorov Equations]]
[[Langevin Dynamics]]
[[Metropolis-Hastings]]
[[Hidden Markov Model]]
[[Markov Decision Process]]

Dataview Query

1
2
3

LIST
FROM #markov_process OR #stochastic_process OR #probability_theory
SORT file.ctime DESC

References

Book: Introduction to Probability Models - Sheldon Ross
Book: Markov Chains - J.R. Norris
Book: Stochastic Processes - Sheldon Ross
Course: STAT 260 High-Dimensional Statistics (Berkeley)
Course: CS236 Deep Generative Models (Stanford)
Wikipedia: Markov chain, Markov property, Chapman-Kolmogorov equations

Markov Process

1. Core Concept

1.1 Markov Property

1.2 State Space and Time

1.3 Classification

2. Discrete-Time Markov Chains (DTMC)

2.1 Definition

2.2 Chapman-Kolmogorov Equation

2.3 State Classification

2.4 Stationary Distribution

2.5 Example: Random Walk

3. Continuous-Time Markov Chains (CTMC)

3.1 Definition

3.2 Transition Probability Matrix

3.3 Holding Times

3.4 Embedded Markov Chain

3.5 Stationary Distribution

4. General Markov Processes

4.1 Transition Kernel

4.2 Chapman-Kolmogorov Equation (General)

4.3 Feller Property

4.4 Infinitesimal Generator

5. Connection to Other Concepts

5.1 Markov Process and [[Wiener Process|Wiener Process]]

5.2 Markov Process and [[Stochastic Differential Equation (SDE)|SDE]]

5.3 Markov Process and [[Martingale]]

5.4 Markov Process and [[Diffusion Model|Diffusion Models]]

6. Important Theorems

6.1 Strong Markov Property

6.2 Ergodic Theorem

6.3 Convergence Rate

6.4 Reversibility

7. Applications

7.1 [[Diffusion Model|Diffusion Models]]

7.2 [[Martingale|Martingale]] Theory

7.3 Queueing Theory

7.4 Population Dynamics

7.5 Reinforcement Learning

7.6 PageRank Algorithm

8. Simulation and Sampling

8.1 DTMC Simulation

8.2 CTMC Simulation (Gillespie Algorithm)

8.3 MCMC Sampling

9. Advanced Topics

9.1 Hidden Markov Models (HMM)

9.2 Markov Random Fields (MRF)

9.3 Jump-Diffusion Processes

9.4 Feynman-Kac Formula

10. Core Formula Cards

Related Concepts

Dataview Query

References