Markov Process

A Markov Process is a stochastic process that satisfies the Markov Property: the future state depends only on the current state, not on the past history. This “memoryless” property makes Markov processes fundamental in modeling random systems across physics, finance, biology, and machine learning.


1. Core Concept

1.1 Markov Property

Intuitive Understanding:

“Given the present, the future is independent of the past.”

Formal Definition:

A stochastic process {Xt}tT satisfies the Markov property if:

P(Xt+n=xXt=xt,Xt1=xt1,,X0=x0)=P(Xt+n=xXt=xt)

for all n1 and all states x,xt,xt1,,x0 .

[!NOTE] Key Insight
The Markov property means that the current state contains all the information needed to predict the future. Past history provides no additional predictive power once we know the present.

1.2 State Space and Time

Markov processes are classified by:

State Space:

  • Discrete: Finite or countable states (Markov Chain)
  • Continuous: Real-valued or vector-valued states

Time Parameter:

  • Discrete-time: t{0,1,2,}
  • Continuous-time: t[0,)

1.3 Classification

Type Time State Space Example
Discrete-time Markov Chain (DTMC) Discrete Discrete Random walk, board games
Continuous-time Markov Chain (CTMC) Continuous Discrete Queueing systems, Poisson process
Markov Process (general) Continuous Continuous [[Wiener Process
Hidden Markov Model (HMM) Discrete Discrete (hidden) Speech recognition, NLP

2. Discrete-Time Markov Chains (DTMC)

2.1 Definition

A discrete-time Markov chain is characterized by:

  1. State space: S={s1,s2,,sN}
  2. Initial distribution: π0(i)=P(X0=si)
  3. Transition matrix: P=[pij] , where pij=P(Xn+1=sjXn=si)

Properties of transition matrix:

  • pij0 for all i,j
  • jpij=1 for all i (row stochastic)

2.2 Chapman-Kolmogorov Equation

The n -step transition probability is:

pij(n)=P(Xm+n=sjXm=si)=(Pn)ij

Matrix form:

P(m+n)=P(m)P(n)

2.3 State Classification

Reachability:

  • State j is reachable from state i if pij(n)>0 for some n0

Communication:

  • States i and j communicate ( ij ) if i is reachable from j and vice versa

Classification:

Property Definition Meaning
Recurrent Return to state with probability 1 Will visit again
Transient Return probability < 1 May never return
Absorbing pii=1 Once entered, never leaves
Ergodic Recurrent + aperiodic + positive recurrent Converges to stationary distribution

2.4 Stationary Distribution

A distribution π is stationary if:

πP=π

or equivalently:

πj=iπipijj

Existence and Uniqueness Theorem:

For an irreducible, aperiodic, positive recurrent Markov chain:

  • Unique stationary distribution π exists
  • limnpij(n)=πj for all i,j
  • πj=1E[Tj] , where Tj is return time to state j

2.5 Example: Random Walk

Simple symmetric random walk on integers:

Xn+1={Xn+1with probability pXn1with probability 1p

Transition probabilities:

  • pi,i+1=p
  • pi,i1=1p

Properties:

  • If p=0.5 : Null recurrent (returns with probability 1, but infinite expected return time)
  • If p0.5 : Transient (drifts to ± )

3. Continuous-Time Markov Chains (CTMC)

3.1 Definition

A continuous-time Markov chain is characterized by:

  1. State space: S={s1,s2,}
  2. Generator matrix (Q-matrix): Q=[qij]

Properties:

  • qij0 for ij (transition rates)
  • qii=jiqij (negative diagonal)
  • jqij=0 for all i

3.2 Transition Probability Matrix

The transition probability matrix P(t) satisfies the Kolmogorov equations:

Forward equation:

dP(t)dt=P(t)Q

Backward equation:

dP(t)dt=QP(t)

Solution:

P(t)=eQt=n=0(Qt)nn!

3.3 Holding Times

Key property: The time spent in state i before transitioning is exponentially distributed:

TiExp(λi),where λi=qii

Memoryless property:

P(Ti>s+tTi>s)=P(Ti>t)

3.4 Embedded Markov Chain

The jump chain (embedded DTMC) has transition matrix:

p~ij={qijqiiif ij0if i=j

3.5 Stationary Distribution

For an irreducible CTMC, stationary distribution π satisfies:

πQ=0

or:

iπiqij=0j

4. General Markov Processes

4.1 Transition Kernel

For continuous state space, the transition probability is described by a kernel:

P(t,x,A)=P(Xs+tAXs=x)

where A is a measurable set.

4.2 Chapman-Kolmogorov Equation (General)

P(s+t,x,A)=SP(t,y,A)P(s,x,dy)

4.3 Feller Property

A Markov process is Feller if its transition semigroup maps continuous functions to continuous functions. This ensures:

  • Well-behaved sample paths
  • Existence of generator
  • Connection to PDEs

4.4 Infinitesimal Generator

The generator A of a Markov process acts on functions f :

Af(x)=limt0E[f(Xt)X0=x]f(x)t

Examples:

  1. [[Wiener Process|Wiener Process]]: A=12d2dx2

  2. [[Stochastic Differential Equation (SDE)|SDE]] dXt=μ(Xt)dt+σ(Xt)dWt :

A=μ(x)ddx+12σ2(x)d2dx2

5. Connection to Other Concepts

5.1 Markov Process and [[Wiener Process|Wiener Process]]

The [[Wiener Process|Wiener Process]] is a Markov process with:

  • Continuous state space: R
  • Continuous time: t[0,)
  • Transition density: p(t,x,y)=12πte(yx)2/(2t)

Markov property:

P(Wt+sAWu,0ut)=P(Wt+sAWt)

5.2 Markov Process and [[Stochastic Differential Equation (SDE)|SDE]]

Solutions to SDEs are Markov processes:

dXt=μ(Xt,t)dt+σ(Xt,t)dWt

Conditions for Markov property:

  • Coefficients μ,σ depend only on current state Xt (not history)
  • [[Wiener Process|Wiener Process]] increments are independent

5.3 Markov Process and [[Martingale]]

Relationship:

  • Not all Markov processes are [[Martingale|martingales]]
  • Not all martingales are Markov processes

Example: [[Wiener Process|Wiener Process]] is both Markov and [[Martingale|martingale]].

[[Martingale]] condition for Markov process:

A Markov process Xt is a [[Martingale|martingale]] iff:

E[Xt+sXt]=Xt

5.4 Markov Process and [[Diffusion Model|Diffusion Models]]

Forward process in diffusion models:

q(xtxt1)=N(xt;1βtxt1,βtI)

This is a Markov chain! The full forward process:

q(x0:T)=q(x0)t=1Tq(xtxt1)

Key implications:

  • Efficient sampling: only need current state
  • Tractable posterior: q(xt1xt,x0) simplifies
  • Training objective decomposes over time steps

6. Important Theorems

6.1 Strong Markov Property

Definition: A process has the strong Markov property if the Markov property holds at stopping times τ :

P(Xτ+tAFτ)=P(Xτ+tAXτ)

Theorem: [[Wiener Process|Wiener Process]] and solutions to SDEs satisfy the strong Markov property.

6.2 Ergodic Theorem

For an ergodic Markov chain with stationary distribution π :

1nk=0n1f(Xk)a.s.iπif(si)

Interpretation: Time average converges to ensemble average.

6.3 Convergence Rate

For a finite, irreducible, aperiodic Markov chain:

Pn(x,)πTVCλn

where:

  • λ(0,1) is the second-largest eigenvalue modulus
  • C is a constant
  • TV is total variation distance

Mixing time: tmix(ϵ)=min{n:Pn(x,)πTVϵ}

6.4 Reversibility

A Markov chain is reversible if it satisfies detailed balance:

πipij=πjpjii,j

Implications:

  • Easier to compute stationary distribution
  • Connection to physics (detailed balance in thermodynamics)
  • Used in MCMC algorithms ([[Metropolis-Hastings]])

7. Applications

7.1 [[Diffusion Model|Diffusion Models]]

Forward process (Markov chain):

x0x1x2xT

Key properties:

  • Each step depends only on previous step
  • Enables tractable computation of q(xtx0)
  • Reverse process also Markov (approximately)

7.2 [[Martingale|Martingale]] Theory

Markov martingales:

  • [[Wiener Process|Wiener Process]]
  • Geometric [[Wiener Process|Brownian motion]]
  • Solutions to certain SDEs

Applications:

  • Financial modeling (stock prices)
  • Optimal stopping problems
  • Stochastic control

7.3 Queueing Theory

M/M/1 queue:

  • Arrivals: Poisson process (rate λ )
  • Service: Exponential (rate μ )
  • State: Number of customers in system

Transition rates:

  • qi,i+1=λ (arrival)
  • qi,i1=μ (departure)

Stationary distribution (if λ<μ ):

πn=(1ρ)ρn,ρ=λμ

7.4 Population Dynamics

Birth-death process:

  • State: Population size
  • Transitions: Birth (+1) or death (-1)

Applications:

  • Ecology (species population)
  • Epidemiology (disease spread)
  • Genetics (allele frequencies)

7.5 Reinforcement Learning

Markov Decision Process (MDP):

  • State space S
  • Action space A
  • Transition: P(ss,a)
  • Reward: R(s,a,s)

Bellman equation:

V(s)=maxa[R(s,a)+γsP(ss,a)V(s)]

7.6 PageRank Algorithm

Web surfing as Markov chain:

  • States: Web pages
  • Transitions: Following links
  • Stationary distribution: PageRank scores

Transition matrix:

P=αW+(1α)1N1

where W is link matrix, α0.85 is damping factor.


8. Simulation and Sampling

8.1 DTMC Simulation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def simulate_markov_chain(P, x0, n_steps):
"""
P: Transition matrix
x0: Initial state
n_steps: Number of steps
"""
states = [x0]
current_state = x0

for _ in range(n_steps):
# Sample next state
probs = P[current_state]
next_state = np.random.choice(len(probs), p=probs)
states.append(next_state)
current_state = next_state

return states

8.2 CTMC Simulation (Gillespie Algorithm)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def simulate_ctmc(Q, x0, T_max):
"""
Q: Generator matrix
x0: Initial state
T_max: Maximum time
"""
trajectory = [(0, x0)]
current_state = x0
current_time = 0

while current_time < T_max:
# Sample holding time
rate = -Q[current_state, current_state]
holding_time = np.random.exponential(1 / rate)

# Update time
current_time += holding_time
if current_time > T_max:
break

# Sample next state
rates = Q[current_state].copy()
rates[current_state] = 0
rates /= rates.sum()

next_state = np.random.choice(len(rates), p=rates)
trajectory.append((current_time, next_state))
current_state = next_state

return trajectory

8.3 MCMC Sampling

[[Metropolis-Hastings]] algorithm:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def metropolis_hastings(target_proposal, x0, n_samples):
samples = [x0]
current = x0

for _ in range(n_samples):
# Propose new state
proposed = proposal(current)

# Acceptance probability
alpha = min(1, target(proposed) * proposal(proposed, current) /
(target(current) * proposal(current, proposed)))

# Accept or reject
if np.random.random() < alpha:
current = proposed

samples.append(current)

return samples

9. Advanced Topics

9.1 Hidden Markov Models (HMM)

Structure:

  • Hidden states: Xt (Markov chain)
  • Observations: Yt (depends on Xt )

Components:

  • Transition matrix: A=[aij]
  • Emission probabilities: B=[bj(y)]
  • Initial distribution: π

Algorithms:

  • Forward algorithm: Compute P(Y1:T)
  • Viterbi algorithm: Find most likely state sequence
  • Baum-Welch algorithm: Learn parameters (EM)

9.2 Markov Random Fields (MRF)

Definition: Undirected graphical model with Markov property:

P(XiXV{i})=P(XiXN(i))

where N(i) is the neighborhood of node i .

Applications:

  • Image segmentation
  • Spatial statistics
  • Physics (Ising model)

9.3 Jump-Diffusion Processes

Combine continuous diffusion with discrete jumps:

dXt=μ(Xt)dt+σ(Xt)dWt+dJt

where Jt is a jump process (e.g., [[Poisson Process]]).

Applications:

  • Financial modeling (asset prices with sudden crashes)
  • Neuroscience (neuron spiking)
  • Queueing systems

9.4 Feynman-Kac Formula

Connects Markov processes to PDEs:

u(x,t)=E[0tf(Xs,s)ds+g(Xt)X0=x]

solves the PDE:

ut=Au+f,u(x,0)=g(x)

where A is the generator of Xt .


10. Core Formula Cards

[!QUOTE] Markov Property

P(Xt+n=xXt,Xt1,,X0)=P(Xt+n=xXt)

[!QUOTE] Chapman-Kolmogorov Equation (DTMC)

P(m+n)=P(m)P(n)

[!QUOTE] Stationary Distribution (DTMC)

πP=π

[!QUOTE] Kolmogorov Forward Equation (CTMC)

dP(t)dt=P(t)Q

[!QUOTE] Stationary Distribution (CTMC)

πQ=0

[!QUOTE] Infinitesimal Generator

Af(x)=limt0E[f(Xt)X0=x]f(x)t

[!QUOTE] Detailed Balance (Reversibility)

πipij=πjpji

  • [[Stochastic Process]]
  • [[Wiener Process|Wiener Process]]
  • [[Stochastic Differential Equation (SDE)]]
  • [[Martingale]]
  • [[Poisson Process]]
  • [[Diffusion Model]]
  • [[Kolmogorov Equations]]
  • [[Langevin Dynamics]]
  • [[Metropolis-Hastings]]
  • [[Hidden Markov Model]]
  • [[Markov Decision Process]]

Dataview Query

1
2
3
LIST
FROM #markov_process OR #stochastic_process OR #probability_theory
SORT file.ctime DESC

References

  • Book: Introduction to Probability Models - Sheldon Ross
  • Book: Markov Chains - J.R. Norris
  • Book: Stochastic Processes - Sheldon Ross
  • Course: STAT 260 High-Dimensional Statistics (Berkeley)
  • Course: CS236 Deep Generative Models (Stanford)
  • Wikipedia: Markov chain, Markov property, Chapman-Kolmogorov equations