The Markov Chain Monte Carlo Method

The Markov Chain Monte Carlo (MCMC) method is a powerful computational technique based on Markov chains, which has numerous applications in physics as well as computer science.

Basic formulation

The basic idea behind the MCMC method is simple. Suppose we have a set of states labelled by an integer index $n \in \{0,1,2,\dots\}$ , where each state is associated with a probability $\pi_n$ . For example, in statistical mechanics, for a system maintained at a constant temperature $T$ , each state occurs with probability

: $\pi_n = \frac{\exp\left(-\frac{E_n}{k T}\right)}{Z},$

where $E_n$ is the energy of the state, $k$ is Boltzmann's constant, and $Z$ is the partition function

: $Z = \sum_n \exp\left(-\frac{E_n}{k T}\right).$

From $\{\pi_n\}$ , we would like to calculate various expectation values, which describe the thermodynamic properties of the system. For example, we might be interested in the average energy, which is defined as

: $\langle E\rangle = \sum_n E_n \pi_n.$

The most straightforward way to find $\langle E\rangle$ is to explicitly calculate the above sum. But if the number of states is very large, this is prohibitively time-consuming (unless there is a tractable analytic solution, and frequently there isn't). For example, if we are interested in describing 1000 distinct atoms each having 2 possible energy levels, the total number of states is $2^{1000} \approx 10^{301}$ . Trying to calculate a sum over this mind-boggingly many terms would take longer than the age of the universe.

The MCMC method gets around this problem by selectively sampling the states. To accomplish this, we design a Markov process whose stationary distribution is identically equal to the given probabilities $\pi_n$ . We will discuss how to design the Markov process in the next section. Once we have an appropriate Markov process, we can use it to generate a long Markov chain, and use that chain to calculate moving averages of our desired quantities, like $\langle E\rangle$ . If the Markov chain is sufficiently long, the average calculated this way will converge to the true expectation value $\langle E\rangle$ .

The key fact which makes all this work is that the required length of the Markov chain is usually much less than the total number of possible states. For the above-mentioned problem of 1000 distinct two-level atoms, there are $10^{301}$ states, but a Markov chain of as few as $10^6$ steps can get within several percent of the true value of $\langle E \rangle$ . (The actual accuracy will vary from system to system.) The reason for this is that the vast majority of states are extremely unlikely, and their contributions to the sum leading to $\langle E\rangle$ are very small. A Markov chain can get a good estimate for $\langle E\rangle$ by sampling the states that have the highest probabilities, without spending much time on low-probability states.

The Metropolis algorithm

The MCMC method requires us to design a Markov process to match a given stationary distribution $\{\pi_n\}$ . This is an open-ended problem, and generally there are many good ways to accomplish this goal. The most common method, called the Metropolis algorithm, is based on the principle of detailed balance, which we discussed in the article on Markov chains. To recap, the principle of detailed balance states that under generic circumstances (which are frequently met in physics), a Markov process's transition probabilities are related to the stationary distribution by

: $P(n|m)\, \pi_m = P(m|n)\, \pi_n \quad \mathrm{for}\;\mathrm{all}\;m,n.$

The Metropolis algorithm specifies the following Markov process:

Suppose that on step $k$ , the system is in state $n$ . Randomly choose a candidate state, $m$ , by making an unbiased random step through the space of possible states. (Just how this choice is made is system-dependent, and we'll discuss this below.)
Compare the probabilities and :
- If $\pi_m \ge \pi_n$ , accept the candidate.
- If $\pi_m < \pi_n$ , accept the candidate with probability $\pi_m/\pi_n$ . Otherwise, reject the candidate.
If the candidate is accepted, the state on step $k+1$ is $m$ . Otherwise, the state on step $k+1$ remains $n$ .
Repeat.

Based on the above description, let us verify that the stationary distribution of the Markov process satisfies the desired detailed-balance condition. Consider any two states $a,b$ , and assume without loss of generality that $\pi_a \le \pi_b$ . If we start from $a$ , suppose we choose a candidate step $a\rightarrow b$ with some probability $q$ . Then, according to the Metropolis rules, the probability of actually making this transition, $a\rightarrow b$ , is $q$ times the acceptance probability $1$ . On the other hand, suppose we start from $b$ instead. Because the candidate choice is unbiased, we will choose a candidate step $b\rightarrow a$ with the same probability $q$ as in the previous case. Hence, the transition probability for $b\rightarrow a$ is $q$ times the acceptance probability of $\pi_a/\pi_b$ . As a result,

: $\left\{\begin{align}P(b|a) &= q \times 1 \\ P(a|b) &= q \times \frac{\pi_a}{\pi_b}.\end{align}\right. \quad \Rightarrow \quad P(a|b) \,\pi_b = P(b|a)\, \pi_a$

Since this reasoning holds for arbitrary $a,b$ , the principle of detailed balance implies that the stationary distribution of our Markov process follows the desired distribution $\{\pi_n\}$ .

Expression in terms of energies

In physics, the MCMC method is commonly applied to thermodynamic states, for which

: $\pi_n = \frac{\exp\left(-\frac{E_n}{kT}\right)}{Z}.$

In such cases, the Metropolis algorithm can be equivalently expressed in terms of the state energies:

Suppose that on step $k$ , the system is in state $n$ . Randomly choose a candidate state, $m$ , by making an unbiased random step through the space of possible states.
Compare the energies and :
- If $E_m \le E_n$ , accept the candidate.
- If $E_m > E_n$ , accept the candidate with probability $\exp\left[(E_n - E_m)/kT\right]$ . Otherwise, reject the candidate.
If the candidate is accepted, the state on step $k+1$ is $m$ . Otherwise, the state on step $k+1$ remains $n$ .
Repeat.

Stepping through state space

One way of thinking about the Metropolis algorithm is that it takes a scheme for performing an unbiased random walk through the space of possible states (represented by our candidate choices), and converts it into a scheme for performing a biased random walk. The biased random walk corresponds to a Markov process with the stationary distribution we are interested in.

The way the Metropolis candidates are chosen (i.e., the "unbiased random walk" part) varies from system to system, and once again there are multiple valid schemes that we could employ. For example, suppose we have a collection of 6 atoms, where each atom can be in the level labelled $0$ or the level labelled $1$ . Each state of the overall system is described by a list of 6 symbols, e.g. $011001$ . Then we can make an unbiased walk through the "state space" by randomly choosing one of the 6 atoms (with equal probability), and flipping it. For example, we might choose to flip the second atom:

: $011001 \quad \stackrel{\mathrm{flip}\; \mathrm{second}\; \mathrm{atom}}{\longrightarrow}\quad 001001 \;\;\;\;\; (q = 1/6).$

If we start from the other state, the reverse process has the same probability:

: $001001 \quad \stackrel{\mathrm{flip}\; \mathrm{second}\; \mathrm{atom}}{\longrightarrow}\quad 011001 \;\;\;\;\; (q = 1/6).$

Hence, this scheme for walking through the "state space" is said to be unbiased. Note that, for a given walking scheme, it is not always possible to connect every two states by a single step; for example, in this case we can't go from $000000$ to $111111$ in one step.

There is more than one possible walking scheme; for instance, a different scheme could involve randomly choosing two atoms and flipping them. Whatever scheme we choose, however, the most important thing is that the walk must be unbiased: each possible step must occur with the same probability as the reverse step. Otherwise, the above proof that the Metropolis algorithm satisfies detailed balance would not work.

The Ising model

Problem statement

To better understand the above general formulation of the MCMC method, let us apply it to the 2D Ising model, a simple and instructive model which is commonly used to teach statistical mechanics concepts. The system is described by a set of N "spins", arranged in a 2D square lattice, where the value of each spin $S_n$ is either $+1$ (spin up) or $-1$ (spin down). This describes a hypothetical two-dimensional magnetic material, where the magnetization of each atom is constrained to point either up or down.

Each state can be described by a grid of $+1/-1$ values. For example, for a $4\times 4$ grid, a typical state can be represented as

: $\{S\} = \left\{\begin{align}+1 &\; +1 & +1 & \;-1 \\ +1 &\; -1 & -1 &\; +1 \\ -1 &\; +1 & +1 &\; -1 \\ -1 &\; +1 & -1 &\; -1\end{align}\right\},$

and the total number of possible states is $2^{16} = 65536$ .

The energy of each state is given by

: $E(\{S\}) = - J \sum_{\langle ij\rangle} S_i S_j,$

where $\langle ij\rangle$ denotes pairs of spins, on adjacent sites labelled $i$ and $j$ , which are adjacent to each other on the grid (without double-counting). We'll assume periodic boundary conditions at the edges of the lattice. Thus, for example,

: $\left\{\begin{align}+1 &\; +1 & +1 &\; +1 \\ +1 &\; +1 & +1 &\; +1 \\ +1 &\; +1 & +1 &\; +1 \\ +1 &\; +1 & +1 &\; +1\end{align}\right\} \quad E = -32J.$

: $\left\{\begin{align}+1 & \;+1 & -1 & \;+1 \\ +1 &\; +1 & +1 & \;+1 \\ +1 &\; +1 & +1 & \;+1 \\ +1 &\; +1 & +1 & \;+1\end{align}\right\} \quad E = -24J.$

For each state, we can compute various quantities of interest, such as the mean spin

: $S_{\mathrm{avg}}(\{S\}) = \frac{1}{N}\, \sum_i S_i.$

Here, $\mathrm{avg}$ denotes the average over the lattice, for a given spin configuration. We are then interested in the thermodynamic average $\langle S_{\mathrm{avg}}\rangle$ , which is obtained by averaging $S_{\mathrm{avg}}$ over a thermodynamic ensemble of spin configurations:

: $\left\langle S_{\mathrm{avg}}\right\rangle \;\; = \sum_{\mathrm{possible}\;\mathrm{states} \; \{S\}} S_{\mathrm{avg}}(\{S\}) \; \pi(\{S\}),$

where $\pi(\{S\})$ denotes the probability of a spin configuration:

: $\pi(\{S\}) = \frac{1}{Z}\, \exp\left(- \frac{E(\{S\})}{kT}\right).$

Metropolis Monte Carlo simulation

To apply the MCMC method, we design a Markov process using the Metropolis algorithm discussed above. In the context of the Ising model, the steps are as follows:

On step $k$ , randomly choose one of the spins, $i$ , and consider a candidate move which consists of flipping that spin: $S_i \rightarrow -S_i$ .
Calculate the change in energy that would result from flipping spin , relative to , i.e. the quantity:

where is the change in due to the spin-flip, which is if currently, and if currently. (The reason we calculate , rather than , is to keep the quantities in our program dimensionless, and to avoid dealing with very large or very small floating-point numbers. Note also that we can do this calculation without summing over the entire lattice; we only need to find the values of the spins adjacent to the spin we are considering flipping.)
- If $Q \le 0$ , accept the spin-flip.
- If $Q > 0$ , accept the spin-flip with probability $\exp(-Q)$ . Otherwise, reject the flip.
This tells us the state on step $k+1$ of the Markov chain (whether the spin was flipped, or remained as it was). Use this to update our moving average of $S_{\mathrm{avg}}$ (or whatever other average we're interested in).
Repeat.

The MCMC method consists of repeatedly applying the above Markov process, starting from some initial state. We can choose either a "perfectly ordered" initial state, where $S_i = +1$ for all spins, or a "perfectly disordered" state, where each $S_i$ is assigned either $+1$ or $-1$ randomly.

In some systems, the choice of initial state is relatively unimportant; you can choose whatever you want, and leave it to the Markov chain to reach the stationary distribution. For the Ising model, however, there is a practical reason to prefer a "perfectly ordered" initial state, for the following reason. Depending on the value of $J/kT$ , the Ising model either settles into a "ferromagnetic" phase where the spins are mostly aligned, or a "paramagnetic" phase where the spins are mostly random. If the model is in the paramagnetic phase and you start with an ordered (ferromagnetic) initial state, it is easy for the spin lattice to "melt" into disordered states by flipping individual spins, as shown in Fig. 1:

Fig. 1: Progress of a Monte Carlo simulation of an Ising model with an ordered initial state, for a $30\times30$ lattice with $J/kT = 0.25$ . The ordered spin lattice "melts" into a disordered configuration, which is the thermodynamic equilibrium for this value of $J/kT = 0.25$ .

In the ferromagnetic phase, however, if you start with a disordered initial state, the spin lattice will "freeze" by aligning adjacent spins. When this happens, large domains with opposite spins will form, as shown in Fig. 2. These separate domains cannot be easily aligned by flipping individual spins, and as a result the Markov chain gets "trapped" in this part of the state space for a long time, failing to access the more energetically favorable set of states where most of the spins form a single aligned domain. (The simulation will eventually get unstuck, but only if you wait a very long time.) The presence of domains will bias the calculation of $S_{\mathrm{avg}}$ , because the spins in different domains will cancel out. Hence, in this situation is better to start the MCMC simulation in an ordered state.

Fig. 2: Progress of a Monte Carlo simulation of an Ising model with a disordered initial configuration, for a $30\times30$ lattice with $J/kT = 1$ . As the disordered spin lattice "freezes", it forms long-lasting domains which can interfere with calculations of $S_{\mathrm{avg}}$ .

Computational physics notes | Prev: Markov chain