Monte Carlo Methods and
The Metropolis Algorithm

Contents

Motivation: The Problem Of Summing
Probability models arise naturally in many fields. However, the systems can easily become so large that the "Expected Values" within the models cannot be evaluated using straightforward sums.

Let "Nature" Do All The Work
The problems with large systems are solved by adding not all terms but rather only a carefully selected set of "representative terms", where "representative" reflects the order in which things might be seen in a real experiment.

The Monte Carlo Method: Simulating Nature By Random Calculations
The (very long) sums of statistical physics can be valuated using a computer simulation of "nature". This involves the use of random numbers to choose among various possible paths in the evolution of a system.

The Metropolis Algorithm For Random Updates
The algorithm of choice for Monte Carlo analysis of the Ising Model involves repeated, probabilistic updates of individual spins. This leads to a collection of system "snapshots" which are consistent with the overall system probability function.


Motivation: The Problem Of Summing

Probability models are common in a variety of areas, ranging from physics and chemistry to sociology and finance. The simplest probability models involve systems with a finite number (N) of distinct outcomes and non-zero probabilities which can be assigned to these events:

The system could be very simple. For example, there are 36 possible outcomes from tossing a pair of dice and, if the dice are fair, the probabilities are P=1/36 for each of the outcomes. Or, the system could be incredibly complicated, such as the Ising lattice systems of this unit, with an almost incomprehensible number of possible states and state probabilities which vary over many orders of magnitude.

Consider some probability system (that is, a set of outcomes and values for the probabilities of these outcomes), and let F be some numerical function of these outcomes. For example, in the dice model, one could consider F(A) to be the total number of spots showing for outcome A. As a more complicated system, the set {A} could be the set of possible future price movements for some financial security, and F(A) could be a payoff function depending on the price of the stock at some "option expiration date". A particularly important quantity in such situations is the Expected Value of the function F (e.g., an investor will be really interested in knowing the expected income from a purchase). This quantity is given by a weighted sum over all possible outcomes:

Equation 1

This sum considers each possible outcome, evaluates the function F for that outcome, and then "weights" the function value by the probability that the selected outcome actually occurs. The expected value is thus an average of sorts, with the "of sorts" part meaning that really rare events (i.e., P(A) is very tiny) don't count much in evaluating the "average".

The Expected Value is a key concept in statistical physics, and many measurable quantities for a macroscopic chunk of matter are simply related to expected values of "atomic level" functions. Probabilities of the various microscopic states are given by the Boltzmann function:

Equation 2

In principle, then, the "averaging" problem for statistical physics would seem to be solved. Given a physical model, we could evaluate the energies E(S) for each of the possible (microscopic) states, and Equation 2 would then give the probabilities of these states. To evaluate any "average value" function of the system, we need only perform the sum in Equation 1.

However, as emphasized in the Ising Model Introduction document, the number of terms in summation in Equation 1 is incomprehensibly large. (If we could somehow use all computing resources on the planet for one full year we could only evaluate Equation 1 for "trivially simple" problems.)

In practice, the "formal" solution in Equation 1 is worthless!


Let "Nature" Do All The Work

Due to the "exponential explosion" in the number of possible states, it is simply impossible to calculate expected values from the definition in Equation 1. On the other hand, for any real system, if is fairly simple to estimate this quantity from a number of measurements.

To have a definite example in mind, let's pretend that the 2-dimensional Ising model described in the Ising Model Introduction document is a "real" system and, more importantly, let's assume that one can do actual experiments to measure the total magnetization of the system. The theoretical model for the average total magnetization is simply

Equation 3

For any particular state S, N(UP) and N(DOWN) are the number of individual sites with spins pointing up and down. The sum in Equation 4 is over all possible states, and the probabilities for each state are given by Equation 2.

We cannot evaluate the predicted magnetization in Equation 3, but we can imagine doing an experiment to measure the magnetization of the (assumed) real system. In an absolutely idealized world (which we assume), such a measurement provides an instantaneous "snapshot" of the fluctuating physical system. For the particular state at the time of the measurement, we actually measure the relative number of Up and Down spins.

A good experiment is generally not limited to a single measurement, and instead it is common to average the the values from a number of separate measurements.

Equation 4

("L" indicates the number of measurements.)

Unlike the computationally overwhelming sum in Equation 1, one often gets reasonable values from the "experimental sum" in Equation 4 with a modest number of measurements.

"Status Report:"

One of the most important results in all of Statistics (not just statistical physics) is the fact that these two sums are essentially the same thing.:

Equation 5

That is,

The Law of Large Numbers is so important that it is worth rephrasing it in a slightly poetic fashion:

Nature freely (or at least cheaply) gives what man and all his computers cannot accomplish.

(The sum on the left hand side of Equation 5 has a "godzillion" terms, while that on the right hand side might well have only a few hundred.)

The solution to the "summation problem" of the first section is now clear: the exact but impossible expression in Equation 1 must be replaced by an algorithm which approximates the "natural sampling" of Equation 2.


The Monte Carlo Method
Simulating Nature By Random Calculations

There are, in fact, two significant differences between the sums in Equation 1 and Equation 4:

  1. The sum in Equation 4 has far, far fewer terms.
  2. There are no "Probability Factors" P(S) in any of the terms being added in Equation 4.

Thinking about these differences a bit leads to the identification of two obvious problems with the "theory sum" in Equation 1:

  1. The formal sum spends as much computational effort on that are highly unlikely (i.e., P(A) is extremely small) as it does with the important terms.

    That is: Equation 1 leads to a lot of effort "computing zero".

  2. Equation 1 effectively recomputes the same thing over and over again, corresponding to simple transformations (e.g., rotations) of a single physical state.

    That is: Equation 1 can be highly redundant.

Any effective numerical alternative to the brute force summation in Equation 1 needs to avoid these two problems. The most effective approach is to mimic the way nature works:
Alternative Evaluation Strategy
  1. Replace the complete summation in Equation 1 by a (vastly) restricted sum over "representative" states.

  2. Select these representative states carefully according to the full probability function of Equation 2, meaning:

    The frequency with which some class of events is included in the representative sum must be the same as the actual probability for that class.

Ignoring, for the moment, the question of how the selection of representative states is actually done, this leads to a fairly simple "flow chart" of a computer algorithm which simulates the "experimental evaluation" of Equation 4:

Schematic: Simulation Of Equation 4

Each pass down through the three boxes amounts to a single computer simulation of an "experimental" event. In the first (and trickiest) step, random numbers are used to generate a state - such as the specification of all spins in an Ising model. The next two steps are "data analysis", performing the same calculations that might be done in a real experiment. The event generation loop is simply repeated until one has produced a large enough sample size "L" for the evaluation of Equation 4.

The usefulness of this procedure clearly depends on the ability of the computer code to generate random events "just like nature". That is, the frequency with which the algorithm generates any particular type of events must be the same as that specified by the Boltzmann distribution:

Equation 6

This correspondence is required in order to justify the use of a small sum to evaluate predicted expectations (as in Equation 4).

There are many different ways to design computer algorithms which satisfy Equation 6. Most of these algorithms share two common features:

  1. Complicated states and systems are built up through a number of simpler "decision steps".
  2. Choices at the decision steps are made by comparing (known) decision probabilities with computer-generated random numbers.
These common features are easily illustrated in terms of the simple decision process illustrated below.

Illustration of a "Decision Process"

Suppose that the algorithm as gotten to the box labelled "Old State". This means that all details of the Old State are specified (and "known" within the computer program). The algorithm must now decide between two alternatives for the next step. The procedure used is straightforward:

  1. The probability (P) for moving from the Old State to Future Option 1 is evaluated (this can be done since all details of the Old State are known).
  2. A random number (R) is selected (using standard routines available on the computer), with R in the interval [0,1].
  3. If P > R, the algorithm selects the "Future Option 1" path (completely discarding Option 2 for purpose of the current simulated event).
  4. If P < R, Option 2 is selected instead.
If we imagine repeating this decision process a large number of times from some fixed "Old State", then the fraction of times in which the algorithm chooses the upper branch in decision process will be close to the true probability, P. That is, this very simple procedure satisfies the consistency requirement in Equation 6.

A Monte Carlo Calculation is simply a sequence of random choices. This reliance on chance is the origin of the name of the procedure, referring to the famous casino in Monaco.


The Metropolis Algorithm For Random Updates

There are many, many forms for Monte Carlo calculations depending on the details of the problem at hand. The standard technique for Statistical Physics simulations is the Metropolis Algorithm, which can be described as follows:

The decision to flip a particular spin in the second step is done by using the Boltzmann probability function of Equation 2 and the Ising Model for the system energy, as described in the Ising Model Introduction document. The ratio of the probabilities before and after the possible spin flip is given by

Equation 7

The "Random Spin Flipping" algorithm uses this ratio as follows:

  1. The energy difference between the "After" and "Before" states is evaluated using the Ising Model energy expressions.

  2. If the energy difference is negative (i.e., the transition reduces the overall system energy), the spin flip is immediately accepted.

  3. If the energy difference is positive, a random number R is generated and the spin flip is accepted only if R is less than the ratio in Equation 7.

And that's it! The random decision in the third step is the key to ensuring that the constraint in Equation 6 is (eventually) satisfied.

The Ising Model code included as part of this unit is ultimately a simple package for probabilistic decisions based on Equation 7. Indeed, most of the actual code is devoted to extraneous "junk", such as checking bad entries in user requests and producing graphics output files.