tl;dr Monte Carlo

MCMC

Basic Monte-Carlo description and theory.

Robert Settlage
2022-06-01

Monte-Carlo summary. Much of this is following notes from Scotland Lemon’s MCMC class in Spring 2017 at Virginia Tech.

Monte Carlo

Class or set of algorithms to produce a random and approximate solution to a problem in a \(\textbf{fixed}\) amount of time. As opposed to Las Vegas algorithms that produce a solution of fixed tolerance in an unknown amount of time.

Monte Carlo methods rely on sampling \(iid\) from the distribution to then compute the estimate. The class and what I am interested in more are random walks where the next sample is dependent on the current sample. This is more aptly described as Markov Chain Monte Carlo.

General idea

if \(x\sim p(x)\), where we can sample from \(p(x)\), but we want \(g(x)\), then

\[ \begin{eqnarray} \tag{1} E[g(x)] &=& \int_{\Omega_x}g(x)p(x)dx \\ &\approx& \frac{\sum g(x)}{N} \text{where }x_i\sim p(x) \end{eqnarray} \]

Example 1

We want \(E[u]\) where \(u\sim unif[0,1]\). The pdf of x is

\[ \begin{equation} \tag{2} x \sim \begin{cases} 1 \text{ if } 0 \le u \le 1 \\ 0 \text{ otherwise} \end{cases} \end{equation} \] So we could do this analytically as:

\[ \begin{equation} \tag{3} E[u] = \int_0^1 u\;p(u)\;du = \frac{u^3}{2} \mid _0^1 = \frac{1}{2}\text{ similarly, var}(u)=\frac{1}{12} \end{equation} \] Alternatively, using Monte-Carlo approximation, we would follow:

  1. init \(u_0 = runif(1)\)
  2. for \(i=1:N\)
    \(u_i = runif(1)\)
    end

Now:

\[ \begin{eqnarray} \tag{4} mean(u) &=& \frac{\sum_{i=1}^N u_i}{N} \approx \int_0^1 u\;p(u)\;du \\ \tag{5} Var(u) &=& \frac{\sum_{i=1}^N (u_i-E[u])^2}{N} \approx \int_0^1 (u_i-E[u])^2\;p(u)\;du \end{eqnarray} \]

Quick try with N going from 100 to 10,000:

set.seed(12475)
draws <- runif(10000)
results <- data.frame(means=rbind(mean(draws[1:100]),mean(draws[1:1000]),mean(draws)),
                      vars=rbind(var(draws[1:1000]),var(draws[1:100]),var(draws)),
                      row.names = c("100","1,000","1,0000"))
kable(results,digits = 4)
means vars
100 0.5131 0.0848
1,000 0.4950 0.0816
1,0000 0.5009 0.0856

Looks like the mean is approaching the desired value of 0.5.

Example 2

\(x\sim N(\mu=10, \sigma^2=1)\)

Want \(E[x^4]\):

\[ \begin{equation} \tag{6} E[x^4] = \int_{-\infty}^{\infty} x^4\;p(x)\;dx = \int_{-\infty}^{\infty}x^4\;\frac{1}{\sqrt{2\pi}}e^{-\tfrac{(x-E[x])^2}{2}}\;dx = ?? \end{equation} \] We may be able to do this integral using some tricks, but instead, perhaps use Monte Carlo:

  1. init \(x_0 = rnorm(10,1)\)
  2. for \(i=1:N\)
    \(x_i = rnorm(10,1)\)
    end

Now:

\[ \begin{equation} \tag{7} E[X^4] \approx \frac{\sum_{x_i\sim N(10,1)}x_i^4}{N} \end{equation} \]

set.seed(12475)
draws <- rnorm(10000,10,1)^4
results <- data.frame(means=rbind(mean(draws[1:100]),mean(draws[1:1000]),mean(draws)),
                      row.names = c("100","1,000","1,0000"))
kable(results,digits = 4)
means
100 10258.86
1,000 10608.94
1,0000 10615.04

It appears to be converging to what we would hope would be close to 10k.

Example 3

\(x\sim exp(\lambda)\)

Want:

\(E[e^{sin(x)}]\) where \(0 \le x \le \pi\)

\[ \begin{eqnarray} \tag{8} E[e^{sin(x)}\mid 0 \le x \le \pi] &=& \int_0^{\pi} e^{sin(x)} \lambda e^{-\lambda x} dx \\ &=& \int_0^{-\infty} \delta(x)_{[0,\pi]}e^{sin(x)} \lambda e^{-\lambda x} dx \\ &\approx& \frac{1}{N}\sum_{x_i \sim exp(\lambda)} \delta(x)_{[0,\pi]}e^{sin(x)} \end{eqnarray} \]

The key here is the N is from samples of \(x_i\) that pass the criteria, ie pass the delta function.

Bias, variance and convergence

Couple things to clean up to close, is the estimate biased and how does it converge? For this, let’s assume we are looking at

\(u\stackrel{iid}{\sim}unif(0,1)\text{; i=1..N}\)

\(E[u]\approx \frac{\sum_{u_i\stackrel{iid}{\sim}unif(0,1)}u_i}{N}=\hat{M}\) and similar for variance.

Bias

Is the estimate biased?

\[ \begin{eqnarray} \tag{8} E[\hat{M}] &=& E\left[\frac{\sum_{i=1}^Nu_i}{N}\right] \\ &=& \frac{\sum_{i=1}^N E[u_i]}{N} \\ &=& \frac{\sum_{i=1}^N E[u]}{N} \text{ b/c iid}\\ &=& E[u] \\ &=& M \end{eqnarray} \]

\[ \begin{eqnarray} \tag{9} Var[\hat{M}] &=& Var\left[\frac{\sum_{i=1}^Nu_i}{N}\right] \\ &=& \frac{\sum_{i=1}^N Var[u_i]}{N^2} \\ &=& \frac{\sum_{i=1}^N Var[u]}{N^2} \text{ b/c iid}\\ &=& \frac{Var[u]}{N} \\ &\xrightarrow[]{N\rightarrow \infty}& 0 \end{eqnarray} \]

So our estimate is unbiased with zero variance. Can we say anything about convergence in probability?

\[ \begin{eqnarray} \tag{10} Pr(\mid \hat{M}-M \mid \lt \epsilon) &=& Pr\left(\mid \hat{M}-M \mid \lt \delta \cdot \sqrt{V/N}\right) \text{ where V=Var(M)} \\ &=& Pr\left(\frac{\mid \hat{M}-M \mid}{\sqrt{V/N}} \lt \delta\right) \text{note this is }N(0,1)\lt\delta\\ &\approx& Pr\left(\frac{\mid \hat{M}-M \mid}{\sqrt{\hat{V}/N}} \lt \delta\right) \text{note }\hat{V}\text{ is sample estimate of variance, now this is }t(0,1)\lt\delta \end{eqnarray} \]

And by the CLT, we can make probabilistic statements. We also note our probabilistic error bound only depends on N through \(\sqrt{\frac{V}{N}}\approx\sqrt{\frac{\hat{V}}{N}}\) so that for any dimensional problem, the error descreases at the rate of \(N^{-\tfrac{1}{2}}\) such that we are \(O(N^{-\tfrac{1}{2}})\) convergence.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/rsettlage/rsettlage.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Settlage (2022, June 1). Robert E. Settlage: tl;dr Monte Carlo. Retrieved from https://rsettlage.github.io/mcmc/2022-06-01-monte-carlo-tldr/

BibTeX citation

@misc{settlage2022tl;dr,
  author = {Settlage, Robert},
  title = {Robert E. Settlage: tl;dr Monte Carlo},
  url = {https://rsettlage.github.io/mcmc/2022-06-01-monte-carlo-tldr/},
  year = {2022}
}