..

Monte Carlo Integration

What For

In Computer Graphics, we often need to solve integrals (e.g. rendering equation) whose analytic solutions don’t exist or are too computationally expensive to be feasible. Monte Carlo Integration is one of available numerical methods to approximate the integral.

Monte Carlo Estimator

Expectation

Expectation is what you would expect to get from one single sample from the population (a statistics term used to refer to the dataset of interest).

Suppose \(X\) being a random variable describing a random/stochastic process. The expectation or expected value of it is defined as:

$$ E[X] = \int x \: pdf(x) \: \mathrm{d}x $$

Now we have the connection from statistics to integral

Where \( pdf(x) \) is the probability density function of the continuous random variable.

Similarly for discrete random variable:

$$ E[X] = \sum_i x_i \: p_i $$

Law of the Unconscious Statistician

Suppose \(X\) being a random variable, the function \(Y = f(X)\) of \(X\) is naturally a random variable (it can take on any random value as well).

Then the expectation of \(Y\) is as follows, given by Law of the Unconscious Statistician:

$$ E[Y] = \int f(x) \: pdf(x) \: \mathrm{d}x $$

What this basically says is that if we want to compute the expected value of Y, we don’t need to know the probability distribution of Y (thus unconscious) so long as we know the probability distribution of X.

Monte Carlo Integral

Given function \(f(x)\), construct new function \(y = \frac{f(x)}{pdf(x)}\).

If we regard them as random variable X and Y, using the statistics theorems above, we can get:

$$ E[Y] = \int \frac{f(x)}{pdf(x)} \: pdf(x) \: \mathrm{d}x = \int f(x) \: \mathrm{d}x $$

The first equation is given by Law of the Unconscious Statistician, the second is given by the canceling out of \(pdf(x)\).

Here comes the statistics-based Monte Carlo method to calculate the integral of a function.

The expected value \(E\) of the random variable \(Y = \frac{f(X)}{pdf(X)}\) is equal to the integral of function \(f(x)\).

Law of Large Numbers

Now we know E(Y) == Intergral, how do we get E(Y) then?

The Law of Large Numbers theorem indicates that if you have large numbers (denoted N) of sample, the sample mean (denoted \( \overline{X_n} \)) of N samples converges to the expected value of the random variable in question.

$$ \lim \limits_{n \to \infty} \overline{X_n} = E[X] $$

That is to say: if you want to get \(E[X]\), you experiment with large amount of samples, the more you try, the closer the sample mean is to \(E[X]\). Until infinity, sample mean remains approximation.

$$ E[X] \approx \overline{X_n} $$

Summary

To sum up, given \(f(x)\), the steps to approximate the integral is:

  1. Pick a sample size N.
  2. Construct the proxy function \(\frac{f(x)}{pdf(x)}\), as one random variable Y.
  3. Generate samples of size N from a random process X, using whatever distributions you want, so long as it’s random. Be it uniform, normal/gaussian or etc.
  4. You compute the sample mean of N samples, that’s your estimation of the function’s integral.

Estimator

The sample mean \(\overline{X_n}\) of N samples is a function based on gathered samples, we use it to estimate the parameter (the true value) we are interested in (in this case the integral).

A function used like this is called a estimator.

Consistency

A estimator is said to be consistent if it converges to a value given large enough samples.

Suppose a estimator \(T_n\) with N samples, it can be expressed as:

$$ \lim \limits_{n \to \infty} T_n = \theta $$

The basic Monte Carlo estimator is consistent, ensured by Law of Large Numbers, converging to its expected value (which is also the integral of a function).

Bias

The difference between the estimator’s expected value and the true value.

Suppose a estimator \(\hat\theta\) and true value \(\theta\), it can be expressed as:

$$ Bias = E[\hat\theta - \theta] = E[\hat\theta] - \theta $$

The basic Monte Carlo estimator is unbiased, by definition its expected value equals the integral (the true value we are interested in).

Variance

While we expect the expected value from a single sample from the population, since it’s a random process, every sample could be any value.

The variance measures the fluctuation from the expected value, the statistical variation/noise/error is caused by the nature of randomness.

The lower the variance, the more possible we get the expectation, the closer we are to it.

Variance of Sample Mean

Why are we interest in properties of sample mean?

Remember, the basic Monte Carlo estimator is sample mean

The Central Limit Theorem establishes:

  1. The sampling distribution of the sample mean \(\overline{X_n}\) (instead of the sampled random variable itself, in fact, no matter what distribution the sampled random variable is in) tends towards the normal distribution. And further,
  2. \(E[\overline{X_n}] = E[X]\)
  3. \(Var[\overline{X_n}] = \frac{Var[X]}{N}\)

One interesting point is that in probability, if we draw one single sample from \(\overline{X_n}\) (which is itself a random variable, thus the sampling distribution we can study), we are closer to \(E[X]\) than the single draw from \(X\) itself.

Now we know that the variance (or noise, or error) from a single (basic Monte Carlo estimator) sample depends on:

  1. The variance of the sampled random variable (in this case \(Y = \frac{f(X)}{pdf(X)}\)) (NOT the distribution of the integrand function \(f(x)\))
  2. The sample size N. We can also know that the converge rate (the rate towards variance of 0) is measured as \(\frac{1}{N}\)

This gives us 2 directions to reduce the statistical error from the estimation.

Conclusion

Compared to other numerical methods to approximate integral, like Riemann Sum (a deterministic one), the advantage of Monte Carlo Integration (a random one) is that it is irrespective of dimensions.

As can be seen, the noise/error only depends on the variance of the integrand function and the sample size N. Whereas the sample size increases exponentially with number of dimensions for Riemann Sum to get the same quality of approximation.

Generally speaking, Monte Carlo Integration should be preferred for functions in higher dimensions as is usually the case in Computer Graphics.

Bonus

MSE

You may heard of Mean squared error, how does it differ from variance?

  • Variance is the deviation from the estimator’s expected value.
  • While MSE is the deviation from the true value.

So, if the estimator is unbiased, these two are equivalent.

In fact, take \(\hat\theta\) as an estimator of parameter \(\theta\), this can be expressed mathematically:

$$ MSE[\hat\theta] = E[(\hat\theta - \theta)^2] $$ $$ Var[\hat\theta] = E[(\hat\theta - E[\hat\theta])^2] $$ $$ Bias[\hat\theta] = E[\hat\theta - \theta] $$ $$ Var[\hat\theta] = E[\hat{\theta}^2 - 2 \hat\theta E[\hat\theta] + E[\hat\theta]^2] = E[\hat{\theta}^2] - E[\hat\theta]^2 $$ $$ Var[\hat\theta] + Bias[\hat\theta]^2 = E[\hat{\theta}^2] - E[\hat\theta]^2 + (E[\hat\theta] - \theta)^2 = E[\hat{\theta}^2] - 2 \theta E[\hat\theta] + \theta^2 = E[(\hat\theta - \theta)^2] = MSE[\hat\theta] $$ $$ Var[\hat\theta] = MSE[\hat\theta], \quad \text{if } Bias[\hat\theta] = 0 $$

References