Chebychev Inequality proof and use.
Quick proof and example of use of the Chebychev Inequality.
Suppose we have \(x\sim f(x), g(x) \gt 0, \epsilon \gt 0\). The Chebychev Inequality is written as:
\[ \begin{equation} \tag{1} Pr(g(x) \ge \epsilon) \le \frac{E[g(x)]}{\epsilon} \end{equation} \]
\[ \begin{eqnarray} \tag{2} E[g(x)] &=& \int_{\Omega_x} g(x) f(x) dx \;\;\;\;\;\;\;\;\;\;\;\;\text{ :def of expectation}\\ &\ge& \int_{\{x: g(x)\ge \epsilon\}}g(x) f(x) dx \;\;\;\;\text{ :limit x}\\ &\ge& \int_{\{x: g(x)\ge \epsilon\}} \epsilon f(x) dx \;\;\;\;\;\;\;\;\text{ :smallest g(x)=}\epsilon \\ &\ge& \epsilon Pr(g(x)\ge \epsilon) \;\;\;\;\;\;\;\;\;\;\;\;\text{ :by def of prob} \end{eqnarray} \]
Slight rearrangment gives desired Chebychev Inequality equation.
Say we observe data \(\mathfrak D\) which was generated by some process \(f(\cdot )\) and we want to know \(Pr(X=\mathfrak D)\). We can use Monte Carlo to approximate this. How?
\[ \begin{equation} \frac{\sum_{x_i \sim f(\cdot)} \mathbb{1}_{\{x_i = \mathfrak D\}}}{N} \end{equation} \] The question may then be, how good is this or how do we measure. We could use relative error:
\[ \begin{equation} \frac{\mid \frac{\sum_{x_i \sim f(\cdot)} \mathbb{1}_{\{x_i = \mathfrak D\}}}{N} - Pr(X=\mathfrak D) \mid }{Pr(X=\mathfrak D)} \end{equation} \] Or, perhaps, we want a bit more and want to know the probability the error is within some bound:
\[ \begin{equation} Pr\left(\frac{\mid \frac{\sum_{x_i \sim f(\cdot)} \mathbb{1}_{\{x_i = \mathfrak D\}}}{N} - Pr(X=\mathfrak D) \mid }{Pr(X=\mathfrak D)} \le \epsilon \right) \end{equation} \]
Now this is starting to feel like Chebychev. What if we were to square the inner terms and follow along with Chebychev:
\[ \begin{eqnarray} Pr\left(\frac{\mid \frac{\sum_{x_i \sim f(\cdot)} \mathbb{1}_{\{x_i = \mathfrak D\}}}{N} - Pr(X=\mathfrak D) \mid^2 }{Pr(X=\mathfrak D)^2} \le \epsilon^2 \right) &\le& \frac{E\left[\left(\frac{\sum_{x_i \sim f(\cdot)} \mathbb{1}_{\{x_i = \mathfrak D\}}}{N} - Pr(X=\mathfrak D)\right)^2\right]}{\epsilon^2 Pr(X=\mathfrak D)^2} \\ &\le& \frac{Var\left(\frac{\sum_{x_i \sim f(\cdot)} \mathbb{1}_{\{x_i = \mathfrak D\}}}{N}\right)}{\epsilon^2 Pr(X=\mathfrak D)^2} \\ &\le& \frac{Var\left(x_i=\mathfrak D\right)}{N^2 \epsilon^2 Pr(X=\mathfrak D)^2} \text{ numerator is just iid Bernoulli} \\ &=& \frac{\left(1- Pr(x=\mathfrak D) \right) Pr(X=\mathfrak D)} {N\epsilon^2Pr(X=\mathfrak D)^2} \\ &=& \frac{\left(1- Pr(x=\mathfrak D) \right) } {N\epsilon^2Pr(X=\mathfrak D)} \end{eqnarray} \] This leads to some insights. As N increases, error decreases. Additionally, if \(Pr(X=\mathfrak D)\) is small, error is relatively larger for fixed N. Finally, if we pick the # of successes, then the error rate is fixed by N. This last observation is the Las Vegas alternative to Monte Carlo.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/rsettlage/rsettlage.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Settlage (2022, June 12). Robert E. Settlage: tl;dr Chebychev Inequality. Retrieved from https://rsettlage.github.io/mcmc/2022-06-12-chebychev-tldr/
BibTeX citation
@misc{settlage2022tl;dr, author = {Settlage, Robert}, title = {Robert E. Settlage: tl;dr Chebychev Inequality}, url = {https://rsettlage.github.io/mcmc/2022-06-12-chebychev-tldr/}, year = {2022} }