Robert E. Settlage: Hobbs-Hooten Chapter 3

Robert Settlage

The models we create are deliberate approximations of truth. Deliberate in that we intentionally (or not) simplify the model to something understandable and tractable with the data at hand. Statistics help us gain insight into the approximations by understanding and quantifying our uncertainty. The main sources of variation include:

Process variance

Process variance includes all the uncertainty associated with our imperfect model specification. Good models have low process variance because they capture the majority of the variation in the state they predict. The only way to reduce process variance is to improve our model, ie collecting more of the same data, improving our instrumentation, etc will not help with a poorly specified model. This highlights the need to separate process variance from other sources of variance such as observation variance.

Observation variance

Observation variance is the variance associated with an actual observation of the state of the system. The two causes of observation variance are sampling from a larger population and potential biases in how we collect the observations. Sampling variance is reduced through more sampling while corrections in sampling bias becomes more certain with additional samples.

Individual variation

For processes where we are interested in individuals, differences in individuals themselves give rise to uncertainty. Spatial location can also be thought of the same way as individual variations.

Model selection uncertainty

This uncertainty arises from our choice of a specific model. In many cases, we may not care about the uncertainty associated with a specific model, in others, we may want to quantify the uncertainties associated with different choices.

Rules of probability

Finally. Something both concrete and squishy. ;)

We are seeking to learn about unobserved quantities from data, observed quantities. Bayesians treat all unobserved quantities as random variables. Random variables, are, well random and can take on a range of values due to chance. Chance, being governed by the rules of probability, and taking values according to some probability distribution.

Sample space, outcomes, and events

outcome - a possible result of an experiment
event - a set of possible outcomes of an experiment
sample space – set of all possible outcomes of an experiment

Consider rolling a 6-sided die. The sample space contains the numbers 1-6 as outcomes. An event could include evens (2,4,6) vs odds (1,3,5).

Conditional, independent, and disjoint probabilities

Given 2 events, we can talk about conditional, independent, and disjoint probabilities.

Conditional – if the probability of occurrence of event B is dependent on event A, this means that knowledge of occurrence of event B changes the probability of event A. For instance, suppose we are pulling balls out of a bag where 2 of the 5 balls in the bag are blue, the rest are red. The probability of choosing a blue ball on the first draw is 2/5. After drawing a blue ball, now drawing a second ball, the probability of pulling a second blue ball is 1/4. This is written as:

\[ \tag{1} P(A \mid B) = \frac{P(A,B)}{P(B)} \]

independent

If occurrence of one event does not change the probability of the other, the events are said to be independent. For example, flipping a coin twice. The result of heads on the first flip does not influence the second flip. This is written as:

\[ \begin{eqnarray} \tag{2} P(A \text{ and } B) = P(A,B) = P(A) \ast P(B) \\ \tag{3} P(A|B) = P(A) \end{eqnarray} \]

disjoint

If two events are disjoint, the probability of both events occurring is 0. Written as:

\[ \tag{4} P(B\mid A) = \frac{P(A,B)}{P(A)} = 0 \]

NOTE, this is different than independent. Knowing that event A occurred in a pair of disjoint events leads to the knowledge that event B has NOT occurred. For a pair of independent events, this statement can not be made.

misc helpful equations

Union or inclusive or \[ \tag{5} P(A \cup B) = P(A) + P(B) - P(A,B) \]

Sample space (S) is partitioned into n disjoint sets (\(B_n\)) and we are interested in event A that may overlap 1 or more events \(B_n\): \[ \tag{6} P(A) = \sum_n P(A|B_n)P(B_n) \]

Or, as n approaches infinity: \[ \tag{7} P(A) = \int [A|B][B] dB \]

Factoring joint probabilities

It will be helpful to simplify our thoughts later if we can factor joint probabilities. To do so, starting from a rearrangement of (1)

\[ \tag{8} P(A,B) = P(A\mid B)P(B) \]

We can then write:

\[ \tag{9} P(a_1,a_2, \dots a_n\mid p_1,p_2,\dots p_n ) = \prod_{i=1}^n P(a_i\mid \{p_i\}) \]

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/rsettlage/rsettlage.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Settlage (2022, July 3). Robert E. Settlage: Hobbs-Hooten Chapter 3. Retrieved from https://rsettlage.github.io/books/2022-07-03-hobbs-hooton-ch3/

BibTeX citation

@misc{settlage2022hobbs-hooten,
  author = {Settlage, Robert},
  title = {Robert E. Settlage: Hobbs-Hooten Chapter 3},
  url = {https://rsettlage.github.io/books/2022-07-03-hobbs-hooton-ch3/},
  year = {2022}
}