Robert E. Settlage: tl;dr Accept-Reject

Robert Settlage

Accept-reject, but following:

Smith and Gelfand, Bayesian statistics without tears: A sampling-resampling perspective, 1992 DOI:10.1080/00031305.1992.10475856.

Accept-Reject Alrogithm

We want h(\(\theta\)), but have g(\(\theta\)). Additionally, we know \(h(\theta) = \frac{f(\theta)}{\int{f(\theta) d\theta}}\) and that there exists some constant \(M>0\) where \(\frac{f(\theta)}{g(\theta)} \le M\).

Procedure:

generate \(\theta \sim g(\theta)\)
generate \(u \sim \text{unif}(0,1)\)
if \(u \le \frac{f(\theta)}{Mg(\theta)}\) accept (accept with probability u), else reject
repeat 1-3

Any accepted \(\theta\) is a random variate from \(h(\theta) = \frac{f(\theta)}{\int f(\theta) d\theta}\). The proof is relatively simple and given in about 3 lines in the Smith and Gelfand paper where I will point you if you need to see it.

If we don’t know M:

\[ \begin{eqnarray} \tag{1} M &=& \int f(\theta) d\theta \approx \frac{1}{n} \sum_i \omega_i \text{ , where } \\ \tag{2} \omega_i &=& \frac{f(\theta_i)}{g(\theta_i)} \end{eqnarray} \]

Or alternatively, we could use \(Weighted\ Bootstrapping\) where we draw \(\theta^{\ast}\) from a realization of \(\theta = \{\theta_1 \dots \theta_n\}\). From this we can calculate: \[ \begin{equation} \tag{3} q_i = \frac{\omega_i}{\sum_j \omega_j} \text{ ; } \omega_{[i,j]} \text{ given in (2)} \end{equation} \]

Now \(q_i\) is our sampling probability for drawing from the \(\theta_i\)’s.

Example: Prior \(\rightarrow\) Posterior

We have prior knowledge in \(p(\theta)\) and want to update our knowledge base given new information via the posterior \(p(\theta | \textbf{X})\).

If we let: \[ \begin{eqnarray} \tag{4} f_x(\theta) &=& \ell(\theta|\textbf{X}) p(\theta) \\ \tag{5} g(\theta) &=& p(\theta) \text{, further, if we know } \hat{\theta}_{ML} \\ \tag{6} M &=& \ell(\hat{\theta|}\textbf{X}) \end{eqnarray} \]

So, returning to our procedure, we need to accept with probability u

\[ \begin{eqnarray} \tag{7} u &\le& \frac{f(\theta)}{Mg(\theta)} \\ &\le& \frac{\ell(\theta|\textbf{X}) p(\theta)}{\ell(\hat{\theta|}\textbf{X})p(\theta)} \\ &\le& \frac{\ell(\theta|\textbf{X})}{\ell(\hat{\theta|}\textbf{X})} \end{eqnarray} \]

Illustrative Example from article

Smith and Gelfand reworked an example originally given by McCullagh and Nelder, Generalized Linear Models, 1989. Essentially, consider two conditionally independent (given their parameters) random variables observed 3 times through their sum:

\[ \begin{eqnarray} \tag{8} \textbf{X}_{i1} &\sim& Binomial(n_{i1}, \theta_1) \\ \tag{9} \textbf{X}_{i2} &\sim& Binomial(n_{i2}, \theta_2) \\ \tag{10} \textbf{Y}_i &=& \textbf{X}_{i1} + \textbf{X}_{i2}; i \in \{1,2,3\} \end{eqnarray} \]

Likelihood:

\[ \begin{equation} \tag{11} \ell(\theta_1,\theta_2 | \textbf{Y}, i \in \{1,2,3\}) = \prod_{i=1}^{3} \left[\sum_{j_i} {{n_{i1}}\choose{j_i}} {{n_{i2}}\choose{y_i-j_i}} \ast \theta_1^{j_i}(1-\theta_1)^{n_{i1}-y_i}\theta_2^{y_i-j_i}(1-\theta_2)^{n_{i2}-y_i+j_i}\right] \end{equation} \]

And are given the following as data:

	i
	1	2	3
\(n_{i1}\)	5	6	4
\(n_{i2}\)	5	4	6
\(y_i\)	7	5	6

Ok, so our data shows observed sums of \(Y_i =X_1 + X_2\) going from 5 to 7 given the unknown \(\theta_{j \in \{1,2\}}\).

Paper example using Weighted Bootstrap

Following the paper, we will consider priors on both \(\theta_1\) and \(\theta_2\) as uniform(0,1) where the authors generated about 1000 \(\theta_1, \theta_2\) pairs. Following suit:

set.seed(15239) # just a random number as seed
observations <- 1000
t1 <- runif(n=observations, min=0, max=1)
t2 <- runif(n=observations, min=0, max=1)
theta_draws <- data.frame(theta1=t1, theta2=t2)

Figure 1: Theta draws from uniform priors

We now sample from these priors using the weighted bootstrap proceedure as outlined above. To get at the weights, we need to start by calculating \(q_i\). Remembering \(\omega_i = \frac{f(\theta_i)}{g(\theta_i)}\), \(g(\theta_i) = p(\theta_i)\) and \(f(\theta_i) = \ell(\theta_i|\textbf(X_i))p(\theta_i)\)

\[ \begin{eqnarray} \tag{12} q_i &=& \frac{\omega_i}{\sum_{j=1}^n \omega_j} \\ &=& \frac{\frac{f(\theta_i)}{g(\theta_i)}}{\sum_{j=1}^n \frac{f(\theta_j)}{g(\theta_j)}} \\ &=& \frac{\ell(\theta_i|\textbf(X_i))}{\sum_{j=1}^n \ell(\theta_j|\textbf(X_j))} \end{eqnarray} \] I have been a little lax in the i, j in this. Here i refers to the ith draw from the priors and j runs across all draws. Note that as we inspect (13), we encounter \(j_i\). This represents the possible values of \(\textbf{X}_i\) given the parameters, \(n_{i1}\) and \(n_{i2}\), and observed \(y_i\), so \(max(0, y_i-n_{i2}) \le j_i \le min(n_{i1}, y_i)\). Let’s code this up and get our \(q_i\)’s.

ji_min_max <- data.frame(min=c(2,1,0),max=c(5,5,4))
omega <- rep(1,observations)
for(k in 1:observations){
  for(i in 1:3){
    ji <- ji_min_max[i,1]
    temp_omega <- 0
    while(ji<ji_min_max[i,2]){
      temp_omega <- temp_omega +
        choose(test_data[1,i],ji)*choose(test_data[2,i],test_data[3,i]-ji)*
        (theta_draws$theta1[k]^ji)*((1-theta_draws$theta1[k])^(test_data[1,i]-ji))*
        (theta_draws$theta2[k]^(test_data[3,i]-ji))*
        ((1-theta_draws$theta2[k])^(test_data[2,i]-test_data[3,i]+ji))
      ji <- ji+1
    }
    omega[k] <- omega[k] * temp_omega
  }
}
q_i <- omega/sum(omega)

Now do the draws according to the \(q_i\)’s and calc the posterior.

thetas_index <- sample(1:observations, size=1000, replace=TRUE, prob=q_i) 
h <- omega[thetas_index]

Figure 2: Posterior with marginals using Weighted Bootstrap

Figure 3: Posterior with marginals using Weighted Bootstrap as a density

Note, no point in the above posterior is NOT in the previous plot of prior draws. The paper appears to have used a different set of points for the plot of the posterior.

Paper example using Accept-Reject following burn-in

Estimate M on the fly and use “burn-in” to iteratively converge on M and do a real accept-reject.

ji_min_max <- data.frame(min=c(2,1,0),max=c(5,5,4))
#quick funtion to calc f(theta)=likelihood, basically took from above
lhood <- function(t1=0.5,t2=0.5,tdata=test_data,ji_minimax=ji_min_max){
  for(i in 1:3){
    ji <- ji_minimax[i,1]
    temp_likelihood <- 0
    current_likelihood <- 1
    while(ji<ji_minimax[i,2]){
      temp_likelihood <- temp_likelihood +
        choose(tdata[1,i],ji)*choose(tdata[2,i],tdata[3,i]-ji)*
        (t1^ji)*((1-t1)^(tdata[1,i]-ji))*
        (t2^(tdata[3,i]-ji))*((1-t2)^(tdata[2,i]-tdata[3,i]+ji))
      ji <- ji+1
    }
    current_likelihood <- current_likelihood * temp_likelihood
  }
  return(current_likelihood)
}

stored_thetas <- rbind(data.frame(theta1=0,theta2=0,likelihood=0,u=0,accept=1,M=0),
                       data.frame(theta1=0,theta2=0,likelihood=0,u=0,accept=1,M=0))
k <- 2
previous_M <- 0
## actual accept/reject
while(sum(stored_thetas$accept)<10000){
  k <- k+1
  # 1. make proposal(s)
  # 2. generate u from uniform(0,1)
  stored_thetas <- rbind(stored_thetas,
                         data.frame(theta1=runif(1,0,1),theta2=runif(1,0,1),
                              u=runif(1,0,1),likelihood=0,accept=0,M=0))
  # 3. if u <= f/(Mg) then accept, else reject, repeat
  
  stored_thetas$likelihood[k] <- lhood(stored_thetas$theta1[k],
                                            stored_thetas$theta2[k])
  # calc running M real quick
  M <- mean(stored_thetas$likelihood[stored_thetas$accept>0])
  # accpet or reject?
  stored_thetas$accept[k] <-
    ifelse(stored_thetas$u[k]<stored_thetas$likelihood[k]/M,1, 0)
  # update M if needed, this is for status indicator
  stored_thetas$M[k] <-
    ifelse(stored_thetas$accept[k]==1,M,stored_thetas$M[k-1])
  if(stored_thetas$accept[k]==1){
    M_change <- stored_thetas$M[k]-stored_thetas$M[k-1]
  }
  
  # little status indicator
  if((k %% 1000)==0){
    cat("current k: ", k, " current accepted: ", sum(stored_thetas$accept),
      " current/change in M:", M," : ", M_change, "\n",sep="")
  }
}

current k: 1000 current accepted: 452 current/change in M:0.1967587 : -0.000284144
current k: 2000 current accepted: 932 current/change in M:0.2024283 : -9.629906e-05
current k: 3000 current accepted: 1417 current/change in M:0.2009935 : -4.114977e-05
current k: 4000 current accepted: 1865 current/change in M:0.2008423 : -9.366024e-05
current k: 5000 current accepted: 2333 current/change in M:0.1992392 : -4.590816e-05
current k: 6000 current accepted: 2804 current/change in M:0.1992795 : 4.192012e-06
current k: 7000 current accepted: 3296 current/change in M:0.1997426 : -1.830395e-05
current k: 8000 current accepted: 3768 current/change in M:0.2011892 : 1.26757e-05
current k: 9000 current accepted: 4201 current/change in M:0.2017363 : -1.827057e-05
current k: 10000 current accepted: 4690 current/change in M:0.2014897 : 4.914835e-05
current k: 11000 current accepted: 5187 current/change in M:0.2015779 : 1.636161e-05
current k: 12000 current accepted: 5657 current/change in M:0.2017894 : -8.658366e-06
current k: 13000 current accepted: 6116 current/change in M:0.2009656 : -2.302136e-05
current k: 14000 current accepted: 6585 current/change in M:0.2010067 : 6.980897e-06
current k: 15000 current accepted: 7054 current/change in M:0.2017694 : 4.251824e-06
current k: 16000 current accepted: 7536 current/change in M:0.2019006 : 6.399018e-06
current k: 17000 current accepted: 8011 current/change in M:0.202628 : -9.512693e-06
current k: 18000 current accepted: 8475 current/change in M:0.2031243 : 1.733376e-06
current k: 19000 current accepted: 8943 current/change in M:0.2029498 : 2.093441e-05
current k: 20000 current accepted: 9452 current/change in M:0.2023991 : 2.138806e-06
current k: 21000 current accepted: 9901 current/change in M:0.2022406 : -1.338133e-05

stored_thetas <- stored_thetas[-c(1,2),]
cat("final M is: ",M, "\n",sep="")

final M is: 0.2021795

Figure 4: Stablizing M

Figure 5: Posterior with density, accpeted only

Figure 6: Posterior with density, rejected only

Alternatives (ideas) for a future post?

different priors
iterate on \(\theta\)’s, ie fix one, sample other, alternate
others … ??

tl;dr Accept-Reject

Accept-Reject Alrogithm

Example: Prior \(\rightarrow\) Posterior

Illustrative Example from article

Paper example using Weighted Bootstrap

Paper example using Accept-Reject following burn-in

Corrections

Reuse

Citation