[MUSIC] Hello. My name is Lee. You may remember me from the previous course. I'm a professor of statistics and applied math here at University of California, Santa Cruz. Here I'm going to give you a demonstration of using Markov chain Monte Carlo, to estimate posterior probabilities in a simplified case, where we can actually work out the correct answer in closed form, and show that the Metropolis–Hastings algorithm is indeed working, and giving us the right answer. If you recall from the previous course, the example where your brother or maybe your sister, has a loaded coin that you know will come up heads 70% of the time. But they come to you with some coin, you're not sure if it's the loaded coin or a fair coin, and they want to make a bet with you. And you have to figure out which coin this is. Suppose you have a prior probability that it's a 60% probability, that they'll bring a loaded coin to you. They let you flip it five times, and you get two heads and three tails. And then you need to figure out, what's your posterior probability that this is a loaded coin. I've written this all out in advance. Our unknown parameter theta, can either take the values fair or loaded. Our prior for theta is the probability of theta equals loaded, is 0.6. Our prior for probability that we think they're bringing us the loaded coin. Our likelihood will follow a binomial distribution, depending upon the value of theta. Our posterior then, we can look at posterior for theta, given that we saw x equals two heads, posterior is the likelihood times the prior, divided by a normalizing constant. In this case, we can work out the binomial and our prior. And we see that we get these expressions at the end. We get posterior probability of theta is loaded given that we saw two heads, to be 0.388. This is all review from the previous course so far. But suppose we had a more complicated problem, where we couldn't work this all out in closed form? We'll know the likelihood and the prior, but we may not be able to get this normalizing constant. Can we instead do this by simulation? And indeed, yes we can. We can do this with Markov chain Monte Carlo. In particular, using the Metropolis–Hastings algorithm. What we'll do is, we'll set up a Markov chain whose equilibrium distribution has this posterior distribution. So we'll consider a Markov chain with two states, theta equals fair and theta equals loaded. And we'll allow the chain to move between those two states, with certain transition probabilities. We set this up using this using the Metropolis–Hastings algorithm. So under the Metropolis–Hastings algorithm, step one is we start at an arbitrary location. And in this case we can start at either theta not equals fair, or theta not equals loaded. It doesn't really matter where we start, we'll be moving back and forth and we're going to look at the long term running average, the long term simulations. So the key is we'll be simulating. So we'll run m simulations and in each iteration, we'll propose a candidate and either accept it or reject it. So the first part is we're proposing a new candidate. We'll call this candidate theta star. And we're going to propose it be the other state compared to where we are now. So where we are now is theta sub i minus one. And so we'll propose to move. If our current state is fair, we'll propose theta star to be loaded. If our current state is loaded, we'll propose theta star to be fair. Then we want to think about, what's our acceptance probability alpha? The general form for alpha is g of theta star divided by q of theta star given i minus one, over g of theta i minus one divided by q of theta i minus one given theta star. In this case, we're going to be using our un-normalized likelihood times prior, this section. So this is f(x=2) given theta star, times f(theta star) divided by the q function, in this case the q function is one. This would be over the likelihood for theta i minus one, times the prior for theta i minus one. Our q function here is always going to be one, because we have a deterministic proposal. We're proposing it to be the other thing. So with probability one we're going to propose theta to be the other state. And so it's really easy here, it's just a one. So if theta star equals loaded, then we can see alpha equals this value, 0.007794 divided by this value here for fair, 0.0125. If theta star equals fair then alpha is just the reciprocal of this. So when theta star is loaded, we get a 0.635. If it's fair, we get a 1.574. Given these probabilities, we then can do the acceptance or rejection step. The easier one is that a theta star equals fair, alpha is bigger than one, so we always accept. And so in setting accepting we then set, theta i equals fair. If the theta star equals loaded. Alpha equals 0.635. So we accept theta star with probability 0.635. And if we accept it. Set theta i = loaded. Otherwise. Set theta i = theta i- 1, it doesn't accept, it stays in that same old fair state. We can draw this out as a Markov chain with two states. Fair and loaded. If it's in the loaded state, it will move with probability one to the fair state. If it's in the fair state, it will move with probability 0.635 to the loaded state. And with probability 0.365 it will stay in the fair state. And so here's a little diagram for this Markov chain with two states. In which case it will move back and forth with certain probabilities. Thus if we wanted to find our posterior probability, f(theta=loaded given x=2). We can simulate from this Markov chain using these transition probabilities. And observe the fraction of time that it spends in the state theta equals loaded. And this gives us a good estimate of the posterior probability that it's the loaded coin. In this particular case, we can also show that this gives us the theoretical right answer. If you've seen a little bit of theory of Markov chains. We can say that a Markov chain with transition probability capital P, has stationary distribution pi. If pi times P equals pi. That's the definition of a stationary distribution. Here we have a transition probability matrix P, where we can think about fair and loaded. Moving from the fair state, remaining in the fair state happens with probability 0.365. And it moves from fair to loaded, with probability 0.635. If it's in the loaded state, we'll move to the fair state with probability one, and it will stay in the loaded state with probability zero. In this case we want our stationary distribution to be the posterior probabilities. Which you can recall are 0.612 of being fair and 0.388 of being loaded. And so indeed, if you do just the minimal amount of matrix algebra, you can see that 0.612, 0.388 Multiplied by this matrix, 0.365, 0.635, 1, 0, does indeed give you 0.612 and 0.388, at least to within rounding error. Thus in this case we can see, that we do get the correct stationary distribution for the Markov chain using the Metropolis–Hastings algorithm. And that when we simulate it, we do get correct estimates then of the posterior probabilities. This is a nice simple example where we can work out the posterior probabilities in closed form. We don't actually need to run Markov chain Monte Carlo. But this method is very powerful, because all we need is to be able to evaluate the likelihood and the prior, we don't need to evaluate the full posterior and getting that normalizing constant. And so this applies to a much broader range of more complicated problems. Where we can use Markov chain Monte Carlo to simulate, to be able to get these probabilities. We'll make good use of this in the rest of this course. [MUSIC]