[MUSIC] This week we're going to play the kids game Tic-tac-toe. And an interesting thing about tic-tac-toe, is that if both players know what they're doing, no one can ever win this game. Okay. Now we're actually going to build a machine player for tic-tac-toe. Given that no one can ever win, your objective is not to create a machine player that can win the game. Rather, you're trying to create a machine player that's good enough that it doesn't lose the game. All right. How are we going to do this? Well, we're going to use Monte Carlo methods. So, what does this mean? Well, when your machine player is trying to make a move, what it's going to do is just play a bunch of random games. And, you're going to look at the results of these random games, and use that information to decide what your next best move is. So let's talk more about how all this works. Okay, if you haven't seen it before, tic tac toe is a simple game played on a three by three grid with x's and o's. And x always goes first and the objective is simple, get three in a row, so let the machine be x here. That's x, I'm going to be o. And it's right here. Okay. And it's a tie even though I didn't play very well there. All right. And you can play again and she player plays there. All right. I'm defending here. Oh, I lost. Okay, so they said no one can win if they know what they're doing. I didn't actually say I knew what I was doing in playing tic-tac-toe, okay? But this is the basics behind the game. I mean, this is what the GUI that we're providing you looks like. You don't have to use this GUI. We also provided a text interface that allows you to see the, the moves, and the board positions on the console. So, what is our machine player actually going to do? We're going to follow this simple strategy outline here. We're given a board, the current state of the game, and which player we are. And from that we then do the following. We run a certain number of trials. This is a Monte Carlo simulation. So, we want to run a large number of trials. Those are going to be random. So, where's the randomness. Well, we start with that current board, and then we just randomly play out a game. So, if we're told we're player x we start by putting an x on the board. Then we put an o and an x and so on. Right, until somebody wins or there's a draw then we have a final position of the board. We take that board and we score it, and we throw it away, and we start all over again. We go back to the initial board and we play another random game. When we're done we score it, we throw it away. Start again with the initial board, play another random game, score it and so on. And once we've done the number of trials that we desire. We then add up all the scores across the trials right. Now we have an aggregate score for all of the, the entire Monte Carlo simulation. Take those scores, and we're going to end up with a score for each grid square. We look at the empty squares on the original board, and we find the empty square that has the highest score. And that's the move were going to pick. Right? If there's only one it's easy, if there's more than one well we just randomly pick, you know, one of the, the squares that has the highest score, right, if there's a tie, okay. And that's it. The questions now is simply how do we assign these scores? So, how to I actually score the board. Well, all right, there are a lot of different ways you can score tic-tac-toe board, but I'm going to show you one that works well with the Monte Carlo methods we're using here. All right, so I, I start with a board here, and this board is the result of a random trial. It doesn't even matter where we started. Okay. This is just the result. Right. What does matter is who's who. So X is the machine, player, and o is the human player, 'kay? And we're going to give each of these constants. We're going to say okay the machine player, I'm going to use the constant MCMATCH to score. And the human player, we're going to use the constant MCOTHER. Okay. These are going to match the costs that are in the template that we've given you, all right? And, what are we going to do with these costs? Well, we're going to add or subtract them to each square on the board, the question is how, all right? Well, first we going to figure out who won the game, all right? Well, it looks like o won the game, okay? I can see three in a row, for o right here, all right. And, so what that means is that I'm going to subtract this MCMATCH value every time there's, I see and x, for y. Well the X plays weren't very good. I didn't win the game. In fact, I lost the game. 'Kay. All right. On the flip side though, o did win the game, so I'm going to add MCOTHER. Every time I see a play, or a human player play, or a y because I actually want to play there, so I can block the human from actually winning this game, all right. Now you have to actually pick these constants all right, to be sensible, but for the point of this demonstration, I'm just going to set them to be one. And to make them different, so that you can tell when I'm adding one or the other. All right? Okay, so then all we do is go through each screed square on the board and we score it. All right, so here's an empty square in the upper left, well that one gets a score of zero right, because I don't know, there was no x or o there, I don't know if it was a good score, good place to play or not. Right here's an x. So I'm going to say, hey that was a bad spot, so I subtract 1. Hee's no, that was a good spot, I'll add 2. Okay, here's an x, subtract 1, subtract 1. Here's an o add 2. Here's an x subtract 1, o add 2. O, add 2. All right, now, I've scored this board. So, basically I have these nine values or however many scores there are if I was playing a different sized game. Okay, and I will now just. Say this is the score for this particular board. I will sum them up, all the scores for all the boards, across all the trials, and then I will get a result, okay. Now, obviously every game's going to be different. And, so let's imagine I have another game here, I'm going to draw it smaller where, x actually wins. So, let's imagine the final state looks something like this, okay. Now, x has won the game. All right. And since x is the machine player still, now I'm going to add MCMATCH. [BLANK_AUDIO] To each location where x is played. And I'm going to subtract MCOTHER. To each play location where o is played. And intuitively again this makes sense, right, I want to try and play in these squares where I actually won the game. So this board you know, I'd get a plus one for each location on this, on this row. And over here, I would subtract two on these locations with o where o'd played, and then all these empty squares would get a value of zero. Okay. All right. So now, again, you have to figure out who won the game, which player is the machine player and then you score the board. We could also have a draw. When you have a draw there's no useful information there, all the squares should just get zero, all right, because it doesn't help me figure out how to win or how to not lose the game. All right. So, how do we implement this? Well it comes down to four functions. We have MC Trial or Monte Carlo Trial. It takes a board and player. So, this is the current board position. And which player the machine is, all right. And you play one random game. This is a Monte Carlo Trial. All right, and we have MC updates scores. This takes the current. You know, running total of the scores. It takes a board, which is a completed game, so this is one of these trials, the outcome of one of these trials, and it takes which player the machine player is, so that you know how to score the board. You score the board, and then you add that board's scores into the total scores, right. And we have get_best_move, which takes the current board, and the scores and tells you which move is best for the machine player to make next, okay? And then finally, we have the mc_move function, which takes the current board, a player, and the number of trials, that you want to use for your machine player, okay? And this is going to use the other functions. To basically make a move. All right. Now, we are giving you a template. But I want to point out the template is missing a lot of stuff this time around. We may be pushing you a little bit outside of your comfort zone. You're going to have to write more things. And you're going to have to write them correctly, because the tester is assuming that you did write them, you know, the, these functions the way that we've asked you to. Okay. All right. It shouldn't be too difficult, all right. But we want you to get used to the fact that you're not always starting with a whole bunch of code in the screen, right. Sometimes you're starting with almost an empty screen. 'Kay. Don't let that deter you. I'm sure that you can do it. So I played tic-tac-toe with my kids. Perhaps you do to. It's not and overly exciting game though, right. But, I want to point out that the techniques that we're teaching you here transcend tic-tac-toe. Monte Carlo methods can be used for all kinds of useful things. Even if we're just talking about games, there are far more complicated games out there as I'm sure you're well aware, right? And If I have a game that has so many possible outcomes, or so many possible ways that the game could play out. From the current position it might be impossible to exhaustively sort of enumerate those possibilities and figure out what the best one is. In that case these kind of Money Carlo methods maybe actually the best that you can do. Now, actually, playing out random games and seeing what happens will give you some good insights into which moves are good and which moves are bad. And there are some good machine players for other types of games. Which do exactly this, so while tic-tac-toe may seem silly, the concepts and techniques you're learning here are extremely valuable.