So before we begin discussing probability, we need some very basic mathematics. Now everyone listening to this lecture will have had set notation at some point in their life and covered it from a very basic or even more advanced mathematical perspective. In probability, the set notation has the same rules, of course, it's just a subset of ordinary set notation. However, the interpretations of set notation are slightly different. So usually when you talk about set notation, you talk about sum. Uber space that contains everything. Well, in statistics, we call this the sample space and that we usually denote with an upper case omega and this is the collection of all possible outcomes of an experiment. So as in a simple example, let's conduct an experiment. We roll a die, so the possible set of outcomes are one, two, three, four, five or six. Where here, we're not gonna play the sort of mental games that the die could land on an edge or a corner or something like that. It has to roll on one of the numbers and then the sample space would be the integers from one to six. An event is any subset of the sample space. So for example, you could have the event that the die rolls even. I that E is the set containing the numbers two, four and six. Certain kinds of events are so commonly talked about that we give them a separate name. An elementary or simple event is the particular result of an experiment. So for example, if the die roll is a four, then usually we denote this for the lowercase omega. Omega = 4. Here, we don't tend to split hairs about the actual number four and the set containing the element four. But I think in the traditional definition, a simple event is the actual element for not the set containing the element four. But here, I don't think for our purposes that distinction will be necessary and then it's always useful to talk about nothing. So the null event is actually the event that nothing occurs. The null event or empty set and that's usually denoted with a letter here, which I'll just call null. Again, the sets in probability theory follow all the same rules as ordinary set notation. Of course, because it is exactly ordinary set notation, but just with different interpretations. So when we say that an elementary event is an element of an event, then that implies that E occurs when W occurs. So for example, just looking back at the previous slide. If our elementary event is that the die roll is a four and the event is that it is even. If you roll a four, then the roll is even. If the elementary event is not in an event, that implies that E does not occur when W occurs. For example, if the elementary event is a five. Five not being in the set of even number means that when you roll a five, you have not rolled an even number. We can follow along this logic. So for example, E being a subset of F implies that the occurrence of E implies the occurrence of F. So for example, let's take E as the event that the die roll is even, E equals 2, 4, 6 and F is the event that the die is either even or a 5. Hence, F is the event two, four, six and five. So two, four, five, six, then the occurrence of E implies the occurrence of F. That is if you roll an even die roll, then you have also rolled in an element of the set of even die rolls plus five. If the standard set intersection E intersect F implies that both E and F occur. So to give you a specific example of this, imagine that E is the event that the die roll is even, F is the event that the die roll is a prime number. So let's think of what the prime numbers would be on a die roll, that would be two, three and five. So E intersect F means that the die roll is both even and prime that would just be the number two. So the event E intersect F occurs means that you get both a even number and prime number, which of course, in this case, would mean that you get a two. E union F is the standard set notation for union, but in probabilistic interpretation it means that at least one of E or F occur. So in my previous example, it would mean that I either get an even number or a prime number or both in the case of two. If E intersects F is the null set, that means that both E and F cannot simultaneously occur. So imagine E is the set of even numbers, F is the set of odd numbers, then you cannot roll a die that is both even and odd. So, E intersect F will be the null set and that's important enough that we give it its own name. So in bold here, you see its own name. That's called mutually exclusive. So if we say that two events are mutually exclusive, that means that they both cannot occur and you frequently hear people use the phrase mutually exclusive incorrectly. So what it technically means, things are mutually exclusive if they cannot both simultaneously occur. And then the compliment of an event, E compliment or sometimes we might write E with a little bar on top of it, that is the event that E did not occur. So in our case, where E is an even number, two, four or six, E compliment is the odd numbers, one, three and five. Since something and its opposite cannot simultaneously occur, their intersection is always the null set. So, E and E complement are always mutually exclusive. There's some standard set theory facts that we should also just remind you of, there's the famous so-called DeMorgan's laws. So A intersect B complement is A complement union B complement and they way to think about this is this little complement symbol sort of distributes itself across the parentheses to A and B, A complement and B complement and it flips the cap into a cup. And then what's nice is if you look at the second example of DeMorgan's law, A union B complement is a complement intersect B, the same thing happens. This C distributes itself across the parentheses, so you get A complement and B complement. In this case, the cup turns into a cap. So DeMorgan's law basically says, if you complement across and either at intersection or union, the compliment distributes itself, but it flips everything. It flips all the cups and caps. So, I struggled to come up with a verbal example of DeMorgan's laws and here's the best I could do. Let's let A be the event that you're an alligator and B be the event that you're a turtle, so the event that A union B is the event that you are either a turtle or an alligator. And then complementing that, that means if an alligator or turtle you are not, then DeMorgan's law says that's a complement intersect b complement A complement is you are not an alligator, intersect B complement is you are not a turtle. So the set theory association with the English would be if an alligator or a turtle you are not, then you are not an alligator and you are also not a turtle. This is the equivalence between those two sentences. I think everyone agree those two sentences agree. Another example for the second DeMorgan's law, if your car is not both hybrid and diesel, so A is the event that your car is hybrid and B is the event that your car is diesel and you complement their inner sections. So if your car is not both hybrid and diesel, then your car is either not hybrid or not diesel. So, A compliment union B compliment. Some other small little facts that I'm sure you remember from set theory, A compliment, compliment is A. So if you do not, not get an even number, you get an even number and A union B quantity intersection C is A intersect C union and B intersects C. So the way to think about this just to remember it is think of the union as sort of plus and the intersection is multiplication. This little rule looks just exactly like the distributive property. So C sort of gets multiplied by A. C gets multiplied by B. It sort of distributes across the plus sign sort of the unions. So that's the way you can sort of remember that one. So that gives you a very basic Rosetta Stone, taking ordinary set notation and connecting it to how we think about it in probability. Next, we're gonna actually use the set notation to develop probability. So this is a very brief section and in this discussion, we're just going to talk about probability at its very conceptual level. And in the next section, we'll talk about probability at it sort of mathematical foundation, but I wanted to spend a minute talking about where we're going with probability as a modeling tool to analyze data and here's a strategy that underlies much of science and the idea is this. For a given experiment, attribute everything you know to a systematic model. So a good example of this are things like lines and planes and hyperplanes where people presume that an outcome say, something like hypertension depends on a lot of predictors in a linear fashion. And so that's either known or it's theorized or it's assumed for sake of convenience, but that relates known predictors to the known outcome and then attribute everything else to randomness. Now this is a very difficult bullet to swallow I think for many people, because in nearly all applications of probability, what the word random means is very difficult to tie down is example earlier on in the lecture we're talking about retrospectively sampling hospital records. And in this case, if you were to model the outcome of whether or not a person had a disease as is predicted by their history where we perform some form of retrospective sampling. It's not exactly clear where the randomness is coming from or even what randomness means in this context. So even if that's the case, we still often use probability to evaluate. The collection of unknown things in an experiment, treating them as if they were random and then we have to just be careful in how we interpret our probability statement in that context relative to what the word random is meaning in that case. In some other settings, people have very specific definitions of what random means. For example, sometimes people will analyze clinical trials using the randomization that was used to assign patients to treatment or control as the actual probability that they're modeling in their mathematical models. And there, they can point very directly to what randomness they're modeling. However, that has it's own problems as well. So I just wanna say that this process of using probability, the third point, using probability to quantify the uncertainty in your conclusions to model this randomness is actually a very delicate subject. And as you can imagine from this discussion, all three of these first bullet points come with quite a bit of baggage in the terms of assumptions and things that you cannot evaluate at all and so what we'd like to do is check how sensitive our conclusions are to the assumptions in these models. In some cases, we can actually directly verify them. We can check whether the relationship between the response and the predictors looks kind of like a line, so we're okay modeling it as a line. In other cases, they involve assumptions that we can't possibly check. They involve variables that we did not collect or variables that we do not even know. And in this case, we have to evaluate our sensitivity to our model in terms of unknowns. We have to evaluate how robust our approach is to the unknowns and this comes from the study of how the data was collected? How the statistics were used? What exactly is probability actually modeling? In what follows, we're going to both cover the mathematics of probability, but hopefully touch on these subjects. Now I want to emphasize that these are very, very difficult topics that many people struggle with if thought of with sufficient depth and what we hope to do in this class is mostly get you started thinking about this. And I think if you just did one thing when thinking about probability in your data that you're analyzing is when you say, I have a 95% confidence interval or my p value is blank or something like this where you actually use probability in your actual data analysis. Go through the exercise of trying to think what is it that you're modeling is random. What is the sources of this randomness and how good of a job do you think your probability statements do at characterizing this randomness? This is the end of Mathematical Biostatistics Boot Camp lecture one. In this lecture, we covered basic conceptual ideas. And next lecture, we're going to be covering much of the basic mathematics that underlies probability. So make sure you have plenty of coffee to get ready. [MUSIC]