So, let's put together the various aspects of hypothesis testing theory covered so far, into a numerical and hence a real world example. Now, the last time you were in the supermarket, did you notice, let's say in the mineral water section on the shelf, you saw multiple bottles of the same branded mineral water. But if you took a very close look at the water levels in each of those bottles, you would have noticed some slight variations among them. Now, whether this was a main mineral water brand or a supermarket's own brand, there would have been this variation. And if you look closely at the labeling on whichever kind of mineral water bottle it was, you would have seen a claim made by the manufacturer. And let's imagine, the mineral water bottles of interest to you on the labels it said 500 milliliters, 500ml. Now, what this actually denotes is not the amount of mineral water in that specific bottle you've picked up, rather it is a claim made by the manufacturer about the average content across all the mineral water bottles which they've produced. So, I want to link together many of the themes we've seen throughout this MOOC course so far. So we recognize there would be some slight variations in the quantities of mineral water in each of these bottles. Let's assume a normal distribution for the amount of water in each of these bottles. Because I'm guessing, the manufacturing plant there would be a load of empty bottles coming along the conveyor belt, some sought of tube goes in, it puts in a designated amount of mineral water in each of those bottles. And hopefully, given the claim of 500 milliliters, is that on average that's the amount of water going into each bottle. But this machine is not perfect. It won't be putting identical amounts of water into each bottle. There would inevitably be some variation. Hopefully, not too great a variation, but some variation nonetheless. And hits off this normal distribution, and remember a normal distribution had two key parameters: its main, its measure of location. Which we hope in this case would be 500 milliliters, the average contents per bottle and also some variation as well. And we expect some minor variation in the water levels in each of these bottles. So let's suppose you are perhaps the manager of this factory, and you want to take a sample of bottles to judge whether your claim about the average content, is being met or not. Now I say take a sample, because it would not be practical to test every single bottle coming off this manufacturing line. It would be far too time consuming to do so. So let's imagine, we took a random sample of 100 bottles. Now, think back to a sampling distribution of the sample mean. We know as we go from one random sample of size into another, we're going to have different component members within that sample here, that would equate to different samples of 100 mineral water bottles and inevitably, the contents would vary a little bit across these. And hence if we calculated the sample mean, it would vary from one set of 100 mineral water bottles compared to another. Lets imagine, let's put some simple numbers to this. So, we have a sample size n of 100. Let's assume that in our sample of 100 bottles the sample mean was 503 milliliters. Of course the claim made by the manufacturer is 500 milliliters. Now at first glance, we might think, "Oh, the factory is overfilling the bottles because 503 is greater than 500." Well, true. 503 is greater than 500. But here we're not comparing apples with apples, if you will. Because the 500 is a claim made about the population mean μ, whereas the 503 represent the sample mean obtained from our random sample of mineral water bottles. And we know our point estimate is susceptible to some sampling error. So, yes there is a difference of three milliliters. But is this difference A, just due to sampling error and the machine is filling these bottles correctly on average, or is this difference statistically significant? So, we want to now use hypothesis testing to provide an answer to this question. So we have a sample mean of 503 milliliters, a claim of μ by the manufacturer of 500 milliliters, and a sample size n of 100. Just to keep the numbers very simple, let us also assume, that the standard deviation, the σ related to that normal distribution, let's suppose this was 10 milliliters. So how are we going to calculate a p-value? Well, we need to go through a couple of stages to get there, but it's worth the effort. Let's backtrack, now to our sampling distribution of x̅. We said, when sampling from a normal distribution, which is our assumption here in this mineral water bottle example, we said the x̅ is normally distributed with a mean of μ and a variance of σ² over n. You may also recall, many sessions ago when we considered the normal distribution, we spoke about the idea of standardizing a random variable. Whereby, we subtract its mean and divide by its standard deviation. But what I want to do is to standardize this random variable x̅ by subtracting its mean μ and dividing by standard deviation. Now just to complicate things perhaps ever so slightly, x̅ is a random variable yes, but being used to estimate a population parameter. And hence, this is a special kind of random variable, which we may call an estimator. And rather than refer to its standard deviation, we will refer to its standard deviation as the standard error of the sample mean. Now, as you do more statistics in future you will see many more references to standard errors. For now just think of it as the standard deviation of x̅. So when we standardize x̅, we subtract its mean μ and we divide by its standard error i.e. the square root of its variants. So, the square root of σ² over n, which is simply σ over root n. And we've now created a standardized variable. And given x̅ has a normal distribution, the standardized version of x̅ also follows a normal distribution. Such that we will refer to the standardized value as Z, following a standard normal distribution i.e. with a mean of zero and a variance standard deviation of 1. So now let's put our numerical values into this formula. Sample mean 503, assumed value of μ under a null hypothesis that the manufacturers claim of 500 milliliters is correct, so minus 500 divided by the standard error σ 10, divided by the square root of n. So the square root of a 100 which is conveniently also 10. And this will give us a Z value of 3. Now in our hypothesis testing language, we will refer to three not as the p-value, because p-value of course is a probability, and three is not a probability, but this is our test statistic value. So what we'd like to say is, from this three, can we now calculate a p-value. And the answer is yes we can. Now to calculate this, exactly when we typically defer to a computer, but do remember that when we introduce standardized variables previously we said that there was a 95% chance of being within two standard deviations of the mean. So on a Z score that meant from -2, to +2 and 99.7% chance of being within three standard deviations of the mean i.e. on the Z score being between plus and minus 3. So if briefly, we now consider the p-value as the probability of our test statistic value, which here is +3 or a more extreme value, while this equates to the p-value. So again in practice we will defer to the computer to do the number crunching for us. But we've already seen that there was a 99.7% chance of being within three standard deviations of the mean i.e. being between plus or minus three on a Z score, and hence there would be a 0.3% chance of being beyond plus or minus three. And hence that 0.3%, 0.003 would in fact equate to our p-value in this instance. So we got there in the end. Once we have this p-value it's very easy to draw our statistical conclusion and make our decision of whether or not to reject our null hypothesis. So our null hypothesis is that the claim of the manufacturer is true. μ is equal to 500 milliliters. The alternative hypothesis is that the claim is not true and that the mean μ is some value other than 500 millilitres. Let's opt for a 5% significance level, which we've previously said is our preferred default choice. So we have an ɑ a 0.05. Our p-value is approximately 0.003. Clearly far below that threshold, very close to zero, and hence we are very happy to reject the null hypothesis and reject the claim by the manufacturer. And hence it seems that that difference of 3 milliliters between observed sample mean of 503 and the claimed value of 500, that does not seem to be due to simply sampling error, rather that difference seems to be statistically significant. So our first example of a hypothesis test applied to some real world data. Now perhaps as a little exercise for you and more of this on the online accompanying resources, think what happens as you change the effect size. So here our effect size was 3, the 3 milliliter differential between the sample mean x̅ bar and the population mean of μ. But also consider the impact of the sample size n as n gets bigger or smaller, you've already alluded to the sample size influence which can occur.