Let's look at some additional examples related to the material we've just covered. So, let me go back to this paired comparison where we've done both competence interval for the mean difference in blood pressures before and after contraceptives, and a P-value both using the paired approach. As I did at the end of the confidence interval set, I also want to speak here about what would happen with regards to our results from a hypothesis test if we ignore the pairing in these data. So you may recall, we've looked at this several times that on average blood pressure increase for the ten women in the study by 4.8 millimeters of mercury after three months on oral contraceptives compared to their starting point before using oral contraceptives, and the variability in the individual differences in the ten women was 4.6 millimeters of mercury. So, if we recognize the pairing and used that information about the variability of the individual measurements that were the difference is computed for each of the women, we would get the pair of standard error under the paired assumption of 1.5 millimeters of mercury. So, we saw a difference of 4.8 millimeters of mercury, so our distance measure in terms of standard errors would be 4.8, our observed mean difference, millimeters of mercury, over 1.5 millimeters of mercury, and that would give us a result that was 3.2 standard errors above what we'd expect under the null hypothesis which is a difference of zero. If you look this up on the appropriate T-table, a T with nine degrees of freedom, this would result in a P-value of less than 0.05 and the result will be statistically significant and consistent with the confidence interval we got when we use the paired approach appropriately which did not include that value of zero. But if we ignored the pairing, I'd condense the two samples before and after into one sample of differences so that we can compute properly the variability in the differences if we just treated those as two independent samples and computed the standard error the way we would with the study unpaired, we'd get a much larger estimate for the standard error as we saw before because we're double counting information they shared in those pairs. If we did that under the unpaired approach, we get a distance measure of 4.8 millimeters of mercury over standard error of 5.3, we'd have something that was 0.91 standard errors above what we'd expect it to be under the null and even on a normal curve not no less a T with nine degrees of freedom that certainly could result in a P-value well greater than 0.05. So, we would get a non-statistically significant result and the confidence interval includes zero if we did not respect the pairing in this data. Now, this is an extreme example where the data are highly correlated before and after, but generally speaking, ignoring the pairing will lead to over estimates of the standard error for a mean difference, leading to wider confidence intervals and larger P-values than are appropriate. So, again why do we get such a difference in the standard error estimates? Well, there's a lot of shared variability in the before and after measurements on these ten women, and we can see that if we plot the measurements for each of the ten woman, we plot her measurement after compared to her measurement before on a two-dimensional graph, we can see these things track pretty closely. So, if we ignore that tracking in that pairing, we will double count uncertainty in the before and after measurements as they contribute to our standard error, and we'll get inflated estimate for the standard error. So, we'll get another use of the two-sample unpaired T-test where we looked at this randomized study where participants in a study dinner, 303 participants were randomly assigned to either receive a menu without calorie labels, a menu with calorie labels only or a menu with calorie labels and a label stating the recommended daily caloric intake for the average adult, calorie labels plus information. We saw before that the end of the study, looking at calories consumed during and after the meal, the mean values of calories consumed during and after the meal we're very similar for those who got no calorie labels and those who got calorie labels only, but they were lower on average by several 100 calories for those who got the calorie labels plus information. So, if we wanted to put the resulting mean differences for, we would designate the no labels group as our reference and compare the calorie labels only and the calorie labels plus information compared to that same reference, we already solved the mean differences in confidence intervals. So, those who got the calorie labels only consumed on average only five calories less than those who got no labels, the confidence interval was wide, included zero close to the center, was not statistically significant and the two-sample T-test P-value for testing the null that the underlying population mean calories consumed during and after the meal were the same where everyone to get calorie labels only or to get no labels that P-value is 0.96. So, we would certainly fail to reject the null which is consistent with that confidence interval that includes zero. We compare the group that got calorie labels plus information compared to those who got no labels. Those who got the calorie labels plus information consumed in average 250 less calories during and after the meal, and while the confidence interval somewhat Y, you did not include the null value of zero, and so we have a statistically significant result, and if we look at our P-value from the two sample T-test, it's less than 0.05 that comes in it, 0.017. So, certainly for the second test here, if the null is that the mean calories consumed for the calorie labels plus group was equal to the mean calorie label consumed for the no label group at the population level where everybody in the population from which they sampled given one or the other. The alternative is that the means are not equal and this of course results in statements about the mean difference. Under the null, the mean difference would be zero, under the alternative, it'll be not equal to zero. Let's look at another example, the two-sample T-test. We'll revisit some data we look back in early lectures, we've got data on 236 Nepali children who are 12 months old and we separated them by sex as we have a 124 males and 124 females, and we looked at the distributions await visually but now let's compare them statistically. So, we saw and we can see here in this box plot that there's a lot of crossover in the individual weights of males and females, but males tended to weigh more comparing the median and 25th and 75th percentiles of this distribution. If we do the mean difference, comparing the mean weight for males to the mean for females in kilograms, the 7.4 kilograms minus 6.7 or 0.7 kilograms. So, on average and we do in the difference, males to females, it was positive because on average, male children weigh more than female children by 0.7 kilograms. The standard deviation sample sizes for the males and females are as follows, among the 124 males, the standard deviation was 1.16 kilograms, among the 124 individual weight measurements for the males in that sample, among 112 females in the sample, the standard deviation of 112 weight measurements was 1.19 kilograms. So, let's look up doing the hypothesis testing approach. Just remind you of the general rule, first we look or conceptualize the competing hypotheses which is that the mean weight for the two sex groups are the same or the the mean difference is zero. To measure how far our estimate of that population mean difference, our study mean difference X bar diff is from what we'd expect that population mean difference to be under the null which is zero, we're going to measure that in units of standard error. So, now we're going to look at our data here, and let's look at that distance in standard errors. So, the difference in weights between the males and females was 0.7 kilograms as we said before, males weighed 0.7 kilograms more on average than females. If you compute the standard error based on the sample standard deviation of the sample sizes I gave you before, it turns out to be 0.15 kilograms, so you have a result that's approximately 4.67 standard errors above zero. So, again we have a result that is 4.67 standard errors above zero, zero is what the true mean difference is under the null hypothesis. So, the P-value that we get is the probability of getting a sample mean difference that we observe the 0.7 kilograms or something even more extreme, in other words, less likely if the true population mean differences in weight is zero. We translate in the standard errors, and we could rephrase as the probability of getting a result as far or farther than 4.67 standard errors from the mean of a normal curve. So, if we use the rnorm function for that, the pnorm function shall I say, we want to look at the percentage who are as far or farther than 4.6, seven standard errors in either direction from the mean of any normal curve, but it's been converted to standard normal curve which also has mean zero as does the sampling distribution under the null hypothesis. So, using our trick with pnorm which if we plug in 4.67, it will give us the proportion who are less than that four from zero. We take one minus that, we get the proportion who are as far or farther above zero, and because of the symmetry and that we want to go in both directions as far or farther than 4.67 either direction, we multiply that by two and we get a result, a P-value that's well less than 0.5, even less than 0.01 or 0.001. It's approximately three times ten to the negative six, so it is very low and we would reject the null of no difference at the population level. So, just to recall and then we're going to just continually talk about this because of the mechanics will change and I'm going to do due diligence by showing you how these things are done, but I want you to appreciate that they're always doing the same thing just with different input. When we get in the next lecture set, we'll name more hypothesis tests for different outcome situations, but the approach will always be the same. First, you state. You don't have to state it out loud or write it down, but just conceptualize your null and alternative hypothesis, and the null is always that there's no difference, there's no difference in the groups we're comparing and that will be reflected in the value of our population association measure for differences that zero. The alternative is just the complement of that that there is a difference. So, then you go forth and assume the null, and measure the difference between the study based estimate of the quantity of interest. So, in what we've done so far, the sample mean differences as an estimate for the population mean difference. Then we measure that distance from what we've assumed that value to be under the null hypothesis, and the distance is measured in standard errors. Then we convert this standardized distance measure into P-value generally by appealing to the normal curve. Sometimes in smaller samples for mean comparisons, we appeal to the T, but again ultimately this will all be handled by a computer. The P-value is the probability of getting a result as extreme, more extreme, in other words, as unlikely or more unlikely than the study results we got if the null hypothesis is true. Then we make a decision to reject or fail to reject the null hypothesis based on our P-value, and generally we said the approach would be to compare to our cut-off, for significance are type one error level that's standardly used which is 5%. So, in the next set of lectures, we'll continue on this theme but look at hypothesis test for comparing proportions between two populations and the incidence rates, and both when we don't have individual level time measurements and when we do.