In this video we'll see how to follow up a significant F-test in one-way analysis of variance with pairwise comparisons of group means using pairwise t-tests or confidence intervals. Pairwise comparisons help us to determine why the overall effect occurred. They tell us which group means differ significantly and in what direction. Follow up comparisons are often referred to as post-hoc comparisons. The post-hoc indicates that we're making comparisons affect the fact. So without having a clear hypothesis about which groups will differ In which direction before collecting the data and performing the analysis. This implies using two-sided tests or confidence intervals. If we do have a clear expectation about how the individual group names will differ, we can perform planned comparisons. These are outside the scope of this introduction, however. Suppose we performed an F test to compare healthiness of three groups of cats that consumed different diets. Raw meat, canned food, and dry food. Health was rated on a scale from zero to ten. Suppose we found an F value of 3.793 with 2 and 46 degrees of freedom, and with the P value of 0.03, indicating a significance difference in the groups. To find out how we should interpret this significant overall effect, we'll determine post hoc confidence intervals. If we have g groups, there are g times g- 1 divided by 2 comparisons to be made. In our example, we have 3 groups, so three comparisons. Remember, these comparisons should only be performed if the overall test is significant. When we perform the comparisons, whether using pairwise t tests or confidence intervals, the same assumption should hold as for the f test. Independence, normality, and homogeneity of variances. Of course you'll have already check these before performing the overall F test. The formulas for the T test and confidence interval are almost the same as for the regular T test. And confidence interval for two independent groups assuming equal population variances. In post-hoc comparisons, we use Fisher's least significant difference method, which refers to the use of the residual standard deviation. The square root of the within group variance. Instead of the Pooled standard deviation to calculate the standard error. So in each pairwise comparison we estimate the standard error based on the variance, in all groups. Including the ones not in the comparison. Here's the formula for the confidence interval. It's the difference between the means of group J and K, minus and plus the appropriate T value times the standard error. The T value is the value associated with half of the significance level, and the error degrees of freedom. Which equal the total number of observations in all groups, minus the total number of groups. The standard error equals the residual standard deviation, the square root within sum of squares divided by the error degrees of freedom times the the square root of one over the size group J plus one over the size of group K. If the assumption of homogeneity is violated you should use the formula that makes no assumption about the population variances since we'll be making multiple comparisons we should correct for the inflated family wise error rate. The probability that at least one of the comparisons will result in a false rejection of the null hypothesis. There are many correction methods often referred to as multiple comparison methods. We'll consider two of these. The Bonferroni method involves dividing the desired overall alpha by the number of comparisons and using the resulting corrected alpha for the individual comparisons. With this correction, the actual probability of falsely rejection the null will be smaller then or equal to the desired overall alpha. In many cases the correction is overly conservative resulting in a smaller alpha and less power. Turkey’s honestly significant difference method is less conservative. The actual probability of falsely rejecting the null is closer to the desired overall alpha. It results in more powerful tests and narrower confidence interval methods than the Bonferroni method. Tukey’s method uses a test statistic distribution slightly different from the T distribution, so we'll leave the calculation of the test statistic, confidence level, and p-values to software. In this example we'll use the Bon Ferroni method and divide that standard alpha level of 0.05 by three resulting in a corrected alpha of 0.017. If we use tables to determine significance we'll have to settle for a corrected alpha of 0.010, since 0.017 isn't in the table. The critical t-value is the value listed at 40 degrees of freedom, rounding down from 46, and half the significance level, so 0.005. We find a critical t-value of 2.7045. The residual standard deviation equals 2.057. Using the formulas for each of the three comparisons with the appropriate group means we find confidence intervals ranging from minus 0.07 and 3.93 for the difference between raw meat and canned food. Minus 0.41 and 3.48 for the difference between raw meat and dry food. And minus 1.52 and 2.31 for the difference between canned and dry food. None of the intervals differ significantly. This is not only because the Bon Ferroni method has less power, but also because we rounded down our alpha and degrees of freedom by using tables. If we use Tukey’s method with more power we find that only the interval for the difference between raw meat and canned food does not contain zero so we reject the null hypothesis for this comparison only. Looking at the mean health scores we can conclude that raw meat results in a significantly higher average health score than canned food. The mean for dry food lies in-between these means, and does not differ significantly from either raw meat or dry food.