In this video, we'll discuss how to compare two independent groups on a quantitative variable using a t-test for two independent means. We'll consider a version of this test that adds the assumption of equality of population variances and we'll also see how to calculate a confidence interval comparing two independent means. We use a t-test or confidence interval for two independent means, if we have a quantitative response variable and a binary independent variable that distinguishes two independent samples. An example of a research question is, does the average number of watched television shows differ between people with a full time job and people who are unemployed? Or is the mean score on a happiness scale lower for people who have children versus people who don't have children? The first requirement to test such questions is that the samples are independent. The cases should be assigned to groups randomly or drawn randomly from the population in a non-experimental design, both samples should be distributed normally. However, the t-test is against violation of this assumption for large samples due to the central limit theorem. It's even robust for small samples when using a two-sided test. Normality is important if the samples are small and the test is one-sided. There's no simple rule that says, what sample size is large enough? It depends on the variation of the population and the true effect size. However, for the purpose of this course, let's say that for one-sided tests both samples need to have more than 30 cases. The statistical hypotheses are expressed in terms of the difference between the population mean is mu 1 and mu 2. If both groups form the same population the difference will be zero. This is the null hypothesis. Possible alternative hypothesis are that the difference is not zero or that the difference is greater or smaller than zero. The test statistic t equals the difference in sample means minus the expected value, under the null hypothesis, zero divided by the standard error, which equals the square root of the sum of the group variance divided by sample size for each group. The test statistic follow a student t distribution with a rather complicated number of degrees of freedom. The formula looks horrific, but if you need to do this by hand, you already calculated the variance of each sample divide it by the respective sizes when you determine the standard error. If you reuse that information, the calculation is not so bad. Once we know the degrees of freedom, we can calculate or look up the one-sided or two-sided p-value and compare it to the predetermined significance level. Finally, we reject or fail to reject the null hypothesis. Suppose I want to test whether a raw meat diet is healthier for cats than regular canned food, I randomly assign half of my entire sample to a raw meat diet and the other half to a canned food diet. Cat health is measured on a scale between zero and ten by a veterinarian. To check whether the groups are normally distributed, I look at the histograms of the health scores of each group. These look normal enough. The null hypothesis states that the difference in means between these groups will be zero. My one-sided alternative hypothesis is that the difference will be larger than zero, if I subtract the canned mean from the raw mean. I expect the raw diet to result in a higher mean health score. I'll set the significant level to 0.05. The test statistic value is 5.12-4.86, divided by the square root of 1.10/150+1.89/148. This equals 1.87. As expected, the value is positive and falls in the right-tail. The degrees of freedom are 274.95. If I use a table to look up the p-value, I find that it lies somewhere between 0.025 and 0.050. If I calculate the p-value with the statistical software, I find a value of 0.032. This is smaller than the significance level of 0.05, so I can reject the null hypothesis. I conclude that the mean health score in the population is higher for cats on a raw meat diet, as compared to cats on a canned food diet. There's an alternative version of the t-test for independent means that adds the assumptions that the variances in the populations are the same. If we're willing to make this assumption, the standard error and degrees of freedom are calculated differently. The standard error is calculated by taking the pool standard deviation times the square root of the sum of the reciprocals of the sample sizes. You can think of the pool standard deviation as the weighted average of the two standard deviations of each sample. Fortunately, the calculation of the degrees of freedom is now much simpler. They're equal to the total sample size minus 2. The additional assumption of equal population variances can be useful, because it results in larger degrees of freedom. And possibly, a smaller standard error, which results in a slightly larger chance of rejecting the null hypothesis. However, most people would recommend using the unequal variances version to be on the safe side. And because in almost all cases, it's just as good. Some people decide to make the extra assumption based on a test that determines whether the variances are equal, but these tests are not very robust against violation of normality and will be significant too often. If you want to make the extra assumption, then as a rule of thumb, only do so if the standard deviations differ by less than a factor of two. We calculate the confidence interval using this formula. The difference in sample means plus minus t times standard error. Plus and minus t equals the t values associated with the required confidence level and the degrees of freedom that we just calculated. With 274.95 degrees of freedom and a confidence level of 95%, the values are minus and plus 1.96. The standard error is calculated the same way as before. Also, remember that we need to meet the same assumptions required for a two sided t test. The confidence interval for our example data is 0.26 plus and minus 1.96 times 0.14. This results in a confidence interval that ranges from minus 0.01 to plus 0.54. This corresponds to performing a two-sided test. Since the value of zero, no difference in the means, lies inside the interval. Zero is considered a plausible value. This means a two-sided, more conservative test than the one we've performed would have been non-significant.