0:00
In this video, we're going to talk about accuracy and
precision of confidence intervals.
We define accuracy in terms of whether or
not the confidence interval contains the true population parameter.
And precision refers to the width of a confidence interval.
So first, we're going to start by defining the confidence level.
Then we're going to talk about the interplay between the confidence level and
the width of an interval.
And then talk about the trade-offs between accuracy and precision.
0:29
First, let's define the confidence level.
Suppose we took many samples and built a confidence interval from each sample using
the equation point estimate plus or minus 1.96 times the standard error.
Then about 95% of those confidence intervals
would be expected to contain the true population mean, the mu.
For example in this figure, the vertical line represents the true population mean,
which we rarely know, and each horizontal line is an interval calculated based on
a different random sample.
There are 25 total intervals plotted, and
24 of them contain the true population mean, and 1 does not.
Therefore, the confidence level for
these intervals would be 24 over 25, 0.96 or 96%.
This is not exactly 95% but it's close enough.
If we examine many more intervals, the percentage of those capturing the true
population parameter would be closer to 95%.
Obviously, this is not how we calculate the confidence level,
since we usually only work with one sample from the population.
In fact, the confidence level is something we choose as oppose to calculate and
base the rest of our calculations on.
1:50
Commonly used confidence levels in practice are 90%, 95%, 98%, and 99%.
Remember that we saw earlier that changing the confidence level simply
means adjusting the value of the critical value in the confidence interval formula.
2:18
Looking at this figure it seems like a wider interval would indeed
be much better.
You can think about the red interval that is plotted on this figure and
imagine that it extends even further.
It would be much more likely for it to then capture the true
population parameter, which is shown here as the vertical dashed line.
Therefore, as the confidence level increases, so
does the width of the confidence interval.
2:43
Another way of thinking about this is the width of the area
that captures the middle 95 or 99% of the distribution.
The middle 99% will inevitably span a larger area.
And hence the 99% confidence interval is going to be wider.
Therefore, as we increase the confidence level,
the width of the interval increases as well.
Remember, more accurate means a higher confidence level.
So if we're saying that we want to increase accuracy,
we also need to increase the confidence level.
But this might come at a cost.
So what do we mean by a cost?
What drawbacks are associated with using a wider interval?
Let me give you a hint.
Say you're watching the weather forecast, and you're told that the next day the low
is negative 20 degrees Fahrenheit, and the high is positive 110 degrees Fahrenheit.
Is this accurate?
Most likely, yes.
Tomorrow's temperature is probably going to be somewhere between negative 20 and
positive 110.
However, is it informative, or in other words, is it precise?
Not really.
Based on this weather report it would be near impossible to figure out what to wear
tomorrow, or what really to expect in terms of the weather.
As we discussed before, as the confidence level increases, the width
of the confidence interval increases as well which then increases the accuracy.
However, the precision goes down.
4:14
Then how can we get the best of both worlds?
Is it possible to get higher precision and higher accuracy?
Well, of course, yes.
The way is to increase the sample size.
If we increase our sample size, that's going to shrink our standard error and
our margin of error.
And therefore we can still remain at a high confidence level while not
necessarily needing to increase the width of the confidence interval as well.
4:44
Let's take a look at this example.
The General Social Survey, the GSS,
is a sociological survey used to collect data on demographic characteristics and
attitudes of residents of the United States.
In 2010, the survey collected responses from 1,154 U.S. residents.
Based on the survey results, a 95% confidence interval for
the average number of hours Americans have to relax or
pursue activities that they enjoy after an average workday,
was found to be 3.53 to 3.83 hours.
Determine if each of the following statements are true or false.
A says that 95% of Americans spend between
3.53 to 3.83 hours relaxing after a work day.
This is not true because remember that the confidence interval is not about
individuals in the population but instead about the true population parameter.
B says that 95% of random samples of 1,154 Americans
will yield confidence intervals that contain the true average
number of hours Americans spend relaxing after a work day.
This is indeed the definition of the confidence level.
The percentage of random samples that will yield confidence intervals
that contain the true population parameter.
So this is true. C says 95% of the time the true average
number of hours Americans spend relaxing after
a work day is between 3.53 and 3.83 hours.
This is not true because the population parameter is not this moving target
that is sometimes within an interval and sometimes outside of it.
6:50
This is not true because remember that the confidence interval
is not about the sample mean, but is instead about the population mean.
We know exactly what the sample mean is.
It has to be between these values because we construct the confidence interval
around the sample mean.
Therefore, we could actually say that we are 100% confident that Americans
in this sample spend on average between 3.53 and
3.83 hours relaxing after an average work day.
But that's not a very interesting statement because it's only about
the sample, and not about the unknown population parameter that we're after.