0:00

In this video, we're going to define a confidence interval.

Talk about the conditions required to be able to

calculate the confidence interval with the formulas that we provide.

I'm going to give you guys a hint.

It actually is based on the central limit theorem.

So the conditions are going to be very similar.

And lastly, we're going to generally discuss how to

find confidence intervals and how to interpret the results.

A plausible range of values for the

population parameter is called a confidence interval.

Using only a sample statistic to estimate a parameter

is like fishing in a murky lake with a spear.

And using a confidence interval is like fishing with a net.

We can throw a spear where we saw a fish, but we probably will miss.

If we toss a net in that area, though, we have a good chance of catching the fish.

In other words, if we report a point

estimate, we probably won't hit the exact population parameter.

On the other hand, if we report a range of

plausible values, we have a good shot at capturing the parameter.

1:01

So, based on this one sample's mean, how can we figure

out what this range of plausible values is going to be?

Well, this one sample mean, our x bar, is

indeed our best guess for the unknown population mean.

Therefore, any interval we construct should be constructed around

that x bar that we know to be our best guess.

1:22

Also, from the central limit theorem, we know that x bars are distributed

nearly normally, and the center of that

distribution is at the unknown population mean.

One more piece of item that we want to think about is the 68, 95, 99.7% rule.

Which tells us that, roughly 95% of random samples will have

sample means that are within two standard errors of the population mean.

Clearly then, for 95% of random samples, the unknown true

population mean is going to be within two standard errors of that sample's mean.

Note that we're being very careful about the language here.

The 95% here only applies to random

samples in the abstract.

Once we actually have a sample, the mean of that sample will be

either within two standard errors of the population mean or it won't be.

So the 95% confidence interval can be constructed

approximately as our sample mean, plus or minus two standard errors.

2:29

In this formula, what comes after the plus or minus, the two standard errors,

is actually called the margin of error.

So usually we construct a confidence interval as a point estimate.

In this case we are dealing with mean so our point

estimate is the sample mean, plus or minus some margin of error.

The margin of error for a 95% confidence

interval is roughly two times the standard error.

Let's take a look at a practice problem to put

to use some of the concepts that we have recently learned.

One of the earliest examples of behavioral asymmetry is

a preference in humans for turning the head to the

right rather than to the left during the final weeks

of gestation and for the first six months after birth.

This is thought to influence subsequent

development of perceptual and motor preferences.

A study of 124 couples found that 64.5%

turn their heads to the right when kissing.

The standard error associated with this estimate is roughly 4%.

Which of the below is false?

3:36

A says a higher sample size would yield a lower standard error.

We know that this is always true.

We've seen this with the central limit theorem as well.

Conceptually this is because the higher your sample sizes, the less

variable your point estimates from those samples are going to be.

Mathematically speaking, the standard error is always sigma over square root

of n so that n and the standard error are going

to be inversely proportional, in other words if n goes up,

the standard error is going to go down, so this is correct.

The margin of error for a 95% confidence interval for

a percentage of kissers who turned their heads to the right is roughly 8%.

We just learned that the margin of error for a 95%

confidence interval is going to be approximately two times the stand error.

In this case, the standard error is given to

be 4%, and therefore this option is also correct.

4:36

The 95% confidence interval for the percentage of kissers

who turn their heads to the right is roughly 64.5% plus or minus 4%.

Remember, the confidence interval is always of the form,

point estimate plus or minus a margin of error.

In this case, what we have is our point estimate, the sample proportion,

plus or minus a standard error, as opposed to the margin of error.

And while those things sound similar, they're not exactly the

same thing. Therefore, this option is wrong.

Lets take a look at the last one real quick as well.

The 99.7% confidence interval for the percentage of kissers who turned

their head to the right, is roughly 64.5% plus or minus 12%.

We haven't really talked yet in depth about using different confidence levels

for confidence intervals, but hopefully it's obvious that we can do that.

How did we come up with this 12% number?

Remember, according to the 68, 95, 99.7% rule through 99.7%

of the distribution will be within three standard deviations of the mean.

Or in this case, three standard errors, since we're looking for the variability

of a point estimate. So, 3 times 4 does indeed give us 12%.

So this one also seems right, so the option that's false is C.

How could we make this option correct?

We could actually add and subtract the margin of error, which is given in part b

so the approximate 95% confidence interval should be 64.5% plus or minus 8%.

6:22

More formally, the confidence interval for a population mean can be

computed as a sample mean, plus or minus a margin of error.

This is critical value corresponding to the middle whatever you like percent.

So, I have just xx here as a placeholder of

the normal distribution times the standard error of the sampling distribution.

As with the central limit theorem, there are

some conditions that need to be satisfied to

use this formula to construct confidence intervals.

In fact, since this method is based on the

central limit theorem, these are actually the same conditions.

The first condition is independent.

Sampled observations must be independent.

And we talked about this being difficult to confirm.

However, usually we either want a random sample, if we have an

observational study, or a random assignment if we have an experiment and

if were sampling without replacement we want our sample

size to be less then 10% of our population.

The second condition is about the sample size and skew.

We either need n to be greater than or equal

to 30 or larger if the population distribution is very skewed.

And this second condition is actually a little stricter

than what we saw with the central limit theorem.

Because it places

a minimum required sample size requirement.

That's the n greater than or equal to 30.

And we're going to discuss what we do if the

sample size is smaller than 30 in the next unit.

So for now, let's focus on what we call large samples

and these are samples that have at least sample sizes over 30.

Or even larger if the population distribution is very skewed.

So when we're checking our conditions,

we're definitely going to want to see a visualization of the distribution from the

sample that we're going to use as

an indicator for what the population looks like.

Or we're going to need to be told to assume that we're going to

need to be told that perhaps we can assume some normality and proceed.

Earlier we conceptually developed the formula for the confidence interval for

the mean as x bar plus or minus z star times

the standard error.

And we said that the z star for a 95% confidence interval should be

approximately 2, as per the 68, 95, 99.7% rule.

But this rule is simply a rule of thumb, and it's actually not exact.

So how do we find the exact critical value for a 95% confidence interval?

Remember that the confidence level refers

to the middle of the distribution.

So the 95% confidence interval will span the middle 95% of the normal distribution.

So, let's mark that on the normal curve, and we're basically

looking for the cut off values that mark the middle 95%.

9:13

We can use the table to find these, but

first remember that the tables always give us areas under

the curve below a given z score, so the area under the curve below the lower bound

of the middle 95% is simply 1 minus 0.95 divided by 2.

Since the total area under the curve is one and

the curve is symmetric, leaving equally sizes tails on each side.

So this comes out to be 0.025 or 2.5 % on each side.

Next.

9:45

And we can take a look at a table.

What we want to do, is we want to locate 0.025, the

percentile within the table, and actually this time we can,

we hit exactly 0.025, and then we want to look at

the edges of the table to grab the associated z score.

Which here, comes out to be negative 1.96 and the

upper bound will then be positive 1.96, since, once again,

the curve is symmetric.

Note that the critical values in a confidence

interval formula are always defined to be positive.

10:36

Also remember that you can get this critical value using R.

When we want to find cutoff values using R, we use

the qnorm function which takes in the percentile as an input.

So qnorm

of 0.025 should also give us negative 1.96 and what

we need to do is to just remind ourselves that

if we are looking for a critical value, we are

always going to need the positive version of this number.