0:00

So far, we have been focusing our

attention on doing inference for one population mean.

We usually have data from one sample way

to find the sample mean and the sample standard

deviation and use those values to be able

to say something meaningful about the unknown population mean.

However, the methods that we're learning are actually

really not limited to just this one special case.

We can actually use the same methods for doing inference for

a variety of other estimators.

We're going to be learning about these in the next two units in more detail, but

this video is kind of a peek into

the unified nature of hypothesis testing and confidence intervals.

0:39

The methods that we've been learning for,

doing hypothesis tests and constructing confidence intervals,

can be easily adapted for any estimator

that has a nearly normal sampling distribution.

One example that we've been working with is a sample mean.

Another example that's useful is the difference between two sample means.

So, this type of an estimator would be useful

for comparing two groups of population means, for example.

1:05

Another estimator that might be of interest is the sample proportion.

The sampling distribution of the sample proportion will also be

nearly normal, as long as our sample size is high.

And then we can apply the same techniques that we've been learning to

do inference for proportion, or even

also looking at difference between two proportions.

So again, this gives us an avenue by which to con

compare two groups to each other, two populations to each other.

An important assumption about the point estimates is that they're unbiased.

In other words, the sampling distribution of the

estimate is centered at the true population parameter

it estimates.

That is, an unbiased estimate does not naturally over or

underestimate the parameter but instead it provides a good estimate.

We know that the sample mean is an es, example

of an unbiased point estimate, because the central limit theorem

tells us that the sampling distribution of sample means is

going to be nearly normal centered at the true population mean.

And the other estimates that we listed in the

previous slides, are also good examples of unbiased estimators.

2:16

So, if we have a point estimate that we know is unbiased and that has a

nearly normal sampling distribution, then we already know

how to construct a confidence interval around it.

We always start with a point estimate when working with confidence intervals.

And then we add and subtract the

same amount to that point estimate.

This is kind of the leeway we're

giving ourselves when we're doing this estimation.

And the value that we add and subtract we said, is the margin of error.

The margin of error is comprised of two components.

The critical value which is a z star, if we're working with

a nearly normal sampling distribution, and the other one is the standard error.

So, the one thing we're not going to get into in this video,

is how to find the standard error for different types of point estimates.

Because that's going to be the focus of what

we're going to be doing in the next two units.

So, once you have this general structure set up

for the confidence interval, all you need to do is

to swap out the formula for the standard error for

a different estimate, but you keep everything else the same.

In other words, with what you have learned so far you already know how

to calculate confidence intervals for a variety of point

estimates that happen to have nearly normal sampling distributions.

What you're still looking forward to finding out though,

is how to calculate the specific standard errors for those.

So we are going to give a couple of examples working on

constructing confidence intervals and doing hypothesis

tests for these different different point estimates.

But, we are just going to give away the standard error to you

in this particular video and then we are going

to get into more detail in the following units.

3:54

Let's take a look at a practice problem.

A 2010 Pew Research foundation poll indicates that

among 1,099 college graduates, 33% watch the Daily Show.

An American late-night TV Show.

The standard error of this estimate is 0.014.

We are asked to estimate the

95% confidence interval for the proportion of

college graduates who watch The Daily Show.

Let's start by parsing through some of the information we are given.

The 33% who watch the daily show among the, these

observed college graduates is going to be our p hat 0.033.

P hat stands for sample proportion, just like x bar stands for sample mean.

And we are also told that

the standard error of this estimate is 0.014,

so let's take a note of that as well.

By now, we know the generic formula for a confidence interval for any estimator.

It's always a point estimate, plus or minus a margin of error.

In this case, our point estimate is a p hat, and then we have plus or

minus a critical value, z star, times our

standard error, that make up the margin of error.

The p hat is 0.33 plus or minus 1.96

for the critical value, times the standard error that we're given in the problem.

Gives us a margin of error of 0.027 or 2.7%.

Adding and subtracting that to our point

estimate, we get a confidence interval that

says that we are 95% confident that

between 30.3% and 35.7% of college graduates watch

the Daily Show.

Just like with confidence intervals, we can apply the

same framework for hypothesis testing to different estimators, as well.

And again, as long as the estimator is

unbiased and has a nearly normal sampling distribution.

So if that's the case, we can use the z statistic as our test statistic, that we

always calculate as a point estimate minus the null

value, kind of like the observed minus the mean,

divided by some standard error.

And we're not, again, once again, going to

get into the, calculating the standard error for

these different point estimators, but that's something we're

going to focus on in the following units.

6:08

Now let's take a look at a practice problem doing

a hypothesis test on an estimator different than the sample mean.

The third national health and nutrition examination survey NHANES, collected body

fat percentage and gender data from over 13,000 subjects in ages between 20 to 80.

The average body fat percentage for the 6,580 men in the sample was 23.9%.

And this value was 35% for the, for the 7,021 women.

The standard error for the difference between the average male and

female body fat percentages was 0.114. Do these data provide convincing

evidence that men and women have different average body fat percentages?

You may assume that the distribution of the point estimate is nearly normal.

So now that we know that the distribution

of the point estimate is going to be nearly

normal, we know we can use the same

framework we've learned before, to do this hypothesis test.

And let's follow the same steps for doing so then.

First we want to set our hypotheses.

The null hypothesis is going to be that there is no difference

between these two populations, so the null hypothesis is always a status quo.

So that means that the average men and average

women body fat percentage is equal to each other.

And the alternative is going to speak to our research question.

Do these data provide convincing

evidence that men and women have different average body fat percentages?

So the alternative is going to be two-sided.

Our point estimate is simply the x bar version

of what we have on our, in our hypothesis.

So that's going to be the observed average body fat percentage for men, minus the

observed average body fat percentage for women which comes out to be negative 11.1.

Lastly, we're going to need to check conditions, but we're told that we

can assume that the distribution of the point estimate is nearly normal.

So we're safe on that account, and since this is a

nationwide survey, I think we can be reasonably certain that they have

used random sampling and such that the observations in the sample

are independent of each other with respect to their body fat percentages.

8:26

And next, what we want to do is to be able to draw our curve.

But before we can do that, we need to figure

out what the sampling distribution of this estimator looks like.

We know the shape.

It is nearly normal, but what is the center going to be?

The center is usually the null value.

However, in our null hypothesis, there is currently no value.

So we can rewrite our null hypothesis as

the difference between the two population means being

equal to zero.

Because after all, if the two quantities are equal to

each other, then their difference is going to be zero.

Which tells us that the sampling distribution

is nearly normal and centered at zero.

And our p value is going to be any region

that's beyond the out, observed difference between the two means.

So that could be less than negative 11.1, or greater than positive 11.1.

I think its looking like this is going to be a pretty tiny p value, because the

shaded regions are so small, but for completeness

once again we can actually calculate our z score.

So the z score is calculated as the point estimate minus the null value divided by

the standard error and that is one huge z score of 97.36.

With such a huge z score,

the p value is bound to be really, really tiny

which is going to result in us rejecting the null hypothesis.

And in context, what we would then

determine, is that these data provide convincing evidence

that the average body fat percentages of men

and women are indeed different from each other.