A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

來自 Johns Hopkins University 的課程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

238 個評分

Johns Hopkins University

238 個評分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

從本節課中

Module 4A: Making Group Comparisons: The Hypothesis Testing Approach

Module 4A shows a complimentary approach to confidence intervals when comparing a summary measure between two populations via two samples; statistical hypothesis testing. This module will cover some of the most used statistical tests including the t-test for means, chi-squared test for proportions and log-rank test for time-to-event outcomes.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Greetings and welcome back.

Let's do a few review exercises to,

to discuss the material covered in lecture nine.

So as usual, I'll read you the problems first, advise you to pause the tape.

Work on it at your leisure, and then come back when

you're ready to discuss the answers and I'll give you my take.

So here's a general question that applies to

things beyond what we've done in lecture nine.

And we'll continue to apply to other hypothesis tests

that we do.

So why can a small mean difference in sample means for

a paired or unpaired t test, produce a small p-value if n.

Or if they're unequal sample sizes, n1 and n2 are large.

Explain the concept of type-1 error and

its role in the hypothesis testing process.

What is the common level used for setting this type-1 error level?

Why is it potentially difficult to interpret a

non-statistically significant result in a small sample study?

For comparing means between two populations, what is the correspondence

between the 95% confidence interval for the population mean difference?

And the resulting p-value from the appropriate hypothesis test, and by

correspondence I mean in terms of the null value for the comparison.

Then what is the basic recipe for hypothesis test

comparing means between two populations be that paired or unpaired?

In this second problem, I'm giving you some summary data on charges by sex based

on a random sample of 500 carotid Endarterectomy

procedures performed in State of Maryland in 1995.

So we have

a total of 500 persons, and then we classify them by their sex.

So the average charge for males was on the order of $6615.

But there was a fair, male, amount

of variability in these charges for the males.

The standard deviation of these 271 charges, for the males was $4,220.

For females, the sample mean average is higher than that

for males.

But as with the males, there's a fair

amount of charge to charge variability amongst the females.

And there were 229 females in this sample.

So you're interesting in comparing the population level mean charges between

males and females, so that's the carotid endarterectomy done in 1995.

You got the sample data, but you wish to extract, take

that and reach out to the population from which the samples came.

So first of all, is this a paired or unpaired comparison?

Now by hand, estimate a 95% confidence interval for

the mean difference in charges for males compared to females.

What are the corresponding null and alternative hypotheses for this test?

Now set up the hypothesis test, and find the corresponding p-value.

And you can either ballpark it as

to how it relates to the 0.05 cutoff,

or if you're really curious about what it actually

turns out to be, go ahead and use

an online normal curve calculator to get the p-value.

And then in interpret the resulting p-value in a sentence.

So welcome back.

So, why can a small mean difference in sample means

for a paired or unpaired t-test produce a small p-value if

the sample sizes are large?

This has to do with the relationship between the standard

error of an estimate and the samples that it's based on.

So we've seen for both the paired and unpaired case.

For the paired, we simply take

the standard deviation of the differences across

the pairs and divide by the square root of the number of pairs.

For the unpaired situation, the standard error's a little

more intense, but easy to handle computationally.

But you can see in both cases the, the larger

the sample sizes, the smaller the standard errors will be.

And ultimately what we do to get a p-value is

figure out how far our sample result is, our mean

distance is from what we'd expect it to be under

the null of 0 and we convert that into standard errors.

So if the standard error is very

small, even small differences between what we observe

and what we'd expect under the null will look large in terms of standard errors.

And this will give us something very far away from the expected value

of 0 in terms of standard errors, and give us a small p-value.

So it's always important to look at not just

the p-value, from a statistical test, but the estimated difference

between the groups being compared. And the uncertainty balance.

To explain the concept of type-1 error

and its role in the hypothesis testing process.

What is the common value used for setting this type-1 error level?

And to save you from my handwriting, just this once, I've typed this out.

So, by setting the type-1 error level, frequently called, also

called the alpha level or the rejection level of the test.

The researchers A priority, that is in advanced both

specifying the cutoff for unlikely under the null, in other words what

they'll use to consider the result unlike the versus the not unlikely.

And they are also saying how much risk they are willing to take and

rejecting null If it's actually the underlying

truth that generated the samples of data.

So, if they set this threshold at 5%, their threshold for unlikely is below 5%.

But they're

also conceding that they're willing to take a risk, a

5% risk of rejecting the null when it's the underlying truth.

Why is it potentially difficult to interpret a

non-statistically significant result in a small sample study?

Well this is sort of the opposite of the first question we started with

but it gets back to that same idea of standard error and its sample size.

In how, in the roles

samples size plays in standard error.

So we have both, either for the, paired mean difference.

The standard deviation of the difference is divided by the number of pairs.

The square root of numbered pairs are the unpaired we've seen

that the standard error is the function of the sample sizes.

And so if the sample sizes are relatively small, and there's a fair amount

of variation in the individual level data,

we're going to have a large standard error.

There's going to be a lot of uncertainty in our estimate, such that

in some cases, even if there were an underlying population level difference.

Our precision, or uncertainty would be so large and precision

so poor, that we wouldn't be able to detect it.

And again, we'll get into a detailed study of this idea of statistical power

in lectures ten and 11, but this is just a foreshadowing to that idea.

A small sample studies.

A non-statistically significant result is ambiguous sometimes.

So, for comparing means between two

populations whether they be paired or unpaired.

What is the correspondence between the 95%

confidence interval for the population mean difference.

And the resulting p-value for the appropriate hypothesis test?

And I mean in terms to the null value.

Well we have seen and we have laid out the details

for if the null value is not in the 95% confidence interval.

And the null value for the difference is

0, is not in the 95% confidence interval for the population level mean difference.

Then the corresponding p-value will come in at less than 0.05.

And we've laid out the geometry of that. And similarly, if 0 is in the interval,

p will be greater than 0.05.

You might say, well, what happens if, how could p be exactly 0.05?

Well, this would happen if 0 is one of the endpoints of the confidence interval.

Then p would be exactly equal to 0.05.

So what is the basic recipe for a hypothesis

test comparing means between two populations paired or unpaired?

And this will hold, the only thing that will

change slightly is the way we measure distance, the metric.

But the concept will hold for every hypothesis test we study in this course.

What is the basic recipe? Well, we start by specifying, the null

of no difference in the population means.

In other words, that the two means are equal.

Versus the alternative, that the means are not equal or the difference is non 0.

Then we assume, to start the null of the truth that generated our samples of data.

And we measure, how far our estimate is. How far our estimate,

is. A sample mean difference, is from the

expected difference under the null, in terms of, in units of, standard error.

I take the difference and divide it by its estimated standard error.

And then, with using

the sampling distribution of estimates from samples

of the same size under the null hypothesis.

We convert this distance to fee value to figure out,

how likely the result is to have occurred just by chance.

Under the null, and we make a decision based on the resulting p-value.

So now,

let's look at this summary data on charges by sex based on a random

sample of 500 carotid endarterectomy procedures performed

in the state of Maryland in 1995.

An here's the summary data that we looked at.

So is this a paired or unpaired comparison?

Well a big clue is unequal sample sizes.

So that gives it away, that this is unpaired.

There's no obvious relationship

between each male, in the sample of males and females in the sample.

Females, and the sample sizes are unequal as well.

So there's no relationship like brother and

sister, or husband and wife, in these data.

So the two groups are independent.

That's a giveaway.

If the sample sizes are not equal, it's definitely not paired.

However, the sample size being equal is necessary, but not sufficient

to claim a study is paired.

You can have unpaired studies with equal sample sizes.

So you just have to pay attention to how

the data was generated and the samples were collected.

So what you do by hand is estimate a 95% confidence

interval for the mean differences in charges for males to females.

Let's assume the mean difference in charges between

males and females was the fixed $6,615 for

males on average, minus the $7,088 for females

on average for a difference of negative $473.

So, in the samples, males had estimated charges of

$473 less than females on average.

But remember these charges within both groups were

[INAUDIBLE]

standard deviation was high. So let's now address that plus the size

of our samples to estimate the uncertainty in this estimated mean difference.

So what we'll do is, we'll use our trusty formula to get the standard error for the

difference in means when it comes from two independent

unpaired samples.

And so we would take the standard deviation of 4,220 for

the males, square it, divide by 271, that's the number of males.

Add this to the standard deviation in females

squared divided by the 229 females. And when all the dust settles and you can

check my math, this turns out the be a standard error of $413.4 roughly.

And so you get a confidence interval for the true difference.

Population level difference in charges in

the population of carotid endarterectomy procedures performed

in 1995 in Maryland, from males to females we'll add and subtract 2

estimated standard errors. And when the dust settles,

you get a confidence interval and it goes from

negative $1299.80 to $353.80.

So, you can see right away that

despite the magnitude of this observed mean difference.

There's so much uncertainty in the

estimated means that this confidence interval is

quite wide, and includes the null value of 0.

So what are the corresponding null and alternative

hypotheses were we to do the complementary hypothesis test?

Well the null would be that the mean charges

for male minus the mean for females is 0.

The alternative would be, that this difference is not

0 and those would be our 2 competing hypothesis.

So if we wanted to set up this hypothesis test.

So what can you tell me about the hypothesis test right now?

What can you tell me? Well, you can

tell me that P will be greater than 0.05 because

the 95% confidence interval includes 0.

But if we wanted to set this up formally, we'd start by assuming the null is true.

And then compute the distance which we frequently call a t.

That's what people would call it in

textbooks, between our observed mean difference, negative

473. What we'd expect the mean difference to be

under the null, which is 0, divided by the standard errors.

So this is roughly

[SOUND]

1 point, I'm doing this in my head, I should have computed it.

I have the computation but not in front of me.

So, it's roughly negative 1.15, so.

Again, we're not as far as two standard errors away from 0

so, again we will not get a p-value, of less than 0.05.

And, if you actually do look this up or use

a computer to do it, the result in p-value is 0.25.

And handwriting issues here. Resulting p-value is 0.25.

We knew it would come in at greater than 0.05, but now this tells us where we are.

So, the interpretation of that p-value of 0.25.

So, the interpretation of that p-value of 0.25 is that, were these samples to

come from populations of males and females with the same average charges, the chances

of getting the resulting mean difference that we saw between males and

females, or something even more extreme, is one in four, or 25%.

So, not particularly unlikely.