Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

來自 Johns Hopkins University 的課程

Mathematical Biostatistics Boot Camp 2

51 個評分

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

從本節課中

Hypothesis Testing

In this module, you'll get an introduction to hypothesis testing, a core concept in statistics. We'll cover hypothesis testing for basic one and two group settings as well as power. After you've watched the videos and tried the homework, take a stab at the quiz.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

So let's go through an example.

a respiratory disturbance index of more than 30 events

per hour is considered evidence of severe sleep disordered breathing.

Here the respiratory disturbance is usually abbreviated RDI.

30 events per hour is quite a bit, and what this means is, a person's upper

airway, their collapses or partially collapses while their sleeping, 30 times

in an hour, and on average at 30 times in an hour over the night of

sleeping, which is quite a bit because they're

being deprived of oxygen every time this happens.

And they wake up a little bit and then they fall back asleep.

This is a, a disease called sleep disorder breathing.

Suppose that, in a, in, let's, let's assume 30 events prior

were some sort of cut off, for severe sleep disorder breathing.

Now, actually, in case you're interested in this, the cut

off for diagnosis of sleep disorder breathing, is far lower.

I picked 30 for whatever the reason here, but the.

so, so keep that in mind.

So imagine if you had, let's say overweight

subjects, and you're interested in whether these overweight

subjects were drawn from a population that has

an RDI, a population RDI greater than 30.

So your null hypothesis

that the RDI, average RDI for this population is 30,

versus the alternative that the RDI is greater than 30.

Now notice this, this mu that we're talking about

here refers to the population mean, not the sample mean.

here we're assuming that the sample mean was 32 events per hour.

So we want to test whether, with respect to the model of IID

sampling from this population, and we're going to make Gaussian assumptions.

And with respect to Gaussian assumptions

about the, the respiratory disturbance index,

is there enough evidence in the fact that the mean was 32

events per hour, and the standard deviation was 10 events per hour,

to conclude that the population mean is, in fact, bigger than 30.

That's what we'd like to test.

The alternative hypothesis that we specified on the previous

page was that mu was particularly greater than 30.

But, you know, they could, we, you

know, those different versions of the hypothesis test

for our purposes, where we could test less than, greater than, or not equal to.

And there's maybe some philosophical discussion about whether or

not testing whether mu is exactly 30 versus not

equal to 30 is a, is a sensible thing to do.

and we'll talk about that a little bit later.

but for the time being, we'll think that

the alternative is going to come in three varieties.

Grater than, less than, or not equal to.

And then, you know, there's, there's basically, we create this two

by two table of the kinds of decisions that we could make.

So, we could, if the truth was

a, in fact, the null hypothesis, we could decide the null

hypothesis and that would, in that case we'd correctly accept the null.

If the truth was the null hypothesis, and we decide the alternative hypothesis,

then we will have made what, what, what is called a type I error.

if the falsely rejecting the null hypothesis, if

the truth is the alternative hypothesis, and we

decide the where we failure reject the null hypothesis, then we've made

so called Type II error. And then if the truth is the alternative,

then we reject the null hypothesis, then will have correctly rejected the null, so

that in, encapsulates the decision space for hypothesis

testing. So let's revisit

this court of law example again.

So, in general, in most courts, the

null hypothesis is that the defendant is innocent.

And then, we're going to require evidence to request evidence

to reject the null hypothesis or convict a person.

If we require very little evidence, then what, what happens?

Well, we increase the percentage of innocent people convicted which in this

case would be type one error.

however we would also increase the percentage of guilty people

convict, convicted so this would be correctly rejecting the null.

On the other hand, if the court requires a lot

of evidence to convict someone then we increase the percentage of

innocent people let free in, in this case that's correctly

accepting the null But then we would also increase the percentage

of guilty people set free.

These, this is, this case would be type II error.

So this also goes to show, you know, basically in all decision spaces, not

just in statistical decision spaces but certainly

the same things happens in policies for, law.

Is that the type I and type II errors are

associated as you increase one you decrease another and vice versa.

And what

we'll see is, we'll kind of set up our statistical decision making

in a particular way to minimize the chance of type I errors.

And as, as doing so then we've kind of hamstrung our self a lot

about what the, the, the what kind of type II errors we can make.

and so, that's that'll, that'll cloud I think a lot of our kind of framework and

rubric that we talk about for hypothesis testing is that

we're going to very specifically control Type I errors and

then we'll talk maybe later on about how we can

control type, type II errors and try to minimize them.

Now, now obviously to, you know, to, to, to control type

II errors while also keeping type I errors Well, you have to do something.

You have to get better evidence.

Not just more or less, but better

evidence, and that's, we'll talk about that.

So, let's talk a little bit about what's the kind of standard way to

implement hypothesis testing, and we'll go back

to our example with a respirator disturbance index.

And the kind of obvious strategy we want to test whether

or not the mean, the population mean is bigger than 30.

The obvious thing, strategy would be to to say well, is the sample mean

larger than some constant where we pick the constant in some traditional way.

Well, the, the traditional way to do this now is to pick the constant so that

the probability of the type 1 error is

low, you know and, and there's a standard benchmark

that, that occurs very commonly which is 5%.

But the idea is that the type 1 error rate is controlled.

So that the probability of making a type one error is low.

And, and there's a couple of reasons why people do this.

One is maybe there's some logic to it, right, that we you know, we want to, you

know the, the, the, the null hypothesis is our status quo hypothesis

so we want to make sure. That we don't reject that hypothesis idly.

And so, if it's true.

And so, maybe there's some scientific sense

of, of conservatism scientific conservatism, built into it.

it's also, I think, rather pratical in terms of the mathematics.

In that, the fact that, under the null hypothesis, the mean is sharply

specified at mu equal 30.

that makes the math a little easier as well.

And it turns out you can also say whether that, mu is

not just strictly equal to 30, but less than or equal to 30.

You wind up with a, with the same test.

But the fact that the null is this sharp

null hypothesis also helps with the mathematics a little bit,

which maybe isn't the best reason to do it, but

still, nonetheless, it's a good reason, or it's part of

the reason why I think people do it. But maybe the main reason is this

idea that we want to control for Type I error is and have that probability be low.

So the type I error rate we usually assign the letter alpha, the Greek letter alpha.

And that's probability of the type I error.

So, the probability of rejecting the null hypothesis, when in fact it's true.

We want the type I error rate to be 5%.

So, what we're going to do is we're going to choose this constant C

and situate that it factors in the, the uncertainty associated with the

sample mean in the, in, in, in such a way that our

Type I error rate, probability of a Type I error rate is 5%.

So let's go ahead and do that.

Okay so we want this value c here to

be chosen so that the probability of getting a sample

mean larger than it given that the mean is,

the population mean is actually 30 to be 5 percent.

That would be the probability of a type one error if we're going to observe The

rule that we're going to reject for an X bar larger than this constant,

okay? Well so either by the Central Limit

Theorem or supposing our data is, is, is

Gaussian, we can normalize X bar so that

it's zero mean. Unit variance under the null, okay.

So in this case we can subtract off 30 from both sides of this equation,

because that's the actual population mean.

And then we can divide by the standard error of the mean which in this case

is 10 over square root 100, 10 because

that's Let's assume known standard deviation, of the population.

And then square root 100.

Because 100 observations. The mean is comprised of 100 observations.

So this quantity right here.

X bar minus 30 over 10 divided by square root 100.

That is a, that's a, a standard normal now.

Or, you know, it limits to a standard normal if the data are

[INAUDIBLE]

And so, we can just take this whole quantity here, and replace it with a z.

Okay?

And on the right hand side, we have c minus 30.

And we want to calculate this, we're, remember, the

reason that this quantity is a Z random variable,

is we're calculating this probability under the condition that

the null hypothesis is true, so mu equal 30.

You'll see here this point that I was making earlier, that when the.

Null is true we have a sharply specified parameter, it

helps, right, because we can actually plug in exactly 30 here.

Okay, so we want 5% to be equal to the

probability a Z random variable is bigger than C minus 30.

So we could just set C minus 30, well,

I put divided by 1 here just to remind ourselves

of the standard error in the denominator.

We want it to, to be set to a value

that has the probability of Z being larger than at 5%.

Well, we know what that value is.

It's the 95th percentile of the standard normal, which

is 1.645. We can solve for C, and C is 31.645.

Since our mean is 32 in this case, we would reject the null hypothesis.

[INAUDIBLE]

. Okay.

Very briefly, I'd like to just show how to get this normal quantile in r.

Here, I have a cutout of our studio.

And I'm just typing in the appropriate command.

Okay.

So very briefly, let's just illustrate with picture,

what that calculation is, is giving for us.

So here, I have my r studio window.

Just to remind ourselves, the 95th percentile of the normal is about 1.645,

we do that with the cue norm function. So let's visualize it.

Let's create some grid points to, to plot.

and so this is just sequence from minus 3 to plus

3, 0.2 and that, that covers most of the normal distribution.

The y value here is evaluated at the normal

density of the x values, so let's plot that.

So we see a plot of the bell shaped curve from minus 3 to plus 3 about

let me define the sequence that we want to shade in that represents

5% of this. I'll shade it in with a polygon.

There is is now shaded in salmon color.

and then just remind us that's 5% of the curve and then the

number that represents that 5% is 1.645.