Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

來自 Johns Hopkins University 的課程

Mathematical Biostatistics Boot Camp 2

41 個評分

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

從本節課中

Hypothesis Testing

In this module, you'll get an introduction to hypothesis testing, a core concept in statistics. We'll cover hypothesis testing for basic one and two group settings as well as power. After you've watched the videos and tried the homework, take a stab at the quiz.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so let's get back to some far more pedestrian points now.

anyway regression of the mean is a very fun topic.

and I, and I would suggest you, you read up on it.

Gal, I, also read up on Galton.

He was quite an interesting character. his name, his name is Francis

Galton, and he was just a remarkable figure in, in history.

And so I suggest you read up on Galton

just out of interest.

any way, so let's get back to some, some of

the more straight forward things that we were going to talk about.

the extension, not to paired data but to independent groups.

it shouldn't really come as a big surprise either.

I'm hoping at this point in the class after having taken boot camp one that

a lot of these, these that a lot of this discussion will just come very naturally,

naturally to you.

So, lets suppose we want to test h.mu 1 equals mu 2 where we

have two independent groups versus that they

are different or one of the two alternatives.

Then assuming a common error variance between the, between the two gro,

in, between the two groups, we would have this statistic right here, okay?

Our natural estimator of mu 1 minus mu 2 would be, let's, let's say X

bar is group one and Y bar, y's are for group two and our

natural estimator of mu one minus mu 2 would be x bar minus y bar.

Under the null hypothesis, that would have a hypothesized mean of 0

so we don't need to subtract anything off up here in the right.

And then the standard err of x bar minus y bar

is the common standard deviation across each group times square root

1 over the sample size.

And the first group plus 1 over the sample size and

the second group and I hope everyone's capable of deriving that.

And this follows a t n x plus n y

minus 2 distribution under the null hypothesis and the usual assumptions.

And we covered this in boot camp one when we talked about

confidence intervals, basically when we created the interval x bar minus y bar

plus or minus this tea quantile times this standard error.

So, its exactly the same thing and it follows exactly the same logic.

Here its, you know, the normalized test statistic

is estimator minus hypothesized value under the null

hypothesis divided by standard error defollows the same

logic as we've been using from 1 sample test.

So, it should be

[INAUDIBLE]

no great stretch for anyone.

Then if you have and then if, if you have large sample size it should be

surprising for you that the, this, this statistic

on the previous page follows a, a normal distribution.

If, if you're not willing to believe that the error, the

variance is the same in the two groups. then the standard error of x bar minus

y bar works out to be square root sx squared is

the standard of the variance in group one divided by nx.

Plus sy squared, which is the variance in the second group divided by

ny, which is the sample size in the, in the, in the second group.

Square root of the whole thing.

And so, of course if, if, if the, the

variance were the same, then, then you would, you would

be able to pull it out and you'd get the formula from the, from the earlier part.

So I, just to remind you, I don't think I mentioned it on the previous

slide, the, the s The S of P from the previous slide is the pooled variant.

Whereas here now, we're talking about the variance within group one, SX

squared and the variance within group two. So anyway,

this follows a standard normal distribution if nx and ny are large.

And it approximately follows the student's t distribution if the if the x and ys

are normally distributed and independent both within, within and between groups.

But there's this crazy formula for the degrees of freedom.

And the deg, the approximate degrees of freedom are sx, this formula here.

I'm not going to read

it out loud.

And you can plug into it if you want and it turns out to be fairly accurate.

The, the, the distributional approximation turns out to

be fairly accurate and so you can do this.

It's a little bit of a crazy formula and there isn't as far as

I know a ton of obvious intuition as to why it works out this way.

But what you want to think about it that you

know, it's not surprising that x bar minus y bar

divided by its standard error follow something, if the data are normal.

That it follows a distribution that looks like a t distribution.

And the fact that it doesn't follow exactly, a t

distribution falls from the fact we don't have the common variant.

But it seems reasonable that you would be able to pick the degrees

of freedom that, that approximate the

actual distribution of this normalized the statistic.

to,

to, to, to be as close as, as, as the one

we're using it, we're using with these approximate degrees of freedom.

And note degrees of freedom can be fractional, so it doesn't,

it's okay that this doesn't work out to be an integer.

And then a couple of notes.

So our connection between hypothesis

testing and confidence intervals still holds.

If you're independent group T interval contains 0.

Then you will fail to reject the independent group t test for equality

means provided you calculated the standard error in the same and vice versa.

If you construct a confidence

interval by finding all the hypothesized values for differences of means.

For, which you fail to reject you would

wind up with exactly the appropriate t confidence interval.

you, another thing that comes up all the

time, which is people want to test for equality

of means by constructing a confidence interval for

group one, and a confidence interval for group two.

And in seeing if those confidence intervals overlap.

And that procedure works in one sense.

That if the confidence intervals don't overlap

and you reject, that, that's an accurate statement.

But the,

the confidence intervals can overlap and this test statistic, the test statistic

constructed correctly would reject but the confidence intervals do overlap.

that, that, the, the fancier way of saying it is that this procedure of checking

whether it, independently constructed intervals overlap, just has

lower power than just doing the right test.

also it has a potential really bad abuse that I've seen happen is

that people construct an interval for group one and an interval for group two.

But they fail to acknowledge that people

are paired and then they do this procedure.

And then that's quite bad because then your thrown

out incredible amount of information in the pairing, right?

And or, or the purpose of the pairing.

Sometimes the purpose of the pairing is to account for con-founders or

something like that.

So you're, you're both, either, throwing out information or, or, or eliminating

the, the, the, the entire reason for the, that the pairing occurred.

so don't do that, but, but I know no one in this class would do that anyway.

Okay. Let's go through the example.

suppose instead of having repeated data on

two exams that we reandomize the two teaching

modules and each group took the same exam.

So, I'm going to treat the data as before as if it was independent group data.

just to illustrate performing the calculations.

This, it wasn't actually what happened but whatever.

so imagine it, but, but here's a setting where we would

analyze the data this way rather than treating it as paired.

So, and the easiest way to recognize that you

don't have paired data is if your ends are different.

but that could also be misleading because you

could have some missing data and have different ends.

So, so, so the easiest way to determin whether your data is

pair is to solve this question, are my data paired or no?.

And the you will know. Okay.

So here we have two modules that

we randomized students to 50 students in each.

The mean for one was about 87, the mean for the other was about 90.

We've thought it through

in, and realized that no, there was no pair wise association student one

from group one has nothing to do with student one from group two.

And, and so on.

The standard deviations were both about 6.

so here the pooled standard deviation because the ends are the same works out to

be the average 6.065 of the, of the, of the, of the variances.

Remember the pooled standard deviation

works out to be the average of the variances not the average

of the standard deviation so that does work out to be 6.065.

and then our test statistic is the difference in the means, 89.8

minus 86.9 divided by the pulled standard deviation estimate.

Which, go back to boot camp one, and look at how we do the pool variant estimated

but when the sample size are equal, it

works out to be the average of the variances.

So in this case it will be square 6.07 an average

over the square of 6.06 then square root.

[INAUDIBLE]

6.065 times square root 1 over 50 plus 1 over 50

and that's your test, which is taken and you do the rest.

you, you, you, you check whether this test statistic is big enough to reject.

And, and the sample sizes are big enough to where you

can just compare this to a Z distribution that compared it.

With the difference between Z and T

is irrelevant for sample sizes large, this large.

So, we're moving through this stuff kind of quickly

because we want to get to some more interesting stuff.

And I feel like we've mostly already covered this

stuff by covering independent and paired root confidence intervals.

but just some final notes.

Look over the review notes on, on, formal test for the quality of variances

between two, two groups. I'm not a huge fan of this.

Some, some books tell you to

to, to, to employ the following strategy.

Evaluate whether are not the variances are equal by performing a test of

equality of variances and then if you reject do the unequal variance test.

Then if you fail to reject the equal variance.

I, I never liked those kind of recipes approach to statistics.

I would say well first of all these tests aren't very good.

they, they rely on the F distribution.

They just, they're heavily sense, they're

heavily dependent on the normality assumption.

so I would say just do a bootstrap.

If you want to test the quality of variances

do, do some sort of bootstrapping, re-sampling type thing.

Don't, don't do these, this F test. We have better things now so just do them.

and for smaller sample sizes you know,

just, just rely on exploratory data analysis.

But in

general if you're, if you're concerned about equality of variances

just use the unequal one and that, that solves the problem right there.

Okay. And then some very final comments.

I know I keep saying final comments but and then lying.

suppose you have equal numbers of observations for two groups.

And let's label the groups x and y.

in the data, you know, the data is random variables x and random variables y.

If, if the data are matched or, or paired, then the standard air of the difference

is estimating sigma squared y plus sigma squared x

both over n, minus twice the covariance divided by n.

And you can do this calculation if you'd like or you can just trust me.

Square root of the whole thing if you want to, to standard error.

If you ignore the matching, then the standard

error of the difference is estimating this square root.

My, my point is in, in some, in many cases,

by ignoring this correlation you're

unnecessarily inflating your standard error, right?

This, this, if this, if this covariance is positive, right?

then this negative 2 covariance over n, you're throwing out a positive term.

and this is the idea that,

for example, if you're comparing exams and

you want to know whether exam one was harder than exam two.

if you, if you threw out the information that, that it was

the same students taking it twice, then you you discard the fact

that, that some students study more than others.

Some students maybe have different backgrounds than others, and so on.

And, and so some students will consistently do well on

exams and some students will consistently do poorly on exams.

And that is encoded in this negative to covariants term.

And that's saying that that variation is associated with inter-student

va, variability. That you're throwing that away and

absorbing it into the rest of the variants.

now so at any rate the point is

is that if you treat group data as treat paired data as if its grouped.

you are, you know, you are missing out on some.

You're missing out on an important term in the covariance and

we can actually characterize what that term looks like.

And you might think to yourself why in the

world would anyone ever run a non paired experiment then?

And I think, you know, for more

complicated settings the rule isn't always the case.

That the pairing

always yields lower standard error, so in more complex settings, other things can

happen. As an example if you are studying

test, there might be a learning effect, right?

that you don't want to intrinsically study.

You know students may on the second exam might simply

be better by by having virtue of having already taken the first exam.

So its not fair to say is the second exam better,

I mean harder than the first exam are easier than the

first exam when the students have had an exam to practice

and get used to the, to the, to the teacher's style.

That sort of learning effect is an obvious thing.

So, if you really wanted to investigate the intrinsic properties of exam one

versus exam two, you might want to study an independent group exam.

Because otherwise you would be, with only, with only with

only data, where exam two is been tested after exam one.

You can never learn about that learning effect, right?

You, you know, for all the data points, example two was after exam one.

So you can't investigate that.

Now so,

so there are study designs where they trying it that they both take advantage

of the pairing and investigate these sorts of learning effects.

And, and, and I think you probably thinking yourself

that what the obvious thing that you'd have to do.

Well, for some people they would have to take exam two first.

And other people, they would have to take exam one first.

And, and you know maybe in, in the, in, in the setting an exam this

is an, a reasonable thing to do.

But in the setting of drug development it is a reasonable thing to do.

Let's suppose not two exams.

But your testing two headache medications. Deviations

then then you might, for every patient give them headache

medication one, have a washout period then give the medication two.

And then for another set of patients, give them medication

two first and then give them headache medication one second.

And that, those kind of designs are called crossover designs

and they've randomized the order in which the people receive

the treatment in order to adjust exactly for this learning or

carryover effect but any rate, my larger point being that.

The point of this slide is not to

say pairing is always better than independent groups.

but simply to say that if your data are

paired, treating them as if they're independent is both

wrong from the point of view of assumptions.

But also in some cases can actually hurt you in terms of

actually making you less likely, you know, having a, inflating your standard error.

[NOISE].

Oops, I guess that means I gotta go.

[NOISE].

Oops, I guess that means I gotta go.

[NOISE]

Oops, I guess that means I gotta go.

[NOISE]

Oops, I guess that means I gotta go.