Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

來自 Johns Hopkins University 的課程

Mathematical Biostatistics Boot Camp 2

54 個評分

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

從本節課中

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

So

[NOISE]

it's not just enough to have a testing procedure.

we'd also like to have some sort of confidence interval.

So, let's let pi j hat be the sample proportions.

And imagine if we want to estimate d

equal to the difference in the marginal proportions.

So in this case this would be the

difference in the marginal probability of an approve vote.

so so then, this is equal to n 1 2 minus n 2 1 over n.

So that estimates the difference in the marginal proportions.

so we talked in the previous slide about the variance of

this estimator, about the variance of this estimator, under the null hypothesis.

Let's talk about the variance of the estimator in

general, and the variance works out to be this format.

This form, pi 1 plus 1 minus pi 1 plus plus pi plus 1 1 minus pi plus 1.

So that's the you know, divided by n, that would be the kind of difference in

binomial type variance that you would expect to see.

And because the the, the samples are correlated.

We have this correlation term, minus twice pi 1 1 pi 2 2 minus pi 1 2 pi 2 1.

Okay?

And so that's subtracting out the correlation here.

And

what would happen you know If, if basically there's a lot

of counts in these off-diagonal cells, pi 1 2 and pi 2 1, right?

Then pi 1 2 and pi 2 1, pi 1 2 times pi 2 1 would be a big number.

We have minus twice that big number, which would result in a larger variance.

if, if the off-idiota cells are really small, and

most of the data lie on the main diagonal.

then pi 1 2 at times pi 2 2 would

be very large, and we'd have minus twice that number.

And we'd wind up with a much smaller variance,

than the standard kind of difference in binomials variance.

Okay?

so we could take d minus the true difference

in proportions divided by the standard error estimate here.

And that follows an asymptotic normal distribution.

and we can use that again to create confidence intervals.

I think, I hope everyone at this point in the class, could do something like that.

So this last bullet point here, I say compare sigma

d to what we would use if the proportions were independent.

So compare the result to if, instead of asking the

same people on two occasions whether or not they approve.

What if we asked different set of people each time?

Then this minus twice part would go away.

Okay?

But what, what do we kind of think? We kind of think that

people who approve on the first occasion, would

be more likely to approve on the second occasion.

You might think if you are in the U.S, if you're, If

you're a democrat, you might, you know, approve of, say, President Obama.

On, on a first question, you'd be more likely to

approve on the second question, on the second time point.

And the same thing with the people who disapprove.

If you're a republican, and you disapproved on

the first time point, you, you'd be more

likely to disapprove the second time point.

So and that follows, you know, that's a very frequent form of correlation.

where the measurements tend to be concordant, they tend to agree.

so that is exactly this case, where pi 1 1 times pi 2 2.

will be much larger than pi 12 times pi 21.

In other words, things will tend to lie on the main diagonal of that 2 by

2 table, of the matched 2 by 2 table in that people will tend to agree.

And so

if that's the case, this covariance term here will

be positive, so we'll have minus twice this positive number.

And, and you'll, you'll get a dramatic reduction in the variance.

So in other words failing to account for

the fact that the same people were asked twice.

In, in this case would be a a really kind of dumb

thing to do.

Because you have a reduction, reduction, you'd have a reduction in precision, you

get a much wider confidence interval if you, if you fail to do that.

So it gets, it's interesting in general.

But even if it, even if it resulted in

a, in a wider interval to account for the dependency.

You'd still want to do it, because that will give you the

correct interval rather than one that's based on completely incorrect assumptions.