Or maybe just to denote that it's on a squared scale a delta squared.

Okay, they're going to go with the Greek letter, why Greek?

Well, classical reasons but, spare notations as possible.

Whereas, the social science community tends to prefer mnemonic devices,

memory devices so here D for design and e-f-f for effect,

deff which is a ratio of variances for that sample proportion p.

All right, same sample size numerator and denominator and

at the last line in the very bottom of our display.

We see the ratio, you probably didn't recall that the sampling

variance we computed from our sample of size 240 was 0.00276.

And for the simple random sampling equivalent taking the same data and

treating it as a simple random sample came out to be 0.0009.

Those numbers are hard to.

Taking differences here would be a real headache.

This is the scale we're getting for a sample of this size involving proportions.

This is what variances would tend to look like.

But what's important is the ratio.

And we notice that the cluster sampling variance is three times larger than

the simple random sampling variance.

A three fold increase.

The same sample size, much reduced cost.

Now you could raise the question.

Well why would I do the cluster sample if I'm going to lose invariance compared to,

I'd rather do the random sample for the same sample size.

No you wouldn't because the simple random sample would cost

much more money to build the list necessary to select it.

Now if you already got that list assembled, that's a different matter.

But here, we don't have a list assembled, and

that's what happens in most of these cases with cluster sampling.

We have a list assembly cost, and

that list assembly cost could be a hundredfold higher.

Let's say it is a hundred fold higher.

It costs us 100 times as much to assemble that list of all of the names as it does

to get a list of the thousand classrooms, select 10 of them and

then go to those classrooms and get the classroom rosters for those 10.

And if it's a hundred fold difference,

a threefold loss in precision from cluster sampling is nothing.

We're still ahead by 33 to 1.

So we're willing to suffer this loss because we save so much money.

Now, we don't often do that calculation.

Often? I don't think I've ever done it at all.

It's just so obvious that the cost would be so much less with a cluster sample.

And I haven't even mentioned the travel cost to go from classroom to classroom.

So it's not about cost here, it's about variance and equal sample sizes.

We lose in precision on that comparison, but it's a convenient kind of comparison.

It's not about now getting these things so

that we can compare this on the basis of the same budget.

That's typically not done although it could be done.

All right, now this design effect comes up all over the place in literature and

it comes up in many different ways.

For example, one way that it comes up is the following.

That design effect is a ratio of variances.

Well if I multiply both sides of it, if I multiply the right hand side and

left hand side of the simple sampling variance.

I get the variance of the proportion, just the numerator the one side is equal to

the design effect times the simple random sampling variance.

That is, the design effect can be thought of as an adjustment

on what happens in simple random sampling.

In this case, as I said,

a three-fold increase in the variance compared to simple random sampling.

And that sometimes is used by people to assess the impact of the cluster sampling

on their design, when they're looking at the analysis.

They've done simple random sampling calculations and they say,

but I think the design effects here, based on other calculations,

are a factor of two or three.

They're going to take the simple random sampling variances and

inflate them correspondingly.