In this video, we'll talk about calibration with an emphasis on poststratification. So the idea in calibration is we start with a set of sample input weights and then we're going to adjust those, calibrate them in a certain way. The input weights in a probability sample can be the base weight suggested for non-response and possibly for unknown eligibility also if you have any of that to correct for in a non-probability sample. We don't have these repeated sampling weights, so you can start by setting all the weights to 1, that's a possibility. Or you could use the quasi-randomization weights that we talked about earlier. And either one of those will get the process started. So the idea in calibration is to use auxiliaries to reduce variances or to correct for coverage errors. And in order to do that, we need population totals for each auxiliary variable that we use. So we've got several examples that we can think about doing as being in a class of calibration. Poststratification is probably the simplest one and is often used. Raking is similar, and I'll explain the difference between those in a minute. And then, the GREG is, in a sense, more general than raking or poststratification because you're able to use both qualitative and quantitative variables. So the usual implementation of poststratification and raking is that those are based on categorical or qualitative variables. Now here's a formula for the poststratification estimator, just to show you what it looks like. If we're estimating a total, here's the way we would do it. G, or gamma here, which runs from 1 to capital G, defines the poststratum. So these are non-overlapping groups that are created that exhaust the entire population, and we'll need to know control totals for those. So the cap N sub gamma is the population control, which we have to get from a census or some external data set. T hat y sub gamma is the estimated total for the analysis variable y, whatever it happens to be based on the input weights. And N hat sub gamma is the estimated count of units in that poststratum. So just to specify those more clearly, T hat y sub gamma is the sum over the sample in poststratum gamma, so this notation here, summation sign and on the side, S sub gamma means sum over the units that are in this set, S sub gamma. And those are the set of sample units in poststratum gamma. And then I take the input weight, call it di for unit i, times the data, yi, and I sum those up, and that's an estimated total for the units in poststratum gamma only. So when I take this ratio here, I estimate the total of ys divided by an estimated total number of units. This is just an estimator of the mean per unit in poststratum gamma. So when I take the mean, I inflate it by cap N sub gamma, the pop count, I get an estimate of the total. I sum across all poststrata, that's an estimate of the total for the grand total population. And N hat sub gamma is defined down here, it's just the sum of the input weights within poststratum gamma. It looks just like T hat y sub gamma, except they take out the y. So remember, any time you sum the weights for a subset of units, the way we've got the weights scaled, that is going to be an estimate of the total count of units in whatever group you're summing over, in this case, a poststratum. So that's the form of the estimator. And a poststratified estimator has an implied weight, which is a useful thing. It's defined here, so I take my input weight, and I adjust it by what's called poststratification ratio. So the pop control total for poststratum gamma divided by my estimate of the count in poststratum gamma. So you can see that if I've got under coverage this estimate, based on my sample, is going to tend to be less than the pop control. So this ratio will be bigger than 1. I'll inflate my input weight some, and I'll make up for that under coverage. Could work the other way too, if you've got over coverage, it'll tend to deflate the input weights. So intuitively, it's going the right direction to correct coverage errors. And why do we call these weighting classes poststrata? It's because we apply them after the sample is selected, and in fact, after the data are collected. If we had used these to design the sample, it wouldn't be poststrata, they would just be regular design strata. Now another thing to note about poststratification is, you're not limited to poststrata based on one variable like age. You can actually create your own composite or interaction variable, say age group by gender, and call this crossed group here the poststrata. So it's a fairly flexible technique in the sense that you can get as complicated as you want with the definition of the poststrata. You will be limited by your sample sizes. So you don't want to create poststrata that someone got a few units in it. A good number might be at least 30, but people's tastes do differ on that. So in the next video, we'll look at how to actually do this using some R software.