That means that elements 502 to 1,000 have zero chance of being selected.

As do, by the way, choosing the first element to start,

elements 2, 3, 4 through 10.

Elements 12, 13, 14 14 through 20.

There's a problem here obviously, and doing systematic sampling and

always starting with the first and then taking every tenth.

We don't spread our sample out across the entire list and

if there's something different about the transactions

in the first half of the list compared to the second, we've missed.

So we need to spread our sample out over the whole list.

We're going to need to vary the count to account for the size.

We're going to have to scale this to the size of the list.

And we should also vary the selection start.

There's no randomization in this if I always start with the first one.

That poor first transaction's always going to be in all of my samples.

I once did a sample of students, from a population registry in

a registrar's office, for a university, and they had been doing this all along.

They had always been sampling by starting with the first case.

The programmer had an algorithm they found in a cookbook,

in a set of algorithms for random sampling,

there was actually systematic sampling, it always started with the first case.

I'd pity that poor student who was first in the list because there were,

in all the samples as long as they were first on the list.

We're going to vary that, so

we're going to do two things to modify this procedure.

What we're going to do is not take every 10th, but every 20th.

If there's 1,000 in the list and we need to get our sample spread across the whole

list, what we're going to do is take 1,000 divided by 50 to figure out the interval.

Not just an interval that's convenient 10, but

an interval that fits the size of the list.

So we're going to add to our consideration then account, but

our account interval may vary depending on the size of the list.

In addition, we won't start with the first, but if we're going to take our

sample from the first every 20th, what we need to do then, is possibly

vary the list selection by starting with a random place among the first 20.

Because that way when we start in the first 20, and we choose one at random, and

we keep adding 20 to that now to get our sample selections.

When we get done,

we will actually have a sample size 50 before we run off the end of the list,

because of the scaling that we've done in respect to the population size.

So back to our list then.

In our list,

our transaction list, we randomly choose to start with the fourth one.

We've looked up a random number.

We've generated a random number from our software systems, and we start with that

random selection, and we take the 4th, and then we add the interval, the 24th, and

we add the interval, the 44th, and we add the interval, and so on.

And so we've got a very even division of our population distribution shown on

the lower left hand side.

A very even spacing of our sample selection such that we get our required

sample size.

And we start at random.

There's a random element to this.

So we've adapted our selection process to the size of the sample and

the size of the list.

We've calculated an interval to make it more formal.

An interval let's call it k that is equal to the population size divided

by the sample size.

In this case 1,000 divided by 50 or 20.

And we choose the random start anywhere from 1 up to k, at random.