Now we have defined

the major factors which operate on the genetics of populations,

we can even have the definition of genetic population.

I would like to go to a little bit of more mathematical and specific example.

But before we can put mathematics in practice,

we really need to come up with a concept,

very abstract concept, of a genetic population

with which we could be able to operate in mathematical terms.

So what am I going to assume about this hypothetical population,

which will be subject of our mathematical investigation?

We are going to first assume that it's very large,

it's actually infinitely large.

And this is a very important assumption which will allow us to

operate with the terms of probability theory.

And then secondly, we are going to assume that generations are not overlapping.

So, what is actually going to happen is that one generation is going

to give rise to a pool of gametes,

and from this pool of gametes,

pairs of gametes is going to be sampled to define the next generation.

And the first generation you can think of this as it's dying out or something,

but the generations are not overlapping.

The mating happens only within one generation.

And our third line of assumptions is about segregation and aggregation of alleles.

We will assume random and abandoned segregation of

the alleles and also aggregation of these gametes in forming zygotes.

And this assumption is equivalent to assumption of

panmixia and that Mendel's rules are in place.

Now, if you think of such ideal population,

what kind of expectations would be there for fate of an allele?

So, let's consider a specific system which is made of two alleles, A and B.

And consequently, in the population we could observe three genotypes,

AA, AB and BB,

two homozygous and one heterozygous.

Now, if I want to think like what happens to the frequency of alleles and what are

the expectations of genotypic distributions used in the model we have just formulated?

Then, first from the parental generation,

we need to form a gametic pool.

So, each individual will dump two alleles into this pool.

And then we can easily estimate the frequency of A alleles in this genetic pool

as the probability of homozygote A to be

in parental generation plus half of the probability of heterozygote,

and this falls from the very simple fact that an individual of

the homozygous genotype would generate only gametes of the specific type,

while heterozygous individual is going to generate 50% of gametes A and 50% of gametes B.

So, our frequency of A allele in the genetic pool,

let's denote it as p,

is a probability of AA parental generation plus half the probability of the heterozygotes.

And we can figure out the frequency of the alternative allele B in the same manner,

or actually we can obtain it as one minus the frequency of Allele A,

so one minus B,

let's call the frequency of this allele q.

Now, if you assume that the pairing of gametes to form a generation is at random,

then it's very easy to compute what is going to be

probabilities of different genotypes in next generation.

While the frequency of sampling two gametes at random and that

both gametes are A is simply the p square.

And then for the other homozygous it's q square and for heterozygous is twice pq.

These proportions are known as Hardy-Weinberg equilibrium proportions.

Now, I want you to stop for a moment and think,

this is highly abstract model we have just considered.

We have these infinitely large population,

we have no overlap between generations.

We have a random sampling,

we have gametic pool.

This is very far from reality if you think of

any natural population like population of humans. The generations do

overlap and there is no such thing as gametic pool, and we pick our partners ourselves,

despite of this, it is very interesting that as soon as population is panmictic,

you go there, you sample a random locus,

and you normally you'll see that it's in Hardy-Weinberg equilibrium.

So, there is an incredible power in

this simplistic model which does

describe real things we observe in panmictic populations,

like populations of humans or random bred cats.

Now, we have just considered a system which consist of two alleles.

I would like you to consider a slightly different question.

Let's think of a system which is made of some arbitrary number say N alleles.

And then I would like you to think about two questions.

First, out of N alleles,

how many genotypes could be made?

And secondly, what would be how the Hardy-Weinberg expectations for such a system?

One of the possible ways to approach this problem would be in a graphical manner.

So what we can do, we can draw a matrix where the N alleles which we can see there,

will be on the column and on the long dimension.

And then the cells within this matrix will

represent specific genotypes made of these two alleles.

It is immediately follows that a number of possible homozygous is simply the number of

alleles N. And when we want to consider how many possible heterozygotes are there,

then they are represented by that off-diagonal elements of the matrix.

Now, mind that we are not distinguishing AB heterozygous from BA heterozygotes,

at the moment we don't quite care whether A came from

mother and B came from father or the other way around.

So in that, there are off-diagonal elements,

they are symmetric and equivalent.

And then if you want to count a number off-diagonal elements,

we can do it in this way.

So total number of elements in this matrix is N squared,

and N of them is homozygous.

So if we subtract the N and divide the result by two,

this will give us the number of possible heterozygotes.

So at the end, if you add up the number of homozygous and heterozygotes,

we are going to come up with the figure of a product of

number of alleles by number of alleles plus one divided by two.

And then thinking of Hardy-Weinberg equilibrium,

we can follow exactly the same logic as we did in the biallelic system,

and then the probability of homozygote is simply a square of the frequency of

respective allele and the probability of heterozygote

is twice the product of the frequencies of respective alleles.