So, we now come to, To P or Not to P. This in fact was the title I gave to my PhD thesis. So if you're really interested in this topic and/or if you suffer from insomnia and you have difficulty sleeping, you can just do a quick search engine for, To P or Not to P. I'm James Abdey and you can download the great PhD thesis work. And if you are suffering from insomnia, start reading it and I'll give you about 30 seconds, I'm sure you'll be out like a light. But nonetheless, To P or Not to P is all about p-values. So these are very useful tools, useful instruments if you will, to allow us to make this binary decision of whether or not to reject a null hypothesis. So given from the previous section, we've decided and taken it that a type one error is more problematic than a type two error such that we seek to impose a threshold on our probability for committing a type one error, and we do this using the significance level, Alpha. What we're going to do in any kind of statistical hypothesis test in our so-called classical world of hypothesis testing is to calculate something called a p-value. Now, the p of p-value stands for probability, which simply means that all p-values are going to be probabilities i.e. values which lie between zero and one. So remember, our significance level Alpha was the probability of a type one error. So it too is a probability, some value between zero and one. And we've said a 5% significance level in decimal form .05 would be our first choice for a significance level. So we are now in a position to have a very simple binary decision rule, whereby in any hypothesis test we can calculate a p-value. And what we're going to do is to compare this to our benchmark significance level of Alpha. So, take this unit interval from zero to one. And so we're going to partition this into two parts, the region from zero up to Alpha. So from zero, let's say to 0.05 and then the remaining portion of this unit interval. And we simply compare where our p-value lies on this unit interval. And if the p-value lies below the significance level we will have what we call a statistically significant result leading to the decision to reject the null hypothesis. And conversely, if the p-value is greater than Alpha we will opt not to reject that null hypothesis. So, it doesn't matter what kind of statistical tests we wish to perform, provided we have a p-value. The process of interpreting the p-value never changes. But you're perhaps still wondering out there, what does a p-value represent? Well here I'm going to give you a fairly informal definition/interpretation of a p-value. But think of it as a simple measure of how consistent the evidence which in our statistical world would be observed sample data we've obtained. How consistent is the data with respect to our null hypothesis? And if we deem that the data are sufficiently inconsistent with the null hypothesis then we would be so inclined to reject that null hypothesis. Remember the jury on with all of this forensic evidence, fingerprints on the murder weapon, the evidence seems to go against the null hypothesis of the defendant being not guilty and hence the jury would typically return a verdict of guilty. So in our p-value world of testing this p-value is a measure of just how consistent the data are with respect to our null hypothesis, such that we would wish to reject the null hypothesis if this probability was sufficiently small. Well, how small is small? Well, small with respect to our chosen significance level Alpha. So small p-values will be those which lie below that significance level Alpha. So if we take a 5% significance level as our default position then p-values below .05 which suggests the data are sufficiently inconsistent with the null hypothesis that we would wish to reject it. So how are these p-values calculated? Well, in practice, we will make use of computer software, so various statistical or econometric packages and depending on what kind of hypothesis we wish to test there'll be an appropriate statistical test to facilitate this. Now, the good thing about these computer packages is that they will crunch the numbers and in the output which is returned, they would typically report a p-value. So as far as this introductory sort of look at probability and statistics is concerned, we're not going to delve in too deeply as to how one calculates these p-values, but a very important skill is to be able to know how to interpret these p-values. So keeping in mind the simple schematic of the unit interval between zero and one, p-values below your significance level will indicate sort of an extreme departure from what we might expect under the null hypothesis and lead to rejection indeed of H zero. So the p-value is a very useful tool for us within our decision theory or our hypothesis testing framework. Now, I would just like to perhaps conclude this section with a discussion of a couple of key influences on the magnitude of the p-value. So to assist us with this, I'd like us to consider a very simple example of tossing a coin, a coin for which we do not know whether or not it is fair. So we want to get away now from our not guilty, guilty sort of legal pair of hypotheses to what we are more likely to come across in a statistical form of testing, whereby you may seek to test the value of a parameter. So if we think now about tossing a coin as a binary set of outcomes heads and tails, and we may assign pi as our parameter to indicate the probability of success, which let's suppose here equates to getting heads. Now, of course we covered this when we introduced the Bernoulli distribution and extended that discussion into the binomial distribution as well back in the second week of the course. So, let's suppose we suspect that maybe our coin may be a biased coin. So if we set as our null hypothesis that the probability of success pi is equal to not .5 which would be indicative of a fair coin as null hypothesis against the alternative hypothesis that the coin is not fair i.e its biased, where the probability of success pi does not equal to not .5. That could mean it's greater than or indeed less than a .5. So we then conduct an experiment. So I just like to cover these two influences which affect the magnitude of the p-value. Now of course this is critical because it's precisely the magnitude of the p-value which will determine whether or not we reject the null hypothesis. Does the p-value lie above or below Alpha? So let's consider first of all the so-called effect size influence. So let's imagine we toss this coin 100 times. So if the null hypothesis was true, conditioning on this null hypothesis just as the jury has to condition and assume the defendant is not guilty at the start of the trial. So if you had a fair coin and tossed this a 100 times, how many heads would you expect to observe under this null hypothesis where the probability of heads pi is equal to not .5? Well trivially, if it's a fair coin we would expect heads and tails to be equally likely to occur. So in 100 random tosses of the coin we would expect 50 heads and of course 50 tails. Now, let's imagine we did actually observe that experimental outcome. Now of course this would not prove we have a fair coin. However, we could say that the evidence we've obtained is as consistent as we could hope to get in support of that null hypothesis of a fair coin. And in this case our p-value would be equal to one. So remember on that unit interval, one is at the very much the upper bound on what is permitted for a p-value and really as far beyond a significance level as we could ever get. So this is not proving the null hypothesis is true, not proving it's a fair coin but we would have no statistical reason to justify rejecting that hypothesis. So as far as the effect size influence is concerned, now let's imagine when we toss this 100 times, we don't get 50 heads and 50 tails, we get some other outcome. So remember your expectation is 50 heads and 50 tails. So taking this as our benchmark of 50 as the number of heads you observe deviates further and further from 50, of course, that could deviate either closer to zero or closer to 100 then increasingly, you are observing evidence which is increasingly inconsistent with the null hypothesis. So, as what you observe deviates further from your expectation than what will happen to the p-value. Again, ignoring the technicalities here we will just note that the direction of the p-value is that the p-value will become smaller as what we observe deviates more from our expectations. Now, there's clearly going to come some critical point whereby the p-value has reduced, remember from that maximum value of one, and as our evidence gets further away from our expectation, the p-value is going to become smaller and smaller and smaller, and there will come a critical point when this p-value now passes through that .05 threshold and hence we will have a significant result such that we would be inclined to reject the null hypothesis. Note though, doesn't prove we are right when we reject the null hypothesis and say it's a biased coin. Remember, this could just be a type one error, the incorrect rejection of a true null hypothesis. But nonetheless, the effect size influence will have an influence on the p-value, namely, as the effect size, the departure between what we observe and what we expect to observe, as this becomes larger, the p-value will become smaller and hence more likely to be significant. So we'll just round off with the second key influence on the magnitude of the p-value. And this is going to be the sample size influence. So, as the sample size increases and you get more and more observations, then other things equal that will tend to make your p-value smaller and hence as that p-value becomes smaller and smaller and smaller, more likely of course to be below that threshold significance level of Alpha. So now, we're going to go in reverse and we're going to fix the effect size, let's say at 40% heads and now just consider what happens as we vary the sample size. Specifically how does the p-value vary. So I would just consider two somewhat extreme cases. A small sample size versus a large sample size. Suppose we toss this coin a total of ten times and if we observe 40% heads that would equate to the four heads and six tails. Now, if you observed that evidence based on that small number of tosses of the coin you are not instantly going to dismiss this being a fair coin, because to be a fair coin and get four heads and six tails does not seem that unlikely an event. And of course as far as not being that unlikely simply equates to a very large p-value and hence in also known rejection region. However, if we were to now increase the sample size from 10 let's say to about 400 but maintain that still a fixed proportion of 40% heads, then we would have 160 heads and 240 tails. So it's the same effect size but now based on a much larger sample size. And the influence of the sample size is that as n gets bigger other things equal, the p-value becomes smaller such that we now enter the rejection region and hence decide that we don't have a fair coin and conclude that it is a biased coin. So effect size influence and sample size influence are the two key determinants on the magnitude of the p-value. Now, appreciate that there are many different types of statistical tests out there. But nonetheless these key concepts of effect size and sample size influences will typically determine what the magnitude of the p-value is and hence whether you deem to have a statistically significant result or not.