If you know the probability distribution of a random variable, it's possible to calculate the probability that this variable falls within a certain range. In this video, I'll explain how that works, using a normally distributed random variable as a concrete example. A probability density function, often abbreviated as pdf, specifies the probability per unit of the random variable. Here is an example of a pdf of the daily waiting time by taxi drivers of the Mokum Taxi Company. At the y-axis you see the probability per hour and at the x-axis is the waiting time in hours. So if you are an MTC taxi driver and you'd like to know the probability to spend more than seven hours waiting all day you would need to calculate this surface area. On the basis of this graph you can roughly estimate the area. With a cumulative probability function you can do the same, but then more accurately, by reading the probability for the relevant values from the y-axis. So you’d read the y-values corresponding with an x value of seven hours. Next, you would subtract this probability from one because you’re interested in the complementary probability of waiting longer than seven hours, not shorter. Let's now apply this to a distribution for which we actually know the equation, the normal distribution. It's pdf has this shape with the center placed at mu and the width defined by sigma. It's corresponding cumulative probability function looks as follows. Interestingly, while the curve changes with any change in these two parameters, mu and sigma, the probability for an interval expressed as a distance in units of sigma around the center is always the same. Let me illustrate this. Here you have a curve with mean 20 and standard deviation 9. And here's a curve with mean 30 and standard deviation 6. For both pdf’s, the area between the mean, minus one standard deviation and the mean plus one standard deviation is shown. And in both cases the surface area under the pdf is 0.68. It's always the case for any normal distribution regardless the values for mu and sigma. Now if you'd move on and take for instance an interval not one sigma but two times sigma around the mean. The probability for that interval appears to be 0.95. When taking three times sigma, it turns out to be 0.997. These probability values for intervals of one, two, and three sigma around the mean of a normally distributed variable are often used in statistical calculations. Let me illustrate the one, two, and three sigma rules further with an exercise. Assume that the time you spend traveling on a weekday is given by this normal distribution, with a mean of 40 minutes, and a standard deviation of 10 minutes. What will then be the range of travel times for 95% of your weekdays? Right. You know that 95% of the cases should lay in the interval from the mean minus two times sigma to the mean plus two times sigma. In this case, that's from 40 minus 20 up to 40 plus 20, which is 20 to 60 minutes. We can also turn the question around. Let's assume you'd like to know the probability to be traveling more than 50 minutes. Can you calculate it knowing that the average traveling time is 40 minutes, that the standard deviation is ten minutes and the one sigma rule? To answer this question, a bit of creativity is required. You know that a normal distribution is symmetric. So half of the probability is located at one side of the mean. and therefore also the probability for the interval between the mean and the mean plus one standard deviation is half of 0.68 which is 0.34. So the probability to travel less than 50 minutes is 0.5 plus 0.34. 0.84. But you would like to know the compliment, the probability to travel more than 50 minutes. This is one minus 0.84 which is 0.16. Let me summarize what I have explained in this video. On the basis of a probability density function, you can calculate the probability that the random variable falls within a given range by estimating the area under the curve for that range. With the cumulative probability function you can do the same, but then more accurately. By reading the probability for the relevant values from the y-axis. For a normally distributed variable there is a fixed relation between the interval around the mean