In this part of the lecture, we will prove our main Lemma on the precision guarantees of the final version of our proximal point query primitive. Now this Lemma is shown on the slide. It says that if the number of hash buckets that we hash into is chosen to be sufficiently large and if the number of repetitions is logarithmic in the length of the stream then our approximate point query primitive gives an approximation to the frequency count of every data item in the universe at every position in the stream that is correct up to an additive epsilon times fk term. Where fk is the frequency of the kth most frequent item in the stream. Okay, well the failure probability in the statement will be inverse polynomial in the length of the stream. And because of that, instead of proving the Lemma for every point p between 1 and N in the stream, it in fact suffices to prove it for the last position in the stream. Because after that we can apply the Lemma to every prefix of the original stream and get the Lemma from the previous slide. Okay, so this is the statement that we would like to prove. Well, let us start by observing that in fact for every row i of our rEC. That is for every of the key hash functions, h r, that we chose by the basic analysis of our basic version of the approximate point query primitive we have that every single one of the estimators that we're taking the median off has the right mean. That is for our data item i, which we consider fixed from now on. For every r between 1 and t, the expectation of the rth estimator is exactly the frequency of the ith item. And now this is very good, it is reassuring, but it doesn't tell us that this estimator is actually better than the basic version of the estimator that we had before. And the distinguishing feature of our new estimator counts in reduced variants. Specifically by the analysis of our basic estimator applied to every single one of the little t estimators that were taking the median off we have that for every i. And for every fixed r from 1 to t, the variants of our rth estimator for the frequency f i is equal to the following quantity. It is the summation over all other data items j different from i, that hashed into the same bucket under the r hash function of the frequency, fj squared. And note, that the summation, those only over those j's in the data universe that hash to the same bucket as i. Now okay, so our hash functions or hashing the data universe into b buckets. So we should expect the right-hand side to be about a factor of 1 over b smaller in expectation than in our basic analysis. And this is where our gains will come from. Okay, so we would like to show that the variance is small. And we would like to show that it at least reduces by roughly a factor of b. In fact, we'll be able to do a bit better than that. Now specifically, we consider the variance of our estimator. And we separately consider the contribution of the head element of the stream to this variance and the contribution of the tail elements. So we write the sum over all data items j different from i that hashed under the same bucket, under the r hashing, and we think r is fixed from now on. We write the summation as two separate summations. The first one is a summation over data items j in the head of the stream. Think of the heavy hitters that are different from i and again hash into the same buckets of f j squared. So recall that the head of the stream is exactly the top k most frequent elements in the stream. The second term is a summation over the tail of the string, so the less frequent elements that occur in the stream of fj squared. And this is again only over those j that hash to the same bucket as i under the r hush function. Now, our plan is as follows. We would like to bound these two terms. What we will do is for each fixed i and for each data item i, we consider both fixed from now on, we will define the following three events. The first event is an event that we will call no collisions of r and i. This event occurs if and only if our data item i does not collide with any of the top k most frequent elements in the stream that is the head of the stream under hashing r. The second event is a small variance event, again it indexed by the hash function index r and i itself. This event happens, if and only if, i does not collide with too many of the tail items under hashing r. So note, that the first event, if it occurs, implies that the first summation above is just 0. There are not other events in the head that i collides with. And the second event will insure that essentially and the second summation is fairly small. After that we will also need a third event which we denote small deviation of r and i. And this is essentially the success event from our basic analysis from the outcome of probabilities inequalities. What we will show in the next few minutes is that in fact for every r and every data item i, all three of these event on the slide hold simultaneously with probabilities strictly better than one half. And after that we'll later show that this implies that if we take a logarithmic number of independent trials, the median estimate will give us a good estimate with high probability.