Sorting is a fundamental algorithm, so let's see how can we do sorting in parallel. We're going to implement a parallel version of merge sort. Merge sort recursively sorts two halves of an array, and we are going to do this thing in parallel. And then we need to merge the two sorted arrays into one larger sorted array. In order to do that we're going to use two arrays, one of which is going to be a temporary array. And we are going to be copying elements between the temporary and the original array. We will implement parMergeSort as the parMergeSort method which takes an array and a maximum depth that determines how much parallelization we will use and it's going to have the side effect of making the given array sorted. We will start by allocating an intermediate array. We will call this array ys. Whereas the original array was xs. And in order to use this auxilliary storage efficiently we will alternate between using xs versus YS as the array whenever we do the work. The shape of parallel merge sort is similar to the shape of many other divide and conquer parallel algorithms that we have seen. In the base case, we are just going to invoke the sequential sorting algorithm. In this case, this will be a quick sort. Previously we have to go this base case when the segment length was sufficiently small. In this case, this is controlled using this depth parameter and we will stop paralyzation when the depth tree the maximum depth. In this case we are making use of the fact that there is no point to do for the parallelization once we have saturated the parallel resources that we have on a given machine. The interesting part is when we have not reached yet the base case, we are going to then split the array in middle, computing the middle point. And then we are going to recursively sort the first half of the array and the second half of the array. We are of course updating the depth as we proceed to smaller array segments. These two sorts are taking place in parallel. Each of the sub routines calls to sort their recurring locations is going to ensure that the appropriate segment of DRA is sorted. So, we will have two, these joined segments of the array, each of which contains assorted list of some elements. Now, we will just like to merge these two segments into one larger segment that is still sorted. We will do this merge using the call to merge. What merge needs to do Is to copy the elements from the original segments into a new one. For that, we will use the auxiliary storage. Whether xs or ys is auxiliary storage is determined by whether the current difference between maxDepth and depth is even or odd. The flip Boolean variable determines whether this is the case. According to the value of flip, source and destination will either be ys and xs or the other way around. So this is the definition of sort that takes the segments. Beginning and the end. In order to sort an array, we simply invoke this version of sort, starting from 0 and going to the entire length of the array. Merging of the array can be done sequentially, this is a function that takes one array. Which is the source and then destination and it's going to take two sorted intervals in the source array and these intervals are given with the points the beginning, the middle point, and the end point. And it's going to write the resulting merged array into the destination. The version of merge that we are using in this implementation is sequential, so you can do it as you would do in usual merge sort. It is nonetheless an interesting exercise to think how one would implement merge in parallel. A simple operation is to copy one array, given by source in this function, into the target. How could we parallelize this? Let's assume that the beginning of this segment that we wish to copy is given by the from, and the end by until. And that we are willing to do parallization with the depth. In the base case, when the depth is equal to maximum depth, we are simply going to invoke a sequential array copy function. Otherwise, we are going to divide our current array segment into two sub-segments. And then we are going to invoke recursively our own function. By copying the two smaller segments, from middle to until, and from beginning until middle. At each step, we are increasing depth by one, so that at some point we will reach the maximum depth. We can now compare the performance of parMergeSort and the performance of the sequential quicksort. Here, I have the implementation of parallel MergeSort in the Scala prompt as well as code that is going to use Scala meter to compare the sequential and parallel execution. Let's run the benchmarking and see what the relative performance of these two versions is. We can see that we have obtained over two fold speed up for the array sizes that we have considered on this machine. So we have seen the main part of merge sort, how to recursively sort the two halves of the array in parallel. Then we have seen the merging part, which is also the fundamental component of the algorithm. For that we have used a temporary array. And finally, we have seen a simple parallel copy operation on the right that allows us to copy the temporary array back into the original array. Together this gives us our parMergeSort operation and we can see that it can, in fact, achieve speed-ups in practice.