>> In this lecture, we will study the conc data type, which is a parallel counterpart to functional consts list and is used to manipulate data. This will reveal a data structure with an efficient concatenation method. Let's quickly recall the const list data type from functional programming. Const lists are represented with the abstract data type list of T, which has two concrete implementations. Const contains an element and a pointer to the tail of the list and Nil represents an empty list. We usually express operations on the list data type in terms of recursion. Let's implement the filter operation on lists. The filter operation takes a list and a predicate function. It start's by matching the list object against possible patterns. If the list is non empty, that is composed of an element x and a tail xs access, and the predicate holds, then we recursively filter the list and prepend the element x. Otherwise, if the predicate does not hold, we skip x and recursively filter the list. If the list is empty, there is nothing to filter, so we just return nil. Due to their recursive structure, lists are built for sequential computations. They have to be traversed going from left to right. Trees have a different recursive structure. Since there is more than one choice for recursion, different subtrees can be traversed in parallel. Let's take a look at one possible data type for trees. We will call this abstract data type, Tree of T. The Tree data type will have three concrete implementations. Node of T will represent internal nodes, which contain only two references to the left and the right subtree, but no concrete elements. instead elements will be contained in the leaf data type. If the tree is completely empty and has no elements, it will be represented with the empty data type. Know that despite the fact this data structure is a tree, it is still used to store element sequences and not sets. We should not confuse it with a search tree. Here is one example of this tree sequence containing elements one, three, and five. The elements themselves are contained in the leaf, and the leaves are bound together with the node objects. >> Lets repeat the previous task of implementing filter, this time using the tree data type. The new filter method is once more recursive. In the tree object is a node, then the two sub trees left and right are filtered in parallel. Recursion stops when we reach a leaf node. At which point we either return that leaf, or exclude it by returning an empty tree. We know that in practice we would have to insert some additional logic to detect the threshold for recursive parallel calls. But we leave that part out for clarity. But is this filter method really what we want? Assume we have the tree shown on the left. We read the elements in this tree from the leaves going from left to right, 1, 3, 3, 4, 5 and 6. Let's now assume that we want to retrieve only odd integers. Use the filter method with the following predicate. >> After filter completes, we obtain the following tree. Trouble is, our tree now contains some empty trees as well, making some of the branches longer than they should be. We could resolve this problem by redefining the node constructor to eliminate the empty nodes while the tree is created. If we do that, we end up with the tree on the right, which no longer has empty nodes. Unfortunately, this tree is not suitable for parallel computations. It looks more like a list than a tree. Next time we start a parallel operation on this tree, the workloads will not be balanced between the processors. >> Trees are only good for parallelism if they are balanced. To ensure balancing, additional information must be stored in each node. Let's take a look at another data type called conc, used to represent balanced trees. In addition to the left and right sub tree, this abstract data type will define the level and the size of each sub tree. The level will be equal to the longest path from the root to a leaf, and size to the number of elements in the sub tree. The level is, in other words, just the height of the tree. This conc data type will correspond to a so called conc list, originally introduced by the Fortress language. The conc list is a neat data abstraction for parallel programming and could have any number of implementations. In what follows, we will study one particularly concise conc list implementation. The conc data type has the following concrete subclasses. Again, the empty class represents empty trees. The level will be zero and the total number of elements will be zero. The single class will represent leaves containing a single element. Their level will again be zero and the total number of elements one. Lastly, the conc class will represent inner nodes of the tree. We will represent conc class with a small diamond, denoting the choice of going either left or right. Here, the level is equal to one plus level of the higher subtree. And the total number of elements is equal to the number of elements in both subtrees. >> In addition, we will define two invariants on these data types. First, to guard against sparse trees with too many empty sub trees, we disallow inner nodes from pointing to empty trees altogether. The following tree will not be allowed. Second, to ensure the trees remain balanced, the level difference between the left and the right subtree of an inner node will always be one or less. So by this invariant, the following tree is not allowed. Let's take a look at the level of the subtrees. The level of this subtree is zero. >> The level of this subtree is equal to one. >> And finally, the level of this subtree is equal to two. We can now see that the second invariant is broken at the root of the subtree because the difference in levels is greater than one. However, this last tree is allowed. The high difference between any two subtrees is equal to one or less. Thanks to this invariance, a three with n elements will always have a height that is bounded by log of n. As a consequence, all contrary operations will have the same logarithimic bound. We will use this invariance when implementing concatenation In the spirit of the conc operator used to build functional lists we will overload the conc constructor with a method with the same name. The conc method will be a member method on the conc data type, and take another conc tree as an argument. This method will start by ensuring the first invariant. It will eliminate the empty trees. And then delegate the real work to another method called concat. This concat method cannot always directly link two trees together by creating a conc node above them, as that would violate the height invariant. >> Instead, the concat method should reorganize the tree into something like this. This tree contains the same elements, but the nodes have been slightly differently arranged. >> So the concat method must precede case files. First, it must check if the tree height difference is one or less. This case is shown in the following figures. When linking two trees which levels two and one or more generally, two trees with the appropriate depths. The code for this case is simple, we need to compute the difference in levels between the two trees, and if the absolute value of the difference is less or equal to one is simply link to trees together. Note that the expression new conc just creates one node, that is, links the trees. It is different than calling the conc operator on two trees, which would also take care of free balancing. Otherwise, if the height difference is not one or less, then without loss of generality, we assume that the left tree is higher than the right tree. If the left tree has some height, d, then the right tree must have a height, e, which is less than or equal to d minus 2. We cannot link these trees directly, so we will first have to break excess into sub trees, recursively concatenate the right most sub tree with ys, and then relink all the trees together. How exactly we will relink those trees together depends on the configuration of the left tree. In the first case we will assume that the left tree is left leaning. That is its left sub tree is deeper than its right sub tree. In this case we can recursively concatenate the right sub tree. The resulting tree will increase in height by at most one and then re link the two trees >> This is shown in the following three lines of the concat method. First, we concatenate the right sub tree, and then we link the remaining trees together. So there occurs a concat call, and the instantiation of the new node corresponds to the following. Now, let's consider the second case. The left tree excess is right leaning. This time, we also decompose the right sub-tree of excess. The depths of the children of the right sub tree can be d minus 2 and d minus 2, d minus 3, and d minus 2, or d minus 2 and d minus 3. In any of these cases, we recursively concatenate the rightmost child with ys. >> We are then left with three trees that we need to relink. If the result of recursive concatenation gives us a tree with height d minus 1, then we need to relink these left two trees before linking everything together. Otherwise, we first relink the right two trees, which results in a tree with a height of, at most d minus 1, and we then relink everything together. >> The code for case two is as shown, it concatenates the right most sub tree with the right tree and then relinks] the trees in a smart way. >> We will not explain every line in detail, but we invite you to study it more carefully if you're interested. So, having seen how tree configuration works, here is a quiz for you. What is the synthetic complexity of the conc method? So here are your options. Is it possible that running time is logarithmic with the number of elements in the larger tree? Alternatively, is the number of computational steps related to the height difference between the trees? Another possibility is that the time is linear with the number of elements in the larger tree. And finally, could the running time be constant? >> So let's see, since the concat method sometimes recursively calls itself to descend deeper into the tree, the running time is almost certainly not constant. We can eliminate that option. Then, since the concat method links and relinks the trees, but never actually descends in the left subtree of xs, it never spends time processing those elements either. Therefore, the time cannot be linear. Let's examine the remaining two cases. Recursion in the concat method ends as soon as the heights of the two trees are comparable. That means that there will be h1 minus h2 recursive calls. Each recursive call has a finite amount of work in it, so the total running time must be O of h1minus h2. The bound of log n is therefore too pessimistic. So if you answered O of h1 minus h2, you answered correctly. Don't worry if you feel that you didn't understand all the details of concatenation right away. The most important thing to remember is the conc data type, and how it is used as a data astraction in functional parallel programs. In the next lecture, we will see how the core of this data structure can be used to implement more efficient combiners.