0:00

An important problem in programing is decomposition.

Let's say you have a hierarchy of classes and you want to build tree-like data

structures from the instances of these classes.

How do you build such a tree? And how do you find out what kind of

elements are in the tree? And how do you access the data that's

stored in this elements? That's the problem of decomposition, which

we are going to tackle in this session. Lets look at this with an example.

Suppose you want to write a small interpreter for arithmetic expressions.

To keep it simple, lets restrict ourselves for the moment to just numbers and

additions as expressions. Such expressions can be represented as a

class hierarchy. You'd have a base straight Expr for

expressions and two sub classes, Number and Sum.

Now if you want to explore a tree of nodes consisting of Numbers and Sum, you would

like to know what kind of tree is it. Is it a Number?

Is it a Sum? And what components has it?

To be able to do that, it brings us to the following implementation.

1:10

So here you see, I have my class hierarchy, I have an expression and then I

have two subclasses Number and Sum. Now for a complete functionality of

expression, if I look at an expression I would like to know is it a Number or is it

a Sum and that's done by the first two methods. Here is Number and is Sum.

Now, if it's a Number, then I would like to know the value of that Number.

That would be a number, let's call it n. And if it's a sum, then I would know, want

to know what are it's operands and that would be a leftOp and a rightOp field.

2:21

And we would have the three accessor methods for either Numbers or Sums, which

we've numValue, which, this one applies only to Numbers.

And leftOp, rightOp,, these ones do apply to sums.

So to implement these five methods, what do we do in Number?

Well, isNumber obviously is true, isSum obviously is false. The numeric value of a

number is just the number we pass into it. And the left operand and right operand,

they are operations that are not applicable to Numbers. So both of them

would throw an error that says that say well you have called a leftOp or a rightOp

method on number and that is illegal. So let's look at Sums next.

So the idea would be that Sum of e1, e2 would correspond to the expression that is

e1 plus e2. Its five implementations are as follows.

A Sum is clearly not a Number. A Sum is a Sum.

The numeric number of a Sum is something that's not applicable, that would throw an

error. The left operand is the first argument

that we pass into Sum. The right operation is the second argument

we pass into Sum. So now that we've got the basic wiring of

expressions, let's do something with it. One thing we could do is write an

evaluation function. So the evaluation function should take one

of these expression trees and it should return the number that it represents.

So for instance, I would like to have that eval of Sum of the Number one and the

Number two, Should give me three. So how would I write an evaluation

function like that? Well, one way to do that would be to

simply ask, given an expression, what it is?.

So, we ask this question, is it a number. If yes, then we can return the numeric

value of that expression. Otherwise, if the expression is a Sum then

we take, it's both two operands, the left operand and the right operand and we

evaluate both of them using eval. And finally, you have to guard against the

case that you might have another expression that is neither a Number or a

Sum. Maybe not now, but maybe in the future,

somebody will add such a subclass of expression.

So it's prudent to have a third class which says well, if it's neither a Number

nor a Sum, Then we throw an error which says I found

an unknown expression and here it is. .

Okay. So far so good.

But there's a problem with that and that is that writing all these classification

and access of functions quickly becomes tedious.

We've already written fifteen method definitions only to do something as simple

as expressions consisting of Sums and Numbers.

And things get worse if we add other forms of expressions.

So let's demonstrate that. Let's say we want to add to our expression

trees two new classes. One that represents the product of two

expressions, e1 and e2, So e1 times e2 would be a class product

given arguments e1 and e2. and the other new class would be Var that represents

variables. So variables would take a string that

represents the name. So if you want it to continue with our

scheme of classification and access methods, then we'd need to add new methods

to, of course, those two new classes but also to all the classes defined above.

6:25

So, a question to you. If we wanted to do that for Product and

Var, how many new method definitions would we need?

Here I want you to count the method definitions in Product and Var themselves.

But also any new Accessor and classification methods that we have to

retrofit to the existing classes, to Expression and Sum and Num.

Possible answers would be nine, or ten, nineteen, or maybe 25,

Or even 35, or 40. So to answer that question, let's look

back at our existing solution. What we want to do now is add two new

subclasses of Expression, one for Product, the other for Vars.

And Var would have probably a name and Products would have essentially also

leftOp and rightOp just like Sums. So what do we need as new methods.

Well, we would need new classification method that tells us whether a given

expression is a Product or a Var. So there would be some new methods called

that isVar and isProdu that we would have to add to the trait expression and also to

its subclasses. Now, what about the access methods?

Here we would need, in addition to the Num value, and leftOp and rightOp, also an

operation that gives us the name of a variable.

7:57

Would we need access and methods for leftOp and rightOp and product?

Well, you might think so, but in fact, it's probably better to just re-use the

Accessor methods of Sum, to just say both products and sum.

They're binary operations so they share the accesses for the left and right and

side operance. So that means no new access or methods for

products, just a classification method. And we're left with three methods in,

trait Expr. So that means trait Expr now has eight

methods and all subclasses of trait Expr, also needs to implement these methods.

So that would mean eight methods for class number and eight methods for class Sum.

8:42

And then, of course, we have the two new classes, Product and Var, which also have

to implement all of the eight methods now, isNumber, isSum, isVar, isProd, numValue,

leftOp, rightOp, and name. So that means, overall, we have five classes,

Each of them implementing eight methods for a total of 40 methods.

Our previous solution had three classes, each of them implementing five methods for

a total of fifteen methods. So the answer to the question was, we have

25 new methods. So generally, if we continue with that

game to extend the hierachy with new classes we find out that the number of

methods we need tends to grow quadratically, and that's, of course, very

bad news, Because it means that our, program size,

and ability to understand programs, will quickly be exhausted, as we add new

classes to the hierarchy. So the question is what to do about it.

Can we find another solution that avoids the quadratic increase of methods?

Well, here's one which I would actually call a non solution.

Most languages have some form of type testing and type casting, and Scala let's

you do that as well. So it has a method'is instance of' that

can check whether an object's type conforms to T and it has a method called

isInstanceOf that check whether an object as type conforms type T, so that would be

a type cast. If the cast fails at one time because the

object is not at run time, then it will throw a class cast exception.

So x is instance of T is really the same thing as x instance of T in Java.

And it's asInstanceOf is the same thing as a type cast, which is written like this in

Java. .

The Scala form of these type tests and type casts is as methods on the type class

any. And the methods would take the type that

they want to test or cast against as a type parameter.

If you compare the Scala solution and the Java solution then you see that Scala

solution is quite a bit longer, and you could argue less legible than the Java

solution, and that's completely intentional,

Because, actually it turns out that typetest and typecasts are very low level

and unsafe and that, in fact, Scala has much better solutions.

So, the use of type test and type cast, [INAUDIBLE in Scala because of sometimes

useful for interoperability with Java, but the use is discouraged.

11:35

So using type tests and type class, we could do something like that for eval..

We could say, okay, we have a class expression. and now we n-, don't need to

put anything special inside expression. And you would simply ask: is the

expression an instance of my number class? Then we cast it to number in that case and

we pull out the num value from the number. On the other hand, if the expression is an

instance of sum then we take the left operand after casting it to sum, take the

right operand after casting it to sum and evaluate both operands, finally sum the

results. And the third part would be as before, if

it's neither, then we throw an error. So if we look at the solution, then the

good part of it is that we don't need any classification methods.

These instance of tests, fulfill that role now,

And we need access methods only for classes where the value is defined.

So that means that our base,, trait ede could actually be empty.

Whereas number needs a num value. Because and some.

Needs a left up and a right up. But number doesn't need a leftOp and a

rightOp,, because we will call leftOp and rightOp only after casting to Sum.

So, we need many fewer methods, which is good.

On the other hand, use of typetest and typecast is very low level, it is unsafe

because, when you do a type-cast you do not know necessarily at runtime whether

type-cast will succeed. It might also throw a classcast exception.

In this example, here we have actually guarded every type-cast by a type-test, so

we can assure statistically that none of these casts will fail, but in general

that's not assured. So that's why, we recommend the people to

stay away from type-casts. So here's a solution that looks much

better. If, after all, all we want to do is

evaluate expressions, then we could have a direct object oriented solution for that.

Instead of making eval a method which exists outside our hierarchy, we just

write it as a method of expression itself. So now, what we need to do is that, in the

base trait, we would have an expression eval which returns int, and is abstract,

as usual. Then, for a number, the evaluation of a

number would just give the number that's, well, that, well, passed into the class.

That is the parameter. And to evaluate a Sum, we simply evaluate

its left operand, evaluate its right operand, and form the sum of those two

values. Great.

So this is clearly much preferable to what we did so far,

But what happens if we'd like to display expressions now.

So lets assume we want to have here another method, call it show, that returns

a string. At ths place the expression.

14:49

Then of course we'd need to have an implementation of show in Number and

another one in Sum. That's all very well, but there's one

disadvantage here that we needed to touch all the classes in the hierarchy and add a

method in each one of them. So we have to touch many pieces of code,

which in a real system could be, for instance, in different source files.

So any such change that adds a new message to our hierarchy is rather pervasive.

But there's a more fundamental limitation of this object-oriented decomposition

scheme that we've seen. To see that, let's assume that we don't

want to evaluate an expression nor do we want to show it, but we want to simplify

it, let's say using this rule here. So we want to replace sums of products

with the same left operand with a product of sums using the usual rule of

distribution. How would we go about doing that?

Well, the problem here is we can't really have a simplify method in either product

or sum. Because the simplification involves a

whole subtree, not a single node. So it can't be encapsulated as a method of

a single object without also looking at other objects.

Like we could may be put simplify in the sum operation,

But then it would have to look at its two operands and verify that, indeed, they are

both products and that the left operand of each product is the same tree.

Well, Doing that, we are actually back at

classification and access methods, so we're back to square one.

We need tests and access methods for all the different subclasses.

So that shows that object oriented decomposition is good for some things like

implementing the eval function, but it's can't do other things such as a non local

simplification, and it might not be the best solution if you have many new methods

that you want to introduce because you have to touch all subclasses every time

you introduce a new method. In the next session, we're going to see

another technique that will address these problems.