[SOUND] In this segment I want to talk about how its a really valuable thing
that an ML we don't have mutation for most forms of data the ones you create a
topple or a list or a variable there is no way to change the contents of that
data. So this is going to feel like a little
bit of a strange segment, but it's actually a big idea in the course,
because I'm not going to teach you anything new.
I've already shown you all the features you need for homework one.
Instead, we're going to focus on a non-feature; something that ML doesn't
have. So how can the lack of something in a
programming language be important for that language?
Naively, you might think, well you should always give programmers more and more
stuff they can use, and then they can decide whether to use it or not.
But when you have a lack of a feature in your language, that when you're writing
code, you know that no one using your code will use that thing.
Because it can't. Right.
And that will make it easier for you to write your code correctly and to
understand the results that your code produces.
So, in fact, one of the major things that makes functional programming functional
programming, is that when you create some data, like a pair or a list, there is no
way to subsequently change the contents of that data.
You have to make a new piece of data that has some different values in it, alright.
So let me show you why this can be a valuable thing.
Let's start with a somewhat simple example.
I'm showing you A function that does this before.
This is a sort pair function. So it takes an int star int, and it
returns an int star int. If you call it with three comma four, you
get back three comma four, but if you call it with four comma three, you get
back three comma four, because it always sorts the two things in the pair.
So here are two versions of the function. And in the first version, when the first
component of the pair is less than the second component of the pair, we just
return the pair, because that's already a correct answer.
Whereas in the second version of the code, we make essentially a copy of the
pair. We return a new pair that has the first
of pair in the first part and the second part of pair in the second part.
So suppose that you had, say the second version, and you wanted to maintain it,
edit your code, evolve your code to be the first version instead because that
seems simpler or more efficient or whatever.
Is it possible that some client, some user of your function would break because
of that change you made? And in ML the answer is no, the two
versions of these functions cannot be distinguished by any code in the language
that uses those functions. You can argue the first is better styled
but you can't argue that they make any difference to users of the code.
But if your language allows you to mutate, to update the contents of pairs,
this isn't the case. Suppose that we bound to x the pair 3
comma 4, and then we bound to y the result of sorting that pair.
Now, there's two possibilities. We, if we don't know how sort pair is
implemented. Maybe, as you see in this picture over on
the right. Y refers to that same pair that was
passed to x. So x and y are now what are usually
called aliases in most programming languages.
Or the second possibility is that x and y are not aliases.
That y points to some different pair, 3, 4.
In ML, it doesn't matter. But if there were some way in ML to
mutate, say hash one of x to change it so instead of holding three, it holds five.
Now we have a very difficult question. Does that change from three to five
affect what y refers to, or doesn't it? It depends now whether x and y are
aliases. If you don't have mutation then you can't
tell if things are aliases or copies. This makes it easier to implement sort
pair. It makes it easier to use sort pair and
reason about the results. And it even, can even make it easier to
implement languages like ML efficiently. So let me show you a more interesting
example where I'm going to go from pairs now, to lists.
And let's use one of my favorite functions in the whole course, list
append. This was this elegant recursive function,
that takes two lists, x's and y's, and returns a new list, that is all the
contents of x's, appended to the contents of y's.
Alright, and now we might ask ourselves, if I have the list 2 comma 4, and the
list five through zero and I append them. What if any a listing do I have.
Do I have a situation like here in this top picture where x holds the list 2, 4,
y holds the list 5 3, 0, and z does hold the list 2, 4, 5, 3, 0, but part of that
list alias is y, or does this append function in fact make an entirely new
list for z? Well once again, clients can't tell so
you can implement a pendent in either way and it will behave the same way in your
program even though this bottom version takes up a little more space it turns out
the code up here implements the top picture.
Because when x's is empty, when just return y's.
We don't copy y's, we just return what would become an alias to y's.
So here we're actually saving space. And the languages where you can update
the elements in the list this is usually a really bad idea.
Because if someone comes along and does some notation on this list z, it can end
up affecting the list y. Which may very well not be what you want.
So once again. The lack of notation is what's helping us
not have to worry about all this. All right.
So, in M L, it turns out that we create aliases all of the time, and we don't
even think about it. And that's okay, because you can never
tell if two things are aliases, or two copies of identical values.
Are they pointing to the same pair three comma four, or two copies of three comma
four? And in fact, the tail operation for lists
is probably a great example. It's a very fast operation because it
just returns an alias to the tail of the list that was passed as it's argument.
Ml programs would be much less efficient if the tail function made a copy of the
entire list minus the first element. So when you're writing functional
programs, you don't worry about these aliases, you just focus on your algorithm
because you know there's no mutation. In languages that have mutable data,
which is pretty much every nonfunctional language, for example Java, programmers
are absolutely obsessed with the identity of objects.
Am I making a copy here? Am I creating an alias?
Are these two things reference equal or do they just answer true to the equals
method. And I'm not picking on them for being
obsessed. In fact, I think they have to be
obsessed, because any time you have aliases, the assignment statement affects
all of them. And if you don't have aliases it affects
only one of them and you have to understand this to understand how your
program behaves and whether your code is correct.
So in the next segment I'm actually going to show a rather tricky example of this
in Java. Since we don't require Java for the
course it will be totally optional but it will show you just how hard it is to
reason about aliases and assignment statements.
And in ML the way we avoid having to do that is we just get rid of the assignment
statements.