0:00

In this lecture, I just want to get everyone on board with writing

functions, because functions play a critical

role in any R programming and you

tend to write a lot of them when you're writing doing a

lot of data analysis or doing a lot of kind of statistical analysis.

And so I just want to make sure that everyone can kind of get started writing

functions and and particularly for those who

are less familiar with programming languages in general.

So this is just going to be about writing your first function.

It's kind of like the hello world so to speak of R.

So the first thing you're going to want to do is you

going to want to write the function in a text file, all right.

It's possible to write functions on the command

line in R, but it usually no preferrable.

So usually you're going to want to put your functions, in a separate file,

separate from any interactive stuff that you're doing in the command line.

In the future you'll want to put your functions in

an R package, which is a kind of a more

structured type of kind of, kind of environment with

documentation and everything, but we won't talk about that now.

Right now the first thing you're going to want to

do is put your functions in a text file.

Okay, so the first thing we're going to want to do is open up our studio.

So lets do that.

1:11

And so you can see here in R Studio there's some there's

some stuff going on here from a previous project that I'm working on.

So you, that may happen to you, and generally you

can either close it or you can just ignore it.

I wanted to create a new R script here, so

let's create a clean script here to put our code into.

1:30

So the first function I'm going to write is really simple

it's just going to take two numbers and add them together.

So this function obviously doesn't have a real point to it but it shows you how

to use the function syntax, how to specify the arguments and how to return the value.

So the function that adds two values I'm just going to call it add two.

1:47

And and so you get it you use the function directive to start it off.

Now it's going to take it's going to add two values so it has to take

two arguments so I'm just going call the two arguments x and y and then

I'm going to take the two arguments and add them together with the plus operator

alright x plus y and then I close out the function with the curly brace.

So you can see that I didn't have to

do anything special to return the value that that's the

sum of the two elements because the or any R

function, the, the function returns whatever the last expression was.

So here there's only really one expression.

So therefore its the last expression and, and it equals the sum of x and y.

So here I can, I can highlight this guy and run it

in the console, and you can see now I've got my function here.

I can say add two, and lets give it say three and five and hopefully I get eight.

Yes that's a good sign.

So it adds the two numbers together, and that's that.

So it's a very simple function, and and,

you've now written your first function in R.

S the next function that I want to

talk about is a little slightly more complicated.

It's going to take a vector of numbers, it's going to, it's going to return

the subset of the vector, that's, that's above the vector value of ten.

So any number that's bigger than ten, it's going to return those numbers for you.

3:09

Just because it gives you any number that's above ten.

snd, it's going to take a vector here, we'll call it x,

you don't have to call it x, I'm just calling it that.

And I like to open and close the curly braces right away, just

so you know where the beginning and the end of the function is.

3:25

If you happen to have a lot of code in, you know, in, in a single file.

So the first thing I'm going to want to do is I want to construct a logical

statement that figures out which elements of

this vector x are, are greater than ten.

All right?

So I'm going to assign an object.

I'll call it use because these are, these are the numbers that I'm going to use.

And I'll say x greater than ten.

All right?

So this'll return a logical vector, of trues and falses

to indicating which element of x is greater than ten.

3:51

And then I'm going to subset the vector x with this logical vector.

So now this function returns, the subset of the vector x that is bigger than ten.

Of course if there are no elements of x that are

bigger than ten, that it will return an empty numeric vector.

4:07

Now of course, there's really nothing special about the number ten.

I just kind of made that up, and so you may want to created a

function that allows people to sub, to kind of extract the elements of a vector.

That are above an arbitrary other number, right?

And so, so it could be ten, it could

be 12, it could be five, it could be anything.

So maybe you'll want to allow the user to specify that number.

So I'll just call, I'll create a new function here.

Call above.

So it doesn't have the ten encoded in it.

I'll use the function directive, and I'll have a

second arbitrary called n, which can be any number really.

[SOUND] So let's start it off, we'll get the curly braces in there,

and now I'll create a logical statement that x is greater than n.

Right?

And then I'll subset the vector x based on that logical statement.

So now if I can source this into R.

Oops, and I can run my function here.

So I'll just create a vector.

Let's say x is one through 20.

5:06

Oh I, you see, so I didn't specify the number n, so it's not going to know

what to cut it off at, so I need to specify the threshold, so let's do ah,12.

And you can see it returned all the numbers that are greater than 12.

So that's kind of as we expected, and so the function appears to be working well.

Now let's suppose that maybe there is something

special about the number ten, and maybe it's

something that people are going to be kind of be

doing very often and it's a very common number.

So you might you want to specify a default

argument so you might want to the default to

be ten, so remember when I ran the

function before and I didn't specify the number n.

It gave me an error or maybe you don't want

people to have to encounter that error, and so you'll specify

a default value n equals ten so people don't specify

the cutoff value n, it will just automatically default to ten.

So now I can run this in R and now if I

do above, which is x, you see I don't get the error anymore.

It automatically gives you all the numbers that are bigger than ten.

So it's kind of nice in R when you're writing functions

to be able to specify default values like this that make the

life of the user just a little bit easier, specially for very

common cases, where it's not important that the user specify an argument.

6:13

So those are some very simple functions, in R that can be used

to kind of process data or make do simple calculations, like adding two numbers.

The next function I want to talk about is, is just going to take

a matrix or a dataframe and calculate the mean of each column.

Right, so this is slightly more complicated

you, you have to take your argument and

then you have to loop through each column to calculate the mean of each one, right.

So this is going to involve using a for-loop

and, and so we'll talk about it here.

So let's call this function column mean, because that's what it does,

6:53

And so y is going to be a data frame or a matrix, and we're going to go

through the columns of this data frame or

matrix and calculate the mean of each column.

So the first thing I need to figure out is how

many columns does this thing have, and that can be easily done.

I'll call it n c for number of columns and we can use the n call function for that.

7:10

That will calculate the number of columns, and, and then I need to

initialize a vector that's going to that's going to store the means for each column.

The length of this vector has to equal the number of columns, right.

So I'll just call it means, and it'll be a numeric vector

equal to the length of the number, equal to the number of columns.

So this is just an empty vector.

It doesn't, it's going to have, it's going to

be initialized to, to be all zeros.

But we're going to fill it as we go through the column.

So now we want to for-loop through the columns.

And I'll say i is in and then I'll say one through nc.

So this creates a, an integer vector starts

a one and ends at the number of columns,

and then I'm going to for-loop through and for each

I, I'm just going to assign to my means vector.

The mean of x bracket I, right.

Oh sorry that's called y here now.

8:01

And that's it, and then so for I, I haven't returned

anything yet, so right now this function doesn't do anything particularly useful.

But what I want to do is return the vector

of means and so I'm just going to return that.

And that's, since that's the last expression

in the function that what will get returned.

8:34

Okay, so I, there are six, I think there are

six columns in this dataset, so it gave me six means.

Now you can see that the first two columns have NAs.

And that's because it, if the, if the vector has

an na in it, then you can't calculate the mean.

And so the one thing you might want to do, is, by default, is throw

out all of the missing values and

just calculate the mean amongst the observed values.

And so, you'll notice that a lot of functions have a feature where it's like,

where they, you can, you can choose whether you want to remove the nas or not.

But let me just add up an argument here, it's called [UNKNOWN] na.

And it will default to true, right.

And then I'll pass this argument to the mean function.

So the mean has an na.rm argument, and I'll pass at this value.

9:28

so now the default will be now I get my means

for those columns because the default was to remove the na's.

I could say false here, and then my na's will come back.

So I can always choose to kind of go back to the old behavior if I wanted to.

9:42

So the last thing you want to do any time you're writing a

function the most important thing of course is to save your file.

So right now this file is unsaved.

If you don't save it and R Studio crashes or something

happens you'll lose all your work and so you want to go

to the save I meant save as menu, and just save

your file as, you know, functions or whatever you want to call it.

10:09

So that should get you started, just writing

some simple functions in R, for your programming

assignment you'll have to write a few functions

that kind of go through and look at data.

But I just wanted to get you started writing your first

functions so that you know kind of how the directive, the function directive

works, how the arguments work, and you can play around a little

bit with with more complicated ideas as you work through the assignments.