All right. Well, welcome to the very first lab section and let's just jump right into it. If you don't have this up and running, that means you don't have JupyterLab or if you haven't been able to get this far, I would say go take a look at that video that I've recorded about installing JupyterLab or just go to anaconda.com and follow the instructions there, make sure that you can at least get this far. Now, once you've done this, let's open up a new notebook. So you just double-click here, you should be at this point and you should be able to do this 2 plus 2 test. Now, the second thing I want to tell you before I get too deep into this is, at this point I'm assuming that you know your way a little bit around Python, that you've worked with pandas. If you haven't, I'll try and explain what's going on as I'm doing these labs, but at some point if you find yourself getting completely lost, you might be better off going and watching a series of videos that I've recorded and made available to you, that I've titled as crash course, so it'll be a quick and dirty way for you to get up and running with pandas and NumPy and lists and the basics of Python. So if you find yourself losing your way here or if I'm going too fast, that is a good place for you to go. Okay. So let's get into the meat of this stuff. Remember that what we're going to try and do here is we're going to be just working with some very basic concepts of returns and how to compute a return from prices and how to go from a multi-period return to a compounded return and then finally, how to analyze return. So that's what we did in class, we should be able to cover all of that today. Let me just remind you, this is basically what we're trying to do. We're trying to figure out how to compute a return from a sequence of prices. So let's start with something, let's just type in a bunch of prices. So let me type in some prices. Let's say this is just a sequence of three prices over three days. So what is the return over the period of the first day to the second day? Well, it's very simple, I'll use this format, the P_t plus 1 divided by P_t. So let's say it's simply 891 divided by 870 minus 1. So that's about a 2.4 percent return and that makes sense because the price went up from 871 to 891. Okay, what happened on the next day? Well, it was 871, so it went from 891 down to 871. So what we want to do is 871 divided by 891 minus 1. That's about a two percent drop. That makes sense. Now, the question is, we've got a sequence of prices that's given here as a Python list and since prices a is a list, therefore it's a sequence, we should be able to use the sequence to generate a sequence of returns, right? So for example, again, if this stuff looks unfamiliar to you or it looks weird, you can go ahead and watch the crash course video. But let's take a look at all the prices except the first price. In other words, from the second day onwards. Well, if you remember how to do slicing in Python, this is the element at Index 0, this is the element at Index 1, and this is the element at Index 2. So you want everything from the first element onwards, from the element at Index 1 onwards. So that's that 891, and then you want to divide that by the sequence of prices for everything except the last one. So for example, you could do prices a colon minus 1, right? So let me just explain this, if you haven't seen this before, this colon there tells you that you're trying to get a slice of the list. It tells you on the left-hand side it says from the one at Index 1 all the way until the end, since I've gotten nothing there. This here says, all the way from the beginning and minus 1 means everything except the last one, so count backwards from here. So this is the element at Index 0, this is the element at Index 1, and this is the element at Index 2, and this is the last element. So what this is saying is, get me all the prices except the last element. Good, so so far so good, nothing surprising here. Now, all we have to do is divide 891 by 870 and 871 by 891 and subtract 1 and you've got that. So what you might think is you should be able to just do this. In virtually any language where these kinds of things are treated, where lists are treated as vectors, this would work. So if you did this sort of equivalent R syntax in R or if you did the equivalent MATLAB syntax in MATLAB, this will work, however, in Python this is not going to work. So why didn't it work? Because it's telling you here that it's an unsupported operand type for slash. In other words, it says you're trying to do a slash, a division, between list and list. Well, of course, that is exactly what we're doing, we're trying to divide the values in this list by the values in this list and in Python, you cannot do that. The reason you cannot do that is because lists in Pythons are not vectors. So how could you do this? Let me just show you quickly how to do it the slightly cumbersome way, and then I'll show you the way we're going to be doing it most of the time, which is using pandas. But before that, let me just show you how to do it in NumPy and all pandas is, is a very nice wrapper around NumPy at some level. Therefore, it's nice to know that you can do all this in NumPy. So the first thing you have to do to be able to use NumPy is to import that module. So you say import NumPy as np. So what's going on here is, if you haven't seen something like this before, is NumPy is a module that you have available to you and all the code in that module is now available to you through this alias called np. So now, you can do things like this. So I can say, prices is, I'm going to create a NumPy array and I'm going to give it the same list that I had before. You see, it's the same list, it's the same prices, but instead of just assigning it directly as a list, I'm giving that to NumPy and I'm asking NumPy to give me back an array and that's what I'm assigning to prices. If you hit that, Shift Enter, you'll see that it's printing out the numbers but it's telling you that it's an array of numbers and this is a NumPy array. Because it's a NumPy array, I can do things that I could not do here. So the same exact command, here, in fact, let me just copy it from there. I'm going to copy that and I'm going to paste that and now, boom, it worked. So it gave you that two percent and a minus two percent, which is exactly the numbers we had before, right? Where did we get that? 2.4 percent and a minus 2.2 percent, 2.4 percent minus 2.2 percent. Now, why did this work? It worked here because at this point, prices a is not a list, it's being converted from a list to a NumPy array and you can do vector arithmetic like this on a NumPy array. But as it turns out, NumPy itself is a little tedious, so we're going to use this absolutely fantastic wrap around NumPy, we're going to spend a lot of time with this and that's called pandas. So what am I going to do? I'm going to first import pandas as pd, right? This will give me access to all the pandas code and then I'm going to create this thing called prices, a new variable called prices, and I'm going to create it not as an NumPy array as we had before but as a data structure called a DataFrame. Now, let's talk about what DataFrame is. A DataFrame is essentially a data structure that you can think of as sort of rows and columns, it's a matrix for all practical purposes, but it's not just a big blob of numbers, it is organized as rows and columns and so you can index each row and each column. So let's create a DataFrame that has two columns. Let's call it Stock A, Stock B. Actually, let's not call it Stock A, Stock B. For reasons that will become clear soon, I'm going to call it Blue, and I'm going to call the second one Orange. Now, what I just created here, I put those little curly brackets. What that's doing is creating a dictionary in Python. Python dictionary is nothing more than a sequence of key value pairs. So that's the key and I'm going to have to give it a value and then I'll put a little separator here, that's a comma. Then, I'll do another one here. What this pd.DataFrame is expecting is a dictionary with key value pairs where each key is the name of the column and the value are the actual values for that column. I'm going to give you some numbers here and I'm going to give you some stock prices here. Let's do the same prices that we had before for this guy. I'm going to copy those prices and in fact let's give it a few more. Let's go 8.43 and 8.73. Now, for the orange stock, let's start at $10.66 and let's give it an update, 11.08 and then $10.71, $11.59, and let's end on a happy note. You understand what's going on? This is a sequence, it's a list. This is a sequence, this is a list. These are two columns and this column is the blue column, I'm going to give it the name blue. This column is the orange column, I'm going to give it the name orange and I'm separating these with a comma, well that looks a little odd. So let me do that. That looks a little better. Now, let me do "Shift" "Enter". Now, I've got prices and I can look at the prices and you will see it's printed out nice and neatly, it's indexed. You'll see that the columns have these column names and column indices, and it turns out even the rows have these row indices. It just invented them because I didn't give it any specific row index, so it just invented them. Just like we did before, we should be able to divide everything except the first row by everything except the last row. If you look at this, that's essentially what it's doing. Prices, everything except the first price, and here, everything except the last price. Now, the way you do this in a Pandas DataFrame is by using this syntax called iloc. Think of it as index location. So what does iloc do? If you want all of the prices except the first one, what you do is one colon. So it's exactly the list like syntax that you've seen. It says, starting from the one at index one all the way to the end. Remember, the first one is at index zero, that will give you everything from here. This is what you want. You want to divide everything from here down by everything from here down except the last one. This is what you want. You want prices iloc, that's the first one. That's good. Then, what do you want to divide it by? You want to divide it by prices.iloc and you want all the way minus one. Now again, I'm sorry but I'm going to have to do something here that doesn't work first so that I can explain that there's a complication. Let's do what we think should work. What we should do is just take those numbers and divided by these numbers. Makes sense? Yeah. Let's try it. That actually gave you an answer but it gave you a very bizarre answer. The reason it gave you this bizarre answer is precisely because of the index that we got here for free whether we wanted it or not. What it's doing is it's aligning these rows again and it's being a little too clever in a sense because it's saying, I know that the first row of this is row one and I know that the first row of this row zero, and I'm going to line up the two rows ones, and I'm going to divide them. It's dividing 891 in this row one with 8.91 in this row one because they're exactly the same thing and it's giving you one. This is basically how alignment works. If this stuff that I just said is not clear to you, then please go take a look at the alignment section of my Pandas crash course. What are we going to do? Well, I'm going to give you two or three ways to do this. Let's do a simple but crude one to begin with. Let me just "Copy" that back here and let's go here and let's "Copy" that same thing here. The one thing I can do is instead of treating it as a DataFrame, there's a method called values. What values does is it takes a DataFrame and it just pulls the values out and gives you back that NumPy array that we had before. It's actually a NumPy matrix but it's just pure NumPy without this index stuff in there. If you're dividing just basic NumPy matrix without any row index information, then it doesn't have anything to align by and so it'll do just pure positional division. You can see that that is exactly what you got and in fact what we wanted to do is that minus one, and you'll see that you got the same numbers here. If you remember, the blue series is the same thing that we had before. So it's 2.4 percent on the first day and loss of 2.2 percent on the second day which is exactly what we had before, and we have the same thing here. That's one way of doing it. Of course, we could do it another way here. We could, let me "Copy" that again and you can do values here as long as one of these things doesn't have an index, there will be no index to align since the only way you can align two things is if both of them have indices. There you go, that works as well. I'm going to give you two slightly better ways of doing this. The first is all we're trying to do, let's look at prices again. Prices is just that. What we're trying to do is we're trying to divide one of these by the other one by just shifting the rows down and there's actually a method in DataFrame that allows you to shift it right away. For example, you could just say prices divided by prices shifted by one. Prices shifted by one, so that's prices up there and if you look at prices shifted by one, it's exactly the same thing except this slot now has NAs because there's nothing before it, and this has been shifted down and the 8.91 has been shifted down and so on and so forth. But now you see I can actually divide it because prices divided by prices dot shift1 have been properly lined up already for you and we should have done that. You get this, minus 0.02 and there is no return on the first day because you don't have prices for the day before that, which makes perfect sense. In general, whenever you have n prices, you can only compute n minus 1 returns. Now, there's one last way I want to show you how to do this. If you think about what prices, let's go back to prices. Prices is just a sequence of prices. Well, what are we trying to do? How are we trying to compute the return? Well, it's nothing more than the percentage change from one row to another. Fortunately, DataFrame has a method called percentage change that does exactly that. So if you look at prices.pct_change, you'll see you'll get exactly the same numbers, 2.4 percent, and negative two percent, and negative three percent, so on and so forth. There are several ways you could do it, here's one way you cannot do it. The reason you cannot do it is because the alignment gets in the way. You can work around the alignment by getting rid of the index, it's the alignment of the row index after all. You can just basically eliminate the row index by extracting just the values which does not have a row index because the row index is a pure Pandas thing, and you've applied values to it and values gives you back just a NumPy object. So that's why that works. You could do that here too, so that was the second way in which you could do this. Then I showed you a third way which is just by using the Shift method. We could just do prices divided by prices dot shift1 and you've got the same answer. Finally, I showed you the easiest way to do it which is just using the percentage change method in the prices object. If you have prices and you just want to compute the returns, the simplest way is prices.pct_change and we went through all this stuff because I really wanted you to understand what percentage change is doing and why the obvious ways might not work, and I think it's useful for you to see why these things don't work. Now let's try this on some other data but obviously we don't want to sit and type in the numbers we did before. I've created a little sample file with some sample prices and let's pull that in and you can see here. In fact, let me show you what that file looks like. If you go into the data folder and you look at this sample, that's what you've got here. So it's basically nothing more than a CSV file which has the same numbers in sequence. Let's go back here. That's your sequence right there. These are the sequence of numbers. What is the first thing we want do? How would we change this to prices? I'm going to say, returns we could use it any way we wanted but I'm going to do the simplest, which is usually the fastest, which is prices.pct_change. If I look at my returns now, there you go. Okay, good. Let's have some fun. Let's do prices.plot. Prices.plot and sometimes you'll see this happen and there's a couple of things you can do about this. Sometimes, if you're using a modern enough version like I am, you really shouldn't just see this. So sometimes I've noticed that you can just do that again and then you get the plot. If you're using an older version, you have to type in this weird thing that says %matplotlib inline. We'll get to this later we'll understand what a percent is and we'll have plenty of time to talk about these. But for now just treat this as some magic that you have to type in. Once you do that then these plots should work exactly as you expect. So if you go prices.plot. So you can see I can look at these plots prices. Let's just for fun also look at the returns. So returns.plot, but instead of just calling the standard plot, let me look at a bar plot of it. You can see here what's going on with these returns. So this is the same data that I showed you in class. The blue lines are obviously less volatile than the orange lines are. We should be able to see that in the data. So let's do returns.standard deviation. So the.std method of the return. So let's just remember. So let's say you want to look at the first few rows of returns. You can call the head method for that. You can see that the first few there. But I can compute the standard deviation of the series by doing returns.std. Look at that, 2.3 percent for blue and 7.9 percent for orange. You can see that's pretty obvious here. Now, what's interesting of course if you look at the returns themselves and you just compute the mean of them, this will give you the mean of each column. So you can see what's going on here. Returns is a two column DataFrame and when you call the std method, it's computing the standard deviation of each column. That's what's going on here. That's the standard. So this, as you might expect, is going to compute the average of each column. You can see here the average of these columns are exactly the same. So now let's compound these to the series of returns. Well, how do you compound the series of returns? All you do is you take all of those returns, you convert them. So these are the returns. Right? Then you add one. So this is a vector addition. So you can see, let's do this carefully, so that's returns. So those are numbers that are in that range. Now, I'm going to do returns plus one. Now, you see it's going to add a one element-wise to every single one of these. Now, what do I need to do? I need to multiply each of these in sequence, each column. So the way I do that is there's a couple different ways I can do it. The first way is I can call NumPy. NumPy has a function in it called prod, and I can call prod on returns plus one. That will take returns, add one to it, and then it'll multiply each column and give you the result of multiplying each column. You'll see the blue column is 1.12 and the orange column is 1.08. So now, let's go back and remember, we want to do the prod of those minus one. You can see that the blue gives you a compounded return of about 12.3 percent and the orange gives you a compounded return of about eight percent. Which, again, is very interesting considering the fact that the mean of both of these was exactly the same. So there's another way you could do this. You can do returns plus one. Remember, returns plus one is itself a DataFrame. So what you can do is call the.prod method on that DataFrame and you can take one away from that resulting DataFrame. You'll get exactly the same answer. So compounding is nothing more than multiplying all of these returns, but you have to remember that this is returns in this sort of returns plus one format as I showed you. The answer itself is in this returns plus one format, so you have to subtract one to get the actual return that you might be interested in. If you want to see this, by the way, if this 0.123 is bothering you and you want to see something nice like this, then you certainly can do that, and you can multiply it by 100 and that'll give you 12.3 percent. If you want to get even fancier, you can do that. Let me do this and.round let say two. So there are many ways by which you can format these more prettily. But I just want you to understand what's going on here in terms of the multiplication of a DataFrame by a scalar. When you multiply a DataFrame by a scaler, all it's doing is doing element-wise multiplication. Then that result is itself a DataFrame. So you can continue to call methods on it like round, and all round does is go in and return a DataFrame which all of its elements inside the DataFrame had been rounded off to two decimal places. Good. Well, let's end by just doing a little math to show you what annualization looks like. So here, let me remind you of what the idea was for annualization. So let's take a simple example. Let's say that you have monthly return, rm, is 0.01 percent. Now, how do you convert that to an annualized return? Well, it's very simple. Remember, one plus rm, so now you've converted that rm to one plus r format. Now, you need to raise that to the 12th power. In other words, you need to multiply this by itself. So let me do this. So this is the second power, third power, fourth power. So this is the return at four months. This will be the return after five months and this will be the return after six months. Now, obviously this is tedious to do this for 12. So instead, what you can do is you can just raise that to the 12th power. The way you raise something in Python is star star and you do that. So that will give you 1.126, that itself is in 1 plus R format. So if you want to get just the regular return, you have to remember to minus, take one away from that, and that's 12.68 percent. So that's what we did in class. Remember, I told you that if you have a one percent return per month and you got one percent every month for 12 months, at the end of the year, you would not have just 12 percent. You would have 12.68 percent. Let's try this also with quarterly returns. So let's say your quarterly returns was 0.04 percent. So what would you get? One plus rq to the power four minus 1. So that works to about 16.98 percent return. Let's try one last one. Let say you have a daily return of 0.0001, a really small daily return. Well, what would you get? So you get 1 plus your daily return, whatever it is, to the power 252, because there's 252 approximately trading days in a typical year and you do that. That ends up being a 2.5 percent return. So that's annualization. I think that should give you a good feel for just typing in numbers into cells playing around with this stuff. In the next class, we'll do a little bit more of this and we'll also start building our own library so that we can start building a toolkit of useful functionality. I will see you then. Thank you very much.