In this lecture, we're going to explore the pandas series structure. By the end of this lecture, you should be familiar with how to store and manipulate single-dimensional index data in the series object. The series is one of the core data structures in pandas. You can think of it as a cross between a list and a dictionary. The items are all stored in an order and there's labels with which you can retrieve them. An easy way to visualize this is two columns of data. The first is the special index, a lot like keys in a dictionary. While the second is your actual data. It's important to note that the data column has the label of its own and can be retrieved using the dot name attribute. This is different than with dictionaries and is useful when it comes to merging multiple columns of data, and we'll talk about that later on in the course. Let's import pandas to get things started. So import pandas as pd. As you might expect you can create a series by passing in a list of values. When you do this, pandas automatically assigns an index starting with zero and sets the name of the series to none. Let's work on an example of this. One of the easiest ways to create a series is to use an array-like object like a list. So here I'll make a list of three students, Alice, Jack, and Molly all as strings. So students equals list Alice, Jack, and Molly. Now we just call the series function in pandas and pass in the students so pd.Series. So Series is a top level or a module function and we pass in students. The results is a Series object which is nicely rendered to the screen. We see here that the pandas has automatically identified the type of data in the series as object and set the dtype parameter as appropriate. We see that the values are indexed with integers starting with zero. Now we don't have to use strings. If we passed in a list of whole numbers for instance, we can see that pandas sets the type to int 64. Underneath, pandas storage series values in a typed array using the Numpy library. This offer significant speed-up when processing data versus traditional python lists. Let's create a little list of numbers. So here numbers equals, and I'll just throw in three integers one, two, and three, and now let's turn that into a series. So again pd.Series module level function passing numbers. We see that on my architecture, the result is a dtype of int 64 objects. There's some other typing details that exists for performance that are important to know. The most important is how Numpy and thus pandas handle missing data. In Python, we have the none type to indicate a lack of data. But what do we do if we want to have a typed list like this in the series object? Underneath, pandas does some type conversion for us. If we create a list of strings and we have one element, a None type, pandas inserts that as a None and uses the type object for the underlying array. Okay. Let's recreate our list of students, but let's leave out the last one and we'll just set it to None. So students equals Alice, Jack, and then we'll just have a None, and now let's convert that into a series to see what happens. So pd.Series students. So if we create a list of numbers, integers, or floats and put it in a None type, pandas automatically converts this into a special floating-point value designated as NaN which stands for Not a Number. So let's see an example of this. We will create a list with a None value in it. So a numbers equals one, two, and None, and we'll turn that into a series objects. So pd.Series numbers. You'll notice a couple of things. First, NaN is a different value. Second, pandas set the dtype of this series to a floating point numbers instead of an object or ends. That's maybe a bit of a surprise. Why not just leave this as an integer? Underneath, pandas represents NaN as a floating point number and because integers can be typecast to float, pandas went and converted our integers to floats automatically. So when you're wondering why the list of integers you put into a series is not floats, it's probably because there is some missing data. For those who might not have done scientific computing in Python before, it's important to stress that None and NaN might be used by the Data Scientist in the same way, to denote missing data. But that underneath, these are not represented by pandas in the same way. NaN is not equivalent to None and when we try the equality tests, the result is false. Let's bring in Numpy which allows us to generate an NaN value. So we'll import Numpy as np, and now let's compare it to None. So just np.nan and does this equal to None. It turns out that you actually can't even do an equality test of NaN to itself. When you do the answer's always false. So if we just do np.nan equals np.nan, we also get false. Instead you need to use special functions to test for the presence of not a number such as the Numpy library isnan. So we can use np.isnan this function and pass it np.nan, and we see that the result is true. So keep in mind that when you see NaN, it's meaning is similar to None but it's a numeric value and treated differently for efficiency reasons. Let's talk more about how pandas series can be created. While my list might be a common way to create some play data, often you have labeled data that you want to manipulate. A series can be created directly from dictionary data. If you do this, the index is automatically assigned to the keys of the dictionary that you provided and not just incrementing integers. Here's an example using some data of students and their classes. So I'll create a new dictionaries called student scores. I'll have the keys, names, so Alice is Physics, then I'll make Jack, Chemistry and Molly you are in English, and now I'm going to create some new series. So pd.Series and passing this course, I will assign it S and then just print out S. We see that since it was string data, pandas set the data type of the serious to object, and we see that the index the first column is a list of strings. Once this series has been created, we can get the index object using the index attributes. So if we just do S.index for instance. As you play more with pandas, you'll notice that a lot of things are implemented as Numpy arrays and that they have the dtype value set. This is true of indices and here pandas inferred that we're using objects for the index and that's cool. Now this is interesting because the dtype of object is not just for strings, but for arbitrary objects. So let's create a more complex type of data, say a list of tuples. So say, students as a list and we'll just say first name, last name. So Alice Brown as the first tuple, Jack White as the second tuple, and Molly Green as the third one, and then we'll do pd.Series with students. So we see that each of the tuples are stored in the series object and the type is of type object. You can also separate your index creation from the data by passing in an index as a list explicitly to the series. So here we'll create a new series s=pd.Series will pass in Physics, Chemistry, and English as three subjects that we're interested in, and then we'll use the index parameter as a list of Alice, Jack, and Molly and let's print out s. So what happens if your list of values in the index object are not aligned with the keys in your dictionary for creating the series? Well, pandas overrides the automatic creation to favor only and all of the index values that you provide it. So we'll ignore from your dictionary all keys which are not near index and pandas will add None or NaN type values for any index value you provide, which is not in your dictionary key list. Here's an example, a pass in a dictionary of three items in this case the students and their courses. So I'll create students scores. Again, Alice in physics, Jack in Chemistry, and I think we had Molly in English. When I create the series object though, I'll only ask for an index with three students and I'll exclude Jack. So let's do pd.Series, pass in all of the students score. So that's big dictionary, and now you can imagine that this is actually really big in that it came from some data file somewhere. But that our index we're only interested in a couple of students. So Alice, Molly, and we'll make up some new one, Sam. The result is that the series object doesn't have Jack in it even though he was in our original dataset, but explicitly does have Sam in it but as a missing value. In this lecture, we've explored the pandas series data structure. You've seen how to create a series from lists and dictionaries, how indices work on data work, and the way that pandas type casts data including missing values.