In this episode and the next, we are going to focus less on the processing, and more on the data. As you already know, in a programming language, data are represented by variables. And in C++, variables are typed. At this stage of the course, we have mainly introduced the int type, to represent integers, the double type, to represent real variables, and we also evoked the bool type to represent truth values. These types, int, double, bool, which are called the elementary types, are used to represent simple data of the world, like dimensions, sums, logical expressions. But how about more complex, more structured data ? Suppose we'd like to manage the ages of the students following this course. Or suppose we'd like to manage population statistics, how would these statistics be represented ? How would be represented a set, an array, of data (ages) ? Or in the case of statistics, how would a person involved in the statistics be represented ? In both cases, we'll use what is known as data structure. As another example, how would we represent the names of the persons involved in the statistics using the types we have seen until now ? We wouldn't know how to do it. This episode will look at arrays. We'll see later in other episodes, how to represent sequence of characters, and then how to represent structures, namely, how to represent a line in such an array. But let us begin with arrays. Let's suppose that we want to create a game program and that we have to manage the players' scores, we would like for example to display the score of every player, as well as the deviation of these scores from the mean. How would we proceed ? Let's modestly begin by handling the case of a two-player program. What we would do, of course, is to introduce a variable for the score of each of the two players. So, let's say that their scores are here integers, which will be initialized using a particular function here. This function would calculate and return the score. The score is computed like this for the first player. Similarly, computing the second player's score is done using the parameters associated with this second player. And then, as we want the deviation from the mean, we compute the mean of the two players by using the function <i>moyenne</i> (TN : <i>moyenne</i> means <i>mean</i>) that we have already illustrated in previous episodes. This function would take the scores of our two players as parameters. The mean is stored in a variable moyenne_joueurs of type integer. And finally, we could print the scoreboard, by printing the first score and the deviation from the mean, and then by printing the second score and its deviation from the mean. Two players is not bad, but the question is, how can we improve our program to accept more players ? We could simply use several variables. So for example, if we wanted to pass to five players, we would then typically use five scores : one for each player. We would then compute the score of each player using our function (with the players' specific parameters). And we would compute the mean of all these players by changing this time the mean function and passing the five scores to the new mean function, then we would write the scoreboard like this. Here, any good programmer would immediately thinks of using a loop, and will therefore modify this part of the program to introduce a loop here. The loop would go from the first player to the fifth one, it would iterate on each player, and print the score of every player, as well as its deviation from the mean. The question is how to do it concretely ? Because firstly, writing scorei like this, from the previous program, isn't valid. scorei would be here a variable name. It can't be used with different values of i. scorei cannot stand for score1 or score2. So this is something that doesn't work. Also, how would we do if we moved to 100, 1000 players ? It would quickly become tedious to write 100 variables, modify the mean function to take 100 parameters, so it's just impossible in practice. Finally, how would we do if the number of players is unknown at the beginning ? You wouldn't know if you had to deal with five, two, or ten players. The answer to fulfill all these needs is actually the same, it's introducing a new type, the array type. Let me give you by way of preamble the version with an array of the players' score program, but of course we'll detail all this later on in this episode. So, we would declare an array of players, then we would loop, we again encounter this notion of loop to compute the score of each player, then we would compute the mean here, this mean could take an array of scores, and finally we would print the final score with here a for loop which iterates on all the values of the array. Let's have a look at these different elements one by one. Let's begin with the notion of array. An array is a collection, a set of values which all have the same type. We speak of homogeneous collection, to say that they are all the data of the collection are of the same type. For example, the array of scores in which we were interested previously, if we take the case of four scores, it would be an array containing four integers. I decided that my scores are integers. So, all the elements of my array are here of the same type (int). When would we use an array in a program ? When we need to use several variables of the same type, like, for example, the scores of our previous game with players. In C++, we can create arrays of any type, arrays of integers, array of doubles, we can create array of any available type, arrays of bools, and even arrays of array, once we'll have defined this new array type. Generally speaking, there are four kinds of arrays, depending on two questions : at the moment of writing the program, before using the array, do we know its size or not ? And, once known, can this size vary, or is it fixed ? Let's examine the different cases in turn. Let's begin with the case where the size isn't known at the moment of writing the program, and where the size can vary when the program is executing. It's typically the example of the set of ages of the students following this course. At the beginning, I have no idea of how many students will attend this course. Then, at a given moment when I have the array of all the students'ages, I can have other students who register for the course, and therefore add ages, or unfortunately, I can have students who leave the course, and therefore have ages which disappear. So here we have the case where the size can vary, and it isn't known beforehand. Now let's illustrate the extreme opposite, where we suppose that the size is known beforehand, and that it can't change. Typically if you want to program a linear algebra application, if you have 2D vectors, you know that such vectors always have two coordinates, so they will always be arrays with two numbers representing the x and y coordinates. The size is known, it is a priori two, and the size won't change, it will always be two. In the case where the size isn't known at the beginning, but once set, the size doesn't change, we could take the example of a game, with a fixed number of players. At the beginning we don't know how many players we'll have, if three people will participate in the fame, or ten people, but once the number of players is set, we can't change it anymore. So here the size isn't known at the beginning, but once it is known, it can't change during the program's execution. Finally, the last case, maybe the little rarer one, and more difficult to illustrate, is the case where we know the size a priori, at the moment that we write the program, but that size can still change. This would typically be if we wanted to create a program which stores statistics for, for example, the number of "cantons" (TN: "districts" or "states") in a country, it's something we know. At the beginning we have 26 cantons (ed. in Switzerland), and maybe, one day, another canton will be created, or two cantons will merge, so the number would decrease. It's a limit case, and I confess that, indeed, this case is much more rare. Two complementary remarks, to conclude with arrays. First note that if we know how to do arrays with a non-fixed size, and which can vary, then we can obviously create all the other arrays with this one, that's a first remark. Then, second observation : practically no programming language offers the four variants explicitely. For example, in C++, we only have two kinds of arrays, we have dynamic arrays, represented by the vector type, and static, or fixed-size ones, which are, since C++ 2011, represented by the array type, or else a former type which is inherited from the ancestor of C++, which we call C, so the "C-type" arrays.