Welcome to the second video in the introduction to Map/Reduce. The first video motivated our need for Map/Reduce, and in this video we will look at the framework. Using the Map/Reduce framework will help us process big data, but we have to adopt the Map/Reduce requirements. In other words, Map/Reduce expects us to write certain kinds of functions in exchange for taking care of the logistics. The first requirement is that all our data is gonna be placed into key-value pairs. We could think of it as that a key-value pair is gonna become our basic unit of data and our unit of analysis. The other requirement is that the user has to specify mapper and reducer functions. The mapper is the function that is applied to the data, and the reducer is the function that is applied to the intermediate results that is gonna come from Hadoop. Hadoop handles all the logistics of parallel execution, of the map and reduce functions, and producing the intermediate results, and communicating those results to the reducers. In some sense, Hadoop is like a card dealer. It distributes the map functions to the data. And perhaps the most distinguishing feature is that Hadoop is shuffling and grouping data according to the key-value pairs so that all pairs with the same key are grouped together and passed to the same reducer. Let's draw out the flow diagram of the Map/Reduce framework. The user defines a function, and that function will read in the data and output a key-value pair. Note, when I write key-value pairs in text, I often put the little arrow, the greater-than signs and the less-than sign, around that just to set off the text, but that is not part of your output. The user also defines a reduce function. The reduce function is built to read in the key-value pairs and output a result. Hadoop will take care of the logistics. Let's see what that is. It will take the map function, apply it to wherever the data is sitting, and you'll notice that the map function is gonna be replicated and distributed. Hadoop will take the map output, and it will shuffle and group the data according to the key to produce the intermediate results. Then, Hadoop will replicate and distribute the reduce function to process those intermediate results. And it looks something like this. In our next video, we will work through the word count example to see an instantiation of this workflow in some detail.