Welcome to an introduction to Data Science with Python. This course is the first course out of five in a larger Python and Data Science Specialization. Each course progressively builds on your knowledge from previous courses to give you a well-rounded view of what Data Science is, while helping you to develop skills to practice data science. The specialization is of an intermediate level or difficulty, and we expect that you have studied some basic programming and statistics in the past. In this specialization, we're focused on teaching applied skills using the Python programming language. There are many other tools that one can use in data science, such as specialized statistical analysis languages like R, or more general purpose programming languages like Java and C. We chose Python as the basis for this specialization for three reasons. First, it's easy to learn. Python is now the language of choice for introducing university students to programming. It's used in eight out of 10 of the US's top computer science programs. Python programs tend to have minimal templating that you've might have seen in other languages, and have more natural constructs for typical tasks you might need to accomplish. If you have programming experience, but not Python-specific experience, you can pick up Python very quickly. Second, it's full featured. Python is a very general programming language with a lot of built-in libraries and excels at manipulating data, network programming, and databases. It's mature, and there's plenty of resources available from books to online courses. Finally, Python has a significant set of data science libraries one can use. The base of these is called the SciPy Ecosystem, and it even has its own conference series. Both the interface that we're going to use for doing assignments, called Jupiter Notebooks, and the main libraries for the first two courses, Pandas and Matplotlib, are part of the SciPy stack, and provide an excellent basis for moving into machine learning, text mining and network analysis. This first course is broken into four modules. The first module focuses on getting prerequisites in place and reviews some of the basics of the Python language. Don't worry, if you already have Python down and you want to be challenged, we have some advanced Python in here as well. The advanced Python isn't strictly necessary for the rest of the specialization, but many of these examples you might see on the web or broader data science topics like Big Data and real-time analytics, might require a knowledge of some of these more specialized features. In the second module we're going to dig into the pandas Toolkit. The pandas Toolkit is a fundamental in Python data science, and provides a data structure for thinking about data in a tabular form. This Toolkit helps bring functionality that exists in R into the Python world. It's seen significant adoption over the last five years. Much of the thinking behind pandas is similar to relational theory. So if you have a background in databases, you'll find the pandas environment fairly natural to work in. At the same time, some of the more advanced ways to query and manipulate pandas' data frames like boolean masking and hierarchical indexing are different than in databases and require some careful discussion. So we'll discuss these in module three of this course. The final module of the course is dedicated to the course project where you'll take some datasets, merge and clean them, then process the data and answer some questions. In this week we'll discuss basic statistical tests and methods that ensure you have a solid grasp going forward into the next course. At the same time the intent is for your course project to be a demonstration of the skills that you've gained in manipulating messy data into something of coherence. Before we go into programming fundamentals, though, we'll talk a bit more about what data science is, and why it's sweeping over the world.