Hello and welcome. In this video,

we'll talk about how to use Python for machine learning. So let's get started.

Python is a popular and powerful general purpose programming language

that recently emerged as the preferred language among data scientists.

You can write your machine-learning algorithms using Python and it works very well.

However, there are a lot of modules and libraries already implemented in Python,

that can make your life much easier.

We try to introduce the Python packages in

this course and use it in the labs to give you better hands-on experience.

The first package is NumPy which is

a math library to work with N-dimensional arrays in Python.

It enables you to do computation efficiently and effectively.

It is better than regular Python because of its amazing capabilities.

For example, for working with arrays, dictionaries,

functions, datatypes and working with images you need to know NumPy.

SciPy is a collection of numerical algorithms and domain specific toolboxes,

including signal processing, optimization,

statistics and much more.

SciPy is a good library for scientific and high performance computation.

Matplotlib is a very popular plotting package that provides 2D plotting,

as well as 3D plotting.

Basic knowledge about these three packages which are built on top of Python,

is a good asset for data scientists who want to work with real-world problems.

If you're not familiar with these packages,

I recommend that you take the data analysis with Python course first.

This course covers most of the useful topics in these packages.

Pandas library is a very high-level Python library

that provides high performance easy to use data structures.

It has many functions for data importing, manipulation and analysis.

In particular, it offers data structures and

operations for manipulating numerical tables and timeseries.

SciKit Learn is a collection of algorithms and tools for

machine learning which is our focus here

and which you'll learn to use within this course.

As we'll be using SciKit Learn quite a bit in the labs,

let me explain more about it and show you why it is so popular among data scientists.

SciKit Learn is a free Machine Learning Library for the Python programming language.

It has most of the classification,

regression and clustering algorithms,

and it's designed to work with

a Python numerical and scientific libraries; NumPy and SciPy.

Also, it includes very good documentation.

On top of that,

implementing machine learning models with SciKit Learn

is really easy with a few lines of Python code.

Most of the tasks that need to be done in a machine learning pipeline are

implemented already in Scikit Learn including pre-processing of data,

feature selection, feature extraction, train test splitting,

defining the algorithms, fitting models,

tuning parameters, prediction, evaluation and exporting the model.

Let me show you an example of how SciKit Learn looks like when you use this library.

You don't have to understand the code for now but just see

how easily you can build a model with just a few lines of code.

Basically, machine-learning algorithms benefit from standardization of the dataset.

If there are some outliers or different scales fields in your dataset,

you have to fix them.

The pre-processing package of SciKit Learn provides several common utility functions and

transformer classes to change

raw feature vectors into a suitable form of vector for modeling.

You have to split your dataset into train and test sets to

train your model and then test the model's accuracy separately.

SciKit Learn can split arrays or matrices into

random train and test subsets for you in one line of code.

Then you can set up your algorithm.

For example, you can build a classifier using a support vector classification algorithm.

We call our estimator instance CLF and initialize its parameters.

Now you can train your model with the train

set by passing our training set to the fit method,

the CLF model learns to classify unknown cases.

Then we can use our test set to run predictions,

and the result tells us what the class of each unknown value is.

Also, you can use the different metrics to evaluate your model accuracy.

For example, using a confusion matrix to show the results.

And finally, you save your model.

You may find all or some of these machine-learning terms confusing but don't worry,

we'll talk about all of these topics in the following videos.

The most important point to remember is that the entire process of

a machine learning task can be done simply in a few lines of code using SciKit Learn.

Please notice that though it is possible,

it would not be that easy if you want to do all of this using NumPy or SciPy packages.

And of course, it needs much more coding if you use

pure Python programming to implement all of these tasks.

Thanks for watching.