Welcome back.
In this video, we're going to look at the role Python plays in data science and
analytics.
Python is an open source, general purpose programming language that can be used for
everything from building web applications and
enterprise programs to performing analysis on large amounts of data.
Python is popular because it is freely available to use,
emphasizes code readability, and it is easy for newcomers to learn.
Also, it is popular because of its user friendliness and
the ability to integrate with a variety of programs, tools and websites.
Python has become one of the most popular languages for data management and
analysis.
Today, Python is widely used by startups and
tech companies to embed analytics into their products, and
by data scientists to quickly manage and analyze large amounts of data.
Python has a set of tools called the Python Data Analytics Stack
that address every step of the analytics workflow.
These tools are assembled into Python libraries,
which are collections of code that are easy to use.
While the names of these specific libraries may change,
here's a few examples of common libraries for data science.
Pandas for importing and assessing data including outline analysis, and
data cleansing, as well as summary statistics.
NumPY and SciPy for performing extremely fast matrix,
mathematical and scientific operations.
Statsmodels for fitting a wide range of statistical models to the data.
Scikit-Learn from applying machine learning techniques like, clustering,
dimensionality reduction, random forests and logistic regression.
Matplotlib, Seaborn and Bokeh for producing attractive visuals.
And finally, Apache Spark, for
processing data on a massive scale across a cluster of computers.
In addition to normal data science and numeric data,
Python has libraries that can also handle unstructured data like text and images.
For example, NLTK, Spacy, and Gensim process text data.
OpenCV manipulates and analyzes images.
BeautifulSoup and Scrapy make web scraping easy and intuitive.
Python can interact with tools like Caffe to use deep learning techniques
on powerful GPU enabled machines for
cutting edge machine learning on images, sound, and text.
And Flask or Django for building websites and web services that can embed machine
learning models that can be accessed through the Internet.