Predicting Wine Quality with Random Forest and Scikit-Learn

Coursera 社区项目网络
在此指導 項目中,您將:

Perform Exploratory Data Analysis.

Apply a Random Forest Classifier.

Analyze Random Forest Importances.

Clock2.5 hours
Comment Dots英語(English)

In real life we face various classification problems, such as predicting whether an email is spam or not, or whether a credit card transaction is fraudulent or not, or what label the mobile phone should assign to the image it focuses on, perhaps a flower, a dog, a person or something else. Fortunately, we have machine learning techniques to help us deal with this. In this guided project, we will tackle the problem of predicting red wine quality using a Random Forest Classifier. Specifically, we will implement it by programming with Python and the classifier provided by the Scikit-Learn package. You will learn to train the classifier, calibrate it, tune its hyperparameters and evaluate the accuracy of its predictions. You will also learn how to perform cluster analysis to handle collinearity and reduce the number of predictors without sacrificing model accuracy. In addition, you will draw various graphs to help you interpret the results. This project is intended for beginners, so the prerequisites are basic knowledge of Python, Pandas, Numpy, Matplotlib, Seaborn, Scikit-Learn, Scipy and Random Forest algorithms. Note: This course runs in Rhyme's virtual browser, which is Coursera's hands-on project platform. With this browser you will connect to Google Colaboratory to write and execute Python code in a Jupyter Notebook, without worrying about installing software. All you need is to have a Google account. This Guided Project was created by a Coursera community member.


Machine LearningExploratory Data AnalysisClustering Analysis



  1. Getting Started

  2. Defining Problem, Importing Libraries and Downloading Data

  3. Cleaning Data

  4. Performing Exploratory Data Analysis (part 1)

  5. Performing Exploratory Data Analysis (part 2)

  6. Generating Training, Validation and Testing Datasets

  7. Creating a Data Visualizer

  8. Applying a Random Forest Classifier

  9. Analyzing Random Forest Importances

  10. Clustering Analysis

  11. Performing Hyperparameter Tuning






還有其他問題嗎?請訪問 學生幫助中心