課程信息
186,979 次近期查看

第 1 門課程(共 1 門)

100% 在線

立即開始,按照自己的計劃學習。

可靈活調整截止日期

根據您的日程表重置截止日期。

高級

完成時間大約為48 小時

建議:6-10 hours/week...

英語(English)

字幕:英語(English), 韓語

您將獲得的技能

Data AnalysisFeature ExtractionFeature EngineeringXgboost

第 1 門課程(共 1 門)

100% 在線

立即開始,按照自己的計劃學習。

可靈活調整截止日期

根據您的日程表重置截止日期。

高級

完成時間大約為48 小時

建議:6-10 hours/week...

英語(English)

字幕:英語(English), 韓語

教學大綱 - 您將從這門課程中學到什麼

1
完成時間為 6 小時

Introduction & Recap

This week we will introduce you to competitive data science. You will learn about competitions' mechanics, the difference between competitions and a real life data science, hardware and software that people usually use in competitions. We will also briefly recap major ML models frequently used in competitions.

...
8 個視頻 (總計 46 分鐘), 7 個閱讀材料, 6 個測驗
8 個視頻
Competition Mechanics6分鐘
Kaggle Overview [screencast]7分鐘
Real World Application vs Competitions5分鐘
Recap of main ML algorithms9分鐘
Software/Hardware Requirements5分鐘
7 個閱讀材料
Welcome!10分鐘
Week 1 overview10分鐘
Disclaimer10分鐘
Explanation for quiz questions10分鐘
Additional Materials and Links10分鐘
Explanation for quiz questions10分鐘
Additional Material and Links10分鐘
5 個練習
Practice Quiz8分鐘
Recap8分鐘
Recap12分鐘
Software/Hardware6分鐘
Graded Soft/Hard Quiz8分鐘
完成時間為 2 小時

Feature Preprocessing and Generation with Respect to Models

In this module we will summarize approaches to work with features: preprocessing, generation and extraction. We will see, that the choice of the machine learning model impacts both preprocessing we apply to the features and our approach to generation of new ones. We will also discuss feature extraction from text with Bag Of Words and Word2vec, and feature extraction from images with Convolution Neural Networks.

...
7 個視頻 (總計 73 分鐘), 4 個閱讀材料, 4 個測驗
7 個視頻
Overview6分鐘
Datetime and coordinates8分鐘
Handling missing values10分鐘
Bag of words10分鐘
Word2vec, CNN13分鐘
4 個閱讀材料
Explanation for quiz questions10分鐘
Additional Material and Links10分鐘
Explanation for quiz questions10分鐘
Additional Material and Links10分鐘
4 個練習
Feature preprocessing and generation with respect to models8分鐘
Feature preprocessing and generation with respect to models8分鐘
Feature extraction from text and images8分鐘
Feature extraction from text and images8分鐘
完成時間為 1 小時

Final Project Description

This is just a reminder, that the final project in this course is better to start soon! The final project is in fact a competition, in this module you can find an information about it.

...
1 個視頻 (總計 4 分鐘), 2 個閱讀材料
1 個視頻
2 個閱讀材料
Final project10分鐘
Final project advice #110分鐘
2
完成時間為 2 小時

Exploratory Data Analysis

We will start this week with Exploratory Data Analysis (EDA). It is a very broad and exciting topic and an essential component of solving process. Besides regular videos you will find a walk through EDA process for Springleaf competition data and an example of prolific EDA for NumerAI competition with extraordinary findings.

...
8 個視頻 (總計 80 分鐘), 2 個閱讀材料, 1 個測驗
8 個視頻
Visualizations11分鐘
Dataset cleaning and other things to check7分鐘
Springleaf competition EDA I8分鐘
Springleaf competition EDA II16分鐘
Numerai competition EDA6分鐘
2 個閱讀材料
Week 2 overview10分鐘
Additional material and links10分鐘
1 個練習
Exploratory data analysis12分鐘
完成時間為 2 小時

Validation

In this module we will discuss various validation strategies. We will see that the strategy we choose depends on the competition setup and that correct validation scheme is one of the bricks for any winning solution.

...
4 個視頻 (總計 51 分鐘), 3 個閱讀材料, 2 個測驗
4 個視頻
Problems occurring during validation20分鐘
3 個閱讀材料
Validation strategies10分鐘
Comments on quiz10分鐘
Additional material and links10分鐘
2 個練習
Validation8分鐘
Validation8分鐘
完成時間為 5 小時

Data Leakages

Finally, in this module we will cover something very unique to data science competitions. That is, we will see examples how it is sometimes possible to get a top position in a competition with a very little machine learning, just by exploiting a data leakage.

...
3 個視頻 (總計 26 分鐘), 3 個閱讀材料, 3 個測驗
3 個閱讀材料
Comments on quiz10分鐘
Additional material and links10分鐘
Final project advice #210分鐘
1 個練習
Data leakages8分鐘
3
完成時間為 3 小時

Metrics Optimization

This week we will first study another component of the competitions: the evaluation metrics. We will recap the most prominent ones and then see, how we can efficiently optimize a metric given in a competition.

...
8 個視頻 (總計 83 分鐘), 3 個閱讀材料, 2 個測驗
8 個視頻
Motivation8分鐘
Classification metrics review20分鐘
General approaches for metrics optimization6分鐘
Regression metrics optimization10分鐘
Classification metrics optimization I7分鐘
Classification metrics optimization II6分鐘
3 個閱讀材料
Week 3 overview10分鐘
Comments on quiz10分鐘
Additional material and links10分鐘
2 個練習
Metrics12分鐘
Metrics12分鐘
完成時間為 4 小時

Advanced Feature Engineering I

In this module we will study a very powerful technique for feature generation. It has a lot of names, but here we call it "mean encodings". We will see the intuition behind them, how to construct them, regularize and extend them.

...
3 個視頻 (總計 27 分鐘), 2 個閱讀材料, 2 個測驗
2 個閱讀材料
Comments on quiz10分鐘
Final project advice #310分鐘
1 個練習
Mean encodings8分鐘
4
完成時間為 3 小時

Hyperparameter Optimization

In this module we will talk about hyperparameter optimization process. We will also have a special video with practical tips and tricks, recorded by four instructors.

...
6 個視頻 (總計 86 分鐘), 4 個閱讀材料, 2 個測驗
6 個視頻
Practical guide16分鐘
KazAnova's competition pipeline, part 118分鐘
KazAnova's competition pipeline, part 217分鐘
4 個閱讀材料
Week 4 overview10分鐘
Comments on quiz10分鐘
Additional material and links10分鐘
Additional materials and links10分鐘
2 個練習
Practice quiz6分鐘
Graded quiz8分鐘
完成時間為 4 小時

Advanced feature engineering II

In this module we will learn about a few more advanced feature engineering techniques.

...
4 個視頻 (總計 22 分鐘), 2 個閱讀材料, 2 個測驗
2 個閱讀材料
Comments on quiz10分鐘
Additional Materials and Links10分鐘
1 個練習
Graded Advanced Features II Quiz12分鐘
完成時間為 10 小時

Ensembling

Nowadays it is hard to find a competition won by a single model! Every winning solution incorporates ensembles of models. In this module we will talk about the main ensembling techniques in general, and, of course, how it is better to ensemble the models in practice.

...
8 個視頻 (總計 92 分鐘), 4 個閱讀材料, 4 個測驗
8 個視頻
Bagging9分鐘
Boosting16分鐘
Stacking16分鐘
StackNet14分鐘
Ensembling Tips and Tricks14分鐘
CatBoost 17分鐘
CatBoost 27分鐘
4 個閱讀材料
Validation schemes for 2-nd level models10分鐘
Comments on quiz10分鐘
Additional materials and links10分鐘
Final project advice #410分鐘
2 個練習
Ensembling8分鐘
Ensembling12分鐘
4.7
135 個審閱Chevron Right

17%

完成這些課程後已開始新的職業生涯

25%

通過此課程獲得實實在在的工作福利

20%

加薪或升職

來自How to Win a Data Science Competition: Learn from Top Kagglers的熱門評論

創建者 MSMar 29th 2018

Top Kagglers gently introduce one to Data Science Competitions. One will have a great chance to learn various tips and tricks and apply them in practice throughout the course. Highly recommended!

創建者 MMNov 10th 2017

This course is fantastic. It's chock full of practical information that is presented clearly and concisely. I would like to thank the team for sharing their knowledge so generously.

講師

Avatar

Dmitry Ulyanov

Visiting lecturer
HSE Faculty of Computer Science
Avatar

Alexander Guschin

Visiting lecturer at HSE, Lecturer at MIPT
HSE Faculty of Computer Science
Avatar

Mikhail Trofimov

Visiting lecturer
HSE Faculty of Computer Science
Avatar

Dmitry Altukhov

Visiting lecturer
HSE Faculty of Computer Science
Avatar

Marios Michailidis

Research Data Scientist
H2O.ai

關於 国立高等经济大学

National Research University - Higher School of Economics (HSE) is one of the top research universities in Russia. Established in 1992 to promote new research and teaching in economics and related disciplines, it now offers programs at all levels of university education across an extraordinary range of fields of study including business, sociology, cultural studies, philosophy, political science, international relations, law, Asian studies, media and communicamathematics, engineering, and more. Learn more on www.hse.ru...

關於 高级机器学习 專項課程

This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings....
高级机器学习

常見問題

  • 注册以便获得证书后,您将有权访问所有视频、测验和编程作业(如果适用)。只有在您的班次开课之后,才可以提交和审阅同学互评作业。如果您选择在不购买的情况下浏览课程,可能无法访问某些作业。

  • 您注册课程后,将有权访问专项课程中的所有课程,并且会在完成课程后获得证书。您的电子课程证书将添加到您的成就页中,您可以通过该页打印您的课程证书或将其添加到您的领英档案中。如果您只想阅读和查看课程内容,可以免费旁听课程。

還有其他問題嗎?請訪問 學生幫助中心