課程信息
4.6
1,479 個評分
264 個審閱
專項課程

第 1 門課程(共 1 門),位於

100% 在線

100% 在線

立即開始,按照自己的計劃學習。
可靈活調整截止日期

可靈活調整截止日期

根據您的日程表重置截止日期。
完成時間(小時)

完成時間大約為21 小時

建議:6 weeks of study, 5-8 hours/week...
可選語言

英語(English)

字幕:英語(English), 阿拉伯語(Arabic)...

您將獲得的技能

Data Clustering AlgorithmsK-Means ClusteringMachine LearningK-D Tree
專項課程

第 1 門課程(共 1 門),位於

100% 在線

100% 在線

立即開始,按照自己的計劃學習。
可靈活調整截止日期

可靈活調整截止日期

根據您的日程表重置截止日期。
完成時間(小時)

完成時間大約為21 小時

建議:6 weeks of study, 5-8 hours/week...
可選語言

英語(English)

字幕:英語(English), 阿拉伯語(Arabic)...

教學大綱 - 您將從這門課程中學到什麼

1
完成時間(小時)
完成時間為 1 小時

Welcome

Clustering and retrieval are some of the most high-impact machine learning tools out there. Retrieval is used in almost every applications and device we interact with, like in providing a set of products related to one a shopper is currently considering, or a list of people you might want to connect with on a social media platform. Clustering can be used to aid retrieval, but is a more broadly useful tool for automatically discovering structure in data, like uncovering groups of similar patients.<p>This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have....
Reading
4 個視頻(共 25 分鐘), 4 個閱讀材料
Video4 個視頻
Course overview3分鐘
Module-by-module topics covered8分鐘
Assumed background6分鐘
Reading4 個閱讀材料
Important Update regarding the Machine Learning Specialization10分鐘
Slides presented in this module10分鐘
Software tools you'll need for this course10分鐘
A big week ahead!10分鐘
2
完成時間(小時)
完成時間為 4 小時

Nearest Neighbor Search

We start the course by considering a retrieval task of fetching a document similar to one someone is currently reading. We cast this problem as one of nearest neighbor search, which is a concept we have seen in the Foundations and Regression courses. However, here, you will take a deep dive into two critical components of the algorithms: the data representation and metric for measuring similarity between pairs of datapoints. You will examine the computational burden of the naive nearest neighbor search algorithm, and instead implement scalable alternatives using KD-trees for handling large datasets and locality sensitive hashing (LSH) for providing approximate nearest neighbors, even in high-dimensional spaces. You will explore all of these ideas on a Wikipedia dataset, comparing and contrasting the impact of the various choices you can make on the nearest neighbor results produced....
Reading
22 個視頻(共 137 分鐘), 4 個閱讀材料, 5 個測驗
Video22 個視頻
1-NN algorithm2分鐘
k-NN algorithm6分鐘
Document representation5分鐘
Distance metrics: Euclidean and scaled Euclidean6分鐘
Writing (scaled) Euclidean distance using (weighted) inner products4分鐘
Distance metrics: Cosine similarity9分鐘
To normalize or not and other distance considerations6分鐘
Complexity of brute force search1分鐘
KD-tree representation9分鐘
NN search with KD-trees7分鐘
Complexity of NN search with KD-trees5分鐘
Visualizing scaling behavior of KD-trees4分鐘
Approximate k-NN search using KD-trees7分鐘
Limitations of KD-trees3分鐘
LSH as an alternative to KD-trees4分鐘
Using random lines to partition points5分鐘
Defining more bins3分鐘
Searching neighboring bins8分鐘
LSH in higher dimensions4分鐘
(OPTIONAL) Improving efficiency through multiple tables22分鐘
A brief recap2分鐘
Reading4 個閱讀材料
Slides presented in this module10分鐘
Choosing features and metrics for nearest neighbor search10分鐘
(OPTIONAL) A worked-out example for KD-trees10分鐘
Implementing Locality Sensitive Hashing from scratch10分鐘
Quiz5 個練習
Representations and metrics12分鐘
Choosing features and metrics for nearest neighbor search10分鐘
KD-trees10分鐘
Locality Sensitive Hashing10分鐘
Implementing Locality Sensitive Hashing from scratch10分鐘
3
完成時間(小時)
完成時間為 2 小時

Clustering with k-means

In clustering, our goal is to group the datapoints in our dataset into disjoint sets. Motivated by our document analysis case study, you will use clustering to discover thematic groups of articles by "topic". These topics are not provided in this unsupervised learning task; rather, the idea is to output such cluster labels that can be post-facto associated with known topics like "Science", "World News", etc. Even without such post-facto labels, you will examine how the clustering output can provide insights into the relationships between datapoints in the dataset. The first clustering algorithm you will implement is k-means, which is the most widely used clustering algorithm out there. To scale up k-means, you will learn about the general MapReduce framework for parallelizing and distributing computations, and then how the iterates of k-means can utilize this framework. You will show that k-means can provide an interpretable grouping of Wikipedia articles when appropriately tuned....
Reading
13 個視頻(共 79 分鐘), 2 個閱讀材料, 3 個測驗
Video13 個視頻
An unsupervised task6分鐘
Hope for unsupervised learning, and some challenge cases4分鐘
The k-means algorithm7分鐘
k-means as coordinate descent6分鐘
Smart initialization via k-means++4分鐘
Assessing the quality and choosing the number of clusters9分鐘
Motivating MapReduce8分鐘
The general MapReduce abstraction5分鐘
MapReduce execution overview and combiners6分鐘
MapReduce for k-means7分鐘
Other applications of clustering7分鐘
A brief recap1分鐘
Reading2 個閱讀材料
Slides presented in this module10分鐘
Clustering text data with k-means10分鐘
Quiz3 個練習
k-means18分鐘
Clustering text data with K-means16分鐘
MapReduce for k-means10分鐘
4
完成時間(小時)
完成時間為 3 小時

Mixture Models

In k-means, observations are each hard-assigned to a single cluster, and these assignments are based just on the cluster centers, rather than also incorporating shape information. In our second module on clustering, you will perform probabilistic model-based clustering that provides (1) a more descriptive notion of a "cluster" and (2) accounts for uncertainty in assignments of datapoints to clusters via "soft assignments". You will explore and implement a broadly useful algorithm called expectation maximization (EM) for inferring these soft assignments, as well as the model parameters. To gain intuition, you will first consider a visually appealing image clustering task. You will then cluster Wikipedia articles, handling the high-dimensionality of the tf-idf document representation considered....
Reading
15 個視頻(共 91 分鐘), 4 個閱讀材料, 3 個測驗
Video15 個視頻
Aggregating over unknown classes in an image dataset6分鐘
Univariate Gaussian distributions2分鐘
Bivariate and multivariate Gaussians7分鐘
Mixture of Gaussians6分鐘
Interpreting the mixture of Gaussian terms5分鐘
Scaling mixtures of Gaussians for document clustering5分鐘
Computing soft assignments from known cluster parameters7分鐘
(OPTIONAL) Responsibilities as Bayes' rule5分鐘
Estimating cluster parameters from known cluster assignments6分鐘
Estimating cluster parameters from soft assignments8分鐘
EM iterates in equations and pictures6分鐘
Convergence, initialization, and overfitting of EM9分鐘
Relationship to k-means3分鐘
A brief recap1分鐘
Reading4 個閱讀材料
Slides presented in this module10分鐘
(OPTIONAL) A worked-out example for EM10分鐘
Implementing EM for Gaussian mixtures10分鐘
Clustering text data with Gaussian mixtures10分鐘
Quiz3 個練習
EM for Gaussian mixtures18分鐘
Implementing EM for Gaussian mixtures12分鐘
Clustering text data with Gaussian mixtures8分鐘
4.6
264 個審閱Chevron Right
職業方向

32%

完成這些課程後已開始新的職業生涯
工作福利

83%

通過此課程獲得實實在在的工作福利

熱門審閱

創建者 JMJan 17th 2017

Excellent course, well thought out lectures and problem sets. The programming assignments offer an appropriate amount of guidance that allows the students to work through the material on their own.

創建者 AGSep 25th 2017

Nice course with all the practical stuffs and nice analysis about each topic but practical part of LDA was restricted for GraphLab users only which is a weak fallback and rest everything is fine.

講師

Avatar

Emily Fox

Amazon Professor of Machine Learning
Statistics
Avatar

Carlos Guestrin

Amazon Professor of Machine Learning
Computer Science and Engineering

關於 University of Washington

Founded in 1861, the University of Washington is one of the oldest state-supported institutions of higher education on the West Coast and is one of the preeminent research universities in the world....

關於 Machine Learning 專項課程

This Specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data....
Machine Learning

常見問題

  • 注册以便获得证书后,您将有权访问所有视频、测验和编程作业(如果适用)。只有在您的班次开课之后,才可以提交和审阅同学互评作业。如果您选择在不购买的情况下浏览课程,可能无法访问某些作业。

  • 您注册课程后,将有权访问专项课程中的所有课程,并且会在完成课程后获得证书。您的电子课程证书将添加到您的成就页中,您可以通过该页打印您的课程证书或将其添加到您的领英档案中。如果您只想阅读和查看课程内容,可以免费旁听课程。

還有其他問題嗎?請訪問 學生幫助中心