这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，学习模式发现深入的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

Loading...

來自 University of Illinois at Urbana-Champaign 的課程

Pattern Discovery in Data Mining

144 個評分

这这一课程中，我们将学习数据挖掘的基本概念及其基础的方法和应用，然后深入到数据挖掘的子领域——模式发现中，学习模式发现深入的概念、方法，及应用。我们也将介绍基于模式进行分类的方法以及一些模式发现有趣的应用。这一课程将给你提供学习技能和实践的机会，将可扩展的模式发现方法应用在在大体量交易数据上，讨论模式评估指标，以及学习用于挖掘各类不同的模式、序列模式，以及子图模式的方法。

從本節課中

Module 3

Module 3 consists of two lessons: Lessons 5 and 6. In Lesson 5, we discuss mining sequential patterns. We will learn several popular and efficient sequential pattern mining methods, including an Apriori-based sequential pattern mining method, GSP; a vertical data format-based sequential pattern method, SPADE; and a pattern-growth-based sequential pattern mining method, PrefixSpan. We will also learn how to directly mine closed sequential patterns. In Lesson 6, we will study concepts and methods for mining spatiotemporal and trajectory patterns as one kind of pattern mining applications. We will introduce a few popular kinds of patterns and their mining methods, including mining spatial associations, mining spatial colocation patterns, mining and aggregating patterns over multiple trajectories, mining semantics-rich movement patterns, and mining periodic movement patterns.

- Jiawei HanAbel Bliss Professor

Department of Computer Science

[MUSIC]

Now we study another interesting issue called Mining Closed Sequential Patterns.

The algorithm mining this is called CloSpan, so

what is closed sequential patterns, similar to closed frequent item sets?

The closed sequential pattern s means, if there exists no superpattern

s prime and this s prime and s have the same support.

Then s is a closed sequential pattern, in another word,

closed pattern means for the same support, you will find the longest one.

That's the closed pattern, the other ones probably does not really matter,

okay, for example, let's look at this example.

Supposed we find three sequences, abc with support 20,

abcd with support 20 and the abcde with support of 15.

Which ones are closed, abcd is closed in the sense

for support 20, abcd is a longest one.

Then abcde is also closed because the support at 15,

there is no longer 1 and 15 which is a super sequence of abcde.

So there are two ways to mine closed sequential patterns,

one way is you first mine all the sequential patterns.

Then you find which one is closed to like this abc,

you can knock it down, then you'll get set of closed sequential patterns.

But this is not a very efficient,

what we want is directly mine closed sequential patterns.

This will reduce a number of redundant patterns to be generated in the middle.

But it will attain the same expressive power because it's last it's compression,

so there's a one interesting property like this.

If s is the superset of s1,

s is closed if and only if two projected database have the same size.

Let's look at this, for example, for this sequence database, if you find

f the project database and another sequence is af.

This is the project database, if these project database have exactly same size,

then this actually means af is closed.

Essentially, you only need to mine one, that means you can merge them,

so you will be able to develop two kinds of pruning.

One called backward subpattern pruning, this is a subpattern,

you can prune it, another is backward superpattern pruning, so you probably see.

That you can use superpattern to chop up the subpattern you will get

into this, okay?

So with this spirit we can develop an efficient algorithm called CloSpan.

It greatly enhance the processing efficiency,

I'm not going to discuss very detailed.

But details you can read this paper, it gives you all the detail on the CloSpan.

[MUSIC]