Quantitative Text Analysis and Textual Similarity in R

提供方
Coursera Project Network
在此指導項目中,您將:

Tokenize the dataset and convert the data into a document feature matrix Calculate cosine similarity across documents and plot the output

Clock1 hour
Beginner初級
Cloud無需下載
Video分屏視頻
Comment Dots英語(English)
Laptop僅限桌面

By the end of this project, you will learn about the concept of document similarity in textual analysis in R. You will know how to load and pre-process a data set of text documents by converting the data set into a corpus and document feature matrix. You will know how to calculate the cosine similarity between documents and explore and plot the output of your calculation.

您要培養的技能

  • cosine similarity
  • Text Analysis
  • Document Similarity
  • Data Visualization (DataViz)
  • Text Corpus

分步進行學習

在與您的工作區一起在分屏中播放的視頻中,您的授課教師將指導您完成每個步驟:

  1. Load textual data into R and turn it into a corpus object and understand the concept of calculating document similarity in textual analysis

  2. Extract meta-data from text document filenames and subset the data frame to exclude unwanted data

  3. Tokenize and clean the dataset and convert the data into a document feature matrix

  4. Calculate cosine similarity across documents and plot the output

指導項目工作原理

您的工作空間就是瀏覽器中的雲桌面,無需下載

在分屏視頻中,您的授課教師會為您提供分步指導

常見問題

常見問題

還有其他問題嗎?請訪問 學生幫助中心