Creating a Wordcloud using NLP and TF-IDF in Python

提供方
Coursera Project Network
在此指導項目中,您將:

Learn how to clean a dataset by removing encodings and unwanted words/characters

Learn how to lemmatize a text and fit a TF-IDF model

Learn how to create a wordcloud using TF-IDF scores

Clock1.5 hours
Beginner初級
Cloud無需下載
Video分屏視頻
Comment Dots英語(English)
Laptop僅限桌面

By the end of this project, you will learn how to create a professional looking wordcloud from a text dataset in Python. You will use an open source dataset containing Christmas recipes and will create a wordcloud of the most important ingredients used in these recipes. I will teach you how load a JSON dataset, clean the dataset by removing encodings and unwanted characters, and lemmatize your dataset. I will also teach you how to calculate TF-IDF weights of words in your dataset and use these weights to create a wordcloud. You will create a ready-to-use Jupyter notebook for creating a wordcloud on any text dataset. Lemmatization is a process of removing inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma. TF-IDF stands for term frequency-inverse document frequency. TF-IDF gives a weight to each word which tells how important that term is. Using both lemmatization and TF-IDF, one can find the important words in the text dataset and use these important words to create the wordcloud. For example, these datasets could be customer complaints and the business can focus on the important issues that the customers are facing. Wordcloud is a powerful resource which can be used in reports and presentations. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.

您要培養的技能

  • Natural Language Toolkit (NLTK)
  • Python Programming
  • Term Frequency Inverse Document Frequency (TF-IDF)
  • Wordnet

分步進行學習

在與您的工作區一起在分屏中播放的視頻中,您的授課教師將指導您完成每個步驟:

  1. Load a JSON dataset in Python

  2. Clean the dataset

  3. Remove encodings

  4. Lemmatize the text

  5. Fit TF-IDF model

  6. Create a Wordcloud

指導項目工作原理

您的工作空間就是瀏覽器中的雲桌面,無需下載

在分屏視頻中,您的授課教師會為您提供分步指導

常見問題

常見問題

還有其他問題嗎?請訪問 學生幫助中心