电玩城

書名： The Natural Language Processing Workshop
作者名： Rohan Chopra Aniruddha M. Godbole Nipun Sadvilkar Muzaffar Bashir Shah Sohom Ghosh Dwight Gunning
本章字?jǐn)?shù)： 139字
更新時間： 2021-06-11 18:39:26

Summary

In this chapter, you have learned about various types of data and ways to deal with unstructured text data. Text data is usually extremely noisy and needs to be cleaned and preprocessed, which mainly consists of tokenization, stemming, lemmatization, and stop-word removal. After preprocessing, features are extracted from texts using various methods, such as BoW and TFIDF. These methods convert unstructured text data into structured numeric data. New features are created from existing features using a technique called feature engineering. In the last part of this chapter, we explored various ways of visualizing text data, such as word clouds.

In the next chapter, you will learn how to develop machine learning models to classify texts using the feature extraction methods you have learned about in this chapter. Moreover, different sampling techniques and model evaluation parameters will be introduced.

官术网_书友最值得收藏!

The Natural Language Processing Workshop

Summary