官术网_书友最值得收藏!

  • The Natural Language Processing Workshop
  • Rohan Chopra Aniruddha M. Godbole Nipun Sadvilkar Muzaffar Bashir Shah Sohom Ghosh Dwight Gunning
  • 139字
  • 2021-06-11 18:39:26

Summary

In this chapter, you have learned about various types of data and ways to deal with unstructured text data. Text data is usually extremely noisy and needs to be cleaned and preprocessed, which mainly consists of tokenization, stemming, lemmatization, and stop-word removal. After preprocessing, features are extracted from texts using various methods, such as BoW and TFIDF. These methods convert unstructured text data into structured numeric data. New features are created from existing features using a technique called feature engineering. In the last part of this chapter, we explored various ways of visualizing text data, such as word clouds.

In the next chapter, you will learn how to develop machine learning models to classify texts using the feature extraction methods you have learned about in this chapter. Moreover, different sampling techniques and model evaluation parameters will be introduced.

主站蜘蛛池模板: 常德市| 奉贤区| 胶南市| 大丰市| 仁布县| 华池县| 绥宁县| 河南省| 东丽区| 达州市| 奈曼旗| 临漳县| 岳阳市| 孟村| 扎兰屯市| 班玛县| 吉林市| 措勤县| 集贤县| 安龙县| 建昌县| 株洲县| 介休市| 如东县| 龙山县| 鹿邑县| 浠水县| 文昌市| 侯马市| 安平县| 汉沽区| 聂拉木县| 喀喇| 南溪县| 弋阳县| 赞皇县| 唐山市| 宁津县| 松江区| 云林县| 南丹县|