- The Natural Language Processing Workshop
- Rohan Chopra Aniruddha M. Godbole Nipun Sadvilkar Muzaffar Bashir Shah Sohom Ghosh Dwight Gunning
- 139字
- 2021-06-11 18:39:26
Summary
In this chapter, you have learned about various types of data and ways to deal with unstructured text data. Text data is usually extremely noisy and needs to be cleaned and preprocessed, which mainly consists of tokenization, stemming, lemmatization, and stop-word removal. After preprocessing, features are extracted from texts using various methods, such as BoW and TFIDF. These methods convert unstructured text data into structured numeric data. New features are created from existing features using a technique called feature engineering. In the last part of this chapter, we explored various ways of visualizing text data, such as word clouds.
In the next chapter, you will learn how to develop machine learning models to classify texts using the feature extraction methods you have learned about in this chapter. Moreover, different sampling techniques and model evaluation parameters will be introduced.
- Mastering Ninject for Dependency Injection
- Modern Programming: Object Oriented Programming and Best Practices
- Learning Spring Boot
- 業(yè)務(wù)數(shù)據(jù)分析:五招破解業(yè)務(wù)難題
- 達(dá)夢數(shù)據(jù)庫性能優(yōu)化
- 智能數(shù)據(jù)時代:企業(yè)大數(shù)據(jù)戰(zhàn)略與實(shí)戰(zhàn)
- “互聯(lián)網(wǎng)+”時代立體化計算機(jī)組
- 大數(shù)據(jù)技術(shù)入門
- gnuplot Cookbook
- 新手學(xué)會計(2013-2014實(shí)戰(zhàn)升級版)
- 數(shù)據(jù)庫技術(shù)及應(yīng)用
- 數(shù)據(jù)修復(fù)技術(shù)與典型實(shí)例實(shí)戰(zhàn)詳解(第2版)
- 數(shù)據(jù)指標(biāo)體系:構(gòu)建方法與應(yīng)用實(shí)踐
- Practical Convolutional Neural Networks
- Tableau商業(yè)分析從新手到高手(視頻版)