官术网_书友最值得收藏!

Introduction

In the previous chapter, we learned about the concepts of Natural Language Processing (NLP) and text analytics. We also looked at various pre-processing steps in brief. In this chapter, we will learn how to deal with text data whose formats are mostly unstructured. Unstructured data cannot be represented in a tabular format. Therefore, it is essential to convert it into numeric features because most machine learning algorithms are capable of dealing only with numbers. More emphasis will be put on steps such as tokenization, stemming, lemmatization, and stop-word removal. You will also learn about two popular methods for feature extraction: bag of words and Term Frequency-Inverse Document Frequency, as well as various methods for creating new features from existing features. Finally, you will become familiar with how text data can be visualized.

主站蜘蛛池模板: 兰溪市| 巩义市| 马边| 大竹县| 田阳县| 莲花县| 甘谷县| 阿巴嘎旗| 昌黎县| 星子县| 富阳市| 怀仁县| 曲阜市| 弋阳县| 大姚县| 鄂州市| 鸡泽县| 抚顺市| 峨山| 会东县| 简阳市| 石城县| 太白县| 丰原市| 贵定县| 尚志市| 平罗县| 郯城县| 巩义市| 无极县| 象山县| 廉江市| 马关县| 芒康县| 广安市| 江口县| 监利县| 龙川县| 庄浪县| 阿鲁科尔沁旗| 黔江区|