官术网_书友最值得收藏!

Building Your NLP Vocabulary

In the earlier chapters, you were introduced to why Natural Language Processing (NLP) is important especially in today's context, which was followed by a discussion on a few prerequisites and Python libraries that are highly beneficial for NLP tasks. In this chapter, we will take this discussion further and discuss some of the most concrete tasks involved in building a vocabulary for NLP tasks and preprocessing textual data in detail. We will start by learning what a vocabulary is and take the notion forward to actually build a vocabulary. We will do this by applying various methods on text data that are present in most of the NLP pipelines across any organization.

In this chapter, we'll cover the following topics:

  • Lexicons
  • Phonemes, graphemes, and morphemes
  • Tokenization
  • Understanding word normalization
主站蜘蛛池模板: 高碑店市| 华池县| 财经| 乌拉特中旗| 万安县| 巴林右旗| 无极县| 孝感市| 固阳县| 罗山县| 麻城市| 永兴县| 清镇市| 赣州市| 遂宁市| 罗山县| 德清县| 潮州市| 会昌县| 沙雅县| 施秉县| 吉水县| 贡山| 桂林市| 龙南县| 东莞市| 乾安县| 富顺县| 富平县| 达拉特旗| 盐城市| 罗城| 铜梁县| 东港市| 古浪县| 天峻县| 全南县| 华容县| 株洲县| 屏山县| 乌苏市|