書名： Natural Language Processing with Python Quick Start Guide
作者名： Nirant Kasliwal
本章字數： 129字
更新時間： 2021-06-10 18:36:38

Bread and butter – most common tasks

There are several well-known text cleaning ideas. They have all made their way into the most popular tools today such as NLTK, Stanford CoreNLP, and spaCy. I like spaCy for two main reasons:

It's an industry-grade NLP, unlike NLTK, which is mainly meant for teaching.
It has good speed-to-performance trade-off. spaCy is written in Cython, which gives it C-like performance with Python code.

spaCy is actively maintained and developed, and incorporates the best methods available for most challenges.

By the end of this section, you will be able to do the following:

Understand tokenization and do it manually yourself using spaCy
Understand why stop word removal and case standardization works, with spaCy examples
Differentiate between stemming and lemmatization, with spaCy lemmatization examples

官术网_书友最值得收藏!

Natural Language Processing with Python Quick Start Guide

Bread and butter – most common tasks