官术网_书友最值得收藏!

Picking up NLP basics while touring popular NLP libraries

After a short list of real-world applications of NLP, we'll be touring the essential stack of Python NLP libraries in this chapter. These packages handle a wide range of NLP tasks as mentioned previously as well as others such as sentiment analysis, text classification, and named entity recognition.

The most famous NLP libraries in Python include the Natural Language Toolkit (NLTK), spaCy, Gensim, and TextBlob. The scikit-learn library also has impressive NLP-related features. Let's take a look at the following popular NLP libraries in Python:

  • nltk: This library (http://www.nltk.org/) was originally developed for educational purposes and is now being widely used in industries as well. It is said that you can't talk about NLP without mentioning NLTK. It is one of the most famous and leading platforms for building Python-based NLP applications. You can install it simply by running the following command line in terminal:
sudo pip install -U nltk

If you're using conda, then execute the following command line:

conda install nltk
  • SpaCy: This library (https://spacy.io/) is a more powerful toolkit in the industry than NLTK. This is mainly for two reasons: one, spaCy is written in Cython, which is much more memory-optimized (now you see where the Cy in spaCy comes from) and excels in NLP tasks; second, spaCy keeps using state-of-the-art algorithms for core NLP problems, such as, convolutional neural network (CNN) models for tagging and name entity recognition. But it could seem advanced for beginners. In case you're interested, here's the installation instructions.

   Run the following command line in the terminal:

pip install -U spacy

For conda, execute the following command line:

conda install -c conda-forge spacy
  • Gensim: This library (https://radimrehurek.com/gensim/), developed by Radim Rehurek, has been gaining popularity over recent years. It was initially designed in 2008 to generate a list of similar articles given an article, hence the name of this library (generate similar—> Gensim). It was later drastically improved by Radim Rehurek in terms of its efficiency and scalability. Again, we can easily install it via pip by running the following command line:
pip install --upgrade gensim

In the case of conda, you can perform the following command line in terminal:

conda install -c conda-forge gensim 
You should make sure the dependencies, NumPy and SciPy, are already installed before gensim.
  • TextBlob: This library (https://textblob.readthedocs.io/en/dev/) is a relatively new one built on top of NLTK. It simplifies NLP and text analysis with easy-to-use built-in functions and methods, as well as wrappers around common tasks. We can install TextBlob by running the following command line in the terminal:
pip install -U textblob

TextBlob has some useful features that are not available in NLTK (currently), such as spell checking and correction, language detection, and translation.

主站蜘蛛池模板: 辽阳市| 临城县| 洮南市| 南木林县| 嘉兴市| 巫溪县| 隆尧县| 施秉县| 浮梁县| 淮阳县| 涡阳县| 绩溪县| 兴山县| 彰武县| 额济纳旗| 奉化市| 潞西市| 新竹县| 石柱| 集安市| 麻栗坡县| 龙泉市| 闻喜县| 山东| 通许县| 万荣县| 临夏县| 禄丰县| 武冈市| 南华县| 澎湖县| 河津市| 花垣县| 呼伦贝尔市| 玛曲县| 定陶县| 嘉黎县| 张掖市| 崇义县| 湘潭县| 临海市|