官术网_书友最值得收藏!

Named-entity recognition

Given a text sequence, the named-entity recognition (NER) task is to locate and identify words or phrases that are of definitive categories such as names of persons, companies, locations, and dates. We will briefly mention it again in Chapter 4Detecting Spam Email with Naive Bayes.

As an appetizer, let's take a peep at an example of using spaCy for NER.

First, tokenize an input sentence, The book written by Hayden Liu in 2018 was sold at $30 in America, as usual as shown in the following command:

>>> tokens3 = nlp('The book written by Hayden Liu in 2018 was sold at $30 in America')

The resultant token object contains an attribute called ents, which is the named entities. We can extract the tagging for each recognized named entity as follows:

print([(token_ent.text, token_ent.label_) for token_ent in tokens3.ents])
[('Hayden Liu', 'PERSON'), ('2018', 'DATE'), ('30', 'MONEY'), ('America', 'GPE')]

We can see from the results that Hayden Liu is PERSON, 2018 is DATE, 30 is MONEY, and America is GPE (country). Please refer to https://spacy.io/api/annotation#section-named-entities for a full list of named entity tags.

主站蜘蛛池模板: 凤庆县| 望江县| 雅江县| 汾西县| 阿坝县| 施秉县| 雅江县| 武宣县| 沂南县| 舒城县| 上林县| 汤阴县| 巴南区| 西青区| 盈江县| 井陉县| 杭锦旗| 荆门市| 黄大仙区| 班玛县| 平遥县| 聊城市| 彰化市| 桂阳县| 收藏| 永春县| 黄浦区| 远安县| 通山县| 凉城县| 西吉县| 卢湾区| 汉源县| 丰原市| 达拉特旗| 柞水县| 焦作市| 曲阳县| 临清市| 年辖:市辖区| 凌云县|