官术网_书友最值得收藏!

Introduction

An important part of building NLP systems is to work with the appropriate unit for processing. This chapter addresses the abstraction layer associated with the word level of processing. This is called tokenization, which amounts to grouping adjacent characters into meaningful chunks in support of classification, entity finding, and the rest of NLP.

LingPipe provides a broad range of tokenizer needs, which are not covered in this book. Look at the Javadoc for tokenizers that do stemming, Soundex (tokens based on what English words sound like), and more.

主站蜘蛛池模板: 黎川县| 蒙山县| 香港 | 襄城县| 托克托县| 湘潭市| 曲阜市| 襄城县| 新郑市| 垦利县| 新沂市| 东丽区| 新化县| 静乐县| 同德县| 泗水县| 岐山县| 辉县市| 手游| 平昌县| 漯河市| 柳州市| 农安县| 来宾市| 湖南省| 德钦县| 北辰区| 二手房| 宁城县| 永城市| 建瓯市| 莒南县| 绩溪县| 邵武市| 肃北| 广州市| 泾源县| 清苑县| 闻喜县| 道孚县| 汶川县|