官术网_书友最值得收藏!

Introduction

An important part of building NLP systems is to work with the appropriate unit for processing. This chapter addresses the abstraction layer associated with the word level of processing. This is called tokenization, which amounts to grouping adjacent characters into meaningful chunks in support of classification, entity finding, and the rest of NLP.

LingPipe provides a broad range of tokenizer needs, which are not covered in this book. Look at the Javadoc for tokenizers that do stemming, Soundex (tokens based on what English words sound like), and more.

主站蜘蛛池模板: 图片| 红桥区| 利津县| 饶阳县| 巴林右旗| 老河口市| 分宜县| 云阳县| 华容县| 海淀区| 新昌县| 神池县| 甘泉县| 西吉县| 南城县| 江永县| 黄山市| 黄山市| 宜良县| 滨州市| 东方市| 博乐市| 汕头市| 泰州市| 达孜县| 福州市| 永福县| 武山县| 利辛县| 马山县| 昂仁县| 大悟县| 太白县| 宁强县| 隆安县| 吉首市| 辽宁省| 卢湾区| 白玉县| 华坪县| 射洪县|