官术网_书友最值得收藏!

Chapter 2. Finding and Working with Words

In this chapter, we cover the following recipes:

  • Introduction to tokenizer factories – finding words in a character stream
  • Combining tokenizers – lowercase tokenizer
  • Combining tokenizers – stop word tokenizers
  • Using Lucene/Solr tokenizers
  • Using Lucene/Solr tokenizers with LingPipe
  • Evaluating tokenizers with unit tests
  • Modifying tokenizer factories
  • Finding words for languages without white spaces
主站蜘蛛池模板: 辽宁省| 原阳县| 惠东县| 宜丰县| 合水县| 东乡县| 荔浦县| 武安市| 安达市| 鹿邑县| 柏乡县| 尉氏县| 安岳县| 商洛市| 两当县| 英超| 红安县| 崇左市| 凤冈县| 县级市| 丰台区| 寻乌县| 巢湖市| 松阳县| 阿拉善左旗| 晋城| 鄂尔多斯市| 临泉县| 宁都县| 温泉县| 岳池县| 扶沟县| 邢台县| 绥阳县| 新邵县| 长子县| 蒲城县| 辽阳市| 长阳| 亚东县| 冀州市|