官术网_书友最值得收藏!

Choosing the proper directory configuration

One of the most crucial properties of Apache Lucene and Solr is the Lucene Directory implementation. The directory interface provides an abstraction layer for all I/O operations for the Lucene library. Although it seems simple, choosing the right directory implementation can affect the performance of your Solr setup in a drastic way. This recipe will show you how to choose the right directory implementation.

How to do it...

In order to use the desired directory, all you need to do is choose the right directory factory implementation and inform Solr about it. Let's assume that you want to use NRTCachingDirectory as your directory implementation. In order to do this, you need to place (or replace if it is already present) the following fragment in your solrconfig.xml file:

<directoryFactory name="DirectoryFactory" class="solr.NRTCachingDirectoryFactory" />

That's all. The setup is quite simple, but I think that the question that will arise is what directory factories are available to use. When this book was written, the following directory factories were available:

  • solr.StandardDirectoryFactory
  • solr.SimpleFSDirectoryFactory
  • solr.NIOFSDirectoryFactory
  • solr.MMapDirectoryFactory
  • solr.NRTCachingDirectoryFactory
  • solr.HdfsDirectoryFactory
  • solr.RAMDirectoryFactory

Now, let's see what each of these factories provides.

How it works...

Before we get into the details of each of the presented directory factories, I would like to comment on the directory factory configuration parameter. All you need to remember is that the name attribute of the directoryFactory tag should be set to DirectoryFactory, and the class attribute should be set to the directory factory implementation of your choice. Also, some of the directory implementations can take additional parameters that define their behavior. We will talk about some of them in other recipes in the book (for example, in the Limiting I/O usage recipe in this chapter).

If you want Solr to make decisions for you, you should use the solr.StandardDirectoryFactory directory factory. It is filesystem-based and tries to choose the best implementation based on your current operating system and Java virtual machine used. If you implement a small application that won't use many threads, you can use the solr.SimpleFSDirectoryFactory directory factory that stores the index file on your local filesystem, but it doesn't scale well with a high number of threads. The solr.NIOFSDirectoryFactory directory factory scales well with many threads, but remember that it doesn't work well on Microsoft Windows platforms (it's much slower) because of a JVM bug (http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6265734).

The solr.MMapDirectoryFactory directory factory has been the default directory factory for Solr for 64-bit Linux systems since Solr 3.1. This directory implementation uses virtual memory and the kernel feature called mmap to access index files stored on disk. This allows Lucene (and thus Solr) to directly access the I/O cache. This is desirable, and you should stick to this directory if near real-time searching is not needed.

If you need near real-time indexing and searching, you should use solr.NRTCachingDirectoryFactory. It is designed to store some parts of the index in memory (small chunks), and thus speeds up some near real-time operations greatly. By saying near real-time, we mean that the documents are available within milliseconds from indexing.

Note

If you want to know more about near real-time search and indexing, refer to a great explanation on the phrase on Solr wiki, available at https://wiki.apache.org/lucene-java/NearRealtimeSearch.

The solr.HdfsDirectoryFactory is used when Solr runs on HDFS filesystems, so inside a Hadoop cluster. If you are using Solr inside a Hadoop cluster, then it is almost certain that you'll want to use the directory implementation.

The last directory factory, solr.RAMDirectoryFactory, is the only one that is not persistent. The whole index is stored in the RAM memory, and thus, you'll lose your index after a restart or server crash. Also, you should remember that replication won't work when using solr.RAMDirectoryFactory. One might ask why I should use this factory? Imagine a volatile index autocomplete functionality or for unit tests of your query's relevance, or just anything you can think of when you don't need to have persistent and replicated data. However, remember that this directory is not designed to hold large amounts of data.

主站蜘蛛池模板: 朝阳县| 肃北| 安义县| 南投县| 厦门市| 广州市| 色达县| 团风县| 衡阳市| 喀喇沁旗| 临西县| 长宁区| 江华| 曲麻莱县| 邮箱| 轮台县| 吴川市| 建阳市| 环江| 邻水| 全州县| 河池市| 房产| 阳江市| 稻城县| 宁海县| 黄大仙区| 蓬溪县| 鄂伦春自治旗| 松桃| 遵化市| 灌阳县| 临潭县| 都安| 寻甸| 商洛市| 黄骅市| 景德镇市| 汉中市| 新邵县| 濉溪县|