官术网_书友最值得收藏!

Counting the number of fields

Imagine a situation where we have a simple document to be indexed to Solr with titles and tags. What we will want to do is separate the premium documents that have more tag values because they are better in terms of our business. Of course, we can count the number of tags ourselves, but why not let Solr do this? This recipe will show you how to do this with Solr.

How to do it...

Let's look at the steps we need to take to count the number of field values.

  1. We start with the index structure. What we need to do is put the following section in the schema.xml file:
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="title" type="text_general" indexed="true" stored="true"/>
    <field name="tags" type="string" indexed="true" stored="true" multiValued="true"/>
    <field name="tags_count" type="int" indexed="true" stored="true"/>
  2. The next thing is our test data, which looks as follows:
    <add>
     <doc>
      <field name="id">1</field>
      <field name="title">Solr Cookbook 4</field>
      <field name="tags">solr</field>
     </doc>
     <doc>
      <field name="id">2</field>
      <field name="title">Solr Cookbook 4 second edition</field>
      <field name="tags">search</field>
      <field name="tags">solr</field>
      <field name="tags">cookbook</field>
     </doc>
    </add>
  3. In addition to this, we need to alter our solrconfig.xml file. First, we add the proper update request processor to the file:
    <updateRequestProcessorChain name="count">
     <processor class="solr.CloneFieldUpdateProcessorFactory">
      <str name="source">tags</str>
      <str name="dest">tags_count</str>
     </processor>
     <processor class="solr.CountFieldValuesUpdateProcessorFactory">
      <str name="fieldName">tags_count</str>
     </processor>
     <processor class="solr.DefaultValueUpdateProcessorFactory">
      <str name="fieldName">tags_count</str>
      <int name="value">0</int>
     </processor>
     <processor class="solr.LogUpdateProcessorFactory" />
     <processor class="solr.RunUpdateProcessorFactory" />
    </updateRequestProcessorChain>
  4. We would also like to have our update processor be used with every indexing request, so we change our /update handler in the solrconfig.xml file so that it looks like this:
    <requestHandler name="/update" class="solr.UpdateRequestHandler">
     <lst name="defaults">
      <str name="update.chain">count</str>
     </lst>
    </requestHandler>
  5. Now, if we want to use the count information Solr automatically added, we will send the following query:
    http://localhost:8983/solr/cookbook/select?q=title:cookbook&bf=field(tags_count)&defType=edismax
  6. Solr will position the document with more tags at the top of the result list:
    <?xml version="1.0" encoding="UTF-8"?>
    <response>
     <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
      <lst name="params">
       <str name="q">title:cookbook</str>
       <str name="defType">edismax</str>
       <str name="bf">field(tags_count)</str>
      </lst>
     </lst>
     <result name="response" numFound="2" start="0">
      <doc>
       <str name="id">2</str>
       <str name="title">Solr Cookbook 4 second edition</str>
       <arr name="tags">
        <str>search</str>
        <str>solr</str>
        <str>cookbook</str>
       </arr>
       <int name="tags_count">3</int>
       <long name="_version_">1467535763434373120</long></doc>
      <doc>
       <str name="id">1</str>
       <str name="title">Solr Cookbook 4</str>
       <arr name="tags">
        <str>solr</str>
       </arr>
       <int name="tags_count">1</int>
       <long name="_version_">1467535763382992896</long></doc>
      </result>
    </response>

Now, let's see how it works.

How it works...

The index structure is quite simple. It contains a unique identifier field, a title, a field holding tags, and a field holding the count of tags. As you can see, in the example data, we provide the identifier of the document, its title, and the tags. What we don't provide is the number of tags that we calculate during indexation.

We also defined a new update request processor chain called count. It contains five update processors.

The first update processor, solr.CloneFieldUpdateProcessorFactory, is responsible for copying the value of the field defined by the source property to a field defined by the dest property. The second update processor, solr.CountFieldValuesUpdateProcessorFactory, replaces the actual value of the field defined by the fieldName property with the count of values. This is why we need the solr.CloneFieldUpdateProcessorFactory update processor before solr.CountFieldValuesUpdateProcessorFactory. The third update processor, solr.DefaultValueUpdateProcessorFactory, sets the default value (defined by the value property) for the field defined by the fieldName property. The other request processors are responsible for logging the request information and running the update. By defining this chain, we tell Solr that we want the tags field to be cloned into tags_count first, then we want the counts to be calculated and placed in the tags_count field; if we don't have a value in the tags_count field, we set it to 0.

We also define the solr.UpdateRequestHandler configuration and then alter the default configuration by adding the defaults section and including the update.chain property to count (our update request processor chain name). This means that our defined update request processor chain will be used with every indexing request.

Our query searches for every document that includes the cookbook term in the title field. We will also use the edismax query parser (defType=edismax). We also include a simple boosting function that boosts documents by the value of their tags_count field (bf=field(tags_count)). As you can see in the results, we get what we wanted to achieve.

主站蜘蛛池模板: 鲜城| 佛冈县| 襄汾县| 新邵县| 奉化市| 中江县| 滦南县| 永泰县| 宜丰县| 长沙县| 田阳县| 禄丰县| 维西| 兴城市| 普陀区| 巴彦淖尔市| 连州市| 宁乡县| 罗定市| 黄大仙区| 宜川县| 宜良县| 从化市| 光泽县| 临桂县| 青岛市| 蒙城县| 盘锦市| 鹤岗市| 会理县| 浦城县| 敦煌市| 会东县| 荣昌县| 凌源市| 翁牛特旗| 临西县| 曲靖市| 朝阳县| 凌云县| 长春市|