Solr JSON Facet API

Related Pages

Introduction

Solr 5 has a completely re-written faceted search and analytics module with a structured JSON API to control the faceting and analytics commands.
NOTE: Some examples use syntax only supported in later Solr 5 releases, or even Solr 6.
Download a recent Solr release or snapshot to try them out.

The structured nature of nested sub-facets are more naturally expressed in a nested structure like JSON rather than the flat structure that normal query parameters provide.

Goals of the new Faceting Module:

First class JSON support
Easier programmatic construction of complex nested facet commands
Support a much more canonical response format that is easier for clients to parse
First class analytics support
Ability to sort facet buckets by any calculated metric
Support a cleaner way to do distributed faceting
Support better integration with other search features

Of course if you prefer to use Solr’s existing faceting capabilities, that’s fine too. You can even use both at once if you want!

UPDATE: The JSON Facet API is now part of the JSON Request API, so a complete request may be expressed in JSON.

Ease of Use

Some of the ease-of-use enhancements over traditional Solr faceting come from the inherent nested structure of JSON.
As an example, here is the faceting command for two different range facets using Solr’s flat legacy API:

&facet=true
&facet.range={!key=age_ranges}age
&f.age.facet.range.start=0
&f.age.facet.range.end=100
&f.age.facet.range.gap=10
&facet.range={!key=price_ranges}price
&f.price.facet.range.start=0
&f.price.facet.range.end=1000
&f.price.facet.range.gap=50

And here is the equivalent faceting command in the new JSON Faceting API:

{
  age_ranges: {
    type : range
    field : age,
    start : 0,
    end : 100,
    gap : 10
  }
  ,
  price_ranges: {
    type : range
    field : price,
    start : 0,
    end : 1000,
    gap : 50  
  }
}

These aren’t even nested facets, but already one can see how much nicer the JSON API looks. With deeply nested sub-facets and statistics, the clarity of the inherently nested JSON API only grows.

JSON extensions

A number of JSON extensions have been implemented to further increase the clarity and ease of constructing a JSON faceting command by hand. For example:

{  // this is a single-line comment, which can help add clarity to large JSON commands
     /* traditional C-style comments are also supported */
  x : "avg(price)" , // Simple strings can occur unquoted
  y : 'unique(manu)' // Strings can also use single quotes (easier to embed in another String)
}

Debugging JSON

Nicely indented JSON is very easy to understand. If you get a large piece of non-indented JSON somehow, and are trying to make sense of it, you can cut and paste into one of the online validators:
http://jsonlint.com
http://jsonformatter.curiousconcept.com
Both of these validators will indent your JSON, even when it contains extensions unsupported by them (such as comments or bare strings).

Facet Types

There are two types of facets, one that breaks up the domain into multiple buckets, and aggregations / facet functions that provide information about the set of documents belonging to each bucket.

Faceting can be nested! Any bucket produced by faceting can further be broken down into multiple buckets by a sub-facet.

Statistics are facets

Statistics are now fully integrated into faceting. Since we start off with a single facet bucket with a domain defined by the main query and filters, we can even ask for statistics for this top level bucket, before breaking up into further buckets via faceting. Example:

json.facet={
  x : "avg(price)",           // the average of the price field will appear under "x"
  y : "unique(manufacturer)"  // the number of unique manufacturers will appear under "y"
}

See facet functions for a complete list of the available aggregation functions.

JSON Facet Syntax

The general form of the JSON facet commands are:
<facet_name> : { <facet_type> : <facet_parameter(s)> }
Example: top_authors : { terms : { field : authors, limit : 5 } }

After Solr 5.2, a flatter structure with a “type” field may also be used:
<facet_name> : { "type" : <facet_type> , <other_facet_parameter(s)> }
Example: top_authors : { type : terms, field : authors, limit : 5 }

The results will appear in the response under the facet name specified.
Facet commands are specified using json.facet request parameters.

Test Using Curl

To test out different facet requests by hand, it’s easiest to use “curl” from the command line. Example:

$ curl http://localhost:8983/solr/query -d 'q=*:*&rows=0&
 json.facet={
   categories:{ 
     type : terms, 
     field : cat,
     sort : { x : desc},
     facet:{
       x : "avg(price)",
       y : "sum(price)"
     }
   }
 }
'

Terms Facet

The terms facet, or field facet, produces buckets from the unique values of a field. The field needs to be indexed or have docValues.

The simplest form of the terms facet

{
  top_genres : { terms : genre_field }
}

An expanded form allows for more parameters:

{
  top_genres : {
    type : terms,
    field : genre_field,
    limit : 3,
    mincount : 2
  }
}

Example response:

    "top_genres":{
      "buckets":[
        {
          "val":"Science Fiction",
          "count":143},
        {
          "val":"Fantasy",
          "count":122},
        {
          "val":"Biography",
          "count":28}
      ]
    }

Parameters:

field – The field name to facet over.
offset – Used for paging, this skips the first N buckets. Defaults to 0.
limit – Limits the number of buckets returned. Defaults to 10.
mincount – Only return buckets with a count of at least this number. Defaults to 1.
sort – Specifies how to sort the buckets produced. “count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be “asc” or “desc”
missing – A boolean that specifies if a special “missing” bucket should be returned that is defined by documents without a value in the field. Defaults to false.
numBuckets – A boolean. If true, adds “numBuckets” to the response, an integer representing the number of buckets for the facet (as opposed to the number of buckets returned). Defaults to false.
allBuckets – A boolean. If true, adds an “allBuckets” bucket to the response, representing the union of all of the buckets. For multi-valued fields, this is different than a bucket for all of the documents in the domain since a single document can belong to multiple buckets. Defaults to false.
prefix – Only produce buckets for terms starting with the specified prefix.
method – Provides an execution hint for how to facet the field.
- method:uif – Stands for UninvertedField, a method of faceting indexed, multi-valued fields using top-level data structures that optimize for performance over NRT capabilities.
- method:dv – Stands for DocValues, a method of faceting indexed, multi-valued fields using per-segment data structures. This method mirrors faceting on real docValues fields but works by building on-heap docValues on the fly from the index when docValues aren’t available. This method is better for a quickly changing index.
- method:stream – This method creates each individual facet bucket (including any sub-facets) on-the-fly while streaming the response back to the requester. Currently only supports sorting by index order.

Query Facet

The query facet produces a single bucket that matches the specified query.

An example of the simplest form of the query facet

{
  high_popularity : { query : "popularity:[8 TO 10]" }
}

An expanded form allows for more parameters (or sub-facets / facet functions):

{
  high_popularity : { 
    type : query,
    q : "popularity:[8 TO 10]",
    facet : { average_price : "avg(price)" }
  }
}

Example response:

  "high_popularity" : { 
    "count" : 147,
    "average_price" : 74.25
  }

Range Facet

The range facet produces multiple range buckets over numeric fields or date fields.

Range facet example:

{
  prices : { 
    type : range,
    field : price,
    start : 0,
    end : 100,
    gap : 20
  }
}

Example response:

    "prices":{
      "buckets":[
        {
          "val":0.0,  // the bucket value represents the start of each range.  This bucket covers 0-20
          "count":5},
        {
          "val":20.0,
          "count":3},
        {
          "val":40.0,
          "count":2},
        {
          "val":60.0,
          "count":1},
        {
          "val":80.0,
          "count":1}
      ]
    }

To ease migration, these parameter names, values, and semantics were taken directly from the old-style (non JSON) Solr range faceting.

Parameters:

field – The numeric field or date field to produce range buckets from
mincount – Minimum document count for the bucket to be included in the response. Defaults to 0.
start – Lower bound of the ranges
end – Upper bound of the ranges
gap – Size of each range bucket produced
hardend – A boolean, which if true means that the last bucket will end at “end” even if it is less than “gap” wide. If false, the last bucket will be “gap” wide, which may extend past “end”.
other – This param indicates that in addition to the counts for each range constraint between facet.range.start and facet.range.end, counts should also be computed for…

"before" all records with field values lower then lower bound of the first range
"after" all records with field values greater then the upper bound of the last range
"between" all records with field values between the start and end bounds of all ranges
"none" compute none of this information
"all" shortcut for before, between, and after

include – By default, the ranges used to compute range faceting between facet.range.start and facet.range.end are inclusive of their lower bounds and exclusive of the upper bounds. The “before” range is exclusive and the “after” range is inclusive. This default, equivalent to lower below, will not result in double counting at the boundaries. This behavior can be modified by the facet.range.include param, which can be any combination of the following options…

"lower" all gap based ranges include their lower bound
"upper" all gap based ranges include their upper bound
"edge" the first and last gap ranges include their edge bounds (ie: lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified
"outer" the “before” and “after” ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries.
"all" shorthand for lower, upper, edge, outer

Common Parameters

Parameters that all faceting methods have in common include

domain – Facet domain transformations, to change the incoming domain of the facet command before faceting is executed. This is useful for multi-select faceting and nested document (block join) faceting. Additional filters to be applied to the domain can also be specified here.

Solr 'n Stuff

Open source search and analytics

Open source search and analytics

JSON Facet API

Introduction

Ease of Use

JSON extensions

Debugging JSON

Facet Types

Statistics are facets

JSON Facet Syntax

Test Using Curl

Terms Facet

Query Facet

Range Facet

Common Parameters