Solr Facet Functions and Analytics


Traditional faceted search (also called guided navigation) involves counting search results that belong to categories (also called facet constraints). The new facet functions in Solr extends normal faceting by allowing additional aggregations on document fields themselves. Combined with the new Sub-facet feature, this provides powerful new realtime analytics capabilities. Also see the page about the new JSON Facet API.

Aggregation Functions

Faceting involves breaking up the domain into multiple buckets and providing information about each bucket.
There are multiple aggregation functions / statistics that can be used:

Aggregation Example Effect
sum sum(sales) summation of numeric values
avg avg(popularity) average of numeric values
sumsq sumsq(rent) sum of squares
min min(salary) minimum value
max max(mul(price,popularity)) maximum value
unique unique(state) number of unique values (count distinct)
hll hll(state) number of unique values using the HyperLogLog algorithm
percentile percentile(salary,50,75,99,99.9) calculates percentiles
stddev stddev(salary) calculates standard deviation (Solr6.6+)
variance variance(salary) calculates variance (Solr 6.6+)

 
Numeric aggregation functions such as avg can be on any numeric field, or on another function of multiple numeric fields.

See Count Distinct in Solr for more information on distributed cardinality estimation / calcDistinct.

 

Simple Example

The faceting domain starts with the set of documents that match the main query and filters.
We can ask for statistics over this whole set of documents:

http://localhost:8983/solr/query?q=*:*&
   json.facet={x:'avg(price)'}

And the response will contain a facets section:

[...]
  "facets":{
    "count":32,
    "x":164.10218846797943
  }
[...]

 
If we want to break up the domain into buckets and then calculate a function per bucket, we simply add a nested facet command to the facet parameters. For example (using curl this time):

$ curl http://localhost:8983/solr/query -d 'q=*:*&
 json.facet={
   categories:{ 
     type : terms, // terms facet creates a bucket for each indexed term (or value) in the field
     field : cat,
     facet:{
       x : "avg(price)",
       y : "sum(price)"
     }
   }
 }
'

The response will contain the two stats we asked for in each category bucket.

[...]
  "facets":{
    "count":32,
    "categories":{
      "buckets":[
        { 
          "val":"electronics",
          "count":12,
          "x":231.02666823069254,
          "y":2772.3200187683105
        },
        { 
          "val":"memory",
          "count":3,
          "x":86.66333262125652,
          "y":259.98999786376953
        },
[...]

 

Facet Sorting

The default sort for a field or terms facet is by bucket count descending.
We can optionally sort ascending or descending by any facet function that appears in each bucket. For example, if we wanted to find the top buckets by average price, then we would add sort:"x desc" to the previous facet request:

$ curl http://localhost:8983/solr/query -d 'q=*:*&
 json.facet={
   categories:{ 
     type : terms,
     field : cat,
     sort : "x desc",   // can also use sort:{x:desc}
     facet:{
       x : "avg(price)",
       y : "sum(price)"
     }
   }
 }
'

 

Try it out

Facet functions and Subfacets are currently only in Solr 5.1. Download the latest release and give it a spin!