Traditional faceted search (also called guided navigation) involves counting search results that belong to categories (also called facet constraints). The new facet functions in Solr extends normal faceting by allowing additional aggregations on document fields themselves. Combined with the new Sub-facet feature, this provides powerful new realtime analytics capabilities. Also see the page about the new JSON Facet API.
Aggregation Functions
Faceting involves breaking up the domain into multiple buckets and providing information about each bucket.
There are multiple aggregation functions / statistics that can be used:
Aggregation | Example | Effect |
---|---|---|
sum | sum(sales) | summation of numeric values |
avg | avg(popularity) | average of numeric values |
sumsq | sumsq(rent) | sum of squares |
min | min(salary) | minimum value |
max | max(mul(price,popularity)) | maximum value |
unique | unique(state) | number of unique values (count distinct) |
hll | hll(state) | number of unique values using the HyperLogLog algorithm |
percentile | percentile(salary,50,75,99,99.9) | calculates percentiles |
stddev | stddev(salary) | calculates standard deviation (Solr6.6+) |
variance | variance(salary) | calculates variance (Solr 6.6+) |
Numeric aggregation functions such as avg
can be on any numeric field, or on another function of multiple numeric fields.
See Count Distinct in Solr for more information on distributed cardinality estimation / calcDistinct.
Simple Example
The faceting domain starts with the set of documents that match the main query and filters.
We can ask for statistics over this whole set of documents:
http://localhost:8983/solr/query?q=*:*& json.facet={x:'avg(price)'}
And the response will contain a facets section:
[...] "facets":{ "count":32, "x":164.10218846797943 } [...]
If we want to break up the domain into buckets and then calculate a function per bucket, we simply add a nested facet command to the facet parameters. For example (using curl this time):
$ curl http://localhost:8983/solr/query -d 'q=*:*& json.facet={ categories:{ type : terms, // terms facet creates a bucket for each indexed term (or value) in the field field : cat, facet:{ x : "avg(price)", y : "sum(price)" } } } '
The response will contain the two stats we asked for in each category bucket.
[...] "facets":{ "count":32, "categories":{ "buckets":[ { "val":"electronics", "count":12, "x":231.02666823069254, "y":2772.3200187683105 }, { "val":"memory", "count":3, "x":86.66333262125652, "y":259.98999786376953 }, [...]
Facet Sorting
The default sort for a field or terms facet is by bucket count descending.
We can optionally sort ascending or descending by any facet function that appears in each bucket. For example, if we wanted to find the top buckets by average price, then we would add sort:"x desc"
to the previous facet request:
$ curl http://localhost:8983/solr/query -d 'q=*:*& json.facet={ categories:{ type : terms, field : cat, sort : "x desc", // can also use sort:{x:desc} facet:{ x : "avg(price)", y : "sum(price)" } } } '
Try it out
Facet functions and Subfacets are currently only in Solr 5.1. Download the latest release and give it a spin!