Having performance issues with Solr’s faceted search and certain types of fields? Help has arrived in the form of a new Solr faceting algorithm! This new faceting implementation dramatically improves the performance of faceted search, making it suitable for a much wider range of applications.
The existing multivalued field faceting algorithm (where each document may have multiple values) steps over each term in the index for that field. For each term, the set of documents that match that term is retrieved from the filterCache, and an intersection count is calculated with the set of documents that match the query. This works well for fields with a limited number of terms (less than 1000), but not so great for fields with many terms.
The new method works by un-inverting the indexed field to be faceted, allowing quick lookup of the terms in the field for any given document. It’s actually a hybrid approach – to save memory and increase speed, terms that appear in many documents (over 5%) are not un-inverted, instead the traditional set intersection logic is used to get the counts.
Results: up to 5000% increase in queries per second and up to 700% improvement in memory utilization.
More gory details and detailed benchmark results can be found at http://issues.apache.org/jira/browse/SOLR-475
Try it now with a Solr nightly/test development build dated 11/25/2008 or later.