Solr 6.4 Features


Here’s an overview of some of the new features upcoming in Solr 6.4.
Download a Solr 6.4 developer snapshot to try these features out and give us feedback!

filter list for facet domain

Any JSON facet command (terms, range, query) can now filter the facet domain in a simpler manner, without resorting to nested query facets.

Example:

json.facet = {
  categories : {
    type : terms, 
    field : cat, 
    domain : { filter:"user:yonik" }
  } 
}

The filters are applied after other domain change operations and are particularly useful when faceting on child documents.
The filter attribute can be a single query or a list of multiple queries to intersect.

Using a param to refer to a filter by query parameter was added shortly after.

 

Learning to Rank

Learning to Rank (LTR) plugin for reranking results with machine learning models.

See the Lucene/Solr Revolution presentation Learning to Rank in Solr.

snapshotcli.sh command line tool to manage snapshots

The full path of the script is ./solr/server/scripts/cloud-scripts/snapshotscli.sh

An example from the pull request:

// Start solr and initialize a sample collection
bin/solr start -c
bin/solr create_collection -c books
curl 'http://localhost:8983/solr/books/update?commit=true' -H 'Content-type:application/json' -d '
[ {"id" : "book1", "title" : "American Gods", "author" : "Neil Gaiman" } ]'

//Create and export a snapshot
./snapshotscli.sh --create snap-1 -c books -z localhost:9983
./snapshotscli.sh --list -c books -z localhost:9983
./snapshotscli.sh --describe snap-1 -c books -z localhost:9983
./snapshotscli.sh --export snap-1 -c books -z localhost:9983 -d /tmp
./snapshotscli.sh --delete snap-1 -c books -z localhost:9983

// Restore the backup and verify the doc count
curl 'http://localhost:8983/solr/admin/collections?action=restore&name=snap-1&location=/tmp&collection=books_restored'
curl 'http://localhost:8983/solr/books_restored/select?q=*:*'

 

More efficient filter queries

When parsing filter queries (including fq parameters) the standard solr query parser will avoid using BooleanQuery for term disjunctions on string and numeric fields, and will use TermsQuery instead.

This has a number of positive effects:

  • Avoids Lucene’s dreaded static maxBooleanClauses issue that causes “too many boolean clauses” exceptions
  • The resulting query should be smaller to cache
  • The resulting query should have higher performance

For example, the following filter will now be faster, and will no longer throw a “too many boolean clauses” exception:

fq=id:(myid1 myid2 myid3 myid4 ... myid2000)

Upcoming Features (uncommitted)

Solr goes even further into Machine Learning with LogisticRegressionQuery and a logit streaming expression that implements a distributed Stochastic Gradient Descent (SGD) optimizer for Logistic Regression.

The features are currently numeric values in separate fields.
From the JIRA, here’s a syntax example for the logit streaming expression:

  logit(collection1, features="a,b,c,d,e,f", outcome="x", maxIterations="80")