New Solr 4.7 Features

Solr 4.7 has been released! Here’s a slightly more in-depth overview of some selected features.

Deep Paging

Both single node, and distributed deep paging have been added to Solr! I previously created an example of how to use Solr’s deep paging, and Hoss has a great set of benchmarks showing the performance increases. Here’s one graph from that post showing the most basic case (sorting by score descending) and how performance varies with paging depth. Ignore the green “strawman” line… that was proof-of-concept code that was never committed.

In short, pass cursorMark=* on the first paging request and then you will be given back a nextCursorMark value which you should then pass as cursorMark for your next request.

SimpleQueryParser

SimpleQueryParser (registered via the name “simple”) is an alternative to edismax in that they both share the goal of handling user queries without throwing exceptions. Unlike edismax, this parser does not handle full “lucene” query syntax.
The q.operator parameter controls what operators are available (by default all are).
Example:

  &defType=simple                 # type of the main query is "simple"
  &q=solr -search                 # the user query string
  &q.op=AND                       # all clauses mandatory (the default is OR)
  &q.operators=WHITESPACE,NOT     # enable the "-" operator (we need whitespace parsing too so the "-" will be seen as an operator)
  &qf=title^2 text                # query across the title and text field, giving a boost of 2 to the title field

The output of that example query (in lucene sytnax) would be:

   +(text:solr^3.0 title:solr^2.0) +(-(text:search^3.0 title:search^2.0) *:*)

Tri-Level CompositeId Routing

Composite ID routing allows one to partition the hash range such that related documents appear on the same part of the hash ring. This allows one to efficiently query over a set of related documents (say a users email messages) without querying the whole collection. The default compositeId router has been extended to accept tri-level routes so partitioning may be done at more than one level. For example, one use case would be to partition first by application id, then by user id, with the final part of the hash being the users document id.

All this simply works out of the box (no configuration needed!). Index documents with ID’s like the following:
{"id" : "heliosearch!yonik!mydoc1", ...
And then at query time you could specify a route key that restricts queries to nodes containing heliosearch documents:
_route_=heliosearch!
Or that restricts queries to nodes containing yonik’s heliosearch documents:
_route_=heliosearch!yonik!

Migrate a set of documents to another collection

A new MIGRATE operation has been added to the Solr Collections API that allows one to move part of one collection to another collection based on _route_ (i.e. the ID prefix when using compositeId routing). This is actually a live migration! While the source documents are being copied to the target collection, any updates to those documents will also be forwarded to the target collection. For a short amount of time after the copy is complete, updates to the source documents will continue being forwarded to the target collection. It’s the clients responsibility after that point to send the documents to the correct collection.

Here’s a quick example of migration in action:
First start up a single node in ZK mode:

java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myConf -DzkRun -DnumShards=1 -jar start.jar

Create two new collections:

curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=c2&replicationFactor=2&maxShardsPerNode=100&numShards=1"
curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=c3&replicationFactor=2&maxShardsPerNode=100&numShards=1"

Index some documents to collection “c2”

curl "http://localhost:8983/solr/c2/update?commit=true" -H 'Content-type:application/json' -d '[{"id":"a!doc1"}, {"id":"b!doc2"},{"id":"c!doc3"},{"id":"d!doc4"}]'

Now migrate all documents with a route key of a! from collection “c2” to “c3”

curl 'http://localhost:8983/solr/admin/collections?action=MIGRATE&collection=c2&split.key=a!&target.collection=c3'

The docs should now be migrated! To verify, call commit on the target collection to make the docs visible, and do a query.

curl "http://localhost:8983/solr/c3/update?softCommit=true"
curl "http://localhost:8983/solr/c3/query?q=*:*"

Lots More!

There are quite a few other new Solr features/improvements, including

For security minded folks, SSL support for SolrCloud
The ability to build Solr indexes with Hadoop MapReduce
Many more Suggester options
Updated geospatial support

Solr 'n Stuff

Open source search and analytics