Solr’s Realtime Get


Solr took another step toward increasing it’s NoSQL datastore capabilities, with the addition of realtime get.

Background

As readers probably know, Lucene/Solr search works off of point-in-time snapshots of the index. After changes have been made to the index, a commit (or a new Near Real Time softCommit) needs to be done before those changes are visible. Even with Solr’s new NRT (Near Real Time) capabilities, it’s probably not advisable to reopen the searcher more than once a second. However there are some use cases that require the absolute latest version of a document, as opposed to just a very recent version. This is where Solr’s new realtime get comes to the rescue, where the latest version of a document can be retrieved without reopening the searcher and risk disrupting other normal search traffic.

The Realtime-Get API

The realtime get handler is registered at the /get URL. As an example, a request like

 http://localhost:8983/solr/get?id=SOLR1000&fl=id,name

returns a response like

{"doc":{"id":"SOLR1000","name":"Solr, the Enterprise Search Server"}}

Notice that the optional fl (field list) parameter works as normal, allowing you to select the fields you want returned.

There’s also a realtime get component that can be inserted into any request handler, including the standard request handler.

How it works

The realtime get feature uses transaction logging to keep track of uncommitted updates to the index.  When a get request for a document is received, this log is checked first and retrieved from there if found.  If it’s not found, then the latest opened searcher is used to retrieve the document.  Checking the log is super fast, and IO reads from the log are fully concurrent for maximum scalability.

Try it out

Download a recent nightly build of Solr 4.0-dev and follow the Quick Start guide  on the Solr wiki.  Feedback on the solr-user mailing list is always appreciated!