Here’s an overview of some of the new features in Solr 6.3.
Download Solr 6.3 to try these features out and give us feedback!
You can also check out upcoming features of the next Solr release.
New dvhash
faceting method in the JSON Facet API
A new faceting method for the JSON Facet API has been added that utilizes hashing instead of direct indexing by ordinal. Currently, when hashing on a field with docValues enabled, or when specifying method=dv
, there are accumulator arrays that are indexed by ordinal. When the cardinality of the domain is low, and the cardinality of the field is high, direct indexing of large arrays both wastes memory and takes longer to scan looking for the highest accumulated entries.
Example:
json.facet={ top_authors : { type : terms, field : author, limit : 20, method : dvhash } }
Since it may be difficult for a client to know when the facet domain will be large vs the field cardinality (esp if the domain is based off of a user query), future work is expected to implement automatically switching between faceting methods.
Optimizing, storing and deploying AI models with Streaming Expressions
The classify
streaming expression classifies documents according to a model.
classify(<document_stream>, // the stream to fetch documents that need to be classified <model_stream>, // the stream to fetch the latest model field=<input_text_field>, // the field on the document to classify fieldType=<analysis_fieldType> // (opt) fieldType to use for tokenization of the input field )
For each input document, the classifier decorates the document with two (currently hardcoded) fields with the result:
score_d
– the raw scoreprobability_d
– the positive probability that the document belongs to the group
The update
streaming expression can be used to store classifier models in a SolrCloud collection, and the topic
streaming expression can be used with the classify
expression to both deploy models as well as stream new data through the models.
The development JIRA SOLR-9258 has examples, and should be consulted until the Solr reference guide is updated.
“executor” Streaming Expression
The executor
streaming expression wraps another stream containing streaming expressions. By default the expression will be contained in the expr_s
field in each tuple of the wrapped stream. The executor has an internal thread pool so expressions can be executed in parallel on a single worker. This expression can further be wrapped in a parallel
streaming expression to enable execution across a cluster of worker nodes.
Example syntax from the JIRA of a work queue… the “topic” expression retrieves expressions to execute (and keeps track of where it left off):
daemon(executor(threads=10, topic(storedExpressions, fl="expr_s", ...)))
“commit” Streaming Expression
The commit
streaming expression wraps another stream (normally involving updates) and performs a commit at the end.
Example:
commit(targetCollection, update(targetCollection, search(sourceCollection, q="myquery", ...)))
“fetch” Streaming Expression
The fetch
streaming expression wraps another expression and fetches additional fields from documents in batches. It’s very much like an inner join, with one side of the join being very small.
Example that queries books and adds publisher address to each result:
fetch(publisherCollection, search(booksCollection, q="*:*", fl="title,author,publisher", sort="publisher asc"), fl="publisherAddress", on="publisher=publisherId")
Lots More…
See the release notes on the Solr wiki and the CHANGES file for an in-depth list of changes.