Here’s an overview of some of the new features in Solr 6.1.
Download Solr 6.1 to try these features out and give us feedback!
You can also check out upcoming features of the next Solr release.
shortestPath Streaming Expression
A shortestPath Streaming Expression was added that implements a distributed breadth-first graph traversal to find the shortest paths in a directed directed graph.
Example:
shortestPath(collection, from="john@company.com", to="jane@company.com", edge="from=to", threads="6", partitionSize="300", fq="limiting query", maxDepth="4")
gatherNodes Streaming Expression (coming soon)
The gatherNodes expression is a more general form of graph traversal than shortestPath, and can be used for more use cases.
Example:
gatherNodes(friends, gatherNodes(friends, search(articles, q=“body:(queryA)”, fl=“author”), walk ="author->user”, gather="friend"), walk=“friend->user”, gather="friend", scatter=“roots, branches, leaves”)
TolerantUpdateProcessorFactory
ToleranteUpdateProcessorFactory will skip update commands that would otherwise cause subsequent updates in a batch to fail.
<updateRequestProcessorChain name="tolerant"> <processor class="solr.TolerantUpdateProcessorFactory"/> <processor class="solr.DistributedUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>
Passing update.chain=tolerant
will use the processor chain defined above. One can also pass maxErrors=10
to limit the number of errors before aborting the complete update request. For example, if one is loading a large CSV file with millions of entries for the first time, it may be useful to abort early if every addition would fail due to a configuration error.
Small filter optimization
Filter creation for small cardinality sets (those that match few documents) now produces much less garbage. Up to 3x performance producing small sets (due to less GC overhead).
HDFS optimizations
The HDFS block cache now skips caching “read once” scenarios such as index merges.