Solr 6.2 Features


Here’s an overview of some of the new features in Solr 6.2.

Download Solr 6.2 to try these features out and give us feedback!
You can also check out upcoming features of the next Solr release.

Boolean Comparison Function Queries

Function queries for comparisons of numeric arguments:

  • gt – greater than
  • gte – greater than or equal
  • lt – less than
  • lte – less than or equal
  • eq – equal to

These functions return a boolean value for any given document, which is normally used in conjunction with the if function which returns the second argument if the first argument is true, or the third argument otherwise.

Example: For a given document, if num_reviewers is less than 5, return the value of the editor_score field, otherwise return the user_score field.

 if( lt(num_reviewers,5), editor_score, user_score)

An easy way to experiment with function queries and see the results is to use the pseudo-field functionality to return specific function values along with stored fields.

Example: http://localhost:8983/solr/techproducts/query?q=*:*&fl=id,popularity,gt(popularity,3)

[...]
      {
        "id":"SP2514N",
        "popularity":6,
        "gt(popularity,3)":true},
      {
        "id":"6H500F0",
        "popularity":6,
        "gt(popularity,3)":true},
      {
        "id":"F8V7067-APL-KIT",
        "popularity":1,
        "gt(popularity,3)":false},
[...]

 

Feature Selection and Logistic Regression

The features and train streaming expressions are machine learning related operations used to implement a text classifier.
The goal is to create a classifier (or model) based on a training set of documents that can be used to tell if a new document should be considered part of that set.

  • features – streaming expression that selects important features (important terms) from a set of documents called the training set (specified by a query).
  • train – trains a logistic regression model on a text field using the selected features and the training set.

Here’s an example of the streaming expression syntax taken from the JIRA:

train(collection1, q="*:*",
      features(collection1, 
               q="*:*",  
               field="body", 
               outcome="out_i", 
               positiveLabel=1, 
               numTerms=100),
      field="body",
      outcome="out_i",
      maxIterations=100)

See the features streaming expression and the train streaming expression in the reference guide.

 

topic streaming expression (Beta)

The topic streaming expression implements a publish/subscribe messaging capabilities, where clients can subscribe to queries and receive deliveries of new documents that match those topic queries. When the first call is made, an id is specified to uniquely identify the topic. A checkpointCollection is also specified, and this is where topic progress is stored to keep track of which documents have already been returned.

Example syntax:

topic(checkpointCollection,
      collection1,
      id="yonik_topic1", 
      q="title:(lucene solr)",
      fl="id, title, abstract, author") 

A client can keep calling the same topic to receive new documents, or it can be wrapped in a daemon streaming expression to provide push functionality.

 

scoreNodes graph streaming expression

The scoreNodes streaming expression is used to add a relevancy score to a graph expression, similar to tf-idf for text.
The tf factor is how many times a node appears in the graph traversal, and biases scores in favor of nodes that appear often.
The idf factor biases scores in favor of nodes that are more rare in the index.

There is a good example of using this in the ref guide: Calculating Market Basket Co-occurrence, for calculating product recommendations based on your current shopping basket and what other shopping baskets look like.

 

collections API: REPLACENODE and DELETENODE commands

New actions have been added to the collections API:

  • REPLACENODE – moves all replicas from one node to another.
  • DELETENODE – deletes all replicas currently on a given node.

Examples:
/admin/collections?action=RELACENODE&source=MY_OLD_NODE&target=MY_NEW_NODE
/admin/collections?action=DELETENODE&node=MY_NODE_NAME

See the relevant sections in the Collections API for more details.

 

Kerberos security enhancements

Kerberos delegation token support was added to the authentication filter, especially useful for when distributed clients such as MapReduce do not have access to the user’s credentials.

See Kerberos Authentication Plugin in the ref guide.

 

Whole index replication for CDCR (Cross Data Center Replication)

CDCR can now fall back to whole-index replication to bring a replica up to date when transaction logs have not retained enough updates to do so.

 

“role” tag in replica placement rules

SolrCloud nodes in the cluster have tags that may be used when performing rule based replica placement. A “role” tag has been added that
currently be used to prevent placement of new replicas on the overseer node by adding rule=role:!overseer during collection creation.

See Rule Based Replica Placement in the ref guide for more info.