Here’s an overview of some of the new features in Solr 5.5
Also see Solr Download Links
and upcoming Features of the next Solr release.
Return docValues fields like stored fields
In previous Solr versions, returning the top N documents only retrieved field values from the row-store (i.e. fields where “stored” is true). Values will now also be retrieved from docValues
(which are essentially column stored) for fields where “stored” is false. This avoids the need to duplicate the value in the row store if it’s already column stored.
facet debug info
A "facet-trace"
section has been added to the debug info for JSON Facets.
The exact format is subject to change, but the current information includes information about each facet command, including the processor used to execute the facet and the domain size. This information is included recursively included for each facet bucket for facets with sub-facets.
Example request:
$ curl http://localhost:8983/solr/techproducts/query -d 'q=*:*&rows=0&debug=true& json.facet={ categories:{ type : terms, field : cat } }
Example response debug info:
[...] "facet-trace":{ "processor":"FacetQueryProcessor", "elapse":0, "query":null, "domainSize":32, "sub-facet":[{ "processor":"FacetFieldProcessorUIF", "elapse":0, "field":"cat", "limit":10, "numBuckets":16, "domainSize":32}]}, [...]
UnInvertedField faceting method restored
facet.method=uif
parameter causes traditional field faceting to delegate to the JSON Facet API with method=uif
.
This is roughly equivalent to what the Solr 4 default faceting method was for multi-valued fields. It is optimized for performance over static indexes rather than NRT (quickly changing indexes).
Stored field compression mode
The compression mode for stored fields can now be specified via codecFactory in solrconfig.xml
See the Codec Factory section in the Solr reference guide for more details.
Async Collection APIs
Generic support was added for making collection APIs async. Async support was added for the following commands: delete/reload collection, create/delete alias, create/delete shard, delete replica, add/delete replica property, add/remove role, overseer status, balance shard unique, rebalance leaders, modify collection, migrate state format.
See Asynchronous Calls in the Solr ref guide.
BlockJoinFacetComponent
There is a new experimental BlockJoinFacetComponent for calculating facets by a child.facet.field
parameter with a {!parent}
query. The component is not enabled by default.
Note that this component is unrelated to the block join faceting support in the JSON Facet API.
XML Query Syntax
The XML query parser, registered as “xmlparser” is a direct interface to Lucene’s XMLQueryParser (CoreParser).
Personal recommendation: avoid the use of this query parser unless you have very unusual/specific needs.
Example:
curl http://localhost:8983/solr/techproducts/query -d 'debugQuery=true& q={!xmlparser} <BooleanQuery> <Clause occurs="must"> <TermQuery fieldName="name">ipod</TermQuery> </Clause> <Clause occurs="must"> <TermQuery fieldName="manu">apple</TermQuery> </Clause> </BooleanQuery> '
This parser does not do any text analysis on terms, so provided terms will need to match what is in the index exactly (i.e. you will need to do things like lowercasing and stemming yourself). Good backward compatibility is unlikely with this parser as it exposes more internal implementation details. Term queries on fields such as numeric fields, enum fields, and boolean fields will only work if you know the internal term representation in the index.
See XmlQParser in the Solr ref guide for more info.
upconfig and downconfig
A configset
, or configuration set, is a set of config files for a Solr collection.
For SolrCloud mode, an upconfig
option has been added to the /bin/solr
script to upload a configset to zookeeper.
A matching downconfig
option has been added to download a configset from zookeeper.
For examples and documentation, see bin/solr Zookeeper Operations in the Solr reference guide.
CheckIndex for HDFS
There is an internal CheckHdfsIndex class that can be run from the command line for HDFS indexes like CheckIndex can be run for normal indexes.
Example:
java -cp "./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/ext/*" -ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex /path/to/my/index/
For reference, here is also the command to run CheckIndex on a local (non-HDFS) lucene index:
java -cp "./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/ext/*" -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex ./example/techproducts/solr/techproducts/data/index
Which will result in the following output:
Opening index @ ./example/techproducts/solr/techproducts/data/index Segments file=segments_2 numSegments=1 version=6.0.0 id=bv1fdquc5dh3nvcf4jxiwfow4 format= userData={commitTimeMSec=1456073865693} 1 of 1: name=_0 maxDoc=32 version=6.0.0 id=bv1fdquc5dh3nvcf4jxiwfow3 codec=Lucene60 compound=false numFiles=13 size (MB)=0.026 diagnostics = {java.runtime.version=1.8.0_40-b25, java.vendor=Oracle Corporation, java.version=1.8.0_40, java.vm.version=25.40-b25, lucene.version=6.0.0, os=Mac OS X, os.arch=x86_64, os.version=10.11.2, source=flush, timestamp=1456073865745} no deletions test: open reader.........OK [took 0.059 sec] test: check integrity.....OK [took 0.000 sec] test: check live docs.....OK [took 0.000 sec] test: field infos.........OK [25 fields] [took 0.000 sec] test: field norms.........OK [5 fields] [took 0.001 sec] test: terms, freq, prox...OK [1187 terms; 1813 terms/docs pairs; 1496 tokens] [took 0.025 sec] test: stored fields.......OK [356 total field count; avg 11.1 fields per doc] [took 0.013 sec] test: term vectors........OK [3 total term vector count; avg 1.0 term/freq vector fields per doc] [took 0.006 sec] test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET] [took 0.000 sec] No problems were detected with this index. Took 0.246 sec total.