Heliosearch’s off-heap FieldCache was previously introduced and benchmarked for integer fields.
Support for all numeric field types as well as string fields has now been completed, and this post will focus on the performance of string fields.
A review of nCache (n is for “native”) features and goals:
- nCache has Off-Heap Data-structures, just like the Off-Heap Filters to lower garbage collection pauses and GC overhead.
- nCache is a managed cache, meaning you can do anything with it that you can do with other Solr caches, including configuring size and warming policies, and viewing cache statistics through the admin page.
- nCache is NRT friendly. Field values are cached on per-segment basis, enabling rapid turn-around time for new index snapshots.
- nCache is designed for maximium performance, even when the system is not experiencing garbage collection issues.
- nCache uses no weak references like the Lucene FieldCache does.
String Performance Results
The first benchmark involved sorting by string fields with different numbers of unique values. Queries were of the following form:
q={!cache=false}*:* &sort=my_str_field1 desc
The test index consisted of 10M documents. 80% of the documents had a value for any given field being sorted on. The query was executed 50 timed per field, and the median latency for each field was recorded.
Next we tested the concurrent query throughput on the same 10M document index. Each query would sort on a random string field with a random sort order (asc or desc). 1000 queries were run for each throughput test, and each test was repeated 5 times (restarting the JVM before each) to generate an average throughput.
The hardware consisted of a 3GHz quad-core AMD processor running Ubuntu Linux 13.10. The latest 64 bit Oracle JVMs for Java7 and Java8 were used.
The previous query sorting performance test was re-run on a 3.4GHz quad-core Intel processor running Windows 8.
nCache shows an even greater performance advantage here (68% throughput increase using Java8). This probably had more to do with the different processor architecture (Intel vs AMD) than the different operating systems.
We also compared the process sizes via “top” and tracked the maximum size during tests (averaging across different test runs).
Try it out!
This new functionality is included in the latest Heliosearch release.
Heliosearch is currently API compatible with Solr at the HTTP level, so it should be easy to try it out and see what types of performance increases you get. Let us know how it goes in the the heliosearch user forum, and join our the heliosearch dev forum if you want to contribute!