Native code faceting for Solr has just been added to Heliosearch, and benchmarks show an impressive 2x performance increase! This is faceting code written in C++ and statically compiled for maximum performance, and loaded into the JVM via JNI (Java Native Interface).
nCache, Heliosearch’s off-heap version of the Lucene/Solr FieldCache, was instrumental in allowing this level of optimization. Java arrays (and other on-heap memory) cannot be efficiently accessed from native code. Moving the data structures off-heap not only provided great decreases in garbage collection overhead, but also allowed for practical native code optimizations. Top-level nCache string support was recently added, paving the way for native code faceting on single valued string fields.
- 10M document index
- Documents consist of an ID field, and 6 different single-valued string fields with varying numbers of unique values ranging from 10 to 1 Million
- Faceting request throughput was measured for 1000 requests after 50 request warmup.
- Client had 4 request threads
- Each individual client request uses a random field to make the test realistic and to avoid hotspot overspecializing the code for a specific field.
- Solr versions: Apache Solr 4.8.1, Heliosearch/Solr snapshot (based on Solr 4.9)
The different operating systems were run on different hardware (hence the large performance differences of the same code across the different platforms).
|OS||CPU||Native code compiler||performance vs solr|
|Ubuntu Linux 13.10||quad core AMD Phenom II||gcc 4.7.3||227%|
|Windows 8.1||quad core Intel i5||gcc 4.8.2||202%|
|OS-X Mavericks 10.9.3||dual core Intel i5||gcc 4.8.2||246%|
|OS-X Mavericks 10.9.3||dual core Intel i5||LLVM 5.1||235%|
clang/LLVM vs gcc
The gcc/g++ included with OS-X is actually clang/LLVM – clang is the C language front end and LLVM is the back end that produces executable code.
At least for this initial native code, g++ 4.8.2 was about 5% faster than clang/LLVM 5.1, hence we’ll most likely use gcc/g++ by default.
The easiest way to get gcc/g++ on your Mac is
$ brew install gcc48
After installation, gcc/g++ will continue pointing to the clang/LLVM versions, but there will be
g++-4.8 you can use in
Besides the incredible over 2x faceting performance improvement, native code has other advantages as well:
- Avoidance of Java hotspot bugs in compiling code. Compiling the code just once statically means it’s the same for every run, for everyone.
- No variations from run-to-run due to how hotspot compiles the code (unexplained slowdowns).
- No hotspot warm-up period, or time spent optimizing, or de-optimizing code.
It’s easy to take advantage of these performance improvements and new features since Heliosearch/Solr is currently a drop-in replacement (at the HTTP-API level) for Apache Solr. Download the latest release and try it out.