Native Code Faceting


Native code faceting for Solr has just been added to Heliosearch, and benchmarks show an impressive 2x performance increase! This is faceting code written in C++ and statically compiled for maximum performance, and loaded into the JVM via JNI (Java Native Interface).

nCache, Heliosearch’s off-heap version of the Lucene/Solr FieldCache, was instrumental in allowing this level of optimization. Java arrays (and other on-heap memory) cannot be efficiently accessed from native code. Moving the data structures off-heap not only provided great decreases in garbage collection overhead, but also allowed for practical native code optimizations. Top-level nCache string support was recently added, paving the way for native code faceting on single valued string fields.

Faceting Performance

Benchmark details:

  • 10M document index
  • Documents consist of an ID field, and 6 different single-valued string fields with varying numbers of unique values ranging from 10 to 1 Million
  • Faceting request throughput was measured for 1000 requests after 50 request warmup.
  • Client had 4 request threads
  • Each individual client request uses a random field to make the test realistic and to avoid hotspot overspecializing the code for a specific field.
  • Solr versions: Apache Solr 4.8.1, Heliosearch/Solr snapshot (based on Solr 4.9)

native_faceting_perf

 
The different operating systems were run on different hardware (hence the large performance differences of the same code across the different platforms).

OS CPU Native code compiler performance vs solr
Ubuntu Linux 13.10 quad core AMD Phenom II gcc 4.7.3 227%
Windows 8.1 quad core Intel i5 gcc 4.8.2 202%
OS-X Mavericks 10.9.3 dual core Intel i5 gcc 4.8.2 246%
OS-X Mavericks 10.9.3 dual core Intel i5 LLVM 5.1 235%

 

clang/LLVM vs gcc

The gcc/g++ included with OS-X is actually clang/LLVM – clang is the C language front end and LLVM is the back end that produces executable code.
At least for this initial native code, g++ 4.8.2 was about 5% faster than clang/LLVM 5.1, hence we’ll most likely use gcc/g++ by default.
The easiest way to get gcc/g++ on your Mac is

$ brew install gcc48

After installation, gcc/g++ will continue pointing to the clang/LLVM versions, but there will be gcc-4.8 and g++-4.8 you can use in /usr/local/bin.

 

Conclusion

Besides the incredible over 2x faceting performance improvement, native code has other advantages as well:

  • Avoidance of Java hotspot bugs in compiling code. Compiling the code just once statically means it’s the same for every run, for everyone.
  • No variations from run-to-run due to how hotspot compiles the code (unexplained slowdowns).
  • No hotspot warm-up period, or time spent optimizing, or de-optimizing code.

It’s easy to take advantage of these performance improvements and new features since Heliosearch/Solr is currently a drop-in replacement (at the HTTP-API level) for Apache Solr. Download the latest release and try it out.

We’d love to hear how it’s working for you… drop by the user mailing list and let us know.
Want to dabble in C/C++ code again? Drop by the dev mailing list to help out with development!