I’ve often seen mistaken descriptions of Solr as just “a http wrapper around Lucene”. Unfortunately that mischaracterization was never nipped in the bud early enough and has continued to be repeated in many places such as press articles (where it is picked up and repeated again). Of course people who have been involved with Lucene and Solr from the beginning know better!
The fact that there was so much core functionality in Solr that Lucene users wanted even led the projects to merge in 2010.
Here’s a partial history of some Solr milestones that include core search functionality (i.e. not related to just exposing Lucene via HTTP):
Functionality | Implemented in Solr | Available in Lucene |
---|---|---|
Numerics + range queries | Jan 2006 | Sept 2009 Lucene 2.9 |
Index Replication | Jan 2006 | July 2013 Lucene 4.4 Replication Module |
Unique keys (overwriting) | Jan 2006 | ? 2007 IndexWriter.updateDocument |
Many analysis filters, WordDelimiterFilter, Soundex, Regex/Pattern, HTML, kstem, trim, reverse wildcard, multi-word synonym, etc | Jan 2006 – various | Oct 2012 Lucene 4.0 all analysis filters moved from Solr to Lucene |
Searcher concurrency control | Jan 2006 | Nov 2011, Lucene 3.5, SearcherManager |
Faceted search | Sep 2006 | Sep 2011, Lucene 3.4, LUCENE-3079 |
Function queries | Jan 2006 | Jun 2007, Solr’s FunctionQuery was copied (not moved) into Lucene 2.2 but it stagnated, function queries were later moved from Solr to Lucene for version 4.0 (Oct 2012) |
Distributed search | Feb 2008 | Jul 2011, Lucene 3.3, partial support via TopDocs.merge |
Query-time Join | April 2011 | Jan 2012, LUCENE-3602 |
Grouping / Field Collapsing | Aug 2010 (dev patches used by many in production much earlier however) | May 2011 – Oct 2011, Grouping moved from Solr to Lucene LUCENE-1421, LUCENE-3483, etc. |
Constant score queries, including prefix/range queries that don’t explode when too many terms are matched | Jan 2006 | May 2006, moved from Solr to Lucene LUCENE-383 etc. |
Multi-valued field cache (UnInvertedField) | Nov 2008 SOLR-475 | Mar 2011, moved from Solr to Lucene LUCENE-3003 |
Distributed faceting | Feb 2008 | Jul 2013, Lucene 4.4, partial support via FacetResult.mergeHierarchies? |
Auto-suggest | Aug 2010, SOLR-1316 | May 2011 Moved from Solr to Lucene, LUCENE-2995 |
field types | Jan 2006 | Oct 2012 Lucene 4.0 FieldType class |
Configurable analysis component factories | Jan 2006 | July 2012, all analysis factories moved from Solr to Lucene, LUCENE-2510 |
User-oriented query parsers (dismax, edismax) | Jan 2006, Nov 2009 SOLR-1553 | Nov 2013 LUCENE-5336 |
Real-time Get | Nov 2011 SOLR-2700 | Jan 2013 LUCENE-4695 |
Filter Cache | Jan 2006 | Nov 2014 LUCENE-6077 |
Query Cache | Jan 2006 | Apr 2015, Lucene 5.1 LUCENE-6303 |
Of course, I’ve only touched on some of the features that were in Solr first and later became available in Lucene. I’ve left out all of the features that Lucene still does not have (like optimistic locking, numeric statistics), and more server-ish features (many query parser types, in/out support for JSON, XML, CSV, etc.)
The reality is that both Lucene and Solr have long been innovating in the open source search space.