solr

Background I needed a really good hash function for the distributed indexing in SolrCloud. Since it is be used for partitioning documents, it needed to be really high quality (well distributed) since we don’t want uneven shards. It also needed to be cross-platform, so a client could calculate this hash […]

MurmurHash3 for Java

September 15, 2011 in analytics / java / solr tagged murmur3 / MurmurHash3 / murmurhash3 128 / murmurhash3 64 / murmurhash3 java by yonik (updated on May 18, 2015)

Solr took another step toward increasing it’s NoSQL datastore capabilities, with the addition of realtime get. Background As readers probably know, Lucene/Solr search works off of point-in-time snapshots of the index. After changes have been made to the index, a commit (or a new Near Real Time softCommit) needs to […]

Solr’s Realtime Get

September 7, 2011 in lucene / search / solr tagged NoSQL / realtime / solr / solr 4.0 by yonik (updated on April 28, 2015)

Lucene’s default ranking function uses factors such as tf, idf, and norm to help calculate relevancy scores. Solr has now exposed these factors as function queries. docfreq(field,term) returns the number of documents that contain the term in the field. termfreq(field,term) returns the number of times the term appears in the […]

Solr relevancy function queries

March 10, 2011 in lucene / search / solr tagged function query / lucidworks / Similarity / solr / solr 4.0 by yonik (updated on April 28, 2015)

I previously introduced Solr’s Result Grouping, also called Field Collapsing, that limits the number of documents shown for each “group”, normally defined as the unique values in a field or function query. Since then, there have been a number of bug fixes, performance improvements, and feature enhancements. You’ll need a […]

Solr Result Grouping / Field Collapsing Improvements

December 17, 2010 in lucene / search / solr / Uncategorized tagged field collapsing / lucidworks / result grouping / solr / solr 4.0 by yonik (updated on April 28, 2015)

Result Grouping, also called Field Collapsing, has been committed to Solr! This functionality limits the number of documents for each “group”, usually defined by the unique values in a field (just like field faceting). You can think of it like faceted search, except instead of just getting a count, you […]

Solr Result Grouping / Field Collapsing

September 16, 2010 in search / solr tagged field collapsing / geo search / result grouping / solr / solr 4.0 / spatial search by yonik (updated on April 28, 2015)

Solr has been able to slurp in CSV for quite some time, and now I’ve finally got around to adding the ability to output query results in CSV also. The output format matches what the CSV loader can slurp. Adding a simple wt=csv to a query request will cause the […]

CSV output for Solr

July 29, 2010 in search / solr tagged CSV / solr / solr 3.1 / solr 4.0 by yonik (updated on April 28, 2015)

Solr 1.4 contains a new feature that allows range queries or range filters over arbitrary functions. It’s implemented as a standard Solr QParser plugin, and thus easily available for use any place that accepts the standard Solr Query Syntax by specifying the frange query type. Here’s an example of a […]

Ranges over Functions in Solr 1.4

July 6, 2009 in lucene / search / solr tagged frange / function query / lucene / qparser / query syntax / range query / solr by yonik (updated on April 28, 2015)

One of the many performance improvements in the upcoming Solr 1.4 release involves improved filtering performance. Solr 1.4 filters are both faster (anywhere from 30% to 80% faster to calculate intersections, depending on configuration), take less memory (40% smaller), and are more efficiently applied to the query during a search. […]

Filtered query performance increases for Solr 1.4

May 27, 2009 in lucene / search / solr tagged filtered query / solr 1.4 / solr performance by yonik (updated on April 28, 2015)

With CPU cores constantly increasing, there has been some major work done in Lucene/Solr to increase the scalability under multi-threaded load. Read-only IndexReaders One bottleneck was synchronization around the checking of deleted docs in a Lucene IndexReader. Since another thread could delete a document at any time, the IndexReader.isDeleted() call […]

Solr scalability improvements

December 1, 2008 in java / lucene / search / solr tagged cache / LRU / lucene / NIO / scalability / solr by yonik (updated on April 28, 2015)

Having performance issues with Solr’s faceted search and certain types of fields? Help has arrived in the form of a new Solr faceting algorithm! This new faceting implementation dramatically improves the performance of faceted search, making it suitable for a much wider range of applications. The existing multivalued field faceting […]

Solr Faceted Search Performance Improvements

November 25, 2008 in java / lucene / search / solr tagged Faceted Search / lucene / performance / search / solr by yonik (updated on April 28, 2015)

Solr 'n Stuff

Open source search and analytics