Solr Tutorial


Solr 8 Tutorial

This is a simple Solr tutorial designed to get your feet wet.
This tutorial was verified with Solr 8, but may work on older versions as well.

1. Download Solr

Download and unpack the latest Solr release from the Apache download mirrors. You may want to check out the Solr Prerequisites as well.

2. Start Solr

$ bin/solr start            # this starts solr
$ bin/solr create -c demo   # this creates a document collection called "demo"

3. Go!

You’re now ready to start using Solr!
To verify it’s up and running, you can point your browser at the admin page:

http://localhost:8983/solr/

 

Indexing and Retrieving a Document

Now that Solr is running, we can add a document (also known as “indexing” a document) with curl:

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t" : "The Way of Kings",
"author_s" : "Brandon Sanderson"
}
]'

And then we can ask for it back:

$ curl http://localhost:8983/solr/demo/get?id=book1
{
"doc": {
"id" : "book1",
"author_s": "Brandon Sanderson",
"title_t" : "The Way of Kings",
"_version_": 1410390803582287872
}
}

Of course for queries, you can always just use your browser and click on the link
http://localhost:8983/solr/demo/get?id=book1
or cut’n’paste the URL into your browser and modify the query directly in the address bar to try out different requests.

Dynamic Fields

The “id” field is already pre-defined in every schema. Lucene and Solr need to know the types of fields so that they can be indexed in the correct way. There are a number of options for defining new fields:

  • Edit the schema.xml file to define the fields.
  • Use the Schema API to add the new fields.
  • Use dynamicFields, a form of convention-over-configuration that maps field names to field types based on patterns in the field name. For example, every field ending in “_i” is taken to be an integer.
  • Use “schemaless” mode, where field types are auto-detected (guessed) based on the first value seen for that field

For other document fields in this tutorial, we have chosen to use convention over configuration via dynamic fields. Dynamic fields includes the essential benefits of schemaless – namely the ability to add new fields on the fly without having to pre-define them.

Our schema has some common dynamicField patterns defined for use:

Field Suffix Multivalued Suffix Type Description
_t _txt text_general Indexed for full-text search so individual words or phrases may be matched.
_s _ss string A string value is indexed as a single unit. This is good for sorting, faceting, and analytics. It’s not good for full-text search.
_i _is int a 32 bit signed integer
_l _ls long a 64 bit signed long
_f _fs float IEEE 32 bit floating point number (single precision)
_d _ds double IEEE 64 bit floating point number (double precision)
_b _bs boolean true or false
_dt _dts date A date in Solr’s date format
_p location A lattitude and longitude pair for geo-spatial search

Updating a Document

Let’s update book1 and add cat_s, a category field, a publication year, and an ISBN. Via dynamic fields, a field name ending with _i tells Solr to treat the value as an integer, while a field name ending with _s is treated as a string.

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id"         : "book1",
"cat_s"      : { "add" : "fantasy" },
"pubyear_i"  : { "add" : 2010 },
"ISBN_s"     : { "add" : "978-0-7653-2635-5" }
}
]'

Now go ahead and ask for the document back, and you should see the new fields:

$ curl http://localhost:8983/solr/demo/get?id=book1

See Atomic Updates for more document update options.

 

Solr Search Requests

First, lets add a few more documents so we have something to search for. This time we’ll demonstrate indexing documents in CSV (comma separated values) format:

$ curl http://localhost:8983/solr/demo/update?commitWithin=5000 -H 'Content-type:text/csv' -d '
id,cat_s,pubyear_i,title_t,author_s,series_s,sequence_i,publisher_s
book1,fantasy,2010,The Way of Kings,Brandon Sanderson,The Stormlight Archive,1,Tor
book2,fantasy,1996,A Game of Thrones,George R.R. Martin,A Song of Ice and Fire,1,Bantam
book3,fantasy,1999,A Clash of Kings,George R.R. Martin,A Song of Ice and Fire,2,Bantam
book4,sci-fi,1951,Foundation,Isaac Asimov,Foundation Series,1,Bantam
book5,sci-fi,1952,Foundation and Empire,Isaac Asimov,Foundation Series,2,Bantam
book6,sci-fi,1992,Snow Crash,Neal Stephenson,Snow Crash,,Bantam
book7,sci-fi,1984,Neuromancer,William Gibson,Sprawl trilogy,1,Ace
book8,fantasy,1985,The Black Company,Glen Cook,The Black Company,1,Tor
book9,fantasy,1965,The Black Cauldron,Lloyd Alexander,The Chronicles of Prydain,2,Square Fish
book10,fantasy,2001,American Gods,Neil Gaiman,,,Harper'

We added the commitWithin=5000 parameter to indicate that we would like our updates to be visible within 5000 milliseconds (5 seconds). The Lucene library that Solr uses for full-text search works off of point-in-time snapshots that must be periodically refreshed in order for queries to see new changes.

Note that although we often use JSON in our examples, Solr is actually data format agnostic – you’re not artificially tied to any particular transfer-syntax or serialization format such as JSON or XML.

 

Your First Solr Search Request

Now let’s query our book collection! For example, we can find all books with “black” in the title field:

http://localhost:8983/solr/demo/query?
  q=title_t:black
  fl=author_s,title_t

The fl parameter stands for “field list” and specifies what stored fields should be returned from documents matching the query. We should see a result like the following:

{"response":{"numFound":2,"start":0,"docs":[
{
"title_t":"The Black Company",
"author_s":"Glen Cook"},
{
"title_t":"The Black Cauldron",
"author_s":"Lloyd Alexander"}]
}}

See Solr Query for more solr query examples and syntax.

Solr Search Request in JSON

If you prefer using JSON to search the index, you can use the JSON Request API:

$ curl http://localhost:8983/solr/demo/query -d '
{
"query" : "title_t:black",
"fields" : ["title_t", "author_s"]
}'

Sorting and Paging Search Results

By default, Solr will return the top 10 documents ordered by highest score (relevance) first. Let’s change things up and return the top 3 search results, limiting them to books published by Bantam, and sorting by publication year descending:

$ curl http://localhost:8983/solr/demo/query -d '
q=*:*&
fq=publisher_s:Bantam&
rows=3&
sort=pubyear_i desc&
fl=title_t,pubyear_i'

And we get the response as requested:

"response":{"numFound":5,"start":0,"docs":[
{
"pubyear_i":1999,
"title_t":["A Clash of Kings"]},
{
"pubyear_i":1996,
"title_t":["A Game of Thrones"]},
{
"pubyear_i":1992,
"title_t":["Snow Crash"]}]
}

Parameter Explanation:

  • q=*:* – The *:* query matches all documents in the index.
  • fq=publisher_s:Bantam – “fq” parameters are filter queries, which don’t affect scoring, but filter out documents that don’t match the given query. These are cached separately and reused across different requests, greatly accelerating throughput. See Advanced Filter Caching in Solr for more details.
  • sort=pubyear_i desc – This sorts on the “pubyear_i” field descending. Solr has many advanced sorting options such as tie-break sorts and sorting by a function of document fields!
  • rows=3 – “rows” specifies the number of results to return, while “start” specifies an offset into the sorted list for paging purposes. Also see Deep Paging for options to efficiently page deeply into result sets.

Other things to investigate

JSON Facet API
Facet Functions & Analytics
Sub-Facets / Nested Facets

Finding your way around

If you poke around the web admin interface, you can find a lot of functionality. But just where is stuff stored on disk?

  • ./server/logs contains the log files for the solr node you started.
  • ./server/solr is the so-called “Solr Home” for the node.
  • ./server/solr/demo is the directory for the solr core that was automatically created when we created the demo document collection in the first step.
  • ./server/solr/demo/conf contains configuration files such as managed-schema (look here to see the raw xml defining field types), and solrconfig.xml (other non-schema solr configuration).
  • ./server/solr/demo/data/index contains the actual Lucene index files.

Stopping Solr

$ bin/solr stop  # this stops solr

If you wish, you can also remove the “demo” core/collection by simply deleting the ./server/solr/demo directory after the solr node has been stopped.

Next Steps

Welcome to the Apache Solr community! Now that you’ve discovered just how easy it is to get up and running, you should check out all of the other powerful features that Solr has to offer.

Remember to subscribe to the solr user mailing list where you’ll meet a ton of helpful users and developers!