Solr Nested Objects / Nested Documents and Block Join

Nested Documents

Nested Documents (also called Nested Objects) provides the ability to “nest” some documents inside of other documents in a parent/child relationship.

Why Nested Documents

One reason for using nested documents is to prevent false matches.
For example, we may have a T-Shirt with 2 SKUs, a Large Red, and a Medium Blue.

Say we tried to model this as a single document:

{
 product : "Awesome T-Shirt",
 color : [ "Red", "Blue" ],
 size : [ "L", "M" ]
}

Now if we search for color:RED AND size:M , it would incorrectly match our document!
But if we represented the SKUs as two different documents, then there would be no incorrect match.

{
 color : "Red",
 size : "L",
}
{
 color : "Blue",
 size : "M",
}

Lucene Index Representation

Lucene has a flat object model and does not really support “nesting” of documents in the index.
Lucene *does* support adding a list of documents atomically and contiguously (i.e. a virtual “block”), and this is the feature used by Solr to implement “nested objects”.

When you add a parent document with 3 children, these appear int the index contiguously as

child1, child2, child3, parent

There is no Lucene-level information that links parent and child, or distinguishes this parent/child block from the other documents in the index that come before or after. Successfully using parent/child relationships relies on more information being provided at query time.

Limitations

All children of a parent document must be indexed together with the parent document. One cannot update any document (parent or child) individually. The entire block needs to be re-indexed of any changes need to be made.

Schema Requirements

There are no schema requirements except that the _root_ field must exist (but that is there by default in all our schemas).
Any document can have nested child documents.

Block Join

“Block Join” refers to the set of related query technologies to efficiently map from parents to children or vice versa at query time. The locality of children and parents can be used to both speed up query operations and lower memory requirements compared to other join methods.

Indexing Nested Documents

NOTE: This example currently requires Solr 5.3 or later.

First, bring up Solr and create a collection (if you have not done so already):

$ bin/solr start            # this starts solr
$ bin/solr create -c demo   # this creates a document collection called "demo"

Let’s remove any leftover docs from other examples:

curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '{delete:{query:"*:*"}}'

Now let’s add a book with some reviews as nested child documents (notice the _childDocuments_ element):

$ curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '
[
 {id : book1, type_s:book, title_t : "The Way of Kings", author_s : "Brandon Sanderson",
  cat_s:fantasy, pubyear_i:2010, publisher_s:Tor,
  _childDocuments_ : [
    { id: book1_c1, type_s:review, review_dt:"2015-01-03T14:30:00Z",
      stars_i:5, author_s:yonik,
      comment_t:"A great start to what looks like an epic series!"
    }
    ,
    { id: book1_c2, type_s:review, review_dt:"2014-03-15T12:00:00Z",
      stars_i:3, author_s:dan,
      comment_t:"This book was too long."
    }
  ]
 }
]'

Now we can see that these are really just indexed as 3 documents, all visible by default:

curl http://localhost:8983/solr/demo/query -d 'q=*:*&fl=id'

  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"book1_c1"},
      {
        "id":"book1_c2"},
      {
        "id":"book1"}]
  }

Now lets add an additional document with nested child documents for use with our query examples:

$ curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '
[
 {id : book2, type_s:book, title_t : "Snow Crash", author_s : "Neal Stephenson",
  cat_s:sci-fi, pubyear_i:1992, publisher_s:Bantam,
  _childDocuments_ : [
    { id: book2_c1, type_s:review, review_dt:"2015-01-03T14:30:00Z",
      stars_i:5, author_s:yonik,
      comment_t:"Ahead of its time... I wonder if it helped inspire The Matrix?"
    }
    ,
    { id: book2_c2, type_s:review, review_dt:"2015-04-10T9:00:00Z",
      stars_i:2, author_s:dan,
      comment_t:"A pizza boy for the Mafia franchise? Really?"
    }
    ,
    { id: book2_c3, type_s:review, review_dt:"2015-06-02T00:00:00Z",
      stars_i:4, author_s:mary,
      comment_t:"Neal is so creative and detailed! Loved the metaverse!"
    }
  ]
 }
]'

Block Join Query

TODO

Returning Child Documents

One can return child documents along with every returned parent document by using the [child] doc transformer (it’s added to the fl field list parameter ). The list of child documents will be included under the “_childDocuments_” field of each parent.

$ curl http://localhost:8983/solr/demo/query -d '
q=cat_s:(fantasy OR sci-fi)&
fl=id,[child parentFilter=type_s:book]'

  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"book1",
        "_childDocuments_":[
        {
          "id":"book1_c1",
          "type_s":"review",
          "review_dt":"2015-01-03T14:30:00Z",
          "stars_i":5,
          "author_s":"yonik",
          "comment_t":["A great start to what looks like an epic series!"]},
        {
          "id":"book1_c2",
          "type_s":"review",
[...]

Child Doc Transformer Parameters:

parentFilter – identifies all of the parents. See the section on The Parent Filter for more info.
childFilter – optional query to filter which child documents should be included.
limit – maximum number of child documents to return per parent (defaults to 10)

Also see the Solr ref guide entry on the [child] doc transformer.

Faceting

The JSON Facet API has support for switching the facet domain based on the nested document relationships.

Faceting on Parents

The main query gives us a document list of reviews by author_s:yonik
If we want to facet on the book genre (cat_s field) then we need to
switch the domain from the children (type_s:reviews) to the parents (type_s:books).

$ curl http://localhost:8983/solr/demo/query -d '
q=author_s:yonik&fl=id,comment_t&
json.facet={
  genres : {
    type: terms, 
    field: cat_s,
    domain: { blockParent : "type_s:book" }  
  }
}'

And we get a facet over the books which yonik reviewed:

  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"book1_c1",
        "comment_t":["A great start to what looks like an epic series!"]},
      {
        "id":"book2_c1",
        "comment_t":["Ahead of its time... I wonder if it helped inspire The Matrix?"]}]
  },
  "facets":{
    "count":2,
    "genres":{
      "buckets":[{
          "val":"fantasy",
          "count":1},
        {
          "val":"sci-fi",
          "count":1}]
  }}

Faceting on Children

Now lets say we’re displaying the top sci-fi and fantasy books, and we want to find out who reviews the most books out of our selection. Since our root implicit facet bucket (formed by the query and filters) consists of parent documents (books), we need to switch the facet domain to the children for the author facet.

$ curl http://localhost:8983/solr/demo/query -d '
q=cat_s:(sci-fi OR fantasy)&fl=id,title_t&
json.facet={
  top_reviewers : {
    type: terms, 
    field: author_s,
    domain: { blockChildren : "type_s:book" }  
  }
}'

Response:

  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"book1",
        "title_t":["The Way of Kings"]},
      {
        "id":"book2",
        "title_t":["Snow Crash"]}]
  },
  "facets":{
    "count":2,
    "top_reviewers":{
      "buckets":[{
          "val":"dan",
          "count":2},
        {
          "val":"yonik",
          "count":2},
        {
          "val":"mary",
          "count":1}]
  }}

Filtering Children

By default, blockChildren will match all children of every parent doc from the input domain. It’s often the case that only a subset of the children are desired. The easiest way to limit children is with the filter clause.

For example, if we wanted to find the same top reviewers as before, but only for 5 star reviews:

$ curl http://localhost:8983/solr/demo/query -d '
q=cat_s:(sci-fi OR fantasy)&fl=id,title_t&
json.facet={
  top_reviewers : {
    type: terms, 
    field: author_s,
    domain: { 
      blockChildren : "type_s:book",
      filter : "stars_i:5"
    }
  }
}'

The Parent Filter

Note that regardless of which direction we are mapping (parents to children or children to parents), or what documents we are operating on, we provide a parent filter to define the complete set of parents in the index. In these examples, the parent filter is "type_s:book".

Solr 'n Stuff

Open source search and analytics