Noggit is the world’s fastest streaming JSON parser for Java.
Noggit is the streaming JSON parser used in Solr. It lives here on github.
JSON features and extensions
Noggit supports a number of extensions to the JSON grammar. All of these extensions are optional and may be disabled.
Comments
{ // This is a single line comment # This is also a single line comment /* This is a multi-line * C-style comment. */ }
Unquoted Strings
{ first : Yonik, last : Seeley }
Single Quoted Strings
JSON strings are normally encapsulated by double quotes. It’s often desirable to use single quotes if for example you are embedding some JSON in another double quoted string in a program.
['how', 'now', 'brown', 'cow']
Backslash escape any character
Sometimes one may not know exactly what characters need to be backslash escaped. It can be useful to accept this without throwing an exception.
'This is just a " string'
Trailing commas / extra commas
Allowing trailing commas or extra commas can make it easier to produce JSON that doesn’t throw a parse exception. One use-case is templating JSON. Given the following template,
{ filters:["instock:true", ${FILT1}] } # Note: templating is not part of JSON or Noggit... but may happen before parsing.
If FILT1
is not defined and is replaced with empty space, this results in the following JSON:
{ filters:["instock:true", ] // this will be parsed as filters:["instock:true"] }
Noggit ignores all extra commas, not just trailing commas:
[ [,] // equivalent to [] , {,} // equivalent to {} , [,,3,,,6,,] // equivalent to [3,6] ]
Streaming Values
Large string values can optionally be handled in a streaming fashion a piece at a time. Noggit will only construct a single String object in memory if asked. This allows for stream processing with very little memory overhead.
{ "big_string" : "A very large string... pretend its's 1GB in size... we can process it and send it on without reading it all into memory at once!" }
Huge values
Noggit can handle huge values that are JSON compliant but may be too large to be parsed into a Java primitive.
{ "big_int" : 1234567890987654321334325343534535342325786237862578625725867258672356711107, "big_float" : 112412133377778226524562431234215423.23421434645743234564758453322342, "big_sci" : 2.342669039282149050282364845982748592e-94321 }
Concatenated JSON Streaming
Noggit can also handle multiple JSON values streamed over a single connection and simply catenated together. Primitive values should of course be separated by whitespace to avoid ambiguity.
{first_object:10} ['another array object']['yet another object'] {more:objects}{another:object} ['who knows how many json values will be streamed by the writer...'] 42 "is this the end?"
Memory Efficiency
Noggit can parse huge JSON messages with minimal overhead.
- A single byte of state needed per nested object or array. This is needed to keep track of the type of enclosing entity.
- A user can optionally provide an input buffer for Noggit to use when parsing from a Reader, allowing re-use across different parsers and thus lower memory consumption and garbage collection activity.
- Streaming values: very large values (such as strings) can be obtained in chunks, thus the whole value never needs to reside in memory at once.