Package org.sindice.siren.qparser.keyword

A keyword query parser implemented with the Lucene's Flexible Query Parser, with support for twig queries.

See: Description

Package org.sindice.siren.qparser.keyword Description

A keyword query parser implemented with the Lucene's Flexible Query Parser, with support for twig queries.

Query Parser Syntax

A keyword Query can be either:

The boolean expression can be a mixture of primitive queries (e.g., a keyword search) and of twig queries.

The syntax allows to use custom datatype on any part of the query.

Twig

A twig expression is defined by the special character ':'. For example, the following

 a : b
 
where the term "a" appears in one of the top level node (i.e., one of the top level field names) and the term "b" appears in one of its child node (i.e., one of the value of the field).
                             +---+
                             | a |
                             +-+-+
                               |
                             +-+-+
                             | b |
                             +---+
 

In a twig query, the child clause has a NodeBooleanClause.Occur.MUST occurrence by default.

Multiple children can be associated to a same top level node (i.e., one of the top level field names) by using a JSON-like array syntax:

 a : [ b , c ]
 
where the term "a" appears in one of the top level node, the term "b" appears in one of its child node (i.e., one of the value of the field), and the term "c" appears in a second child node.
                             +---+
                             | a |
                             +-+-+
                               |
                          +----+----+
                        +-+-+     +-+-+
                        | b |     | c |
                        +---+     +---+
 

A JSON-like object syntax is also supported. For example, the following query

 { a : b , c : d }
 
allows to query for nested objects in a JSON document. This is a syntax sugar for
* : [ a : b , c : d ]
where the twigs a : b and c : d become children of a parent with an empty top-level node.
                             +---+
                             | * |
                             +-+-+
                               |
                          +----+----+
                        +-+-+     +-+-+
                        | a |     | c |
                        +---+     +---+
                          |         |
                        +-+-+     +-+-+
                        | b |     | d |
                        +---+     +---+
 

Wildcard Node

The query syntax allows to use a wildcard * as a node in the twig query. For example, a twig query node with no constraint on the top-level node is written as

* : b
. The same goes for setting no constraint on the child:
a : *
. A chain of wildcards can be used to set a node constraint on a specific descendant. The query
a : * : * : b
defines a twig with the term "a" occurring on the top level node, and the term "b" occurring on a node three levels below.

Wildcards can be used with any of the previous syntaxes.

Nested Twigs

The query syntax allows you to create complex twig queries by nesting arrays, objects, and other twigs. For example:

 a : { * : b , a : [ { b : c : d } , { e : a }, g ] }
 
correspond to the following query tree:
                             +---+
                             | a |
                             +-+-+
                               |
                        +------+--------+
                        |               |
                      +-+-+           +-+-+
                      | * |           | a |
                      +-+-+           +-+-+
                        |               |
                      +-+-+      +------+------+
                      | b |      |      |      |
                      +---+    +-+-+  +-+-+  +-+-+
                               | * |  | * |  | g |
                               +-+-+  +-+-+  +---+
                                 |      |
                               +-+-+  +-+-+
                               | b |  | e |
                               +-+-+  +-+-+
                                 |      |
                               +-+-+  +-+-+
                               | c |  | a |
                               +-+-+  +---+
                                 |
                               +-+-+
                               | d |
                               +---+
 

Boolean

A boolean expression follows the Lucene query syntax, except for the ':' which does not define a field query but instead is used to build a twig query.

 a AND b
 
matches documents where the terms "a" and "b" occur in any node of the JSON tree.

Boolean of Twigs

A boolean combination of twig queries is also possible:

 (a : b) AND (c : d)
 
matches JSON documents where both twigs occurs.

Boolean in a Twig node

The complete Lucene query syntax, e.g., grouping, boolean operators or range queries, can be used to match a single node of a twig. For example, the query

 a : b AND c
 
matches JSON documents where the term "a" occurs on the top level node, and with both terms "b" and "c" occurring in a child node.
                             +---+
                             | a |
                             +-+-+
                               |
                          +---------+
                          | b AND c |
                          +---------+
 
The twig operator ':' has priority over the boolean operators. Therefore, the query
 a : b AND c : d
 
matches documents as in the previous query, with the additional constraint that a child node with an occurrence of the term "d" must be present. It is the same as the query
 a : (b AND c) : d
 
                             +---+
                             | a |
                             +-+-+
                               |
                          +---------+
                          | b AND c |
                          +---------+
                               |
                             +---+
                             | d |
                             +-+-+
 

Datatype

Some terms need to be analyzed in a specific way in order to be correctly indexed and searched, e.g., numbers. For those terms to be searchable, the keyword syntax provides a way to set how a query term should be analyzed. Using a function-like syntax:

 datatype( ... )
 
any query elements inside the parenthesis are processed using the datatype.

A mapping from a datatype label to an Analyzer is set thanks to configuration key KeywordQueryConfigHandler.KeywordConfigurationKeys.DATATYPES_ANALYZERS.

For example, I can search for documents where the field contains age, and the values are integers ranging from 5 to 10 using the range query below:

 age : int( [ 5 TO 50 ] )
 
The keyword parser in that example is configured to use IntNumericAnalyzer. for the datatype int.

The top level node of a twig query is by default set to use the datatype JSONDatatype.JSON_FIELD. Any query elements which is not wrapped in a custom datatype uses the datatype XSDDatatype.XSD_STRING.

Query Examples

Node query

Match all the documents with one node containing the phrase "Marie Antoinette"
 "Marie Antoinette"
 

Twig query

Match all the documents with one node containing the term "genre" and with a child node containing the term "Drama".
 genre : Drama
 
Such a twig query is the basic building block to query a particular field name of a JSON object. The field name is always the root of the twig query and the field value is defined as a child clause.

More complex twig queries can be constructed by using nested twig queries or using more than one child clause.

 director : { last_name : Eastwood , first_name : Clint }
 

Boolean Query

Node and twig queries can be combined freely.
 (genre : Drama) AND (year : 2010)
 

Copyright © 2014. All rights reserved.