See: Description
Class | Description |
---|---|
LuceneProxyNodeQuery |
Class that act as a bridge between the SIREn query API and the Lucene query
API.
|
MultiNodeTermQuery |
An abstract
NodePrimitiveQuery that matches documents
containing a subset of terms provided by a FilteredTermEnum enumeration. |
MultiNodeTermQuery.RewriteMethod |
Abstract class that defines how the query is rewritten.
|
NodeAutomatonQuery |
A
MultiNodeTermQuery that will match terms against a finite-state
machine. |
NodeBooleanClause |
A clause in a
NodeBooleanQuery . |
NodeBooleanQuery |
A
NodeQuery that matches a boolean combination of primitive queries, e.g.,
NodeTermQuery s, NodePhraseQuery s, NodeBooleanQuery s,
... |
NodeConstantScoreQuery |
A query that wraps another query or a filter and simply returns a constant
score equal to the query boost for every nodes that matches the filter or query.
|
NodeFuzzyQuery |
Implements the fuzzy search query.
|
NodeNumericRangeQuery<T extends Number> |
A
NodePrimitiveQuery that matches numeric values within a
specified range. |
NodePhraseQuery |
A
NodePrimitiveQuery that matches nodes containing a particular
sequence of terms. |
NodePrefixQuery |
A
NodePrimitiveQuery that matches documents containing terms with a
specified prefix. |
NodePrimitiveQuery |
Abstraction over all the primitive
NodeQuery s such as
NodeTermQuery . |
NodeQuery |
Abstract class for the SIREn's node queries
|
NodeRegexpQuery |
A
NodePrimitiveQuery that provides a fast regular expression matching
based on the org.apache.lucene.util.automaton package. |
NodeScorer |
The abstract
Scorer class that defines the interface for iterating
over an ordered list of nodes matching a query. |
NodeTermQuery |
A
NodePrimitiveQuery that matches nodes containing a term. |
NodeTermRangeQuery |
A
NodePrimitiveQuery that matches documents within an range of terms. |
NodeWildcardQuery |
A
NodePrimitiveQuery that implements the wildcard search query. |
TupleQuery | |
TwigQuery |
A
NodeQuery that matches a boolean combination of Ancestor-Descendant
and Parent-Child node queries. |
TwigQuery.EmptyRootQuery |
An empty root query is used to create twig query in which the root query
is not specified.
|
Enum | Description |
---|---|
NodeBooleanClause.Occur |
Specifies how clauses are to occur in matching documents.
|
Exception | Description |
---|---|
NodeBooleanQuery.TooManyClauses |
Thrown when an attempt is made to add more than
NodeBooleanQuery.getMaxClauseCount() clauses. |
org.apache.lucene.search
package documentation.
Query
API
which provides complex querying capabilities to search for documents, SIREn
provide a NodeQuery
API to provide
complex querying capabilities to search for nodes and documents. The
information retrieved not only consists of the matching documents, but also
of the matching nodes within these documents.
SIREn offers a wide variety of
NodeQuery
implementations. Most of them
are similar to the ones provided by the Lucene's
Query
API. For example, while Lucene
provides a TermQuery
implementation
to search documents that contain a specific term, SIREn provides a NodeTermQuery
implementation to search nodes
and documents that contain a specific term.
NodeQuery
provides methods to set
constraints on the nodes matched by the query. There are two types of
constraints:
NodeTermQuery
NodeTermQuery
matches all the
nodes that contain the specified Term
,
which is a word that occurs in a certain
Field
containing JSON data.
Constructing a NodeTermQuery
is as
simple as:
NodeTermQuery tq = new NodeTermQuery(new Term("json-field", "term"));In this example, the
NodeQuery
identifies all Document
s that have the
Field
named "json-field"
where a node contains the word "term".
NodePhraseQuery
A NodePhraseQuery
matches all the nodes
containing the specified phrase. A phrase is defined as a sequence of
Term
.
NodeBooleanQuery
NodeBooleanQuery
matches all the
nodes containing the specified boolean combination of queries.
A NodeBooleanQuery
contains multiple
NodeBooleanClause
s, where each clause
contains a sub-query
(NodePrimitiveQuery
instance) and an
operator (from NodeBooleanClause.Occur
)
describing how that sub-query is combined with the other clauses. The
semantic of NodeBooleanClause.Occur
is
identical to the semantic of BooleanClause.Occur
.
NodeTermRangeQuery
NodeTermRangeQuery
matches all
nodes containing a term that occurs in the inclusive or exclusive range of a
lower Term
and an upper
Term
according to
TermsEnum.getComparator()
.
It is not intended for numerical ranges; use
NodeNumericRangeQuery
instead.
NodeNumericRangeQuery
NodeNumericRangeQuery
matches all
nodes containing a value that occurs in a numeric range. For
NodeNumericRangeQuery to work, you must index the values with the datatypes
configured with the appropriate numeric analyzers
(NumericAnalyzer
).
NodePrefixQuery
,
NodeWildcardQuery
,
NodeRegexpQuery
NodePrefixQuery
matches all nodes
containing terms that begin with the specified string. A
NodeWildcardQuery
generalizes this
by allowing for the use of + (matches 1 or more characters),
* (matches 0 or more characters) and
? (matches exactly one character) wildcards. Note that the
NodeWildcardQuery
can be quite slow. Also
note that NodeWildcardQuery
should
not start with +, * and ?, as these are extremely slow.
Some QueryParsers may not allow this by default, but provide a
setAllowLeadingWildcard
method to remove that protection.
The NodeRegexpQuery
is even more
general than NodeWildcardQuery, matching all nodes with terms that match a
regular expression pattern.
NodeFuzzyQuery
NodeFuzzyQuery
matches nodes that
contain terms similar to the specified term. Similarity is determined using
Levenshtein (edit)
distance.
TwigQuery
TwigQuery
enables to combine
NodeQuery
s with a Parent-Child or
Ancestor-Descendant relation. This is the basic building block to build
tree-shaped queries.
A TwigQuery
is composed of a root and
of one or more children or descendants:
NodeQuery
instance.
An empty root is considered as a wildcard node query and will match all
nodes. We call "root nodes" the set of nodes that are retrieved by the
root query.
NodeQuery
associated to an operator (from
NodeBooleanClause.Occur
). A
descendant query will match all the nodes for which it exists a path
to a root node. A descendant is associated to a node level, which
corresponds to the relative distance (in term of levels) from the root.
A twig query is always associated to a level. If no level is specified, then by default the level is set to 1. When a twig query is used as a child or descendant of another twig query, then its level is automatically updated according to the level of the parent twig query. For example, given the following instructions:
TwigQuery tw1 = new TwigQuery(); TwigQuery tw2 = new TwigQuery(); tw1.addChild(tw2, Occur.MUST);In this example, the first twig query tw1 is defined at the default level 1. The second twig query tw2, after the call to
TwigQuery.addChild(NodeQuery, org.sindice.siren.search.node.NodeBooleanClause.Occur)
,
will have its level updated to 2 since it is now a child of a twig query at a
level 1.
NodeScorer
abstract class provides
common scoring functionality for all the node scorer implementations which
are the heart of the SIREn scoring process.
The implementation of the query processing framework follows a node-at-a-time
approach, where the query operators (i.e., NodeScorer
)
process one node at a time. The query processing framework has been
designed for high efficiency processing:
Siren10PostingsReader
. For
example, there is not the concept of next matching document (i.e.,
DocIdSetIterator.nextDoc()
) in the
NodeScorer
interface, but instead the concept of next candidate
document (i.e.,
NodeScorer.nextCandidateDocument()
).
This enables NodeConjunctionScorer
to
efficiently iterates over the document identifiers wihtout having to
decode the node labels until a potential candidate is found.
IntsRef
)
being processed is the same in all the query operators, which means that
the same array is reused across and no new arrays are created during the
query processing.
IntsRef
) over the array of the
uncompressed node block.
Copyright © 2014. All rights reserved.