public class AnyURIAnalyzer
extends org.apache.lucene.analysis.Analyzer
The URI normalisation can be configured using
setUriNormalisation(URINormalisation)
. You can disable it, activate
it only on URI local name, or on the full URI. However, URI normalisation on the
full URI is costly in term of CPU at indexing time, and can double the size
of the index, since each URI is duplicated by n tokens. By default, the URI
normalisation is disabled.
URINormalisationFilter
,
URILocalnameFilter
Modifier and Type | Class and Description |
---|---|
static class |
AnyURIAnalyzer.URINormalisation
Types of URI normalisation
|
org.apache.lucene.analysis.Analyzer.GlobalReuseStrategy, org.apache.lucene.analysis.Analyzer.PerFieldReuseStrategy, org.apache.lucene.analysis.Analyzer.ReuseStrategy, org.apache.lucene.analysis.Analyzer.TokenStreamComponents
Modifier and Type | Field and Description |
---|---|
static org.apache.lucene.analysis.util.CharArraySet |
STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not
useful for searching.
|
Constructor and Description |
---|
AnyURIAnalyzer(org.apache.lucene.util.Version version) |
AnyURIAnalyzer(org.apache.lucene.util.Version version,
org.apache.lucene.analysis.util.CharArraySet stopWords) |
AnyURIAnalyzer(org.apache.lucene.util.Version version,
File stopwords) |
AnyURIAnalyzer(org.apache.lucene.util.Version version,
Reader stopWords) |
AnyURIAnalyzer(org.apache.lucene.util.Version version,
String[] stopWords) |
Modifier and Type | Method and Description |
---|---|
protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents |
createComponents(String fieldName,
Reader reader) |
void |
setUriNormalisation(AnyURIAnalyzer.URINormalisation n) |
public static final org.apache.lucene.analysis.util.CharArraySet STOP_WORDS_SET
public AnyURIAnalyzer(org.apache.lucene.util.Version version)
public AnyURIAnalyzer(org.apache.lucene.util.Version version, org.apache.lucene.analysis.util.CharArraySet stopWords)
public AnyURIAnalyzer(org.apache.lucene.util.Version version, String[] stopWords)
public AnyURIAnalyzer(org.apache.lucene.util.Version version, File stopwords) throws IOException
IOException
public AnyURIAnalyzer(org.apache.lucene.util.Version version, Reader stopWords) throws IOException
IOException
public void setUriNormalisation(AnyURIAnalyzer.URINormalisation n)
Copyright © 2014. All rights reserved.