Class ClassicAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.StopwordAnalyzerBase
org.apache.lucene.analysis.classic.ClassicAnalyzer
- All Implemented Interfaces:
Closeable
,AutoCloseable
Filters
ClassicTokenizer
with ClassicFilter
, LowerCaseFilter
and StopFilter
, using a list of English stop words.
ClassicAnalyzer was named StandardAnalyzer in Lucene versions prior to 3.1. As of 3.1, StandardAnalyzer
implements Unicode text segmentation, as specified by UAX#29.
- Since:
- 3.1
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Default maximum allowed token lengthprivate int
static final CharArraySet
An unmodifiable set containing some common English words that are usually not useful for searching.Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
stopwords
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
Constructor Summary
ConstructorsConstructorDescriptionBuilds an analyzer with the default stop words (STOP_WORDS_SET
).ClassicAnalyzer
(Reader stopwords) Builds an analyzer with the stop words from the given reader.ClassicAnalyzer
(CharArraySet stopWords) Builds an analyzer with the given stop words. -
Method Summary
Modifier and TypeMethodDescriptionprotected Analyzer.TokenStreamComponents
createComponents
(String fieldName) Creates a newAnalyzer.TokenStreamComponents
instance for this analyzer.int
protected TokenStream
normalize
(String fieldName, TokenStream in) Wrap the givenTokenStream
in order to apply normalization filters.void
setMaxTokenLength
(int length) Set maximum allowed token length.Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
-
Field Details
-
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTHDefault maximum allowed token length- See Also:
-
maxTokenLength
private int maxTokenLength -
STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.
-
-
Constructor Details
-
ClassicAnalyzer
Builds an analyzer with the given stop words.- Parameters:
stopWords
- stop words
-
ClassicAnalyzer
public ClassicAnalyzer()Builds an analyzer with the default stop words (STOP_WORDS_SET
). -
ClassicAnalyzer
Builds an analyzer with the stop words from the given reader.- Parameters:
stopwords
- Reader to read stop words from- Throws:
IOException
- See Also:
-
-
Method Details
-
setMaxTokenLength
public void setMaxTokenLength(int length) Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called. -
getMaxTokenLength
public int getMaxTokenLength()- See Also:
-
createComponents
Description copied from class:Analyzer
Creates a newAnalyzer.TokenStreamComponents
instance for this analyzer.- Specified by:
createComponents
in classAnalyzer
- Parameters:
fieldName
- the name of the fields content passed to theAnalyzer.TokenStreamComponents
sink as a reader- Returns:
- the
Analyzer.TokenStreamComponents
for this analyzer.
-
normalize
Description copied from class:Analyzer
Wrap the givenTokenStream
in order to apply normalization filters. The default implementation returns theTokenStream
as-is. This is used byAnalyzer.normalize(String, String)
.
-