public class LinearCounts
extends java.lang.Object
| Modifier and Type | Field and Description |
|---|---|
protected java.util.ArrayList |
analysisOptions
List of optional analysis operations
|
protected org.ai_heuristic.model.BagOfWords |
bagOfWords
Bag of words list.
|
protected boolean |
compareWithFirst
If true, only compare other analyses with the first document's analysis
|
protected int |
maxNestingSequenceNumber
The maximum number of words in the sequences to output to
|
protected int |
minNestingSequenceNumber
The minimum number of words in the sequences to output to
|
protected java.util.ArrayList<java.util.ArrayList<java.lang.String[]>> |
popularSequences
Popular sequences list.
|
protected int |
popularSequencesNumber
The number of sequences to output for each sequence number
|
protected java.util.ArrayList<java.lang.String[]> |
popularWords
List of most popular words only.
|
| Constructor and Description |
|---|
LinearCounts()
Create a new instance of LinearCounts.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
createBoW(java.lang.String theText,
boolean wordStem)
Sort all words in the text to determine the most popular ones.
|
java.util.ArrayList<java.lang.String[]> |
getPopularWords()
Get the popular words list.
|
java.util.ArrayList<java.util.ArrayList<java.lang.String[]>> |
getPopularWordSequences()
Get the popular word sequences list.
|
protected java.lang.String |
sequenceKey(java.lang.String sequence,
boolean wordStem)
Create a word sequence key, that might be different if word stemming is used.
|
void |
sortMostPopularSequences(java.util.ArrayList<java.lang.String> singleWordsList,
java.lang.String analysisTerm,
int minNestingSequenceNumber,
int maxNestingSequenceNumber,
int popularSequencesNumber,
boolean wordStem)
Sort all words in the text to determine the most popular sequences.
|
void |
sortMostPopularSequences(java.lang.String theText,
java.lang.String analysisTerm,
int minNestingSequenceNumber,
int maxNestingSequenceNumber,
int popularSequencesNumber,
boolean wordStem)
Sort all words in the text to determine the most popular sequences.
|
void |
sortWordOrder(java.lang.String theText,
int popularWordsNumber,
int minWordLength,
boolean wordStem)
Sort all words in the text to determine the most popular ones.
|
void |
sortWordOrder(java.lang.String theText,
java.lang.String analysisTerm,
int minWordLength,
boolean wordStem)
Sort all words in the text to determine the most popular ones.
|
protected void |
sortWordOrder(java.lang.String theText,
java.lang.String analysisTerm,
int popularWordsNumber,
int minWordLength,
boolean wordStem)
Sort all words in the text to determine the most popular ones.
|
protected int minNestingSequenceNumber
protected int maxNestingSequenceNumber
protected int popularSequencesNumber
protected boolean compareWithFirst
protected org.ai_heuristic.model.BagOfWords bagOfWords
protected java.util.ArrayList<java.lang.String[]> popularWords
protected java.util.ArrayList<java.util.ArrayList<java.lang.String[]>> popularSequences
protected java.util.ArrayList analysisOptions
public void sortWordOrder(java.lang.String theText,
java.lang.String analysisTerm,
int minWordLength,
boolean wordStem)
throws java.lang.Exception
theText - the text to sort.analysisTerm - specific analysis term to compare with. Can be null, or if not
null then sets popularWordsNumber to 1.minWordLength - minimum number of characters in a word.wordStem - if true use word stemming to create the bag-of-words.java.lang.Exception - any error.public void sortWordOrder(java.lang.String theText,
int popularWordsNumber,
int minWordLength,
boolean wordStem)
throws java.lang.Exception
theText - the text to sort.popularWordsNumber - maximum number of popular words to store.minWordLength - minimum number of characters in a word.wordStem - if true use word stemming to create the bag-of-words.java.lang.Exception - any error.protected void sortWordOrder(java.lang.String theText,
java.lang.String analysisTerm,
int popularWordsNumber,
int minWordLength,
boolean wordStem)
throws java.lang.Exception
theText - the text to sort.analysisTerm - specific analysis term to compare with. Can be null, or if not
null then sets popularWordsNumber to 1.popularWordsNumber - maximum number of popular words to store.minWordLength - minimum number of characters in a word.wordStem - if true use word stemming to create the bag-of-words.java.lang.Exception - any error.public void sortMostPopularSequences(java.lang.String theText,
java.lang.String analysisTerm,
int minNestingSequenceNumber,
int maxNestingSequenceNumber,
int popularSequencesNumber,
boolean wordStem)
throws java.lang.Exception
theText - the text to sort.analysisTerm - specific analysis term to compare with. Can be null, or if not
null then sets popularWordsNumber to 1.minNestingSequenceNumber - minimum number of words in a sequence.maxNestingSequenceNumber - maximum number of words in a sequence.popularSequencesNumber - number of sequences for each sequence length.wordStem - if true use word stemming to create the bag-of-words.java.lang.Exception - any error.public void sortMostPopularSequences(java.util.ArrayList<java.lang.String> singleWordsList,
java.lang.String analysisTerm,
int minNestingSequenceNumber,
int maxNestingSequenceNumber,
int popularSequencesNumber,
boolean wordStem)
singleWordsList - single list of words to sort.analysisTerm - specific analysis term to compare with. Can be null, or if not
null then sets popularWordsNumber to 1.minNestingSequenceNumber - minimum number of words in a sequence.maxNestingSequenceNumber - maximum number of words in a sequence.popularSequencesNumber - number of sequences for each sequence length.wordStem - if true use word stemming to create the bag-of-words.protected void createBoW(java.lang.String theText,
boolean wordStem)
throws java.lang.Exception
theText - the text to sort.wordStem - if true use word stemming to create the bag-of-words.
Default is false.java.lang.Exception - any error.protected java.lang.String sequenceKey(java.lang.String sequence,
boolean wordStem)
sequence - the sequence to consider. Word tokens separated by spaces.wordStem - true if word stem.public java.util.ArrayList<java.lang.String[]> getPopularWords()
public java.util.ArrayList<java.util.ArrayList<java.lang.String[]>> getPopularWordSequences()