public class BowHeuristic
extends java.lang.Object
| Modifier and Type | Field and Description |
|---|---|
protected java.util.HashMap<java.lang.String,java.lang.Integer> |
termCount
Count for every term in any document
|
protected java.util.ArrayList<java.lang.String> |
termList
List of all terms in all documents
|
| Constructor and Description |
|---|
BowHeuristic()
Create a new instance of BowHeuristic.
|
| Modifier and Type | Method and Description |
|---|---|
java.util.ArrayList<BagOfWords> |
createBoW(java.util.ArrayList<java.lang.String> filePaths,
java.util.ArrayList<java.lang.String> wordList,
boolean wordStem)
Create a bag-of-words for each file.
|
BagOfWords |
createBoW(java.lang.String theText,
java.util.ArrayList<java.lang.String> wordList,
boolean wordStem)
Create a bag-of-words for the input text sequence.
|
protected java.util.HashMap<java.lang.String,java.lang.Integer> |
getTermCount(java.util.ArrayList<BagOfWords> bowList)
Count all terms over all bag of words.
|
java.util.ArrayList<java.lang.String> |
sortTermOrder(java.util.HashMap<java.lang.String,java.lang.Float> termScore)
Sort the terms in descending order of the term score.
|
java.util.ArrayList<java.lang.String> |
sortTopTermOrder(int termNumber,
java.util.HashMap<java.lang.String,java.lang.Float> termScore)
Sort the top number of terms to use.
|
protected java.util.ArrayList<java.lang.String> termList
protected java.util.HashMap<java.lang.String,java.lang.Integer> termCount
public java.util.ArrayList<BagOfWords> createBoW(java.util.ArrayList<java.lang.String> filePaths, java.util.ArrayList<java.lang.String> wordList, boolean wordStem) throws java.lang.Exception
filePaths - list of files to read.wordList - list of common words to remove from each bow.wordStem - if true use word stemming.java.lang.Exception - any error.public BagOfWords createBoW(java.lang.String theText, java.util.ArrayList<java.lang.String> wordList, boolean wordStem) throws java.lang.Exception
theText - the text to read.wordList - list of common words to remove from each bow. Can be null.wordStem - if true use word stemming.java.lang.Exception - any error.protected java.util.HashMap<java.lang.String,java.lang.Integer> getTermCount(java.util.ArrayList<BagOfWords> bowList) throws java.lang.Exception
bowList - list of bag of words to process. Of type BagOfWords.java.lang.Exception - any error.public java.util.ArrayList<java.lang.String> sortTermOrder(java.util.HashMap<java.lang.String,java.lang.Float> termScore)
throws java.lang.Exception
termScore - a score (floating point value) for each term.java.lang.Exception - any error.public java.util.ArrayList<java.lang.String> sortTopTermOrder(int termNumber,
java.util.HashMap<java.lang.String,java.lang.Float> termScore)
throws java.lang.Exception
termNumber - the maximum number of terms to add to the sorted list.termScore - a score (float value) for each term. Key is term name, value is the score.java.lang.Exception - any error.