public class BagOfWords
extends java.lang.Object
| Modifier and Type | Field and Description |
|---|---|
protected java.util.HashMap<java.lang.String,java.lang.Integer> |
bagOfWords
This stores the bag of words.
|
protected java.lang.String |
name
Unique name or ID
|
protected java.util.ArrayList<java.lang.String> |
wordOrder
Word ordering from most to least counts
|
| Constructor and Description |
|---|
BagOfWords()
Create a new instance of BagOfWords.
|
| Modifier and Type | Method and Description |
|---|---|
void |
createBagOfWords(java.lang.String theText)
Parse the text that is entered to create the bag of words.
|
void |
createBagOfWords(java.lang.String theText,
boolean wordStem)
Parse the text that is entered to create the bag of words.
|
BagOfWords |
difference(BagOfWords thisBagOfWords)
Create a new bag of words that is the difference between of this one and
the one passed in.
|
float |
dotproduct(BagOfWords thisBagOfWords)
Calculate the dotproduct value of this bag-of-words with the one passed in.
|
java.util.HashMap<java.lang.String,java.lang.Integer> |
getBagOfWords()
Get a copy of the bag of words structure.
|
java.lang.String |
getName()
Get the name or id of this bag-of-words.
|
int |
getTotalWordCount()
Get the total count of all instances of each word.
|
java.util.ArrayList<java.lang.String> |
getWordOrder()
Get the ordered word list from most to least counts.
|
BagOfWords |
intersection(BagOfWords thisBagOfWords)
Create a new bag of words that is the intersection of this one with
the one passed in.
|
float |
magnitude(BagOfWords thisBagOfWords)
Calculate the magnitude value of this bag-of-words with the one passed in.
|
protected void |
orderBagOfWords()
Create a ordered list of most to least counts for the bag of words.
|
void |
removeWords(java.util.ArrayList<java.lang.String> toRemove)
Remove the list of words from the BOW structures -
bagOfWords and wordOrder. |
boolean |
sameWordList(BagOfWords compareWith)
Return true if this bag-of-words has the same word list as the bag-of-words
passed in.
|
void |
setBagOfWords(java.util.HashMap<java.lang.String,java.lang.Integer> thisBagOfWords)
Set the bag of words for this class to process.
|
void |
setBagOfWords(java.util.HashMap<java.lang.String,java.lang.Integer> thisBagOfWords,
java.util.ArrayList<java.lang.String> thisWordOrder)
Set the bag of words for this class to process.
|
void |
setName(java.lang.String thisName)
Set the name or id of this bag-of-words.
|
BagOfWords |
subtract(BagOfWords thisBagOfWords)
Create a new bag of words that is the subtraction of the bag passed in from
this one.
|
org.licas_xml.abs.Element |
toXml()
Convert this bag of words into an XML format.
|
BagOfWords |
union(BagOfWords thisBagOfWords)
Create a new bag of words that is the union of this one with
the one passed in.
|
protected java.lang.String name
protected java.util.HashMap<java.lang.String,java.lang.Integer> bagOfWords
protected java.util.ArrayList<java.lang.String> wordOrder
public void createBagOfWords(java.lang.String theText)
throws java.lang.Exception
theText - the text sequence to parse.java.lang.Exception - any error.public void createBagOfWords(java.lang.String theText,
boolean wordStem)
throws java.lang.Exception
theText - the text sequence to parse.wordStem - if true re-parse using word stemming, if false leave as original words.java.lang.Exception - any error.protected void orderBagOfWords()
throws java.lang.Exception
java.lang.Exception - any error.public void removeWords(java.util.ArrayList<java.lang.String> toRemove)
bagOfWords and wordOrder.toRemove - list of words to remove.public boolean sameWordList(BagOfWords compareWith)
compareWith - the bag-of-words to compare with.public BagOfWords subtract(BagOfWords thisBagOfWords) throws java.lang.Exception
thisBagOfWords - the bag of words to intersect with.java.lang.Exception - any error.public BagOfWords difference(BagOfWords thisBagOfWords) throws java.lang.Exception
thisBagOfWords - the bag of words to intersect with.java.lang.Exception - any error.public BagOfWords intersection(BagOfWords thisBagOfWords) throws java.lang.Exception
thisBagOfWords - the bag of words to intersect with.java.lang.Exception - any error.public BagOfWords union(BagOfWords thisBagOfWords) throws java.lang.Exception
thisBagOfWords - the bag of words to combine with.java.lang.Exception - any error.public float dotproduct(BagOfWords thisBagOfWords)
thisBagOfWords - the bag of words to combine with.public float magnitude(BagOfWords thisBagOfWords)
thisBagOfWords - the bag of words to combine with.public int getTotalWordCount()
public void setName(java.lang.String thisName)
thisName - the bag-of-words name.public java.lang.String getName()
public void setBagOfWords(java.util.HashMap<java.lang.String,java.lang.Integer> thisBagOfWords)
throws java.lang.Exception
thisBagOfWords - the bag of words structure.java.lang.Exception - any error.public void setBagOfWords(java.util.HashMap<java.lang.String,java.lang.Integer> thisBagOfWords,
java.util.ArrayList<java.lang.String> thisWordOrder)
throws java.lang.Exception
thisBagOfWords - the bag of words structure.thisWordOrder - word ordering for the bag-of-words.java.lang.Exception - any error.public java.util.HashMap<java.lang.String,java.lang.Integer> getBagOfWords()
public java.util.ArrayList<java.lang.String> getWordOrder()
public org.licas_xml.abs.Element toXml()
throws java.lang.Exception
java.lang.Exception - any error.