public class InformationGain extends Function
valueType for the algorithm. The data column that the
classification is being performed for is of type Integer and can be set and
retrieved using AiHeuristicConst.COLINDEX as the key and retrieved
from dataset.getParams().
str1a str2a str3a
str1b str2b str3a
str1a str2c str3b
str1b str2c str3a
The tabular dataset can be created by adding columns of type MetricValue to a
MetricDataset and passing this as the parameter. For example:
ArrayList dataPoints = new ArrayList();
dataPoints.add("Sunny");
dataPoints.add("Sunny");
dataPoints.add("Overcast");
dataPoints.add("Rain");
dataPoints.add("Rain");
MetricValue metricValue = MetricValue.toValue("Outlook", valueType, dataPoints);
allColumns.add(metricValue);
...
//repeat for the other data columns
...
MetricDataset evalDataset = MetricDataset.toDataset(null, valueType, allColumns);
//add column to evaluate for
evalDataset.getParams().put(AiHeuristicConst.COLINDEX, new Integer(4));
...
//initialise and run the algorithm
InformationGain infoGain = new InformationGain(valueType);
ArrayList resultList = (ArrayList)infoGain.evaluate(evalDataset).getValue();
The algorithm automatically counts the number of similar entries for each column variable
and uses this number to determine the best entropy result. The information gain ordering
is then returned as a result in the form of a ArrayList of MetricValue, where the name is the
column variable name and the value is the Double value evaluation. Algorithms in
the ai_heuristic package try to maximise and so a larger final value means a better split.
Some help from the tutorial by Mohammad A Rahman at http://www.codeproject.com/Articles/259241/ID3-Decision-Tree-Algorithm-Part-1#
config, mathCompare, valueType| Constructor and Description |
|---|
InformationGain()
Create a new instance of InformationGain.
|
InformationGain(java.lang.String thisValueType)
Create a new instance of InformationGain.
|
| Modifier and Type | Method and Description |
|---|---|
ReplySet |
evaluate(MetricDataset dataset)
Evaluate the resulting IG after splitting the datasets over each attribute (column variable),
apart from the decision attribute or column that the IG is being calculated for.
|
protected void |
initialise()
Initialise the function values, setting the config parameters or other.
|
checkValueType, createFunction, createFunction, createFunction, getConfigParams, innerObject, isLegalNumber, setConfigParams, setEvaluator, setValueTypepublic InformationGain()
throws java.lang.Exception
java.lang.Exception - any error.public InformationGain(java.lang.String thisValueType)
throws java.lang.Exception
thisValueType - the type of object being evaluated. Can be null if set later or not used.java.lang.Exception - any error.protected void initialise()
initialise in class Functionpublic ReplySet evaluate(MetricDataset dataset) throws java.lang.Exception
evaluate in interface FunctionDefevaluate in class Functiondataset - tabular list of values. This should store a list of MetricValue entities
that each store a set of values for a single attribute column entry. Each MetricValue in
the list should be assigned the entity name. Note that in addition to this, the config
of the dataset needs to set the data column that the classification is being performed for.
This is of type Integer and can be set and retrieved using AiHeuristicConst.COLINDEX
as the key.ArrayList of MetricValue entities.
Each entry is assigned the attribute entity name and the value is the information gain, as a Double.
For the decision column, the value is null, with only the entity name.java.lang.Exception - any error.