This is the standard Hoeffding-bound categorical feature proposed in the paper below:
More...
|
| HoeffdingCategoricalSplit (const size_t numCategories=0, const size_t numClasses=0) |
| Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes. More...
|
|
| HoeffdingCategoricalSplit (const size_t numCategories, const size_t numClasses, const HoeffdingCategoricalSplit &other) |
| Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes and another HoeffdingCategoricalSplit to take parameters from. More...
|
|
template<typename eT > |
void | Train (eT value, const size_t label) |
| Train on the given value with the given label. More...
|
|
void | EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness) const |
| Given the points seen so far, evaluate the fitness function, returning the gain for the best possible split and the second best possible split. More...
|
|
size_t | NumChildren () const |
| Return the number of children, if the node were to split.
|
|
void | Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo) |
| Gather the information for a split: get the labels of the child majorities, and initialize the SplitInfo object. More...
|
|
size_t | MajorityClass () const |
| Get the majority class seen so far.
|
|
double | MajorityProbability () const |
| Get the probability of the majority class given the points seen so far.
|
|
template<typename Archive > |
void | serialize (Archive &ar, const uint32_t) |
| Serialize the categorical split.
|
|
template<typename FitnessFunction>
class mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >
This is the standard Hoeffding-bound categorical feature proposed in the paper below:
@inproceedings{domingos2000mining,
title={{Mining High-Speed Data Streams}},
author={Domingos, P. and Hulten, G.},
year={2000},
booktitle={Proceedings of the Sixth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '00)},
pages={71--80}
}
This class will track the sufficient statistics of the training points it has seen. The HoeffdingSplit class (and other related classes) can use this class to track categorical features and split decision tree nodes.
- Template Parameters
-
FitnessFunction | Fitness function to use for calculating gain. |
template<typename FitnessFunction >
Given the points seen so far, evaluate the fitness function, returning the gain for the best possible split and the second best possible split.
In this splitting technique, we only split one possible way, so secondBestFitness will always be 0.
- Parameters
-
bestFitness | The fitness function result for this split. |
secondBestFitness | This is always set to 0 (this split only splits one way). |