mlpack
Public Types | Public Member Functions | List of all members
mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction > Class Template Reference

This is the standard Hoeffding-bound categorical feature proposed in the paper below: More...

#include <hoeffding_categorical_split.hpp>

Public Types

typedef CategoricalSplitInfo SplitInfo
 The type of split information required by the HoeffdingCategoricalSplit.
 

Public Member Functions

 HoeffdingCategoricalSplit (const size_t numCategories=0, const size_t numClasses=0)
 Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes. More...
 
 HoeffdingCategoricalSplit (const size_t numCategories, const size_t numClasses, const HoeffdingCategoricalSplit &other)
 Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes and another HoeffdingCategoricalSplit to take parameters from. More...
 
template<typename eT >
void Train (eT value, const size_t label)
 Train on the given value with the given label. More...
 
void EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness) const
 Given the points seen so far, evaluate the fitness function, returning the gain for the best possible split and the second best possible split. More...
 
size_t NumChildren () const
 Return the number of children, if the node were to split.
 
void Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo)
 Gather the information for a split: get the labels of the child majorities, and initialize the SplitInfo object. More...
 
size_t MajorityClass () const
 Get the majority class seen so far.
 
double MajorityProbability () const
 Get the probability of the majority class given the points seen so far.
 
template<typename Archive >
void serialize (Archive &ar, const uint32_t)
 Serialize the categorical split.
 

Detailed Description

template<typename FitnessFunction>
class mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >

This is the standard Hoeffding-bound categorical feature proposed in the paper below:

@inproceedings{domingos2000mining,
title={{Mining High-Speed Data Streams}},
author={Domingos, P. and Hulten, G.},
year={2000},
booktitle={Proceedings of the Sixth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '00)},
pages={71--80}
}

This class will track the sufficient statistics of the training points it has seen. The HoeffdingSplit class (and other related classes) can use this class to track categorical features and split decision tree nodes.

Template Parameters
FitnessFunctionFitness function to use for calculating gain.

Constructor & Destructor Documentation

◆ HoeffdingCategoricalSplit() [1/2]

template<typename FitnessFunction >
mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::HoeffdingCategoricalSplit ( const size_t  numCategories = 0,
const size_t  numClasses = 0 
)

Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes.

Parameters
numCategoriesNumber of categories in this dimension.
numClassesNumber of classes in this dimension.

◆ HoeffdingCategoricalSplit() [2/2]

template<typename FitnessFunction >
mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::HoeffdingCategoricalSplit ( const size_t  numCategories,
const size_t  numClasses,
const HoeffdingCategoricalSplit< FitnessFunction > &  other 
)

Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes and another HoeffdingCategoricalSplit to take parameters from.

In this particular case, there are no parameters to take, but this constructor is required by the HoeffdingTree class.

Member Function Documentation

◆ EvaluateFitnessFunction()

template<typename FitnessFunction >
void mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::EvaluateFitnessFunction ( double &  bestFitness,
double &  secondBestFitness 
) const

Given the points seen so far, evaluate the fitness function, returning the gain for the best possible split and the second best possible split.

In this splitting technique, we only split one possible way, so secondBestFitness will always be 0.

Parameters
bestFitnessThe fitness function result for this split.
secondBestFitnessThis is always set to 0 (this split only splits one way).

◆ Split()

template<typename FitnessFunction >
void mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::Split ( arma::Col< size_t > &  childMajorities,
SplitInfo splitInfo 
)

Gather the information for a split: get the labels of the child majorities, and initialize the SplitInfo object.

Parameters
childMajoritiesMajorities of child nodes to be created.
splitInfoInformation for splitting.

◆ Train()

template<typename FitnessFunction >
template<typename eT >
void mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::Train ( eT  value,
const size_t  label 
)

Train on the given value with the given label.

Parameters
valueValue to train on.
labelLabel to train on.

The documentation for this class was generated from the following files: