mlpack
Public Types | Public Member Functions | List of all members
mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType > Class Template Reference

The BinaryNumericSplit class implements the numeric feature splitting strategy devised by Gama, Rocha, and Medas in the following paper: More...

#include <binary_numeric_split.hpp>

Public Types

typedef BinaryNumericSplitInfo< ObservationType > SplitInfo
 The splitting information required by the BinaryNumericSplit.
 

Public Member Functions

 BinaryNumericSplit (const size_t numClasses=0)
 Create the BinaryNumericSplit object with the given number of classes. More...
 
 BinaryNumericSplit (const size_t numClasses, const BinaryNumericSplit &other)
 Create the BinaryNumericSplit object with the given number of classes, using information from the given other split for other parameters. More...
 
void Train (ObservationType value, const size_t label)
 Train on the given value with the given label. More...
 
void EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness)
 Given the points seen so far, evaluate the fitness function, returning the best possible gain of a binary split. More...
 
size_t NumChildren () const
 
void Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo)
 Given that a split should happen, return the majority classes of the (two) children and an initialized SplitInfo object. More...
 
size_t MajorityClass () const
 The majority class of the points seen so far.
 
double MajorityProbability () const
 The probability of the majority class given the points seen so far.
 
template<typename Archive >
void serialize (Archive &ar, const uint32_t)
 Serialize the object.
 

Detailed Description

template<typename FitnessFunction, typename ObservationType = double>
class mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >

The BinaryNumericSplit class implements the numeric feature splitting strategy devised by Gama, Rocha, and Medas in the following paper:

@inproceedings{gama2003accurate,
title={Accurate Decision Trees for Mining High-Speed Data Streams},
author={Gama, J. and Rocha, R. and Medas, P.},
year={2003},
booktitle={Proceedings of the Ninth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '03)},
pages={523--528}
}

This splitting procedure builds a binary tree on points it has seen so far, and then EvaluateFitnessFunction() returns the best possible split in O(n) time, where n is the number of samples seen so far. Every split with this split type returns only two splits (greater than or equal to the split point, and less than the split point). The Train() function should take O(1) time.

Template Parameters
FitnessFunctionFitness function to use for calculating gain.
ObservationTypeType of observation used by this dimension.

Constructor & Destructor Documentation

◆ BinaryNumericSplit() [1/2]

template<typename FitnessFunction , typename ObservationType >
mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::BinaryNumericSplit ( const size_t  numClasses = 0)

Create the BinaryNumericSplit object with the given number of classes.

Parameters
numClassesNumber of classes in dataset.

◆ BinaryNumericSplit() [2/2]

template<typename FitnessFunction , typename ObservationType >
mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::BinaryNumericSplit ( const size_t  numClasses,
const BinaryNumericSplit< FitnessFunction, ObservationType > &  other 
)

Create the BinaryNumericSplit object with the given number of classes, using information from the given other split for other parameters.

In this case, there are no other parameters, but this function is required by the HoeffdingTree class.

Member Function Documentation

◆ EvaluateFitnessFunction()

template<typename FitnessFunction , typename ObservationType >
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::EvaluateFitnessFunction ( double &  bestFitness,
double &  secondBestFitness 
)

Given the points seen so far, evaluate the fitness function, returning the best possible gain of a binary split.

Note that this takes O(n) time, where n is the number of points seen so far. So this may not exactly be fast...

The best possible split will be stored in bestFitness, and the second best possible split will be stored in secondBestFitness.

Parameters
bestFitnessFitness function value for best possible split.
secondBestFitnessFitness function value for second best possible split.

◆ Split()

template<typename FitnessFunction , typename ObservationType >
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::Split ( arma::Col< size_t > &  childMajorities,
SplitInfo splitInfo 
)

Given that a split should happen, return the majority classes of the (two) children and an initialized SplitInfo object.

Parameters
childMajoritiesMajority classes of the children after the split.
splitInfoSplit information.

◆ Train()

template<typename FitnessFunction , typename ObservationType >
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::Train ( ObservationType  value,
const size_t  label 
)

Train on the given value with the given label.

Parameters
valueThe value to train on.
labelThe label to train on.

The documentation for this class was generated from the following files: