mlpack
Static Public Member Functions | List of all members
mlpack::tree::GiniGain Class Reference

The Gini gain, a measure of set purity usable as a fitness function (FitnessFunction) for decision trees. More...

#include <gini_gain.hpp>

Static Public Member Functions

template<bool UseWeights, typename CountType >
static double EvaluatePtr (const CountType *counts, const size_t countLength, const CountType totalCount)
 Evaluate the Gini impurity given a vector of class weight counts.
 
template<bool UseWeights, typename RowType , typename WeightVecType >
static double Evaluate (const RowType &labels, const size_t numClasses, const WeightVecType &weights)
 Evaluate the Gini impurity on the given set of labels. More...
 
static double Range (const size_t numClasses)
 Return the range of the Gini impurity for the given number of classes. More...
 

Detailed Description

The Gini gain, a measure of set purity usable as a fitness function (FitnessFunction) for decision trees.

This is the exact same thing as the well-known Gini impurity, but negated—since the decision tree will be trying to maximize gain (and the Gini impurity would need to be minimized).

Member Function Documentation

◆ Evaluate()

template<bool UseWeights, typename RowType , typename WeightVecType >
static double mlpack::tree::GiniGain::Evaluate ( const RowType &  labels,
const size_t  numClasses,
const WeightVecType &  weights 
)
inlinestatic

Evaluate the Gini impurity on the given set of labels.

RowType should be an Armadillo vector that holds size_t objects.

Note that it is possible that due to floating-point representation issues, it is possible that the gain returned can be very slightly greater than 0! Thus, if you are checking for a perfect fit, be sure to use 'gain >= 0.0' not 'gain == 0.0'.

Parameters
labelsSet of labels to evaluate Gini impurity on.
numClassesNumber of classes in the dataset.
weightsWeight of labels.

◆ Range()

static double mlpack::tree::GiniGain::Range ( const size_t  numClasses)
inlinestatic

Return the range of the Gini impurity for the given number of classes.

(That is, the difference between the maximum possible value and the minimum possible value.)

Parameters
numClassesNumber of classes in the dataset.

The documentation for this class was generated from the following file: