mlpack
Public Types | Public Member Functions | List of all members
mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap > Class Template Reference

The RandomForest class provides an implementation of random forests, described in Breiman's seminal paper: More...

#include <random_forest.hpp>

Public Types

typedef DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, DimensionSelectionType > DecisionTreeType
 Allow access to the underlying decision tree type.
 

Public Member Functions

 RandomForest ()
 Construct the random forest without any training or specifying the number of trees. More...
 
template<typename MatType >
 RandomForest (const MatType &dataset, const arma::Row< size_t > &labels, const size_t numClasses, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType())
 Create a random forest, training on the given labeled training data with the given number of trees. More...
 
template<typename MatType >
 RandomForest (const MatType &dataset, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType())
 Create a random forest, training on the given labeled training data with the given dataset info and the given number of trees. More...
 
template<typename MatType >
 RandomForest (const MatType &dataset, const arma::Row< size_t > &labels, const size_t numClasses, const arma::rowvec &weights, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType())
 Create a random forest, training on the given weighted labeled training data with the given number of trees. More...
 
template<typename MatType >
 RandomForest (const MatType &dataset, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const arma::rowvec &weights, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, DimensionSelectionType dimensionSelector=DimensionSelectionType())
 Create a random forest, training on the given weighted labeled training data with the given dataset info and the given number of trees. More...
 
template<typename MatType >
double Train (const MatType &data, const arma::Row< size_t > &labels, const size_t numClasses, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, const bool warmStart=false, DimensionSelectionType dimensionSelector=DimensionSelectionType())
 Train the random forest on the given labeled training data with the given number of trees. More...
 
template<typename MatType >
double Train (const MatType &data, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, const bool warmStart=false, DimensionSelectionType dimensionSelector=DimensionSelectionType())
 Train the random forest on the given labeled training data with the given dataset info and the given number of trees. More...
 
template<typename MatType >
double Train (const MatType &data, const arma::Row< size_t > &labels, const size_t numClasses, const arma::rowvec &weights, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, const bool warmStart=false, DimensionSelectionType dimensionSelector=DimensionSelectionType())
 Train the random forest on the given weighted labeled training data with the given number of trees. More...
 
template<typename MatType >
double Train (const MatType &data, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const arma::rowvec &weights, const size_t numTrees=20, const size_t minimumLeafSize=1, const double minimumGainSplit=1e-7, const size_t maximumDepth=0, const bool warmStart=false, DimensionSelectionType dimensionSelector=DimensionSelectionType())
 Train the random forest on the given weighted labeled training data with the given dataset info and the given number of trees. More...
 
template<typename VecType >
size_t Classify (const VecType &point) const
 Predict the class of the given point. More...
 
template<typename VecType >
void Classify (const VecType &point, size_t &prediction, arma::vec &probabilities) const
 Predict the class of the given point and return the predicted class probabilities for each class. More...
 
template<typename MatType >
void Classify (const MatType &data, arma::Row< size_t > &predictions) const
 Predict the classes of each point in the given dataset. More...
 
template<typename MatType >
void Classify (const MatType &data, arma::Row< size_t > &predictions, arma::mat &probabilities) const
 Predict the classes of each point in the given dataset, also returning the predicted class probabilities for each point. More...
 
const DecisionTreeTypeTree (const size_t i) const
 Access a tree in the forest.
 
DecisionTreeTypeTree (const size_t i)
 Modify a tree in the forest (be careful!).
 
size_t NumTrees () const
 Get the number of trees in the forest.
 
template<typename Archive >
void serialize (Archive &ar, const uint32_t)
 Serialize the random forest.
 

Detailed Description

template<typename FitnessFunction = GiniGain, typename DimensionSelectionType = MultipleRandomDimensionSelect, template< typename > class NumericSplitType = BestBinaryNumericSplit, template< typename > class CategoricalSplitType = AllCategoricalSplit, bool UseBootstrap = true>
class mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >

The RandomForest class provides an implementation of random forests, described in Breiman's seminal paper:

@article{breiman2001random,
title={Random forests},
author={Breiman, Leo},
journal={Machine Learning},
volume={45},
number={1},
pages={5--32},
year={2001},
publisher={Springer}
}

Constructor & Destructor Documentation

◆ RandomForest() [1/5]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::RandomForest ( )

Construct the random forest without any training or specifying the number of trees.

Predict() will throw an exception until Train() is called.

◆ RandomForest() [2/5]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename MatType >
mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::RandomForest ( const MatType &  dataset,
const arma::Row< size_t > &  labels,
const size_t  numClasses,
const size_t  numTrees = 20,
const size_t  minimumLeafSize = 1,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType() 
)

Create a random forest, training on the given labeled training data with the given number of trees.

The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions.

Parameters
datasetDataset to train on.
labelsLabels for dataset.
numClassesNumber of classes in dataset.
numTreesNumber of trees in the forest.
minimumLeafSizeMinimum number of points in each tree's leaf nodes.
minimumGainSplitMinimum gain for splitting a decision tree node.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.

◆ RandomForest() [3/5]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename MatType >
mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::RandomForest ( const MatType &  dataset,
const data::DatasetInfo datasetInfo,
const arma::Row< size_t > &  labels,
const size_t  numClasses,
const size_t  numTrees = 20,
const size_t  minimumLeafSize = 1,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType() 
)

Create a random forest, training on the given labeled training data with the given dataset info and the given number of trees.

The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions. This constructor can be used to train on categorical data.

Parameters
datasetDataset to train on.
datasetInfoDimension info for the dataset.
labelsLabels for dataset.
numClassesNumber of classes in dataset.
numTreesNumber of trees in the forest.
minimumLeafSizeMinimum number of points in each tree's leaf nodes.
minimumGainSplitMinimum gain for splitting a decision tree node.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.

◆ RandomForest() [4/5]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename MatType >
mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::RandomForest ( const MatType &  dataset,
const arma::Row< size_t > &  labels,
const size_t  numClasses,
const arma::rowvec &  weights,
const size_t  numTrees = 20,
const size_t  minimumLeafSize = 1,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType() 
)

Create a random forest, training on the given weighted labeled training data with the given number of trees.

The minimumLeafSize parameter is given to each individual decision tree during tree building.

Parameters
datasetDataset to train on.
labelsLabels for dataset.
numClassesNumber of classes in dataset.
weightsWeights (importances) of each point in the dataset.
numTreesNumber of trees in the forest.
minimumLeafSizeMinimum number of points in each tree's leaf nodes.
minimumGainSplitMinimum gain for splitting a decision tree node.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.

◆ RandomForest() [5/5]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename MatType >
mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::RandomForest ( const MatType &  dataset,
const data::DatasetInfo datasetInfo,
const arma::Row< size_t > &  labels,
const size_t  numClasses,
const arma::rowvec &  weights,
const size_t  numTrees = 20,
const size_t  minimumLeafSize = 1,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
DimensionSelectionType  dimensionSelector = DimensionSelectionType() 
)

Create a random forest, training on the given weighted labeled training data with the given dataset info and the given number of trees.

The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions. This can be used for categorical weighted training.

Parameters
datasetDataset to train on.
datasetInfoDimension info for the dataset.
labelsLabels for dataset.
numClassesNumber of classes in dataset.
weightsWeights (importances) of each point in the dataset.
numTreesNumber of trees in the forest.
minimumLeafSizeMinimum number of points in each tree's leaf nodes.
minimumGainSplitMinimum gain for splitting a decision tree node.
maximumDepthMaximum depth for the tree.
dimensionSelectorInstantiated dimension selection policy.

Member Function Documentation

◆ Classify() [1/4]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename VecType >
size_t mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::Classify ( const VecType &  point) const

Predict the class of the given point.

If the random forest has not been trained, this will throw an exception.

Parameters
pointPoint to be classified.

◆ Classify() [2/4]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename VecType >
void mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::Classify ( const VecType &  point,
size_t &  prediction,
arma::vec &  probabilities 
) const

Predict the class of the given point and return the predicted class probabilities for each class.

If the random forest has not been trained, this will throw an exception.

Parameters
pointPoint to be classified.
predictionsize_t to store predicted class in.
probabilitiesOutput vector of class probabilities.

◆ Classify() [3/4]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename MatType >
void mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::Classify ( const MatType &  data,
arma::Row< size_t > &  predictions 
) const

Predict the classes of each point in the given dataset.

If the random forest has not been trained, this will throw an exception.

Parameters
dataDataset to be classified.
predictionsOutput predictions for each point in the dataset.

◆ Classify() [4/4]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename MatType >
void mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::Classify ( const MatType &  data,
arma::Row< size_t > &  predictions,
arma::mat &  probabilities 
) const

Predict the classes of each point in the given dataset, also returning the predicted class probabilities for each point.

If the random forest has not been trained, this will throw an exception.

Parameters
dataDataset to be classified.
predictionsOutput predictions for each point in the dataset.
probabilitiesOutput matrix of class probabilities for each point.

◆ Train() [1/4]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename MatType >
double mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::Train ( const MatType &  data,
const arma::Row< size_t > &  labels,
const size_t  numClasses,
const size_t  numTrees = 20,
const size_t  minimumLeafSize = 1,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
const bool  warmStart = false,
DimensionSelectionType  dimensionSelector = DimensionSelectionType() 
)

Train the random forest on the given labeled training data with the given number of trees.

The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions.

Parameters
dataDataset to train on.
labelsLabels for dataset.
numClassesNumber of classes in dataset.
numTreesNumber of trees in the forest.
minimumLeafSizeMinimum number of points in each tree's leaf nodes.
minimumGainSplitMinimum gain for splitting a decision tree node.
maximumDepthMaximum depth for the tree.
warmStartWhen set to true, it adds numTrees new trees to the existing random forest otherwise a new forest is trained from scratch.
dimensionSelectorInstantiated dimension selection policy.
Returns
The average entropy of all the decision trees trained under forest.

◆ Train() [2/4]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename MatType >
double mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::Train ( const MatType &  data,
const data::DatasetInfo datasetInfo,
const arma::Row< size_t > &  labels,
const size_t  numClasses,
const size_t  numTrees = 20,
const size_t  minimumLeafSize = 1,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
const bool  warmStart = false,
DimensionSelectionType  dimensionSelector = DimensionSelectionType() 
)

Train the random forest on the given labeled training data with the given dataset info and the given number of trees.

The minimumLeafSize parameter is given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions. This overload can be used to train on categorical data.

Parameters
dataDataset to train on.
datasetInfoDimension info for the dataset.
labelsLabels for dataset.
numClassesNumber of classes in dataset.
numTreesNumber of trees in the forest.
minimumLeafSizeMinimum number of points in each tree's leaf nodes.
minimumGainSplitMinimum gain for splitting a decision tree node.
maximumDepthMaximum depth for the tree.
warmStartWhen set to true, it adds numTrees new trees to the existing random forest else a new forest is trained from scratch.
dimensionSelectorInstantiated dimension selection policy.
Returns
The average entropy of all the decision trees trained under forest.

◆ Train() [3/4]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename MatType >
double mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::Train ( const MatType &  data,
const arma::Row< size_t > &  labels,
const size_t  numClasses,
const arma::rowvec &  weights,
const size_t  numTrees = 20,
const size_t  minimumLeafSize = 1,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
const bool  warmStart = false,
DimensionSelectionType  dimensionSelector = DimensionSelectionType() 
)

Train the random forest on the given weighted labeled training data with the given number of trees.

The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions.

Parameters
dataDataset to train on.
labelsLabels for dataset.
numClassesNumber of classes in dataset.
weightsWeights (importances) of each point in the dataset.
numTreesNumber of trees in the forest.
minimumLeafSizeMinimum number of points in each tree's leaf nodes.
minimumGainSplitMinimum gain for splitting a decision tree node.
maximumDepthMaximum depth for the tree.
warmStartWhen set to true, it adds numTrees new trees to the existing random forest else a new forest is trained from scratch.
dimensionSelectorInstantiated dimension selection policy.
Returns
The average entropy of all the decision trees trained under forest.

◆ Train() [4/4]

template<typename FitnessFunction , typename DimensionSelectionType , template< typename > class NumericSplitType, template< typename > class CategoricalSplitType, bool UseBootstrap>
template<typename MatType >
double mlpack::tree::RandomForest< FitnessFunction, DimensionSelectionType, NumericSplitType, CategoricalSplitType, UseBootstrap >::Train ( const MatType &  data,
const data::DatasetInfo datasetInfo,
const arma::Row< size_t > &  labels,
const size_t  numClasses,
const arma::rowvec &  weights,
const size_t  numTrees = 20,
const size_t  minimumLeafSize = 1,
const double  minimumGainSplit = 1e-7,
const size_t  maximumDepth = 0,
const bool  warmStart = false,
DimensionSelectionType  dimensionSelector = DimensionSelectionType() 
)

Train the random forest on the given weighted labeled training data with the given dataset info and the given number of trees.

The minimumLeafSize and minimumGainSplit parameters are given to each individual decision tree during tree building. Optionally, you may specify a DimensionSelectionType to set parameters for the strategy used to choose dimensions. This overload can be used for categorical weighted training.

Parameters
dataDataset to train on.
datasetInfoDimension info for the dataset.
labelsLabels for dataset.
numClassesNumber of classes in dataset.
weightsWeights (importances) of each point in the dataset.
numTreesNumber of trees in the forest.
minimumLeafSizeMinimum number of points in each tree's leaf nodes.
minimumGainSplitMinimum gain for splitting a decision tree node.
maximumDepthMaximum depth for the tree.
warmStartWhen set to true, it adds numTrees new trees to the existing random forest else a new forest is trained from scratch.
dimensionSelectorInstantiated dimension selection policy.
Returns
The average entropy of all the decision trees trained under forest.

The documentation for this class was generated from the following files: