mlpack
|
A hybrid spill tree is a variant of binary space trees in which the children of a node can "spill over" each other, and contain shared datapoints. More...
#include <spill_tree.hpp>
Classes | |
class | SpillDualTreeTraverser |
A generic dual-tree traverser for hybrid spill trees; see spill_dual_tree_traverser.hpp for implementation. More... | |
class | SpillSingleTreeTraverser |
A generic single-tree traverser for hybrid spill trees; see spill_single_tree_traverser.hpp for implementation. More... | |
Public Types | |
typedef MatType | Mat |
So other classes can use TreeType::Mat. | |
typedef MatType::elem_type | ElemType |
The type of element held in MatType. | |
typedef HyperplaneType< MetricType >::BoundType | BoundType |
The bound type. | |
template<typename RuleType > | |
using | SingleTreeTraverser = SpillSingleTreeTraverser< RuleType, false > |
A single-tree traverser for hybrid spill trees. | |
template<typename RuleType > | |
using | DefeatistSingleTreeTraverser = SpillSingleTreeTraverser< RuleType, true > |
A defeatist single-tree traverser for hybrid spill trees. | |
template<typename RuleType > | |
using | DualTreeTraverser = SpillDualTreeTraverser< RuleType, false > |
A dual-tree traverser for hybrid spill trees. | |
template<typename RuleType > | |
using | DefeatistDualTreeTraverser = SpillDualTreeTraverser< RuleType, true > |
A defeatist dual-tree traverser for hybrid spill trees. | |
Public Member Functions | |
SpillTree (const MatType &data, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7) | |
Construct this as the root node of a hybrid spill tree using the given dataset. More... | |
SpillTree (MatType &&data, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7) | |
Construct this as the root node of a hybrid spill tree using the given dataset. More... | |
SpillTree (SpillTree *parent, arma::Col< size_t > &points, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7) | |
Construct this node as a child of the given parent, including the given list of points. More... | |
SpillTree (const SpillTree &other) | |
Create a hybrid spill tree by copying the other tree. More... | |
SpillTree (SpillTree &&other) | |
Move constructor for a SpillTree; possess all the members of the given tree. More... | |
SpillTree & | operator= (const SpillTree &other) |
Copy the given Spill Tree. More... | |
SpillTree & | operator= (SpillTree &&other) |
Take ownership of the given Spill Tree. More... | |
template<typename Archive > | |
SpillTree (Archive &ar, const typename std::enable_if_t< cereal::is_loading< Archive >()> *=0) | |
Initialize the tree from a cereal archive. More... | |
~SpillTree () | |
Deletes this node, deallocating the memory for the children and calling their destructors in turn. More... | |
const BoundType & | Bound () const |
Return the bound object for this node. | |
BoundType & | Bound () |
Return the bound object for this node. | |
const StatisticType & | Stat () const |
Return the statistic object for this node. | |
StatisticType & | Stat () |
Return the statistic object for this node. | |
bool | IsLeaf () const |
Return whether or not this node is a leaf (true if it has no children). | |
SpillTree * | Left () const |
Gets the left child of this node. | |
SpillTree *& | Left () |
Modify the left child of this node. | |
SpillTree * | Right () const |
Gets the right child of this node. | |
SpillTree *& | Right () |
Modify the right child of this node. | |
SpillTree * | Parent () const |
Gets the parent of this node. | |
SpillTree *& | Parent () |
Modify the parent of this node. | |
const MatType & | Dataset () const |
Get the dataset which the tree is built on. | |
bool | Overlap () const |
Distinguish overlapping nodes from non-overlapping nodes. | |
const HyperplaneType< MetricType > & | Hyperplane () const |
Get the Hyperplane instance. | |
MetricType | Metric () const |
Get the metric that the tree uses. | |
size_t | NumChildren () const |
Return the number of children in this node. More... | |
template<typename VecType > | |
size_t | GetNearestChild (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) |
Return the index of the nearest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest). More... | |
template<typename VecType > | |
size_t | GetFurthestChild (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) |
Return the index of the furthest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest). More... | |
size_t | GetNearestChild (const SpillTree &queryNode) |
Return the index of the nearest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest). More... | |
size_t | GetFurthestChild (const SpillTree &queryNode) |
Return the index of the furthest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest). More... | |
ElemType | FurthestPointDistance () const |
Return the furthest distance to a point held in this node. More... | |
ElemType | FurthestDescendantDistance () const |
Return the furthest possible descendant distance. More... | |
ElemType | MinimumBoundDistance () const |
Return the minimum distance from the center of the node to any bound edge. More... | |
ElemType | ParentDistance () const |
Return the distance from the center of this node to the center of the parent node. More... | |
ElemType & | ParentDistance () |
Modify the distance from the center of this node to the center of the parent node. More... | |
SpillTree & | Child (const size_t child) const |
Return the specified child (0 will be left, 1 will be right). More... | |
SpillTree *& | ChildPtr (const size_t child) |
size_t | NumPoints () const |
Return the number of points in this node (0 if not a leaf). More... | |
size_t | NumDescendants () const |
Return the number of descendants of this node. More... | |
size_t | Descendant (const size_t index) const |
Return the index (with reference to the dataset) of a particular descendant of this node. More... | |
size_t | Point (const size_t index) const |
Return the index (with reference to the dataset) of a particular point in this node. More... | |
ElemType | MinDistance (const SpillTree &other) const |
Return the minimum distance to another node. | |
ElemType | MaxDistance (const SpillTree &other) const |
Return the maximum distance to another node. | |
math::RangeType< ElemType > | RangeDistance (const SpillTree &other) const |
Return the minimum and maximum distance to another node. | |
template<typename VecType > | |
ElemType | MinDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const |
Return the minimum distance to another point. | |
template<typename VecType > | |
ElemType | MaxDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const |
Return the maximum distance to another point. | |
template<typename VecType > | |
math::RangeType< ElemType > | RangeDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const |
Return the minimum and maximum distance to another point. | |
void | Center (arma::vec ¢er) |
Store the center of the bounding region in the given vector. | |
template<typename Archive > | |
void | serialize (Archive &ar, const uint32_t version) |
Serialize the tree. | |
Static Public Member Functions | |
static bool | HasSelfChildren () |
Returns false: this tree type does not have self children. | |
Protected Member Functions | |
SpillTree () | |
A default constructor. More... | |
Friends | |
class | cereal::access |
Friend access is given for the default constructor. | |
A hybrid spill tree is a variant of binary space trees in which the children of a node can "spill over" each other, and contain shared datapoints.
Two new separating planes lplane and rplane are defined, both of which are parallel to the original decision boundary and at a distance tau from it. The region between lplane and rplane is called "overlapping buffer".
For each node, we first split the points considering the overlapping buffer. If either of its children contains more than rho fraction of the total points we undo the overlapping splitting. Instead a conventional partition is used. In this way, we can ensure that each split reduces the number of points of a node by at least a constant factor.
This particular tree does not allow growth, so you cannot add or delete nodes from it. If you need to add or delete a node, the better procedure is to rebuild the tree entirely.
Three runtime parameters are required in the constructor:
For more information on spill trees, see
MetricType | The metric used for tree-building. |
StatisticType | Extra data contained in the node. See statistic.hpp for the necessary skeleton interface. |
MatType | The dataset class. |
HyperplaneType | The splitting hyperplane class. |
SplitType | The class that partitions the dataset/points at a particular node into two parts. Its definition decides the way this split is done. |
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree | ( | const MatType & | data, |
const double | tau = 0 , |
||
const size_t | maxLeafSize = 20 , |
||
const double | rho = 0.7 |
||
) |
Construct this as the root node of a hybrid spill tree using the given dataset.
The dataset will not be modified during the building procedure (unlike BinarySpaceTree).
data | Dataset to create tree from. |
tau | Overlapping size. |
maxLeafSize | Size of each leaf in the tree. |
rho | Balance threshold. |
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree | ( | MatType && | data, |
const double | tau = 0 , |
||
const size_t | maxLeafSize = 20 , |
||
const double | rho = 0.7 |
||
) |
Construct this as the root node of a hybrid spill tree using the given dataset.
This will take ownership of the data matrix; if you don't want this, consider using the constructor that takes a const reference to a dataset.
data | Dataset to create tree from. |
tau | Overlapping size. |
maxLeafSize | Size of each leaf in the tree. |
rho | Balance threshold. |
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree | ( | SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > * | parent, |
arma::Col< size_t > & | points, | ||
const double | tau = 0 , |
||
const size_t | maxLeafSize = 20 , |
||
const double | rho = 0.7 |
||
) |
Construct this node as a child of the given parent, including the given list of points.
This is used for recursive tree-building by the other constructors which don't specify point indices.
parent | Parent of this node. |
points | Vector of indexes of points to be included in this node. |
tau | Overlapping size. |
maxLeafSize | Size of each leaf in the tree. |
rho | Balance threshold. |
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree | ( | const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & | other | ) |
Create a hybrid spill tree by copying the other tree.
Be careful! This can take a long time and use a lot of memory.
other | tree to be replicated. |
Be careful! This can take a long time and use a lot of memory.
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree | ( | SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > && | other | ) |
Move constructor for a SpillTree; possess all the members of the given tree.
Move constructor.
other | tree to be moved. |
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree | ( | Archive & | ar, |
const typename std::enable_if_t< cereal::is_loading< Archive >()> * | = 0 |
||
) |
Initialize the tree from a cereal archive.
Initialize the tree from an archive.
ar | Archive to load tree from. Must be an iarchive, not an oarchive. |
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::~SpillTree | ( | ) |
Deletes this node, deallocating the memory for the children and calling their destructors in turn.
This will invalidate any pointers or references to any nodes which are children of this one.
|
protected |
A default constructor.
This is meant to only be used with cereal, which is allowed with the friend declaration below. This does not return a valid tree! The method must be protected, so that the serialization shim can work with the default constructor.
|
inline |
Return the specified child (0 will be left, 1 will be right).
Return the specified child.
If the index is greater than 1, this will return the right child.
child | Index of child to return. |
|
inline |
Return the index (with reference to the dataset) of a particular descendant of this node.
Return the index of a particular descendant contained in this node.
The index should be greater than zero but less than the number of descendants.
index | Index of the descendant. |
|
inline |
Return the furthest possible descendant distance.
This returns the maximum distance from the centroid to the edge of the bound and not the empirical quantity which is the actual furthest descendant distance. So the actual furthest descendant distance may be less than what this method returns (but it will never be greater than this).
This returns the maximum distance from the center to the edge of the bound and not the empirical quantity which is the actual furthest descendant distance. So the actual furthest descendant distance may be less than what this method returns (but it will never be greater than this).
|
inline |
Return the furthest distance to a point held in this node.
Return a bound on the furthest point in the node from the center.
If this is not a leaf node, then the distance is 0 because the node holds no points.
This returns 0 unless the node is a leaf.
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::GetFurthestChild | ( | const VecType & | point, |
typename std::enable_if_t< IsVector< VecType >::value > * | = 0 |
||
) |
Return the index of the furthest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest).
If this is a leaf node, it will return NumChildren() (invalid index).
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::GetFurthestChild | ( | const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & | queryNode | ) |
Return the index of the furthest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest).
Return the index of the furthest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest).
If it can't decide it will return NumChildren() (invalid index).
If this is a leaf node, it will return NumChildren() (invalid index).
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::GetNearestChild | ( | const VecType & | point, |
typename std::enable_if_t< IsVector< VecType >::value > * | = 0 |
||
) |
Return the index of the nearest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest).
If this is a leaf node, it will return NumChildren() (invalid index).
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::GetNearestChild | ( | const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & | queryNode | ) |
Return the index of the nearest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest).
If it can't decide it will return NumChildren() (invalid index).
|
inline |
Return the minimum distance from the center of the node to any bound edge.
Return the minimum distance from the center to any bound edge.
|
inline |
Return the number of children in this node.
Returns the number of children in this node.
|
inline |
Return the number of descendants of this node.
Return the number of descendants contained in the node.
For a non-leaf spill tree, this is the number of points at the descendant leaves. For a leaf, this is the number of points in the leaf.
|
inline |
Return the number of points in this node (0 if not a leaf).
Return the number of points contained in this node.
SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::operator= | ( | const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & | other | ) |
Copy the given Spill Tree.
Copy assignment operator: copy the given other tree.
other | The tree to be copied. |
SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::operator= | ( | SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > && | other | ) |
Take ownership of the given Spill Tree.
Move assignment operator: take ownership of the given tree.
other | The tree to take ownership of. |
|
inline |
Return the distance from the center of this node to the center of the parent node.
|
inline |
Modify the distance from the center of this node to the center of the parent node.
|
inline |
Return the index (with reference to the dataset) of a particular point in this node.
Return the index of a particular point contained in this node.
This will happily return invalid indices if the given index is greater than the number of points in this node (obtained with NumPoints()) – be careful.
index | Index of point for which a dataset index is wanted. |