mlpack
Classes | Public Types | Public Member Functions | Static Public Member Functions | Protected Member Functions | Friends | List of all members
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > Class Template Reference

A hybrid spill tree is a variant of binary space trees in which the children of a node can "spill over" each other, and contain shared datapoints. More...

#include <spill_tree.hpp>

Classes

class  SpillDualTreeTraverser
 A generic dual-tree traverser for hybrid spill trees; see spill_dual_tree_traverser.hpp for implementation. More...
 
class  SpillSingleTreeTraverser
 A generic single-tree traverser for hybrid spill trees; see spill_single_tree_traverser.hpp for implementation. More...
 

Public Types

typedef MatType Mat
 So other classes can use TreeType::Mat.
 
typedef MatType::elem_type ElemType
 The type of element held in MatType.
 
typedef HyperplaneType< MetricType >::BoundType BoundType
 The bound type.
 
template<typename RuleType >
using SingleTreeTraverser = SpillSingleTreeTraverser< RuleType, false >
 A single-tree traverser for hybrid spill trees.
 
template<typename RuleType >
using DefeatistSingleTreeTraverser = SpillSingleTreeTraverser< RuleType, true >
 A defeatist single-tree traverser for hybrid spill trees.
 
template<typename RuleType >
using DualTreeTraverser = SpillDualTreeTraverser< RuleType, false >
 A dual-tree traverser for hybrid spill trees.
 
template<typename RuleType >
using DefeatistDualTreeTraverser = SpillDualTreeTraverser< RuleType, true >
 A defeatist dual-tree traverser for hybrid spill trees.
 

Public Member Functions

 SpillTree (const MatType &data, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7)
 Construct this as the root node of a hybrid spill tree using the given dataset. More...
 
 SpillTree (MatType &&data, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7)
 Construct this as the root node of a hybrid spill tree using the given dataset. More...
 
 SpillTree (SpillTree *parent, arma::Col< size_t > &points, const double tau=0, const size_t maxLeafSize=20, const double rho=0.7)
 Construct this node as a child of the given parent, including the given list of points. More...
 
 SpillTree (const SpillTree &other)
 Create a hybrid spill tree by copying the other tree. More...
 
 SpillTree (SpillTree &&other)
 Move constructor for a SpillTree; possess all the members of the given tree. More...
 
SpillTreeoperator= (const SpillTree &other)
 Copy the given Spill Tree. More...
 
SpillTreeoperator= (SpillTree &&other)
 Take ownership of the given Spill Tree. More...
 
template<typename Archive >
 SpillTree (Archive &ar, const typename std::enable_if_t< cereal::is_loading< Archive >()> *=0)
 Initialize the tree from a cereal archive. More...
 
 ~SpillTree ()
 Deletes this node, deallocating the memory for the children and calling their destructors in turn. More...
 
const BoundTypeBound () const
 Return the bound object for this node.
 
BoundTypeBound ()
 Return the bound object for this node.
 
const StatisticType & Stat () const
 Return the statistic object for this node.
 
StatisticType & Stat ()
 Return the statistic object for this node.
 
bool IsLeaf () const
 Return whether or not this node is a leaf (true if it has no children).
 
SpillTreeLeft () const
 Gets the left child of this node.
 
SpillTree *& Left ()
 Modify the left child of this node.
 
SpillTreeRight () const
 Gets the right child of this node.
 
SpillTree *& Right ()
 Modify the right child of this node.
 
SpillTreeParent () const
 Gets the parent of this node.
 
SpillTree *& Parent ()
 Modify the parent of this node.
 
const MatType & Dataset () const
 Get the dataset which the tree is built on.
 
bool Overlap () const
 Distinguish overlapping nodes from non-overlapping nodes.
 
const HyperplaneType< MetricType > & Hyperplane () const
 Get the Hyperplane instance.
 
MetricType Metric () const
 Get the metric that the tree uses.
 
size_t NumChildren () const
 Return the number of children in this node. More...
 
template<typename VecType >
size_t GetNearestChild (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0)
 Return the index of the nearest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest). More...
 
template<typename VecType >
size_t GetFurthestChild (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0)
 Return the index of the furthest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest). More...
 
size_t GetNearestChild (const SpillTree &queryNode)
 Return the index of the nearest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest). More...
 
size_t GetFurthestChild (const SpillTree &queryNode)
 Return the index of the furthest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest). More...
 
ElemType FurthestPointDistance () const
 Return the furthest distance to a point held in this node. More...
 
ElemType FurthestDescendantDistance () const
 Return the furthest possible descendant distance. More...
 
ElemType MinimumBoundDistance () const
 Return the minimum distance from the center of the node to any bound edge. More...
 
ElemType ParentDistance () const
 Return the distance from the center of this node to the center of the parent node. More...
 
ElemTypeParentDistance ()
 Modify the distance from the center of this node to the center of the parent node. More...
 
SpillTreeChild (const size_t child) const
 Return the specified child (0 will be left, 1 will be right). More...
 
SpillTree *& ChildPtr (const size_t child)
 
size_t NumPoints () const
 Return the number of points in this node (0 if not a leaf). More...
 
size_t NumDescendants () const
 Return the number of descendants of this node. More...
 
size_t Descendant (const size_t index) const
 Return the index (with reference to the dataset) of a particular descendant of this node. More...
 
size_t Point (const size_t index) const
 Return the index (with reference to the dataset) of a particular point in this node. More...
 
ElemType MinDistance (const SpillTree &other) const
 Return the minimum distance to another node.
 
ElemType MaxDistance (const SpillTree &other) const
 Return the maximum distance to another node.
 
math::RangeType< ElemTypeRangeDistance (const SpillTree &other) const
 Return the minimum and maximum distance to another node.
 
template<typename VecType >
ElemType MinDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const
 Return the minimum distance to another point.
 
template<typename VecType >
ElemType MaxDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const
 Return the maximum distance to another point.
 
template<typename VecType >
math::RangeType< ElemTypeRangeDistance (const VecType &point, typename std::enable_if_t< IsVector< VecType >::value > *=0) const
 Return the minimum and maximum distance to another point.
 
void Center (arma::vec &center)
 Store the center of the bounding region in the given vector.
 
template<typename Archive >
void serialize (Archive &ar, const uint32_t version)
 Serialize the tree.
 

Static Public Member Functions

static bool HasSelfChildren ()
 Returns false: this tree type does not have self children.
 

Protected Member Functions

 SpillTree ()
 A default constructor. More...
 

Friends

class cereal::access
 Friend access is given for the default constructor.
 

Detailed Description

template<typename MetricType, typename StatisticType = EmptyStatistic, typename MatType = arma::mat, template< typename HyperplaneMetricType > class HyperplaneType = AxisOrthogonalHyperplane, template< typename SplitMetricType, typename SplitMatType > class SplitType = MidpointSpaceSplit>
class mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >

A hybrid spill tree is a variant of binary space trees in which the children of a node can "spill over" each other, and contain shared datapoints.

Two new separating planes lplane and rplane are defined, both of which are parallel to the original decision boundary and at a distance tau from it. The region between lplane and rplane is called "overlapping buffer".

For each node, we first split the points considering the overlapping buffer. If either of its children contains more than rho fraction of the total points we undo the overlapping splitting. Instead a conventional partition is used. In this way, we can ensure that each split reduces the number of points of a node by at least a constant factor.

This particular tree does not allow growth, so you cannot add or delete nodes from it. If you need to add or delete a node, the better procedure is to rebuild the tree entirely.

Three runtime parameters are required in the constructor:

For more information on spill trees, see

@inproceedings{
author = {Ting Liu, Andrew W. Moore, Alexander Gray and Ke Yang},
title = {An Investigation of Practical Approximate Nearest Neighbor
Algorithms},
booktitle = {Advances in Neural Information Processing Systems 17},
year = {2005},
pages = {825--832}
}
Template Parameters
MetricTypeThe metric used for tree-building.
StatisticTypeExtra data contained in the node. See statistic.hpp for the necessary skeleton interface.
MatTypeThe dataset class.
HyperplaneTypeThe splitting hyperplane class.
SplitTypeThe class that partitions the dataset/points at a particular node into two parts. Its definition decides the way this split is done.

Constructor & Destructor Documentation

◆ SpillTree() [1/7]

template<typename MetricType , typename StatisticType , typename MatType, template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree ( const MatType &  data,
const double  tau = 0,
const size_t  maxLeafSize = 20,
const double  rho = 0.7 
)

Construct this as the root node of a hybrid spill tree using the given dataset.

The dataset will not be modified during the building procedure (unlike BinarySpaceTree).

Parameters
dataDataset to create tree from.
tauOverlapping size.
maxLeafSizeSize of each leaf in the tree.
rhoBalance threshold.

◆ SpillTree() [2/7]

template<typename MetricType , typename StatisticType , typename MatType, template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree ( MatType &&  data,
const double  tau = 0,
const size_t  maxLeafSize = 20,
const double  rho = 0.7 
)

Construct this as the root node of a hybrid spill tree using the given dataset.

This will take ownership of the data matrix; if you don't want this, consider using the constructor that takes a const reference to a dataset.

Parameters
dataDataset to create tree from.
tauOverlapping size.
maxLeafSizeSize of each leaf in the tree.
rhoBalance threshold.

◆ SpillTree() [3/7]

template<typename MetricType , typename StatisticType , typename MatType, template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree ( SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > *  parent,
arma::Col< size_t > &  points,
const double  tau = 0,
const size_t  maxLeafSize = 20,
const double  rho = 0.7 
)

Construct this node as a child of the given parent, including the given list of points.

This is used for recursive tree-building by the other constructors which don't specify point indices.

Parameters
parentParent of this node.
pointsVector of indexes of points to be included in this node.
tauOverlapping size.
maxLeafSizeSize of each leaf in the tree.
rhoBalance threshold.

◆ SpillTree() [4/7]

template<typename MetricType , typename StatisticType , typename MatType, template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  other)

Create a hybrid spill tree by copying the other tree.

Be careful! This can take a long time and use a lot of memory.

Parameters
othertree to be replicated.

Be careful! This can take a long time and use a lot of memory.

◆ SpillTree() [5/7]

template<typename MetricType , typename StatisticType , typename MatType, template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree ( SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &&  other)

Move constructor for a SpillTree; possess all the members of the given tree.

Move constructor.

Parameters
othertree to be moved.

◆ SpillTree() [6/7]

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
template<typename Archive >
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree ( Archive &  ar,
const typename std::enable_if_t< cereal::is_loading< Archive >()> *  = 0 
)

Initialize the tree from a cereal archive.

Initialize the tree from an archive.

Parameters
arArchive to load tree from. Must be an iarchive, not an oarchive.

◆ ~SpillTree()

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::~SpillTree ( )

Deletes this node, deallocating the memory for the children and calling their destructors in turn.

This will invalidate any pointers or references to any nodes which are children of this one.

◆ SpillTree() [7/7]

template<typename MetricType , typename StatisticType , typename MatType, template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::SpillTree ( )
protected

A default constructor.

This is meant to only be used with cereal, which is allowed with the friend declaration below. This does not return a valid tree! The method must be protected, so that the serialization shim can work with the default constructor.

Member Function Documentation

◆ Child()

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Child ( const size_t  child) const
inline

Return the specified child (0 will be left, 1 will be right).

Return the specified child.

If the index is greater than 1, this will return the right child.

Parameters
childIndex of child to return.

◆ Descendant()

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Descendant ( const size_t  index) const
inline

Return the index (with reference to the dataset) of a particular descendant of this node.

Return the index of a particular descendant contained in this node.

The index should be greater than zero but less than the number of descendants.

Parameters
indexIndex of the descendant.

◆ FurthestDescendantDistance()

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ElemType mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::FurthestDescendantDistance ( ) const
inline

Return the furthest possible descendant distance.

This returns the maximum distance from the centroid to the edge of the bound and not the empirical quantity which is the actual furthest descendant distance. So the actual furthest descendant distance may be less than what this method returns (but it will never be greater than this).

This returns the maximum distance from the center to the edge of the bound and not the empirical quantity which is the actual furthest descendant distance. So the actual furthest descendant distance may be less than what this method returns (but it will never be greater than this).

◆ FurthestPointDistance()

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ElemType mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::FurthestPointDistance ( ) const
inline

Return the furthest distance to a point held in this node.

Return a bound on the furthest point in the node from the center.

If this is not a leaf node, then the distance is 0 because the node holds no points.

This returns 0 unless the node is a leaf.

◆ GetFurthestChild() [1/2]

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
template<typename VecType >
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::GetFurthestChild ( const VecType &  point,
typename std::enable_if_t< IsVector< VecType >::value > *  = 0 
)

Return the index of the furthest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest).

If this is a leaf node, it will return NumChildren() (invalid index).

◆ GetFurthestChild() [2/2]

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::GetFurthestChild ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  queryNode)

Return the index of the furthest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest).

Return the index of the furthest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the furthest).

If it can't decide it will return NumChildren() (invalid index).

If this is a leaf node, it will return NumChildren() (invalid index).

◆ GetNearestChild() [1/2]

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
template<typename VecType >
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::GetNearestChild ( const VecType &  point,
typename std::enable_if_t< IsVector< VecType >::value > *  = 0 
)

Return the index of the nearest child node to the given query point (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest).

If this is a leaf node, it will return NumChildren() (invalid index).

◆ GetNearestChild() [2/2]

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::GetNearestChild ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  queryNode)

Return the index of the nearest child node to the given query node (this is an efficient estimation based on the splitting hyperplane, the node returned is not necessarily the nearest).

If it can't decide it will return NumChildren() (invalid index).

◆ MinimumBoundDistance()

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ElemType mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::MinimumBoundDistance ( ) const
inline

Return the minimum distance from the center of the node to any bound edge.

Return the minimum distance from the center to any bound edge.

◆ NumChildren()

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::NumChildren ( ) const
inline

Return the number of children in this node.

Returns the number of children in this node.

◆ NumDescendants()

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::NumDescendants ( ) const
inline

Return the number of descendants of this node.

Return the number of descendants contained in the node.

For a non-leaf spill tree, this is the number of points at the descendant leaves. For a leaf, this is the number of points in the leaf.

◆ NumPoints()

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::NumPoints ( ) const
inline

Return the number of points in this node (0 if not a leaf).

Return the number of points contained in this node.

◆ operator=() [1/2]

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::operator= ( const SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &  other)

Copy the given Spill Tree.

Copy assignment operator: copy the given other tree.

Parameters
otherThe tree to be copied.

◆ operator=() [2/2]

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > & mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::operator= ( SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType > &&  other)

Take ownership of the given Spill Tree.

Move assignment operator: take ownership of the given tree.

Parameters
otherThe tree to take ownership of.

◆ ParentDistance() [1/2]

template<typename MetricType, typename StatisticType = EmptyStatistic, typename MatType = arma::mat, template< typename HyperplaneMetricType > class HyperplaneType = AxisOrthogonalHyperplane, template< typename SplitMetricType, typename SplitMatType > class SplitType = MidpointSpaceSplit>
ElemType mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ParentDistance ( ) const
inline

Return the distance from the center of this node to the center of the parent node.

◆ ParentDistance() [2/2]

template<typename MetricType, typename StatisticType = EmptyStatistic, typename MatType = arma::mat, template< typename HyperplaneMetricType > class HyperplaneType = AxisOrthogonalHyperplane, template< typename SplitMetricType, typename SplitMatType > class SplitType = MidpointSpaceSplit>
ElemType& mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::ParentDistance ( )
inline

Modify the distance from the center of this node to the center of the parent node.

◆ Point()

template<typename MetricType , typename StatisticType , typename MatType , template< typename HyperplaneMetricType > class HyperplaneType, template< typename SplitMetricType, typename SplitMatType > class SplitType>
size_t mlpack::tree::SpillTree< MetricType, StatisticType, MatType, HyperplaneType, SplitType >::Point ( const size_t  index) const
inline

Return the index (with reference to the dataset) of a particular point in this node.

Return the index of a particular point contained in this node.

This will happily return invalid indices if the given index is greater than the number of points in this node (obtained with NumPoints()) – be careful.

Parameters
indexIndex of point for which a dataset index is wanted.

The documentation for this class was generated from the following files: