mlpack
Public Types | Public Member Functions | Friends | List of all members
mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType > Class Template Reference

The RASearch class: This class provides a generic manner to perform rank-approximate search via random-sampling. More...

#include <ra_search.hpp>

Public Types

typedef TreeType< MetricType, RAQueryStat< SortPolicy >, MatType > Tree
 Convenience typedef.
 

Public Member Functions

 RASearch (MatType referenceSet, const bool naive=false, const bool singleMode=false, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20, const MetricType metric=MetricType())
 Initialize the RASearch object, passing both a reference dataset (this is the dataset that will be searched). More...
 
 RASearch (Tree *referenceTree, const bool singleMode=false, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20, const MetricType metric=MetricType())
 Initialize the RASearch object with the given pre-constructed reference tree. More...
 
 RASearch (const bool naive=false, const bool singleMode=false, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20, const MetricType metric=MetricType())
 Create an RASearch object with no reference data. More...
 
 ~RASearch ()
 Delete the RASearch object. More...
 
void Train (MatType referenceSet)
 "Train" the model on the given reference set. More...
 
void Train (Tree *referenceTree)
 Set the reference tree to a new reference tree.
 
void Search (const MatType &querySet, const size_t k, arma::Mat< size_t > &neighbors, arma::mat &distances)
 Compute the rank approximate nearest neighbors of each query point in the query set and store the output in the given matrices. More...
 
void Search (Tree *queryTree, const size_t k, arma::Mat< size_t > &neighbors, arma::mat &distances)
 Compute the rank approximate nearest neighbors of each point in the pre-built query tree and store the output in the given matrices. More...
 
void Search (const size_t k, arma::Mat< size_t > &neighbors, arma::mat &distances)
 Compute the rank approximate nearest neighbors of each point in the reference set (that is, the query set is taken to be the reference set), and store the output in the given matrices. More...
 
void ResetQueryTree (Tree *queryTree) const
 This function recursively resets the RAQueryStat of the given query tree to set 'bound' to SortPolicy::WorstDistance and 'numSamplesMade' to 0. More...
 
const MatType & ReferenceSet () const
 Access the reference set.
 
bool Naive () const
 Get whether or not naive (brute-force) search is used.
 
bool & Naive ()
 Modify whether or not naive (brute-force) search is used.
 
bool SingleMode () const
 Get whether or not single-tree search is used.
 
bool & SingleMode ()
 Modify whether or not single-tree search is used.
 
double Tau () const
 Get the rank-approximation in percentile of the data.
 
double & Tau ()
 Modify the rank-approximation in percentile of the data.
 
double Alpha () const
 Get the desired success probability.
 
double & Alpha ()
 Modify the desired success probability.
 
bool SampleAtLeaves () const
 Get whether or not sampling is done at the leaves.
 
bool & SampleAtLeaves ()
 Modify whether or not sampling is done at the leaves.
 
bool FirstLeafExact () const
 Get whether or not we traverse to the first leaf without approximation.
 
bool & FirstLeafExact ()
 Modify whether or not we traverse to the first leaf without approximation.
 
size_t SingleSampleLimit () const
 Get the limit on the size of a node that can be approximated.
 
size_t & SingleSampleLimit ()
 Modify the limit on the size of a node that can be approximation.
 
template<typename Archive >
void serialize (Archive &ar, const uint32_t)
 Serialize the object.
 

Friends

class LeafSizeRAWrapper< TreeType >
 For access to mappings when building models.
 

Detailed Description

template<typename SortPolicy = NearestNeighborSort, typename MetricType = metric::EuclideanDistance, typename MatType = arma::mat, template< typename TreeMetricType, typename TreeStatType, typename TreeMatType > class TreeType = tree::KDTree>
class mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >

The RASearch class: This class provides a generic manner to perform rank-approximate search via random-sampling.

If the 'naive' option is chosen, this rank-approximate search will be done by randomly sampling from the whole set. If the 'naive' option is not chosen, the sampling is done in a stratified manner in the tree as mentioned in the algorithms in Figure 2 of the following paper:

@inproceedings{ram2009rank,
title={{Rank-Approximate Nearest Neighbor Search: Retaining Meaning and
Speed in High Dimensions}},
author={{Ram, P. and Lee, D. and Ouyang, H. and Gray, A. G.}},
booktitle={{Advances of Neural Information Processing Systems}},
year={2009}
}

RASearch is currently known to not work with ball trees (#356).

Template Parameters
SortPolicyThe sort policy for distances; see NearestNeighborSort.
MetricTypeThe metric to use for computation.
TreeTypeThe tree type to use.

Constructor & Destructor Documentation

◆ RASearch() [1/3]

template<typename SortPolicy , typename MetricType, typename MatType, template< typename TreeMetricType, typename TreeStatType, typename TreeMatType > class TreeType>
mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::RASearch ( MatType  referenceSet,
const bool  naive = false,
const bool  singleMode = false,
const double  tau = 5,
const double  alpha = 0.95,
const bool  sampleAtLeaves = false,
const bool  firstLeafExact = false,
const size_t  singleSampleLimit = 20,
const MetricType  metric = MetricType() 
)

Initialize the RASearch object, passing both a reference dataset (this is the dataset that will be searched).

Optionally, perform the computation in naive mode or single-tree mode. An initialized distance metric can be given, for cases where the metric has internal data (i.e. the distance::MahalanobisDistance class).

This method will copy the matrices to internal copies, which are rearranged during tree-building. If you don't need to keep the reference dataset, you can use std::move() to remove the overhead of making copies. Using std::move() transfers the ownership of the dataset.

tau, the rank-approximation parameter, specifies that we are looking for k neighbors with probability alpha of being in the top tau percent of nearest neighbors. So, as an example, if our dataset has 1000 points, and we want 5 nearest neighbors with 95% probability of being in the top 5% of nearest neighbors (or, the top 50 nearest neighbors), we set k = 5, tau = 5, and alpha = 0.95.

The method will fail (and throw a std::invalid_argument exception) if the value of tau is too low: tau must be set such that the number of points in the corresponding percentile of the data is greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting to choose 5 nearest neighbors out of the closest 1 point – this is invalid.

Parameters
referenceSetSet of reference points.
naiveIf true, the rank-approximate search will be performed by directly sampling the whole set instead of using the stratified sampling on the tree.
singleModeIf true, single-tree search will be used (as opposed to dual-tree search). This is useful when Search() will be called with few query points.
metricAn optional instance of the MetricType class.
tauThe rank-approximation in percentile of the data. The default value is 5%.
alphaThe desired success probability. The default value is 0.95.
sampleAtLeavesSample at leaves for faster but less accurate computation. This defaults to 'false'.
firstLeafExactTraverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now.
singleSampleLimitThe limit on the largest node that can be approximated by sampling. This defaults to 20.

◆ RASearch() [2/3]

template<typename SortPolicy , typename MetricType, typename MatType, template< typename TreeMetricType, typename TreeStatType, typename TreeMatType > class TreeType>
mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::RASearch ( Tree referenceTree,
const bool  singleMode = false,
const double  tau = 5,
const double  alpha = 0.95,
const bool  sampleAtLeaves = false,
const bool  firstLeafExact = false,
const size_t  singleSampleLimit = 20,
const MetricType  metric = MetricType() 
)

Initialize the RASearch object with the given pre-constructed reference tree.

It is assumed that the points in the tree's dataset correspond to the reference set. Optionally, choose to use single-tree mode. Naive mode is not available as an option for this constructor; instead, to run naive computation, use a different constructor. Additionally, an instantiated distance metric can be given, for cases where the distance metric holds data.

There is no copying of the data matrices in this constructor (because tree-building is not necessary), so this is the constructor to use when copies absolutely must be avoided.

tau, the rank-approximation parameter, specifies that we are looking for k neighbors with probability alpha of being in the top tau percent of nearest neighbors. So, as an example, if our dataset has 1000 points, and we want 5 nearest neighbors with 95% probability of being in the top 5% of nearest neighbors (or, the top 50 nearest neighbors), we set k = 5, tau = 5, and alpha = 0.95.

The method will fail (and throw a std::invalid_argument exception) if the value of tau is too low: tau must be set such that the number of points in the corresponding percentile of the data is greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting to choose 5 nearest neighbors out of the closest 1 point – this is invalid.

Note
Tree-building may (at least with BinarySpaceTree) modify the ordering of a matrix, so be aware that the results you get from Search() will correspond to the modified matrix.
Parameters
referenceTreePre-built tree for reference points.
singleModeWhether single-tree computation should be used (as opposed to dual-tree computation).
tauThe rank-approximation in percentile of the data. The default value is 5%.
alphaThe desired success probability. The default value is 0.95.
sampleAtLeavesSample at leaves for faster but less accurate computation. This defaults to 'false'.
firstLeafExactTraverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now.
singleSampleLimitThe limit on the largest node that can be approximated by sampling. This defaults to 20.
metricInstantiated distance metric.

◆ RASearch() [3/3]

template<typename SortPolicy , typename MetricType, typename MatType, template< typename TreeMetricType, typename TreeStatType, typename TreeMatType > class TreeType>
mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::RASearch ( const bool  naive = false,
const bool  singleMode = false,
const double  tau = 5,
const double  alpha = 0.95,
const bool  sampleAtLeaves = false,
const bool  firstLeafExact = false,
const size_t  singleSampleLimit = 20,
const MetricType  metric = MetricType() 
)

Create an RASearch object with no reference data.

If Search() is called before a reference set is set with Train(), an exception will be thrown.

Parameters
naiveWhether naive (brute-force) search should be used.
singleModeWhether single-tree computation should be used (as opposed to dual-tree computation).
tauThe rank-approximation in percentile of the data. The default value is 5%.
alphaThe desired success probability. The default value is 0.95.
sampleAtLeavesSample at leaves for faster but less accurate computation. This defaults to 'false'.
firstLeafExactTraverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now.
singleSampleLimitThe limit on the largest node that can be approximated by sampling. This defaults to 20.
metricInstantiated distance metric.

◆ ~RASearch()

template<typename SortPolicy , typename MetricType , typename MatType , template< typename TreeMetricType, typename TreeStatType, typename TreeMatType > class TreeType>
mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::~RASearch ( )

Delete the RASearch object.

The tree and the dataset are the only members we may be responsible for deleting.

The tree is the only member we are responsible for deleting. The others will take care of themselves.

The others will take care of themselves.

Member Function Documentation

◆ ResetQueryTree()

template<typename SortPolicy , typename MetricType , typename MatType , template< typename TreeMetricType, typename TreeStatType, typename TreeMatType > class TreeType>
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::ResetQueryTree ( Tree queryTree) const

This function recursively resets the RAQueryStat of the given query tree to set 'bound' to SortPolicy::WorstDistance and 'numSamplesMade' to 0.

This allows a user to perform multiple searches with the same query tree, possibly with different levels of approximation without requiring to build a new pair of trees for every new (approximate) search.

If Search() is called multiple times with the same query tree without calling ResetQueryTree(), the results may not satisfy the theoretical guarantees provided by the rank-approximate neighbor search algorithm.

Parameters
queryTreeTree whose statistics should be reset.

◆ Search() [1/3]

template<typename SortPolicy , typename MetricType , typename MatType, template< typename TreeMetricType, typename TreeStatType, typename TreeMatType > class TreeType>
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Search ( const MatType &  querySet,
const size_t  k,
arma::Mat< size_t > &  neighbors,
arma::mat &  distances 
)

Compute the rank approximate nearest neighbors of each query point in the query set and store the output in the given matrices.

Computes the best neighbors and stores them in resultingNeighbors and distances.

The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.

If querySet is small or only contains one point, it can be faster to do single-tree search; single-tree search can be set with the SingleMode() function or in the constructor.

Parameters
querySetSet of query points (can be a single point).
kNumber of neighbors to search for.
neighborsMatrix storing lists of neighbors for each query point.
distancesMatrix storing distances of neighbors for each query point.

◆ Search() [2/3]

template<typename SortPolicy , typename MetricType , typename MatType, template< typename TreeMetricType, typename TreeStatType, typename TreeMatType > class TreeType>
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Search ( Tree queryTree,
const size_t  k,
arma::Mat< size_t > &  neighbors,
arma::mat &  distances 
)

Compute the rank approximate nearest neighbors of each point in the pre-built query tree and store the output in the given matrices.

The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.

If singleMode or naive is enabled, then this method will throw a std::invalid_argument exception; calling this function implies a dual-tree algorithm.

Note
If the tree type you are using modifies the data matrix, be aware that the results returned from this function will be with respect to the modified data matrix.
Parameters
queryTreeTree built on query points.
kNumber of neighbors to search for.
neighborsMatrix storing lists of neighbors for each query point.
distancesMatrix storing distances of neighbors for each query point.

◆ Search() [3/3]

template<typename SortPolicy , typename MetricType , typename MatType, template< typename TreeMetricType, typename TreeStatType, typename TreeMatType > class TreeType>
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Search ( const size_t  k,
arma::Mat< size_t > &  neighbors,
arma::mat &  distances 
)

Compute the rank approximate nearest neighbors of each point in the reference set (that is, the query set is taken to be the reference set), and store the output in the given matrices.

The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.

Parameters
kNumber of neighbors to search for.
neighborsMatrix storing lists of neighbors for each point.
distancesMatrix storing distances of neighbors for each query point.

◆ Train()

template<typename SortPolicy , typename MetricType , typename MatType, template< typename TreeMetricType, typename TreeStatType, typename TreeMatType > class TreeType>
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Train ( MatType  referenceSet)

"Train" the model on the given reference set.

If tree-based search is being used (if Naive() is false), the reference tree is rebuilt. Thus, a copy of the reference dataset is made. If you don't need to keep the dataset, you can avoid copying by using std::move(). This transfers the ownership of the dataset.

Parameters
referenceSetNew reference set to use.

The documentation for this class was generated from the following files: