mlpack
Classes | Public Member Functions | Static Public Member Functions | List of all members
mlpack::neighbor::LSHSearch< SortPolicy, MatType > Class Template Reference

The LSHSearch class; this class builds a hash on the reference set and uses this hash to compute the distance-approximate nearest-neighbors of the given queries. More...

#include <lsh_search.hpp>

Public Member Functions

 LSHSearch (MatType referenceSet, const arma::cube &projections, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500)
 This function initializes the LSH class. More...
 
 LSHSearch (MatType referenceSet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500)
 This function initializes the LSH class. More...
 
 LSHSearch ()
 Create an untrained LSH model. More...
 
 LSHSearch (const LSHSearch &other)
 Copy the given LSH model. More...
 
 LSHSearch (LSHSearch &&other)
 Take ownership of the given LSH model. More...
 
LSHSearchoperator= (const LSHSearch &other)
 Copy the given LSH model. More...
 
LSHSearchoperator= (LSHSearch &&other)
 Take ownership of the given LSH model. More...
 
void Train (MatType referenceSet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500, const arma::cube &projection=arma::cube())
 Train the LSH model on the given dataset. More...
 
void Search (const MatType &querySet, const size_t k, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const size_t numTablesToSearch=0, const size_t T=0)
 Compute the nearest neighbors of the points in the given query set and store the output in the given matrices. More...
 
void Search (const size_t k, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const size_t numTablesToSearch=0, size_t T=0)
 Compute the nearest neighbors and store the output in the given matrices. More...
 
template<typename Archive >
void serialize (Archive &ar, const uint32_t version)
 Serialize the LSH model. More...
 
size_t DistanceEvaluations () const
 Return the number of distance evaluations performed.
 
size_t & DistanceEvaluations ()
 Modify the number of distance evaluations performed.
 
const MatType & ReferenceSet () const
 Return the reference dataset.
 
size_t NumProjections () const
 Get the number of projections.
 
const arma::mat & Offsets () const
 Get the offsets 'b' for each of the projections. (One 'b' per column.)
 
const arma::vec & SecondHashWeights () const
 Get the weights of the second hash.
 
size_t BucketSize () const
 Get the bucket size of the second hash.
 
const std::vector< arma::Col< size_t > > & SecondHashTable () const
 Get the second hash table.
 
const arma::cube & Projections ()
 Get the projection tables.
 
void Projections (const arma::cube &projTables)
 Change the projection tables (this retrains the LSH model).
 

Static Public Member Functions

static double ComputeRecall (const arma::Mat< size_t > &foundNeighbors, const arma::Mat< size_t > &realNeighbors)
 Compute the recall (% of neighbors found) given the neighbors returned by LSHSearch::Search and a "ground truth" set of neighbors. More...
 

Detailed Description

template<typename SortPolicy = NearestNeighborSort, typename MatType = arma::mat>
class mlpack::neighbor::LSHSearch< SortPolicy, MatType >

The LSHSearch class; this class builds a hash on the reference set and uses this hash to compute the distance-approximate nearest-neighbors of the given queries.

Template Parameters
SortPolicyThe sort policy for distances; see NearestNeighborSort.
MatTypeType of matrix to use to store the data.

Constructor & Destructor Documentation

◆ LSHSearch() [1/5]

template<typename SortPolicy , typename MatType >
mlpack::neighbor::LSHSearch< SortPolicy, MatType >::LSHSearch ( MatType  referenceSet,
const arma::cube &  projections,
const double  hashWidth = 0.0,
const size_t  secondHashSize = 99901,
const size_t  bucketSize = 500 
)

This function initializes the LSH class.

It builds the hash on the reference set with 2-stable distributions. See the individual functions performing the hashing for details on how the hashing is done. In order to avoid copying the reference set, it is suggested to pass that parameter with std::move().

Parameters
referenceSetSet of reference points and the set of queries.
projectionsCube of projection tables. For a cube of size (a, b, c) we set numProj = a, numTables = c. b is the reference set dimensionality.
hashWidthThe width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearest-neighbor distance in general.
secondHashSizeThe size of the second hash table. This should be a large prime number.
bucketSizeThe size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. A value of 0 indicates that there is no limit (so the second hash table can be arbitrarily large—be careful!).

◆ LSHSearch() [2/5]

template<typename SortPolicy , typename MatType >
mlpack::neighbor::LSHSearch< SortPolicy, MatType >::LSHSearch ( MatType  referenceSet,
const size_t  numProj,
const size_t  numTables,
const double  hashWidth = 0.0,
const size_t  secondHashSize = 99901,
const size_t  bucketSize = 500 
)

This function initializes the LSH class.

It builds the hash one the reference set using the provided projections. See the individual functions performing the hashing for details on how the hashing is done. In order to avoid copying the reference set, consider passing the set with std::move().

Parameters
referenceSetSet of reference points and the set of queries.
numProjNumber of projections in each hash table (anything between 10-50 might be a decent choice).
numTablesTotal number of hash tables (anything between 10-20 should suffice).
hashWidthThe width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearest-neighbor distance in general.
secondHashSizeThe size of the second hash table. This should be a large prime number.
bucketSizeThe size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. A value of 0 indicates that there is no limit (so the second hash table can be arbitrarily large—be careful!).

◆ LSHSearch() [3/5]

template<typename SortPolicy , typename MatType >
mlpack::neighbor::LSHSearch< SortPolicy, MatType >::LSHSearch ( )

Create an untrained LSH model.

Be sure to call Train() before calling Search(); otherwise, an exception will be thrown when Search() is called.

◆ LSHSearch() [4/5]

template<typename SortPolicy , typename MatType >
mlpack::neighbor::LSHSearch< SortPolicy, MatType >::LSHSearch ( const LSHSearch< SortPolicy, MatType > &  other)

Copy the given LSH model.

Parameters
otherOther LSH model to copy.

◆ LSHSearch() [5/5]

template<typename SortPolicy , typename MatType >
mlpack::neighbor::LSHSearch< SortPolicy, MatType >::LSHSearch ( LSHSearch< SortPolicy, MatType > &&  other)

Take ownership of the given LSH model.

Parameters
otherOther LSH model to take ownership of.

Member Function Documentation

◆ ComputeRecall()

template<typename SortPolicy , typename MatType >
double mlpack::neighbor::LSHSearch< SortPolicy, MatType >::ComputeRecall ( const arma::Mat< size_t > &  foundNeighbors,
const arma::Mat< size_t > &  realNeighbors 
)
static

Compute the recall (% of neighbors found) given the neighbors returned by LSHSearch::Search and a "ground truth" set of neighbors.

The recall returned will be in the range [0, 1].

Parameters
foundNeighborsSet of neighbors to compute recall of.
realNeighborsSet of "ground truth" neighbors to compute recall against.

◆ operator=() [1/2]

template<typename SortPolicy , typename MatType >
LSHSearch< SortPolicy, MatType > & mlpack::neighbor::LSHSearch< SortPolicy, MatType >::operator= ( const LSHSearch< SortPolicy, MatType > &  other)

Copy the given LSH model.

Parameters
otherOther LSH model to copy.

◆ operator=() [2/2]

template<typename SortPolicy , typename MatType >
LSHSearch< SortPolicy, MatType > & mlpack::neighbor::LSHSearch< SortPolicy, MatType >::operator= ( LSHSearch< SortPolicy, MatType > &&  other)

Take ownership of the given LSH model.

Parameters
otherOther LSH model to take ownership of.

◆ Search() [1/2]

template<typename SortPolicy , typename MatType >
void mlpack::neighbor::LSHSearch< SortPolicy, MatType >::Search ( const MatType &  querySet,
const size_t  k,
arma::Mat< size_t > &  resultingNeighbors,
arma::mat &  distances,
const size_t  numTablesToSearch = 0,
const size_t  T = 0 
)

Compute the nearest neighbors of the points in the given query set and store the output in the given matrices.

The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.

Parameters
querySetSet of query points.
kNumber of neighbors to search for.
resultingNeighborsMatrix storing lists of neighbors for each query point.
distancesMatrix storing distances of neighbors for each query point.
numTablesToSearchThis parameter allows the user to have control over the number of hash tables to be searched. This allows the user to pick the number of tables it can afford for the time available without having to build hashing for every table size. By default, this is set to zero in which case all tables are considered.
TThe number of additional probing bins to examine with multiprobe LSH. If T = 0, classic single-probe LSH is run (default).

◆ Search() [2/2]

template<typename SortPolicy , typename MatType >
void mlpack::neighbor::LSHSearch< SortPolicy, MatType >::Search ( const size_t  k,
arma::Mat< size_t > &  resultingNeighbors,
arma::mat &  distances,
const size_t  numTablesToSearch = 0,
size_t  T = 0 
)

Compute the nearest neighbors and store the output in the given matrices.

The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.

Parameters
kNumber of neighbors to search for.
resultingNeighborsMatrix storing lists of neighbors for each query point.
distancesMatrix storing distances of neighbors for each query point.
numTablesToSearchThis parameter allows the user to have control over the number of hash tables to be searched. This allows the user to pick the number of tables it can afford for the time available without having to build hashing for every table size. By default, this is set to zero in which case all tables are considered.
TNumber of probing bins.

◆ serialize()

template<typename SortPolicy , typename MatType >
template<typename Archive >
void mlpack::neighbor::LSHSearch< SortPolicy, MatType >::serialize ( Archive &  ar,
const uint32_t  version 
)

Serialize the LSH model.

Parameters
arArchive to serialize to.
versionserialize class version to provide backward compatibility

◆ Train()

template<typename SortPolicy , typename MatType >
void mlpack::neighbor::LSHSearch< SortPolicy, MatType >::Train ( MatType  referenceSet,
const size_t  numProj,
const size_t  numTables,
const double  hashWidth = 0.0,
const size_t  secondHashSize = 99901,
const size_t  bucketSize = 500,
const arma::cube &  projection = arma::cube() 
)

Train the LSH model on the given dataset.

If a correctly-sized projection cube is not provided, this means building new hash tables. Otherwise, we use the projections provided by the user. In order to avoid copying the reference set, consider passing that parameter with std::move().

Parameters
referenceSetSet of reference points and the set of queries.
numProjNumber of projections in each hash table (anything between 10-50 might be a decent choice).
numTablesTotal number of hash tables (anything between 10-20 should suffice).
hashWidthThe width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearest-neighbor distance in general.
secondHashSizeThe size of the second hash table. This should be a large prime number.
bucketSizeThe size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. A value of 0 indicates that there is no limit (so the second hash table can be arbitrarily large—be careful!).
projectionCube of projection tables. For a cube of size (a, b, c) we set numProj = a, numTables = c. b is the reference set dimensionality.

The documentation for this class was generated from the following files: