JASSv2
Public Member Functions | Static Public Member Functions | Private Attributes | Static Private Attributes | List of all members
JASS::quantize< RANKER > Class Template Reference

Quantize an index. More...

#include <quantize.h>

Inheritance diagram for JASS::quantize< RANKER >:
Inheritance graph
[legend]
Collaboration diagram for JASS::quantize< RANKER >:
Collaboration graph
[legend]

Public Member Functions

 quantize (size_t documents, std::shared_ptr< RANKER > ranker)
 Constructor. More...
 
virtual ~quantize ()
 Destructor.
 
virtual void finish (void)
 Do any final cleaning up.
 
virtual void operator() (const slice &term, const index_postings &postings, compress_integer::integer document_frequency, compress_integer::integer *document_ids, index_postings_impact::impact_type *term_frequencies)
 The callback function for each postings list is operator(). More...
 
virtual void operator() (size_t document_id, const slice &primary_key)
 The callback function for primary keys (external document ids) is operator(). Not needed for quantization. More...
 
virtual void operator() (index_manager::delegate &writer, const slice &term, const index_postings &postings, compress_integer::integer document_frequency, compress_integer::integer *document_ids, index_postings_impact::impact_type *term_frequencies)
 The callback function for each postings list is operator(). More...
 
virtual void operator() (index_manager::delegate &writer, size_t document_id, const slice &primary_key)
 The callback function for primary keys (external document ids) is operator(). Not needed for quantization. More...
 
void get_bounds (double &smallest, double &largest)
 Get the smallest and largest term / document influence (should be called after the first round of the quantizer). More...
 
void serialise_index (index_manager &index, std::vector< std::unique_ptr< index_manager::delegate >> &serialisers)
 Given the index and a serialiser, serialise the index to disk. More...
 
- Public Member Functions inherited from JASS::index_manager::delegate
 delegate (size_t documents)
 Destructor.
 
virtual ~delegate ()
 Destructor.
 
- Public Member Functions inherited from JASS::index_manager::quantizing_delegate
virtual ~quantizing_delegate ()
 Destructor.
 

Static Public Member Functions

static void unittest (void)
 Unit test this class.
 

Private Attributes

double largest_rsv
 The largest score seen for any document/term pair.
 
double smallest_rsv
 The smallest score seen for any document/term pair.
 
std::shared_ptr< RANKER > ranker
 The ranker to use for quantization.
 
compress_integer::integer documents_in_collection
 The number of documents in the collection.
 

Static Private Attributes

static constexpr double impact_range = index_postings_impact::largest_impact - index_postings_impact::smallest_impact
 The number of values in the impact ordering range (normally 255).
 

Additional Inherited Members

- Public Attributes inherited from JASS::index_manager::delegate
size_t documents
 The number of documents in the collection.
 

Detailed Description

template<typename RANKER>
class JASS::quantize< RANKER >

Quantize an index.

Generic quantization class that performs uniform quantization according to the equations in V. N. Anh, O. de Kretser, A. Moffat (2001) Vector-space ranking with effective early termination. SIGIR 2001, PP.35-42. The ranking function itself is a template parameter, and also passed to the constructor as the ranker might need initialisation (BM25 does).

Uniform quantization if most effedctive for BM25 was BM25 has an exponential decay in the rsv scores and so high impact segments are short and low impact scores are long. The best documents have high impact scores for each query term and so have high result list rsvs are rare. Uniform quantization also does not require decoding so the cost of ranking is an integer add!

Constructor & Destructor Documentation

◆ quantize()

template<typename RANKER >
JASS::quantize< RANKER >::quantize ( size_t  documents,
std::shared_ptr< RANKER >  ranker 
)
inline

Constructor.

Parameters
documents[in] The number of documents in the collection.
ranker[in] The ranking function used for quantization.

Member Function Documentation

◆ get_bounds()

template<typename RANKER >
void JASS::quantize< RANKER >::get_bounds ( double &  smallest,
double &  largest 
)
inline

Get the smallest and largest term / document influence (should be called after the first round of the quantizer).

Parameters
smallest[out] This collection's smallest term / document influence.
largest[out] This collection's largest term / document influence.

◆ operator()() [1/4]

template<typename RANKER >
virtual void JASS::quantize< RANKER >::operator() ( const slice term,
const index_postings postings,
compress_integer::integer  document_frequency,
compress_integer::integer document_ids,
index_postings_impact::impact_type term_frequencies 
)
inlinevirtual

The callback function for each postings list is operator().

Parameters
term[in] The term name.
postings[in] The postings list.
document_frequency[in] The document frequency of the term
document_ids[in] An array (of length document_frequency) of document ids.
term_frequencies[in] An array (of length document_frequency) of term frequencies (corresponding to document_ids).

Implements JASS::index_manager::delegate.

Reimplemented in JASS::quantize_none< RANKER >.

◆ operator()() [2/4]

template<typename RANKER >
virtual void JASS::quantize< RANKER >::operator() ( size_t  document_id,
const slice primary_key 
)
inlinevirtual

The callback function for primary keys (external document ids) is operator(). Not needed for quantization.

Parameters
document_id[in] The internal document identfier.
primary_key[in] This document's primary key (external document identifier).

Implements JASS::index_manager::delegate.

◆ operator()() [3/4]

template<typename RANKER >
virtual void JASS::quantize< RANKER >::operator() ( index_manager::delegate writer,
const slice term,
const index_postings postings,
compress_integer::integer  document_frequency,
compress_integer::integer document_ids,
index_postings_impact::impact_type term_frequencies 
)
inlinevirtual

The callback function for each postings list is operator().

Parameters
writer[in] The delegate that writes the quantized result to the output media.
term[in] The term name.
postings[in] The postings list.
document_frequency[in] The document frequency of the term
document_ids[in] An array (of length document_frequency) of document ids.
term_frequencies[in] An array (of length document_frequency) of term frequencies (corresponding to document_ids).

Implements JASS::index_manager::quantizing_delegate.

Reimplemented in JASS::quantize_none< RANKER >.

◆ operator()() [4/4]

template<typename RANKER >
virtual void JASS::quantize< RANKER >::operator() ( index_manager::delegate writer,
size_t  document_id,
const slice primary_key 
)
inlinevirtual

The callback function for primary keys (external document ids) is operator(). Not needed for quantization.

Parameters
writer[in] A deligate object to manage the data once quantized.
document_id[in] The internal document identfier.
primary_key[in] This document's primary key (external document identifier).

Implements JASS::index_manager::quantizing_delegate.

Reimplemented in JASS::quantize_none< RANKER >.

◆ serialise_index()

template<typename RANKER >
void JASS::quantize< RANKER >::serialise_index ( index_manager index,
std::vector< std::unique_ptr< index_manager::delegate >> &  serialisers 
)
inline

Given the index and a serialiser, serialise the index to disk.

Parameters
index[in] The index to serialise.
serialisers[in] The serialiser that writes out in the desired format.

The documentation for this class was generated from the following file: