|
JASSv2
|
Quantize an index. More...
#include <quantize.h>


Public Member Functions | |
| quantize (size_t documents, std::shared_ptr< RANKER > ranker) | |
| Constructor. More... | |
| virtual | ~quantize () |
| Destructor. | |
| virtual void | finish (void) |
| Do any final cleaning up. | |
| virtual void | operator() (const slice &term, const index_postings &postings, compress_integer::integer document_frequency, compress_integer::integer *document_ids, index_postings_impact::impact_type *term_frequencies) |
| The callback function for each postings list is operator(). More... | |
| virtual void | operator() (size_t document_id, const slice &primary_key) |
| The callback function for primary keys (external document ids) is operator(). Not needed for quantization. More... | |
| virtual void | operator() (index_manager::delegate &writer, const slice &term, const index_postings &postings, compress_integer::integer document_frequency, compress_integer::integer *document_ids, index_postings_impact::impact_type *term_frequencies) |
| The callback function for each postings list is operator(). More... | |
| virtual void | operator() (index_manager::delegate &writer, size_t document_id, const slice &primary_key) |
| The callback function for primary keys (external document ids) is operator(). Not needed for quantization. More... | |
| void | get_bounds (double &smallest, double &largest) |
| Get the smallest and largest term / document influence (should be called after the first round of the quantizer). More... | |
| void | serialise_index (index_manager &index, std::vector< std::unique_ptr< index_manager::delegate >> &serialisers) |
| Given the index and a serialiser, serialise the index to disk. More... | |
Public Member Functions inherited from JASS::index_manager::delegate | |
| delegate (size_t documents) | |
| Destructor. | |
| virtual | ~delegate () |
| Destructor. | |
Public Member Functions inherited from JASS::index_manager::quantizing_delegate | |
| virtual | ~quantizing_delegate () |
| Destructor. | |
Static Public Member Functions | |
| static void | unittest (void) |
| Unit test this class. | |
Private Attributes | |
| double | largest_rsv |
| The largest score seen for any document/term pair. | |
| double | smallest_rsv |
| The smallest score seen for any document/term pair. | |
| std::shared_ptr< RANKER > | ranker |
| The ranker to use for quantization. | |
| compress_integer::integer | documents_in_collection |
| The number of documents in the collection. | |
Static Private Attributes | |
| static constexpr double | impact_range = index_postings_impact::largest_impact - index_postings_impact::smallest_impact |
| The number of values in the impact ordering range (normally 255). | |
Additional Inherited Members | |
Public Attributes inherited from JASS::index_manager::delegate | |
| size_t | documents |
| The number of documents in the collection. | |
Quantize an index.
Generic quantization class that performs uniform quantization according to the equations in V. N. Anh, O. de Kretser, A. Moffat (2001) Vector-space ranking with effective early termination. SIGIR 2001, PP.35-42. The ranking function itself is a template parameter, and also passed to the constructor as the ranker might need initialisation (BM25 does).
Uniform quantization if most effedctive for BM25 was BM25 has an exponential decay in the rsv scores and so high impact segments are short and low impact scores are long. The best documents have high impact scores for each query term and so have high result list rsvs are rare. Uniform quantization also does not require decoding so the cost of ranking is an integer add!
|
inline |
Constructor.
| documents | [in] The number of documents in the collection. |
| ranker | [in] The ranking function used for quantization. |
|
inline |
Get the smallest and largest term / document influence (should be called after the first round of the quantizer).
| smallest | [out] This collection's smallest term / document influence. |
| largest | [out] This collection's largest term / document influence. |
|
inlinevirtual |
The callback function for each postings list is operator().
| term | [in] The term name. |
| postings | [in] The postings list. |
| document_frequency | [in] The document frequency of the term |
| document_ids | [in] An array (of length document_frequency) of document ids. |
| term_frequencies | [in] An array (of length document_frequency) of term frequencies (corresponding to document_ids). |
Implements JASS::index_manager::delegate.
Reimplemented in JASS::quantize_none< RANKER >.
|
inlinevirtual |
The callback function for primary keys (external document ids) is operator(). Not needed for quantization.
| document_id | [in] The internal document identfier. |
| primary_key | [in] This document's primary key (external document identifier). |
Implements JASS::index_manager::delegate.
|
inlinevirtual |
The callback function for each postings list is operator().
| writer | [in] The delegate that writes the quantized result to the output media. |
| term | [in] The term name. |
| postings | [in] The postings list. |
| document_frequency | [in] The document frequency of the term |
| document_ids | [in] An array (of length document_frequency) of document ids. |
| term_frequencies | [in] An array (of length document_frequency) of term frequencies (corresponding to document_ids). |
Implements JASS::index_manager::quantizing_delegate.
Reimplemented in JASS::quantize_none< RANKER >.
|
inlinevirtual |
The callback function for primary keys (external document ids) is operator(). Not needed for quantization.
| writer | [in] A deligate object to manage the data once quantized. |
| document_id | [in] The internal document identfier. |
| primary_key | [in] This document's primary key (external document identifier). |
Implements JASS::index_manager::quantizing_delegate.
Reimplemented in JASS::quantize_none< RANKER >.
|
inline |
Given the index and a serialiser, serialise the index to disk.
| index | [in] The index to serialise. |
| serialisers | [in] The serialiser that writes out in the desired format. |
1.8.13