|
JASSv2
|
Serialise an index in the format used by JASS version 2 (a better compressed JASS v1 format). More...
#include <serialise_jass_v2.h>


Public Member Functions | |
| serialise_jass_v2 (size_t documents, jass_v1_codex codex=jass_v1_codex::elias_gamma_simd_vb, int8_t alignment=1) | |
| Constructor. More... | |
| virtual | ~serialise_jass_v2 () |
| Destructor. | |
| virtual void | serialise_vocabulary_pointers (void) |
| Serialise the ppointers that point between the vocab and the postings (the CIvocab.bin file). | |
| virtual void | serialise_primary_keys (void) |
| Serialise the primary keys (or any extra stuff at the end of the primary key file). | |
| virtual void | operator() (const slice &term, const index_postings &postings, compress_integer::integer document_frequency, compress_integer::integer *document_ids, index_postings_impact::impact_type *term_frequencies) |
| The callback function to serialise the postings (given the term) is operator(). More... | |
Public Member Functions inherited from JASS::serialise_jass_v1 | |
| serialise_jass_v1 (size_t documents, jass_v1_codex codex=jass_v1_codex::elias_gamma_simd, int8_t alignment=1) | |
| Constructor. More... | |
| virtual | ~serialise_jass_v1 () |
| Destructor. | |
| virtual void | finish (void) |
| Finish up any serialising that needs to be done. | |
| virtual void | operator() (size_t document_id, const slice &primary_key) |
| The callback function to serialise the primary keys (external document ids) is operator(). More... | |
Public Member Functions inherited from JASS::index_manager::delegate | |
| delegate (size_t documents) | |
| Destructor. | |
| virtual | ~delegate () |
| Destructor. | |
Static Public Member Functions | |
| static void | unittest (void) |
| Unit test this class. | |
Static Public Member Functions inherited from JASS::serialise_jass_v1 | |
| static compress_integer * | get_compressor (jass_v1_codex codex, std::string &name, int32_t &d_ness) |
| Return a reference to a compressor/decompressor that can be used with this index. More... | |
| static void | unittest (void) |
| Unit test this class. | |
Protected Member Functions | |
| virtual size_t | write_postings (const index_postings &postings, size_t &number_of_impacts, compress_integer::integer document_frequency, compress_integer::integer *document_ids, index_postings_impact::impact_type *term_frequencies) |
| Convert the postings list to the JASS v2 format and serialise it to disk. More... | |
Protected Attributes | |
| std::vector< slice, allocator_cpp< slice > > | compressed_headers |
Protected Attributes inherited from JASS::serialise_jass_v1 | |
| file | vocabulary_strings |
| The concatination of UTS-8 encoded unique tokens in the collection. | |
| file | vocabulary |
| Details about the term (including a pointer to the term, a pointer to the postings, and the quantum count. | |
| file | postings |
| The postings lists. | |
| file | primary_keys |
| The list of external identifiers (document primary keys). | |
| std::vector< vocab_tripple > | index_key |
| The entry point into the JASS v1 index is CIvocab.bin, the index key. | |
| std::vector< uint64_t > | primary_key_offsets |
| A list of locations (on disk) of each primary key. | |
| allocator_pool | memory |
| Memory used to store the impact-ordered postings list. | |
| index_postings_impact | impact_ordered |
| The re-used impact ordered postings list. | |
| std::string | compressor_name |
| The name of the compresson algorithm. | |
| int | compressor_d_ness |
| The d-ness of the compression algorithm. | |
| compress_integer * | encoder |
| The integer encoder used to compress postings lists. | |
| allocator_cpp< uint8_t > | allocator |
| C++ allocator between memory object and std::vector object. | |
| std::vector< uint8_t, allocator_cpp< uint8_t > > | compressed_buffer |
| The buffer used to compress postings into. | |
| std::vector< slice, allocator_cpp< slice > > | compressed_segments |
| vector of pointers (and lengths) to the compressed postings. | |
| uint8_t | alignment |
| Postings lists are padded to this alignment (used for codexes that require word alignment). | |
Additional Inherited Members | |
Public Types inherited from JASS::serialise_jass_v1 | |
| enum | jass_v1_codex { uncompressed = 's', variable_byte = 'c', simple_8b = '8', qmx = 'q', qmx_d4 = 'Q', qmx_d0 = 'R', elias_gamma_simd = 'G', elias_gamma_simd_vb = 'g', elias_delta_simd = 'D' } |
| The compression scheme that is active. More... | |
Public Attributes inherited from JASS::index_manager::delegate | |
| size_t | documents |
| The number of documents in the collection. | |
Serialise an index in the format used by JASS version 2 (a better compressed JASS v1 format).
See description for serialise_jass_v1.
|
inline |
Constructor.
| documents | [in] The number of documents in the collection (used to allocate re-usable buffers). |
| encoder | [in] An shared pointer to a codex responsible for performing the compression of postings lists (default = compress_integer_QMX_jass_v1()). |
| alignment | [in] The start address of a postings list is padded to start on these boundaries (needed for compress_integer_QMX_jass_v1 (use 16), and others). Default = 0. |
|
virtual |
The callback function to serialise the postings (given the term) is operator().
| term | [in] The term name. |
| postings | [in] The postings lists. |
| document_frequency | [in] The document frequency of the term |
| document_ids | [in] An array (of length document_frequency) of document ids. |
| term_frequencies | [in] An array (of length document_frequency) of term frequencies (corresponding to document_ids). |
Reimplemented from JASS::serialise_jass_v1.
|
protectedvirtual |
Convert the postings list to the JASS v2 format and serialise it to disk.
| postings | [in] The postings list to serialise. |
| number_of_impacts | [out] The number of distinct impact scores seen in the postings list. |
| document_frequency | [in] The document frequency of the term |
| document_ids | [in] An array (of length document_frequency) of document ids. |
| term_frequencies | [in] An array (of length document_frequency) of term frequencies (corresponding to document_ids). |
Reimplemented from JASS::serialise_jass_v1.
1.8.13