JASSv2
Public Member Functions | Static Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
JASS::serialise_jass_v2 Class Reference

Serialise an index in the format used by JASS version 2 (a better compressed JASS v1 format). More...

#include <serialise_jass_v2.h>

Inheritance diagram for JASS::serialise_jass_v2:
Inheritance graph
[legend]
Collaboration diagram for JASS::serialise_jass_v2:
Collaboration graph
[legend]

Public Member Functions

 serialise_jass_v2 (size_t documents, jass_v1_codex codex=jass_v1_codex::elias_gamma_simd_vb, int8_t alignment=1)
 Constructor. More...
 
virtual ~serialise_jass_v2 ()
 Destructor.
 
virtual void serialise_vocabulary_pointers (void)
 Serialise the ppointers that point between the vocab and the postings (the CIvocab.bin file).
 
virtual void serialise_primary_keys (void)
 Serialise the primary keys (or any extra stuff at the end of the primary key file).
 
virtual void operator() (const slice &term, const index_postings &postings, compress_integer::integer document_frequency, compress_integer::integer *document_ids, index_postings_impact::impact_type *term_frequencies)
 The callback function to serialise the postings (given the term) is operator(). More...
 
- Public Member Functions inherited from JASS::serialise_jass_v1
 serialise_jass_v1 (size_t documents, jass_v1_codex codex=jass_v1_codex::elias_gamma_simd, int8_t alignment=1)
 Constructor. More...
 
virtual ~serialise_jass_v1 ()
 Destructor.
 
virtual void finish (void)
 Finish up any serialising that needs to be done.
 
virtual void operator() (size_t document_id, const slice &primary_key)
 The callback function to serialise the primary keys (external document ids) is operator(). More...
 
- Public Member Functions inherited from JASS::index_manager::delegate
 delegate (size_t documents)
 Destructor.
 
virtual ~delegate ()
 Destructor.
 

Static Public Member Functions

static void unittest (void)
 Unit test this class.
 
- Static Public Member Functions inherited from JASS::serialise_jass_v1
static compress_integerget_compressor (jass_v1_codex codex, std::string &name, int32_t &d_ness)
 Return a reference to a compressor/decompressor that can be used with this index. More...
 
static void unittest (void)
 Unit test this class.
 

Protected Member Functions

virtual size_t write_postings (const index_postings &postings, size_t &number_of_impacts, compress_integer::integer document_frequency, compress_integer::integer *document_ids, index_postings_impact::impact_type *term_frequencies)
 Convert the postings list to the JASS v2 format and serialise it to disk. More...
 

Protected Attributes

std::vector< slice, allocator_cpp< slice > > compressed_headers
 
- Protected Attributes inherited from JASS::serialise_jass_v1
file vocabulary_strings
 The concatination of UTS-8 encoded unique tokens in the collection.
 
file vocabulary
 Details about the term (including a pointer to the term, a pointer to the postings, and the quantum count.
 
file postings
 The postings lists.
 
file primary_keys
 The list of external identifiers (document primary keys).
 
std::vector< vocab_trippleindex_key
 The entry point into the JASS v1 index is CIvocab.bin, the index key.
 
std::vector< uint64_t > primary_key_offsets
 A list of locations (on disk) of each primary key.
 
allocator_pool memory
 Memory used to store the impact-ordered postings list.
 
index_postings_impact impact_ordered
 The re-used impact ordered postings list.
 
std::string compressor_name
 The name of the compresson algorithm.
 
int compressor_d_ness
 The d-ness of the compression algorithm.
 
compress_integerencoder
 The integer encoder used to compress postings lists.
 
allocator_cpp< uint8_t > allocator
 C++ allocator between memory object and std::vector object.
 
std::vector< uint8_t, allocator_cpp< uint8_t > > compressed_buffer
 The buffer used to compress postings into.
 
std::vector< slice, allocator_cpp< slice > > compressed_segments
 vector of pointers (and lengths) to the compressed postings.
 
uint8_t alignment
 Postings lists are padded to this alignment (used for codexes that require word alignment).
 

Additional Inherited Members

- Public Types inherited from JASS::serialise_jass_v1
enum  jass_v1_codex {
  uncompressed = 's', variable_byte = 'c', simple_8b = '8', qmx = 'q',
  qmx_d4 = 'Q', qmx_d0 = 'R', elias_gamma_simd = 'G', elias_gamma_simd_vb = 'g',
  elias_delta_simd = 'D'
}
 The compression scheme that is active. More...
 
- Public Attributes inherited from JASS::index_manager::delegate
size_t documents
 The number of documents in the collection.
 

Detailed Description

Serialise an index in the format used by JASS version 2 (a better compressed JASS v1 format).

See description for serialise_jass_v1.

Constructor & Destructor Documentation

◆ serialise_jass_v2()

JASS::serialise_jass_v2::serialise_jass_v2 ( size_t  documents,
jass_v1_codex  codex = jass_v1_codex::elias_gamma_simd_vb,
int8_t  alignment = 1 
)
inline

Constructor.

Parameters
documents[in] The number of documents in the collection (used to allocate re-usable buffers).
encoder[in] An shared pointer to a codex responsible for performing the compression of postings lists (default = compress_integer_QMX_jass_v1()).
alignment[in] The start address of a postings list is padded to start on these boundaries (needed for compress_integer_QMX_jass_v1 (use 16), and others). Default = 0.

Member Function Documentation

◆ operator()()

void JASS::serialise_jass_v2::operator() ( const slice term,
const index_postings postings,
compress_integer::integer  document_frequency,
compress_integer::integer document_ids,
index_postings_impact::impact_type term_frequencies 
)
virtual

The callback function to serialise the postings (given the term) is operator().

Parameters
term[in] The term name.
postings[in] The postings lists.
document_frequency[in] The document frequency of the term
document_ids[in] An array (of length document_frequency) of document ids.
term_frequencies[in] An array (of length document_frequency) of term frequencies (corresponding to document_ids).

Reimplemented from JASS::serialise_jass_v1.

◆ write_postings()

size_t JASS::serialise_jass_v2::write_postings ( const index_postings postings,
size_t &  number_of_impacts,
compress_integer::integer  document_frequency,
compress_integer::integer document_ids,
index_postings_impact::impact_type term_frequencies 
)
protectedvirtual

Convert the postings list to the JASS v2 format and serialise it to disk.

Parameters
postings[in] The postings list to serialise.
number_of_impacts[out] The number of distinct impact scores seen in the postings list.
document_frequency[in] The document frequency of the term
document_ids[in] An array (of length document_frequency) of document ids.
term_frequencies[in] An array (of length document_frequency) of term frequencies (corresponding to document_ids).
Returns
The location (in CIpostings.bin) of the start of the serialised postings list.

Reimplemented from JASS::serialise_jass_v1.


The documentation for this class was generated from the following files: