JASSv2
Classes | Public Member Functions | Protected Member Functions | Protected Attributes | Static Private Attributes | List of all members
JASS::deserialised_jass_v1 Class Reference

Load and deserialise a JASS v1 index. More...

#include <deserialised_jass_v1.h>

Inheritance diagram for JASS::deserialised_jass_v1:
Inheritance graph
[legend]
Collaboration diagram for JASS::deserialised_jass_v1:
Collaboration graph
[legend]

Classes

class  metadata
 metadata for a given term including pointer to postings and number of impacts. More...
 
class  segment_header
 Each impact ordered segment contains a header with the impact score and the pointers to documents. More...
 
class  segment_header_on_disk
 Each impact ordered segment contains a header with the impact score and the pointers to documents. More...
 

Public Member Functions

 deserialised_jass_v1 (bool verbose=false)
 Constructor. More...
 
virtual ~deserialised_jass_v1 ()
 Destructor.
 
size_t read_index (const std::string &directory="")
 Read a JASS v1 index into memory. More...
 
compress_integercodex (std::string &name, int32_t &d_ness) const
 Return a reference to a decompressor that can be used with this index. More...
 
const std::vector< std::string > & primary_keys (void) const
 Return the list of primary keys as a std::vector<std::string> More...
 
const uint8_t * postings (void) const
 Return a pointer to the start of the postings "file". More...
 
query::DOCID_TYPE document_count (void) const
 Return the number of documents in the collection. More...
 
bool postings_details (metadata &metadata, const query_term &term) const
 Return the meta-data about the postings list. More...
 
virtual size_t get_segment_list (segment_header *segments, metadata &metadata, size_t query_term_frequency, uint32_t &smallest, uint32_t &largest, query::DOCID_TYPE &document_frequency) const
 Extract the segment headers and return them in the parameter called segments. More...
 
auto begin (void)
 return an iterator over the vocabulary. More...
 
auto end (void)
 return an iterator to the end of the vocabulary. More...
 

Protected Member Functions

virtual size_t read_primary_keys (const std::string &primary_key_filename=PRIMARY_KEY_FILENAME)
 Read the JASS v1 index primary key file. More...
 
virtual size_t read_vocabulary (const std::string &vocab_filename=VOCAB_FILENAME, const std::string &terms_filename=TERMS_FILENAME)
 Read the JASS v1 index vocabulary files. More...
 
virtual size_t read_postings (const std::string &postings_filename=POSTINGS_FILENAME)
 Read the JASS v1 index postings file. More...
 
size_t read_index_explicit (const std::string &primary_key_filename=PRIMARY_KEY_FILENAME, const std::string &vocab_filename=VOCAB_FILENAME, const std::string &terms_filename=TERMS_FILENAME, const std::string &postings_filename=POSTINGS_FILENAME)
 Read a JASS v1 index into memory. More...
 

Protected Attributes

bool verbose
 Should this class produce diagnostics on stdout?
 
query::DOCID_TYPE documents
 The number of documents in the collection.
 
file::file_read_only primary_key_memory
 Memory used to store the primary key strings.
 
std::vector< std::string > primary_key_list
 The array of primary keys.
 
uint64_t terms
 The number of terms in the collection.
 
file::file_read_only vocabulary_memory
 Memory used to store the vocabulary pointers.
 
file::file_read_only vocabulary_terms_memory
 Memory used to store the vocabulary strings.
 
std::vector< metadatavocabulary_list
 The (sorted in alphabetical order) array of vocbulary terms.
 
file::file_read_only postings_memory
 Memory used to store the postings.
 

Static Private Attributes

static constexpr const char * PRIMARY_KEY_FILENAME = "CIdoclist.bin"
 
static constexpr const char * VOCAB_FILENAME = "CIvocab.bin"
 
static constexpr const char * TERMS_FILENAME = "CIvocab_terms.bin"
 
static constexpr const char * POSTINGS_FILENAME = "CIpostings.bin"
 

Detailed Description

Load and deserialise a JASS v1 index.

Constructor & Destructor Documentation

◆ deserialised_jass_v1()

JASS::deserialised_jass_v1::deserialised_jass_v1 ( bool  verbose = false)
inlineexplicit

Constructor.

Parameters
verbose[in] Should the index reading methods produce messages on stdout?

Member Function Documentation

◆ begin()

auto JASS::deserialised_jass_v1::begin ( void  )
inline

return an iterator over the vocabulary.

Returns
an iterator over the vocabulary.

◆ codex()

compress_integer * JASS::deserialised_jass_v1::codex ( std::string &  name,
int32_t &  d_ness 
) const

Return a reference to a decompressor that can be used with this index.

Parameters
name[out] The name of the compression codex
d_ness[out] Whether the codex requires D0, D1, etc decoding (-1 if it supports decode_and_process via decode_none)
Returns
A reference to a compress_integer that can decode the given codex

◆ document_count()

query::DOCID_TYPE JASS::deserialised_jass_v1::document_count ( void  ) const
inline

Return the number of documents in the collection.

Returns
the number of documents in the collection

◆ end()

auto JASS::deserialised_jass_v1::end ( void  )
inline

return an iterator to the end of the vocabulary.

Returns
an iterator to the end of the vocabulary.

◆ get_segment_list()

virtual size_t JASS::deserialised_jass_v1::get_segment_list ( segment_header segments,
metadata metadata,
size_t  query_term_frequency,
uint32_t &  smallest,
uint32_t &  largest,
query::DOCID_TYPE document_frequency 
) const
inlinevirtual

Extract the segment headers and return them in the parameter called segments.

Parameters
segments[out] The list of segments for the given search term (caller must ensure this ponts to a large enough array)
metadata[in] The metadata for the given search term
smallest[out] The largest impact score for this term
largest[out] The smallest impact score for this term
Returns
The number of segments extracted and added to the list

Reimplemented in JASS::deserialised_jass_v2.

◆ postings()

const uint8_t* JASS::deserialised_jass_v1::postings ( void  ) const
inline

Return a pointer to the start of the postings "file".

Returns
A pointer to the start of the postings "file"

◆ postings_details()

bool JASS::deserialised_jass_v1::postings_details ( metadata metadata,
const query_term term 
) const
inline

Return the meta-data about the postings list.

Parameters
metadata[out] If the term is found then this is is changed to contain the metadata about the term
term[in] Find the metadata for this term
Returns
true on success, false on fail (e.g. term not in dictionary)

◆ primary_keys()

const std::vector<std::string>& JASS::deserialised_jass_v1::primary_keys ( void  ) const
inline

Return the list of primary keys as a std::vector<std::string>

Returns
A reference to a vector of primary keys

◆ read_index()

size_t JASS::deserialised_jass_v1::read_index ( const std::string &  directory = "")

Read a JASS v1 index into memory.

Parameters
directory[in] The directory to search for and index
Returns
0 on failure, non-zero on success

◆ read_index_explicit()

size_t JASS::deserialised_jass_v1::read_index_explicit ( const std::string &  primary_key_filename = PRIMARY_KEY_FILENAME,
const std::string &  vocab_filename = VOCAB_FILENAME,
const std::string &  terms_filename = TERMS_FILENAME,
const std::string &  postings_filename = POSTINGS_FILENAME 
)
protected

Read a JASS v1 index into memory.

Parameters
primary_key_filename[in] the name of the file containing the primary key list ("CIdoclist.bin")
vocab_filename[in] the name of the file containing the vocabulary pointers ("CIvocab.bin")
terms_filename[in] the name of the file containing the vocabulary strings ("CIvocab_terms.bin")
postings_filename[in] the name of the file containing the postings ("CIpostings.bin")
Returns
0 on failure, non-zero on success

◆ read_postings()

size_t JASS::deserialised_jass_v1::read_postings ( const std::string &  postings_filename = POSTINGS_FILENAME)
protectedvirtual

Read the JASS v1 index postings file.

Parameters
postings_filename[in] the name of the file containing the postings ("CIpostings.bin")
Returns
size of the posings file or 0 on failure

◆ read_primary_keys()

size_t JASS::deserialised_jass_v1::read_primary_keys ( const std::string &  primary_key_filename = PRIMARY_KEY_FILENAME)
protectedvirtual

Read the JASS v1 index primary key file.

Parameters
primary_key_filename[in] the name of the file containing the primary key list ("CIdoclist.bin")
Returns
The number of documents in the collection (or 0 on error)

Reimplemented in JASS::deserialised_jass_v2.

◆ read_vocabulary()

size_t JASS::deserialised_jass_v1::read_vocabulary ( const std::string &  vocab_filename = VOCAB_FILENAME,
const std::string &  terms_filename = TERMS_FILENAME 
)
protectedvirtual

Read the JASS v1 index vocabulary files.

Parameters
vocab_filename[in] the name of the file containing the vocabulary pointers ("CIvocab.bin")
terms_filename[in] the name of the file containing the vocabulary strings ("CIvocab_terms.bin")
Returns
The number of documents in the collection (or 0 on error)

Reimplemented in JASS::deserialised_jass_v2.


The documentation for this class was generated from the following files: