JASSv2
Classes | Public Member Functions | Static Public Member Functions | Private Attributes | Static Private Attributes | List of all members
JASS::index_postings Class Reference

Non-thread-safe object that accumulates a single postings list during indexing. More...

#include <index_postings.h>

Collaboration diagram for JASS::index_postings:
Collaboration graph
[legend]

Classes

class  posting
 The representation of a single postings as a tuple of docid, term frequency and position. More...
 

Public Member Functions

 index_postings (allocator &memory_pool)
 Constructor. More...
 
virtual void push_back (JASS::compress_integer::integer document_id, index_postings_impact::impact_type amount=1)
 Add to the end of the postings list for this term a term frequency of the given amount. More...
 
virtual void push_back (const std::vector< JASS::posting > &data)
 Add a set of postings to the end of the postings list (warning) More...
 
compress_integer::integer linearize (uint8_t *temporary, size_t temporary_size, compress_integer::integer *ids, index_postings_impact::impact_type *frequencies, size_t id_and_frequencies_length) const
 Turn the internal format used to accumulate postings into a docid and term-frequencies array. More...
 
void impact_order (size_t documents_in_collection, index_postings_impact &postings_list) const
 Return the postings list impact ordered postings list with impact headers. More...
 
void impact_order (size_t documents_in_collection, index_postings_impact &postings_list, compress_integer::integer document_frequency, compress_integer::integer *document_ids, index_postings_impact::impact_type *term_frequencies) const
 Return the postings list impact ordered postings list with impact headers. More...
 
void text_render (std::ostream &stream) const
 Dump a human-readable version of the postings list down the stream. Format is: <DocID, TF, Pos, Pos, Pos>... More...
 

Static Public Member Functions

static void unittest (void)
 Unit test this class.
 

Private Attributes

compress_integer::integer highest_document
 The higest document number seen in this postings list (counting from 1)
 
dynamic_array< uint8_t > document_ids
 Array holding the docids (variable byte encoded)
 
dynamic_array< index_postings_impact::impact_typeterm_frequencies
 Array holding the term frequencies (as integers)
 

Static Private Attributes

static constexpr size_t initial_size = 16
 Initially allocate space for 4 elements.
 
static constexpr double growth_factor = 1.5
 Grow dynamic arrays by a factor of 1.5.
 

Detailed Description

Non-thread-safe object that accumulates a single postings list during indexing.

Constructor & Destructor Documentation

◆ index_postings()

JASS::index_postings::index_postings ( allocator memory_pool)
inline

Constructor.

Parameters
memory_pool[in] All allocation is from this allocator.

Member Function Documentation

◆ impact_order() [1/2]

void JASS::index_postings::impact_order ( size_t  documents_in_collection,
index_postings_impact postings_list 
) const
inline

Return the postings list impact ordered postings list with impact headers.

Parameters
documents_in_collection[in] The number of documents in the collection,
postings_list[in / out] The constructed impact ordered postings list.

◆ impact_order() [2/2]

void JASS::index_postings::impact_order ( size_t  documents_in_collection,
index_postings_impact postings_list,
compress_integer::integer  document_frequency,
compress_integer::integer document_ids,
index_postings_impact::impact_type term_frequencies 
) const
inline

Return the postings list impact ordered postings list with impact headers.

Parameters
documents_in_collection[in] The number of documents in the collection,
postings_list[out] The constructed impact ordered postings list.
document_frequency[in] the document frequency of this term (the length of id_list and tf_list).
document_ids[in] The list of document ids.
term_frequencies[in] The list of term frequencies.

◆ linearize()

compress_integer::integer JASS::index_postings::linearize ( uint8_t *  temporary,
size_t  temporary_size,
compress_integer::integer ids,
index_postings_impact::impact_type frequencies,
size_t  id_and_frequencies_length 
) const
inline

Turn the internal format used to accumulate postings into a docid and term-frequencies array.

Parameters
temporary[in] Buffer used as scratch space.
temporary_size[in] Length, in bytes, of temporary.
ids[out] Buffer to store the document ids.
frequencies[out] Buffer to store the term frequencies.
id_and_frequencies_length[in] The length of the id and frequencies buffers.
Returns
Returns the document frequency of this term, or 0 on failure.

◆ push_back() [1/2]

virtual void JASS::index_postings::push_back ( JASS::compress_integer::integer  document_id,
index_postings_impact::impact_type  amount = 1 
)
inlinevirtual

Add to the end of the postings list for this term a term frequency of the given amount.

Parameters
document_id[in] The document whose term count is to be incremented.
amount[in] The amount to add to the score. default = 1.

The amount is generally not useful unless pre-computed term frequencies (or impact scores) are known in advance. This might happen if a forward index is being inverted (i.e. term:count values are known).

◆ push_back() [2/2]

virtual void JASS::index_postings::push_back ( const std::vector< JASS::posting > &  data)
inlinevirtual

Add a set of postings to the end of the postings list (warning)

This method assumes the caller is either adding postings to an index one at a time or is adding postings to an index block at a time. WARNING: it with JASS_assert() if there are already postings in the index from the document-at-a-time adding approach.

Parameters
data[in] The postings list to use. Document ids are assumed to be D1-encoded.

◆ text_render()

void JASS::index_postings::text_render ( std::ostream &  stream) const
inline

Dump a human-readable version of the postings list down the stream. Format is: <DocID, TF, Pos, Pos, Pos>...

Parameters
stream[in] The stream to write to.

The documentation for this class was generated from the following file: