|
JASSv2
|
Base class for holding the index during indexing. More...
#include <index_manager.h>

Classes | |
| class | delegate |
| Base class for the callback function called by iterate. More... | |
| class | quantizing_delegate |
| Base class for the callback function called by iterate. More... | |
Public Member Functions | |
| index_manager () | |
| Constructor. | |
| virtual | ~index_manager () |
| Destructor. | |
| virtual void | set_primary_keys (const std::vector< slice > &keys) |
| Add a list of primary keys to the current list. Normally used to set it without actually indexing (warning) More... | |
| virtual void | begin_document (const slice &primary_key) |
| Tell this object that you're about to start indexing a new object. More... | |
| virtual void | term (const parser::token &term) |
| Hand a new term from the token stream to this object. More... | |
| virtual void | term (const parser::token &term, const std::vector< posting > &postings_list) |
| Hand a new term with a pre-computed postings list to this object. More... | |
| virtual void | end_document (compress_integer::integer document_length) |
| Tell this object that you've finished with the current document (and are about to move on to the next, or are completely finished). | |
| virtual std::vector< compress_integer::integer > & | get_document_length_vector (void) |
| Return a reference to the document length vector. More... | |
| virtual void | set_document_length_vector (std::vector< compress_integer::integer > &new_lengths) |
| Replace the document length vector with the one passed to this function (warning). More... | |
| virtual void | text_render (std::ostream &stream) const |
| unimplemented: Dump a human-readable version of the index down the stream. More... | |
| virtual void | iterate (delegate &callback) |
| Iterate over the index calling callback.operator() with each postings list. More... | |
| virtual void | iterate (index_manager::quantizing_delegate &quantizer, index_manager::delegate &callback) |
| Iterate over the index calling callback.operator() with each postings list. More... | |
| compress_integer::integer | get_highest_document_id (void) const |
| Return the number of documents that have been successfully indexed or are in the process of being indexed. | |
Static Public Member Functions | |
| static void | unittest (void) |
| Unit test this class. | |
Private Attributes | |
| compress_integer::integer | highest_document_id |
| The highest document_id seen so far (counts from 1). | |
| std::vector< compress_integer::integer > | document_length_vector |
| vector of document lengths. | |
Base class for holding the index during indexing.
This class is a base class used to define the interface for different approaches to indexing. Once an object of this type has been declared it is used by calling begin_document() at the beginning of each document, end_document() at the end of each document, and term() for each term in the token stream (i.e. "the cat and the dog" is 5 tokens, "the", "cat", "and", "the", "dog". This class does not stem and it does not stop words. That behaviour is exterior to this class. To find out how many documents have been indexed up-to a given point call get_highest_document_id(). When subclassing, remember to call this class's methods from the over-ridden methods in the sub-class.
|
inlinevirtual |
Tell this object that you're about to start indexing a new object.
| primary_key | [in] The primary key (i.e. external dociment identifier) of this document. |
Reimplemented in JASS::index_manager_sequential.
|
inlinevirtual |
Return a reference to the document length vector.
|
inlinevirtual |
Iterate over the index calling callback.operator() with each postings list.
| callback | [in] The callback to call. |
Reimplemented in JASS::index_manager_sequential.
|
inlinevirtual |
Iterate over the index calling callback.operator() with each postings list.
| quantizer | [in] The quantizer that will quantize then call the serialiser callback. |
| callback | [in] The callback that the quantizer should call. |
Reimplemented in JASS::index_manager_sequential.
|
inlinevirtual |
Replace the document length vector with the one passed to this function (warning).
| new_lengths | [in] The new document length vectror |
It is possble that new_length.size() is different to the current largest document number. If this is the case then the largest document number is set to the number of documents in new_lengths, and future calls to index a single document will fail (the alternative is that documents in the middle get lengths of 0).
|
inlinevirtual |
Add a list of primary keys to the current list. Normally used to set it without actually indexing (warning)
Normally this method would only be called when an index is being "pushed" into an object rather than indexing document at a time. This method actually adds to the end of the primary key list which is assumed to be empty before the method is called, but might not be if some indexing has already happened.
| keys | [in] The vector of primary keys. |
Reimplemented in JASS::index_manager_sequential.
|
inlinevirtual |
Hand a new term from the token stream to this object.
| term | [in] The term from the token stream. |
Reimplemented in JASS::index_manager_sequential.
|
inlinevirtual |
Hand a new term with a pre-computed postings list to this object.
| term | [in] The term from the token stream. |
| postings_list | [in] The pre-computed D1-encoded postings list |
Reimplemented in JASS::index_manager_sequential.
|
inlinevirtual |
unimplemented: Dump a human-readable version of the index down the stream.
| stream | [in] The stream to write to. |
Reimplemented in JASS::index_manager_sequential.
1.8.13