JASSv2
Public Member Functions | Static Public Member Functions | Public Attributes | Protected Attributes | Static Private Attributes | List of all members
JASS::document Class Reference

Container class representing a document through the indexing pipeline. More...

#include <document.h>

Public Member Functions

 document ()
 Constructor using an allocator local to this object (useful when the object needs to contain its own memory)
 
 document (class allocator &memory_source)
 Constructor using an allocator specified (useful when the object needs to allocate memory in a specific location)
 
bool isempty (void)
 Check to see whether this is an empty document. More...
 
void rewind (void)
 Free up all resources assocuated with this object and make it ready for re-use.
 

Static Public Member Functions

static void unittest (void)
 Unit test this class.
 

Public Attributes

allocatorprimary_key_allocator
 If memory is needed for the primary key then allocate from here.
 
allocatorcontenst_allocator
 If memory is needed for the document contents then allocate from here.
 
slice primary_key
 The external primary key (e.g. TREC DOCID, or filename) of the document (or empty if that is meaningless).
 
slice contents
 The contents of the document (or likewise).
 

Protected Attributes

allocator_pool default_allocator
 If a document is created without a specified allocator then use a pool allocator.
 

Static Private Attributes

static const size_t default_allocation_size = 8192
 The default size of the allocation unit within the document.
 

Detailed Description

Container class representing a document through the indexing pipeline.

An example tying documents, instreams, and parsing to count the number of document and non-unique symbols is:

/*
PARSER_USE.CPP
--------------
Copyright (c) 2016 Andrew Trotman
Released under the 2-clause BSD license (See:https://en.wikipedia.org/wiki/BSD_licenses)
*/
#include "parser.h"
#include "instream_file.h"
/*
MAIN()
------
*/
int main(int argc, char *argv[])
{
/*
allocate a document object and a parser object.
*/
JASS::parser parser;
/*
build a pipeline - recall that deletes cascade so file is deleted when source goes out of scope.
*/
/*
this program counts document and alphbetic tokens in those documents.
*/
size_t total_documents = 0;
size_t alphas = 0;
/*
read document, then parse them.
*/
do
{
/*
read the next document into the same memory the last document used.
*/
document.rewind();
source.read(document);
/*
eof is signaled as an empty document.
*/
if (document.isempty())
break;
/*
count documents.
*/
total_documents++;
/*
now parse the docment.
*/
parser.set_document(document);
bool finished = false;
do
{
/*
get the next token
*/
const auto &token = parser.get_next_token();
/*
what type is that token
*/
switch (token.type)
{
/*
At end of document so signal to leave the loop.
*/
finished = true;
break;
/*
Count the number of alphabetic tokens.
*/
alphas++;
break;
default:
/*
else ignore the token.
*/
break;
}
}
while (!finished);
}
while (!document.isempty());
/*
Dump out the the number of documents and the numner of tokens.
*/
printf("Documents:%lld\n", (long long)total_documents);
printf("alphas :%lld\n", (long long)alphas);
return 0;
}
Examples:
parser_use.cpp.

Member Function Documentation

§ isempty()

bool JASS::document::isempty ( void  )
inline

Check to see whether this is an empty document.

Returns
true if the document has no contents, else false;
Examples:
parser_use.cpp.

The documentation for this class was generated from the following file: