|
JASSv2
|
Extract documents from a WARC archive. More...
#include <instream_document_warc.h>


Public Member Functions | |
| instream_document_warc (std::shared_ptr< instream > &source) | |
| Constructor. | |
| virtual | ~instream_document_warc () |
| Destructor. | |
| virtual void | read (document &buffer) |
| Read the next document from the source instream into document. More... | |
Public Member Functions inherited from JASS::instream | |
| instream (std::shared_ptr< instream > &source, std::shared_ptr< allocator > &memory) | |
| Constructor. More... | |
| instream (std::shared_ptr< instream > &source) | |
| Constructor. More... | |
| instream (void) | |
| Constructor. | |
| virtual | ~instream () |
| Destructor. More... | |
| size_t | fetch (void *buffer, size_t bytes) |
| fetch() generates a document object, sets its contents to the passed buffer, calls read() and returns the number of bytes of data read More... | |
Static Public Member Functions | |
| static void | unittest (void) |
| Unit test this class. | |
Private Member Functions | |
| const char * | find_string (const std::string &string) |
| read lines from the WARC file until one starting with string is found. More... | |
Private Attributes | |
| std::string | buffer |
| An internal buffer used to store lines. | |
Static Private Attributes | |
| static constexpr size_t | WARC_BUFFER_SIZE = 8 * 1024 |
| The internal buffer used to read lines one at a time. | |
Additional Inherited Members | |
Protected Attributes inherited from JASS::instream | |
| std::shared_ptr< instream > | source |
| If this object is reading from another instream then this is that instream. | |
| std::shared_ptr< allocator > | memory |
| Any and all memory allocation must happen using this object. | |
Extract documents from a WARC archive.
|
private |
read lines from the WARC file until one starting with string is found.
| string | [in] the string to look for (at the start of a line) |
|
virtual |
Read the next document from the source instream into document.
| buffer | [out] The next document in the source instream. |
Implements JASS::instream.
1.8.13