71 kmer_length(kmer_length),
73 end_of_fasta_document(nullptr)
134 if (got.
type == token::token_type::eof)
const class parser::token & get_next_token_dna(void)
Continue parsing the input looking for the next DNA k-mer token.
Definition: parser_fasta.cpp:24
token current_token
The token that is currently being build. A reference to this is returned when the token is complete...
Definition: parser.h:127
virtual ~parser_fasta()
Destructor.
Definition: parser_fasta.h:89
Simple, but fast, XML parser.
Definition: parser.h:39
void * address(void) const
Extract the pointer value from the slice.
Definition: slice.h:269
const uint8_t * end_of_document
Pointer to the end of the document, used to avoid read past end of buffer.
Definition: parser.h:126
Container class representing a document through the indexing pipeline.
Definition: document.h:31
parser_mode mode
The mode (TEXT or DNA) of the tokenizer;.
Definition: parser_fasta.h:48
Parser to turn DNA sequences in FASTA format into k-mers for indexing.
Definition: parser_fasta.h:34
character is a DNA base (i.e in: {ACTGactg})
Definition: ascii_database_to_c.cpp:35
Simple XML parser that does't do either attributes or entities.
slice contents
The contents of the document (or likewise).
Definition: document.h:43
virtual const class parser::token & get_next_token(void)
Continue parsing the input looking for the next token.
Definition: parser_fasta.h:127
A document withing the indexing pipeline.
virtual const class parser::token & get_next_token(void)
Continue parsing the input looking for the next token.
Definition: parser.cpp:79
uint8_t * end_of_fasta_document
Pointer to the end of the FASTA document, end_of_document points to the end of the first line (the pr...
Definition: parser_fasta.h:49
parser_fasta(size_t kmer_length)
Constructor.
Definition: parser_fasta.h:70
static void unittest(void)
Unit test this class.
Definition: parser_fasta.cpp:92
token_type type
The type of this token (See token_type)
Definition: parser.h:86
size_t kmer_length
The length of the k-mers to compute from the DNA sequences.
Definition: parser_fasta.h:47
const uint8_t * current
The current location within the document.
Definition: parser.h:125
const document * the_document
The document that is currently being parsed.
Definition: parser.h:124
A token as returned by the parser.
Definition: parser.h:58
parser_mode
Definition: parser_fasta.h:40
Slices (also known as string-descriptors) for C++.
virtual void set_document(const class document &document)
Start parsing from the start of this document.
Definition: parser_fasta.h:106
Definition: compress_integer_elias_delta_simd.c:23
alphabetic token
Definition: parser.h:67
size_t size(void) const
Return the length of this slice.
Definition: slice.h:256
index_postings_impact::impact_type count
The number of times the token is seen (normally 1, but if parsing a forward index it might be known t...
Definition: parser.h:87