|
JASSv2
|
Reader for Jimmy Lin's shared index format. More...


Go to the source code of this file.
Classes | |
| class | JASS::ciff_lin |
| Reader for Jimmy Lin's shared index format. More... | |
| class | JASS::ciff_lin::header |
| The header of the CIFF file, it happens first in the file and describes how many postings and document details are included. More... | |
| class | JASS::ciff_lin::postings_list |
| A postings list with a term, df, cf, and postings list of <d,tf> pairs. More... | |
| class | JASS::ciff_lin::doc_record |
| a document record object containing document lengths and primary keys More... | |
| class | JASS::ciff_lin::postings_list_iterator |
| iterator class for iterating over an index More... | |
| class | JASS::ciff_lin::postings_foreach |
| An object used to allow iteration over postings lists. More... | |
| class | JASS::ciff_lin::docrecords_iterator |
| iterator class for iterating over an index More... | |
| class | JASS::ciff_lin::docrecords_foreach |
| An object used to allow iteration over document records. More... | |
Reader for Jimmy Lin's shared index format.
Jimmy uses Anserini to index and then exports using Google protocol buffers. The protocol buffer format is specified by:
Where each PostingsList is written using writeDelimitedTo() and so each postings list is prefixed by a length integer.
This code provides an iterator over a file of this format (once read into memory)
For details of the encoding see: https://developers.google.com/protocol-buffers/docs/encoding
1.8.13