|
JASSv2
|
The ATIRE verison of BM25. More...
#include <ranking_function_atire_bm25.h>
Public Member Functions | |
| ranking_function_atire_bm25 (double k1, double b, std::vector< compress_integer::integer > &document_lengths) | |
| Constructor. More... | |
| ~ranking_function_atire_bm25 () | |
| Destructor. | |
| forceinline void | compute_idf_component (compress_integer::integer document_frequency, compress_integer::integer documents_in_collection) |
| Called once per term (per query). Computes the IDF component of the ranking function and stores it internally. More... | |
| forceinline void | compute_tf_component (index_postings_impact::impact_type term_frequency) |
| Compute and store internally the term-frequency based component of the ranking function (useful when postings lists are impact ordered) More... | |
| forceinline double | compute_score (compress_integer::integer document_id, index_postings_impact::impact_type term_frequency) |
| Compute BM25 from the given document, assuming pieces have already been computed. More... | |
Static Public Member Functions | |
| static void | unittest (void) |
| Unit test this class. | |
Private Attributes | |
| double | idf |
| the IDF of the term being processed | |
| double | top_row |
| the top-row of the ranking function for the term being processed (tf(td) * (k1 + 1)) | |
| double | k1_plus_1 |
| k1 + 1 | |
| double | mean_document_length |
| the mean of the document lengths | |
| std::vector< float > | length_correction |
| most of the bottom row of BM25 (k1 * ((1 - b) + b * length / mean_document_length)) for the current term being processed | |
The ATIRE verison of BM25.
|
inline |
Constructor.
| k1 | [in] the BM25 k1 parameter, 0.9 is a good value. |
| b | [in] the BM25 b parameter, 0.4 is a good value. |
| document_lengths | [in] a vector holding the length of each document in the collection. |
|
inline |
Called once per term (per query). Computes the IDF component of the ranking function and stores it internally.
| document_frequency | [in] the number of documents that contain this term. |
| documents_in_collection | [in] the number of documents in the collection. |
|
inline |
Compute BM25 from the given document, assuming pieces have already been computed.
First compute the IDF for the term using compute_idf_component(), then the TF component using compute_tf_component(), then call this.
| document_id | [in] The ID of the document (used to look up the length) |
| term_frequency | [in] The number of times the term occurs in the document. |
|
inline |
Compute and store internally the term-frequency based component of the ranking function (useful when postings lists are impact ordered)
| term_frequency | [in] The number of times the term occurs in the document. |
1.8.13