mlpack
Public Member Functions | List of all members
mlpack::metric::BLEU< ElemType, PrecisionType > Class Template Reference

BLEU, or the Bilingual Evaluation Understudy, is an algorithm for evaluating the quality of text which has been machine translated from one natural language to another. More...

#include <bleu.hpp>

Public Member Functions

 BLEU (const size_t maxOrder=4)
 Create an instance of BLEU class. More...
 
template<typename ReferenceCorpusType , typename TranslationCorpusType >
ElemType Evaluate (const ReferenceCorpusType &referenceCorpus, const TranslationCorpusType &translationCorpus, const bool smooth=false)
 Computes the BLEU Score. More...
 
template<typename Archive >
void serialize (Archive &ar, const uint32_t version)
 Serialize the metric.
 
size_t MaxOrder () const
 Get the value of maximum length of tokens in n-grams.
 
size_t & MaxOrder ()
 Modify the value of maximum length of tokens in n-grams.
 
ElemType BLEUScore () const
 Get the BLEU Score.
 
ElemType BrevityPenalty () const
 Get the brevity penalty.
 
size_t TranslationLength () const
 Get the value of translation length.
 
size_t ReferenceLength () const
 Get the value of reference length.
 
ElemType Ratio () const
 Get the ratio of translation to reference length ratio.
 
PrecisionType const & Precisions () const
 Get the precisions for corresponding order.
 

Detailed Description

template<typename ElemType = float, typename PrecisionType = std::vector<ElemType>>
class mlpack::metric::BLEU< ElemType, PrecisionType >

BLEU, or the Bilingual Evaluation Understudy, is an algorithm for evaluating the quality of text which has been machine translated from one natural language to another.

It can also be used to evaluate text generated for a suite of natural language processing tasks.

The BLEU score is calculated using the following formula:

\begin{eqnarray*} \text{B} &=& bp \cdot \exp \left(\sum_{n=1}^{N} w \log p_n \right) \\ \text{where,} \\ bp &=& \text{brevity penalty} = \begin{cases} 1 & \text{if ratio} > 1 \\ \exp \left(1-\frac{1}{ratio}\right) & \text{otherwise} \end{cases} \\ p_n &=& \text{modified precision for n-gram,} \\ w &=& \frac {1}{maxOrder}, \\ ratio &=& \text{translation to reference length ratio,} \\ maxOrder &=& \text{maximum length of tokens in n-grams.} \end{eqnarray*}

The value of BLEU Score lies in between 0 and 1.

Template Parameters
ElemTypeType of the quantities in BLEU, e.g. (long double, double, float).
PrecisionTypeContainer type for precision for corresponding order. e.g. (std::vector<float>, std::vector<double>, or any such boost or armadillo container).

Constructor & Destructor Documentation

◆ BLEU()

template<typename ElemType , typename PrecisionType >
mlpack::metric::BLEU< ElemType, PrecisionType >::BLEU ( const size_t  maxOrder = 4)

Create an instance of BLEU class.

Parameters
maxOrderThe maximum length of tokens in n-grams.

Member Function Documentation

◆ Evaluate()

template<typename ElemType , typename PrecisionType >
template<typename ReferenceCorpusType , typename TranslationCorpusType >
ElemType mlpack::metric::BLEU< ElemType, PrecisionType >::Evaluate ( const ReferenceCorpusType &  referenceCorpus,
const TranslationCorpusType &  translationCorpus,
const bool  smooth = false 
)

Computes the BLEU Score.

Template Parameters
ReferenceCorpusTypeType of reference corpus.
TranslationCorpusTypeType of translation corpus.
Parameters
referenceCorpusIt is an array of various references or documents. So, the \( referenceCorpus = \{reference_1, reference_2, \ldots \} \) and each reference is an array of paragraphs. So, \( reference_i = \{paragraph_1, paragraph_2, \ldots \} \) and then each paragraph is an array of tokenized words/string. Like, \( paragraph_i = \{word_1, word_2, \ldots \} \). For ex.
refCorpus = {{{"this", "is", "paragraph", "1", "from", "document", "1"},
{"this", "is", "paragraph", "2", "from", "document", "1"}},
{{"this", "is", "paragraph", "1", "from", "document", "2"},
{"this", "is", "paragraph", "2", "from", "document", "2"}}}
translationCorpusIt is an array of paragraphs which has been machine translated or generated for any natural language processing task. Like, \( translationCorpus = \{paragraph_1, paragraph_2, \ldots \} \). And then, each paragraph is an array of words. The ith paragraph from the corpus is \( paragraph_i = \{word_1, word_2, \ldots \} \). For ex.
transCorpus = {{"this", "is", "generated", "paragraph", "1"},
{"this", "is", "generated", "paragraph", "2"}}
smoothWhether or not to apply Lin et al. 2004 smoothing.
Returns
The Evaluate method returns the BLEU Score. This method also calculates other BLEU metrics (brevity penalty, translation length, reference length, ratio and precisions) which can be accessed by their corresponding accessor methods.

The documentation for this class was generated from the following files: