mlpack
|
The class translates a set of strings into numbers using various encoding algorithms. More...
#include <string_encoding.hpp>
Public Member Functions | |
template<typename ... ArgTypes> | |
StringEncoding (ArgTypes &&... args) | |
Pass the given arguments to the policy constructor and create the StringEncoding object using the policy. | |
StringEncoding (EncodingPolicyType encodingPolicy) | |
Construct the class from the given encoding policy. More... | |
StringEncoding (StringEncoding &) | |
A variant of the copy constructor for non-constant objects. | |
StringEncoding (const StringEncoding &) | |
Default copy-constructor. | |
StringEncoding & | operator= (const StringEncoding &)=default |
Default copy assignment operator. | |
StringEncoding (StringEncoding &&) | |
Default move-constructor. | |
StringEncoding & | operator= (StringEncoding &&)=default |
Default move assignment operator. | |
template<typename TokenizerType > | |
void | CreateMap (const std::string &input, const TokenizerType &tokenizer) |
Initialize the dictionary using the given corpus. More... | |
void | Clear () |
Clear the dictionary. | |
template<typename OutputType , typename TokenizerType > | |
void | Encode (const std::vector< std::string > &input, OutputType &output, const TokenizerType &tokenizer) |
Encode the given text and write the result to the given output. More... | |
const DictionaryType & | Dictionary () const |
Return the dictionary. | |
DictionaryType & | Dictionary () |
Modify the dictionary. | |
const EncodingPolicyType & | EncodingPolicy () const |
Return the encoding policy object. | |
EncodingPolicyType & | EncodingPolicy () |
Modify the encoding policy object. | |
template<typename Archive > | |
void | serialize (Archive &ar, const uint32_t) |
Serialize the class to the given archive. | |
template<typename MatType , typename TokenizerType , typename PolicyType > | |
void | EncodeHelper (const std::vector< std::string > &input, MatType &output, const TokenizerType &tokenizer, PolicyType &policy) |
The class translates a set of strings into numbers using various encoding algorithms.
The encoder writes data either in the column-major order or in the row-major order depending on the output data type.
EncodingPolicyType | Type of the encoding algorithm itself. |
DictionaryType | Type of the dictionary. |
mlpack::data::StringEncoding< EncodingPolicyType, DictionaryType >::StringEncoding | ( | EncodingPolicyType | encodingPolicy | ) |
Construct the class from the given encoding policy.
encodingPolicy | The given encoding policy. |
void mlpack::data::StringEncoding< EncodingPolicyType, DictionaryType >::CreateMap | ( | const std::string & | input, |
const TokenizerType & | tokenizer | ||
) |
Initialize the dictionary using the given corpus.
TokenizerType | Type of the tokenizer. |
input | Corpus of text to encode. |
tokenizer | The tokenizer object. |
The tokenization algorithm has to be an object with two public methods:
void mlpack::data::StringEncoding< EncodingPolicyType, DictionaryType >::Encode | ( | const std::vector< std::string > & | input, |
OutputType & | output, | ||
const TokenizerType & | tokenizer | ||
) |
Encode the given text and write the result to the given output.
The encoder writes data in the column-major order or in the row-major order depending on the output data type.
If the output type is either arma::mat or arma::sp_mat then the function writes it in the column-major order. If the output type is 2D std::vector then the function writes it in the row major order.
OutputType | Type of the output container. The function supports the following types: arma::mat, arma::sp_mat, std::vector<std::vector<>>. |
TokenizerType | Type of the tokenizer. |
input | Corpus of text to encode. |
output | Output container to store the result. |
tokenizer | The tokenizer object. |
The tokenization algorithm has to be an object with two public methods: