|
mlpack
|
The class translates a set of strings into numbers using various encoding algorithms. More...
#include <string_encoding.hpp>
Public Member Functions | |
| template<typename ... ArgTypes> | |
| StringEncoding (ArgTypes &&... args) | |
| Pass the given arguments to the policy constructor and create the StringEncoding object using the policy. | |
| StringEncoding (EncodingPolicyType encodingPolicy) | |
| Construct the class from the given encoding policy. More... | |
| StringEncoding (StringEncoding &) | |
| A variant of the copy constructor for non-constant objects. | |
| StringEncoding (const StringEncoding &) | |
| Default copy-constructor. | |
| StringEncoding & | operator= (const StringEncoding &)=default |
| Default copy assignment operator. | |
| StringEncoding (StringEncoding &&) | |
| Default move-constructor. | |
| StringEncoding & | operator= (StringEncoding &&)=default |
| Default move assignment operator. | |
| template<typename TokenizerType > | |
| void | CreateMap (const std::string &input, const TokenizerType &tokenizer) |
| Initialize the dictionary using the given corpus. More... | |
| void | Clear () |
| Clear the dictionary. | |
| template<typename OutputType , typename TokenizerType > | |
| void | Encode (const std::vector< std::string > &input, OutputType &output, const TokenizerType &tokenizer) |
| Encode the given text and write the result to the given output. More... | |
| const DictionaryType & | Dictionary () const |
| Return the dictionary. | |
| DictionaryType & | Dictionary () |
| Modify the dictionary. | |
| const EncodingPolicyType & | EncodingPolicy () const |
| Return the encoding policy object. | |
| EncodingPolicyType & | EncodingPolicy () |
| Modify the encoding policy object. | |
| template<typename Archive > | |
| void | serialize (Archive &ar, const uint32_t) |
| Serialize the class to the given archive. | |
| template<typename MatType , typename TokenizerType , typename PolicyType > | |
| void | EncodeHelper (const std::vector< std::string > &input, MatType &output, const TokenizerType &tokenizer, PolicyType &policy) |
The class translates a set of strings into numbers using various encoding algorithms.
The encoder writes data either in the column-major order or in the row-major order depending on the output data type.
| EncodingPolicyType | Type of the encoding algorithm itself. |
| DictionaryType | Type of the dictionary. |
| mlpack::data::StringEncoding< EncodingPolicyType, DictionaryType >::StringEncoding | ( | EncodingPolicyType | encodingPolicy | ) |
Construct the class from the given encoding policy.
| encodingPolicy | The given encoding policy. |
| void mlpack::data::StringEncoding< EncodingPolicyType, DictionaryType >::CreateMap | ( | const std::string & | input, |
| const TokenizerType & | tokenizer | ||
| ) |
Initialize the dictionary using the given corpus.
| TokenizerType | Type of the tokenizer. |
| input | Corpus of text to encode. |
| tokenizer | The tokenizer object. |
The tokenization algorithm has to be an object with two public methods:
| void mlpack::data::StringEncoding< EncodingPolicyType, DictionaryType >::Encode | ( | const std::vector< std::string > & | input, |
| OutputType & | output, | ||
| const TokenizerType & | tokenizer | ||
| ) |
Encode the given text and write the result to the given output.
The encoder writes data in the column-major order or in the row-major order depending on the output data type.
If the output type is either arma::mat or arma::sp_mat then the function writes it in the column-major order. If the output type is 2D std::vector then the function writes it in the row major order.
| OutputType | Type of the output container. The function supports the following types: arma::mat, arma::sp_mat, std::vector<std::vector<>>. |
| TokenizerType | Type of the tokenizer. |
| input | Corpus of text to encode. |
| output | Output container to store the result. |
| tokenizer | The tokenizer object. |
The tokenization algorithm has to be an object with two public methods:
1.8.13