DicitonaryEnocdingPolicy is used as a helper class for StringEncoding.
More...
#include <dictionary_encoding_policy.hpp>
|
template<typename Archive > |
void | serialize (Archive &, const uint32_t) |
| Serialize the class to the given archive.
|
|
|
static void | Reset () |
| Clear the necessary internal variables.
|
|
template<typename MatType > |
static void | InitMatrix (MatType &output, const size_t datasetSize, const size_t maxNumTokens, const size_t) |
| The function initializes the output matrix. More...
|
|
template<typename MatType > |
static void | Encode (MatType &output, const size_t value, const size_t line, const size_t index) |
| The function performs the dictionary encoding algorithm i.e. More...
|
|
template<typename ElemType > |
static void | Encode (std::vector< ElemType > &output, size_t value) |
| The function performs the dictionary encoding algorithm i.e. More...
|
|
static void | PreprocessToken (const size_t, const size_t, const size_t) |
| The function is not used by the dictionary encoding policy. More...
|
|
DicitonaryEnocdingPolicy is used as a helper class for StringEncoding.
The encoder assigns a positive integer number to each unique token and treats the dataset as categorical. The numbers are assigned sequentially starting from one. The order in which the tokens are labeled is defined by the dictionary used by the StringEncoding class. The encoder writes data either in the column-major order or in the row-major order depending on the output data type.
◆ Encode() [1/2]
template<typename MatType >
static void mlpack::data::DictionaryEncodingPolicy::Encode |
( |
MatType & |
output, |
|
|
const size_t |
value, |
|
|
const size_t |
line, |
|
|
const size_t |
index |
|
) |
| |
|
inlinestatic |
The function performs the dictionary encoding algorithm i.e.
it writes the encoded token to the output. The encoder writes data in the column-major order.
- Template Parameters
-
MatType | The output matrix type. |
- Parameters
-
output | Output matrix to store the encoded results (sp_mat or mat). |
value | The encoded token. |
line | The line number at which the encoding is performed. |
index | The token index in the line. |
◆ Encode() [2/2]
template<typename ElemType >
static void mlpack::data::DictionaryEncodingPolicy::Encode |
( |
std::vector< ElemType > & |
output, |
|
|
size_t |
value |
|
) |
| |
|
inlinestatic |
The function performs the dictionary encoding algorithm i.e.
it writes the encoded token to the output. This is an overloaded function which saves the result into the given vector to avoid padding. The encoder writes data in the row-major order.
- Template Parameters
-
ElemType | Type of the output values. |
- Parameters
-
output | Output vector to store the encoded line. |
value | The encoded token. |
◆ InitMatrix()
template<typename MatType >
static void mlpack::data::DictionaryEncodingPolicy::InitMatrix |
( |
MatType & |
output, |
|
|
const size_t |
datasetSize, |
|
|
const size_t |
maxNumTokens, |
|
|
const size_t |
|
|
) |
| |
|
inlinestatic |
The function initializes the output matrix.
The encoder writes data in the column-major order.
- Template Parameters
-
MatType | The output matrix type. |
- Parameters
-
output | Output matrix to store the encoded results (sp_mat or mat). |
datasetSize | The number of strings in the input dataset. |
maxNumTokens | The maximum number of tokens in the strings of the input dataset. |
* | (dictionarySize) The size of the dictionary (not used). |
◆ PreprocessToken()
static void mlpack::data::DictionaryEncodingPolicy::PreprocessToken |
( |
const size_t |
, |
|
|
const size_t |
, |
|
|
const size_t |
|
|
) |
| |
|
inlinestatic |
The function is not used by the dictionary encoding policy.
- Parameters
-
* | (line) The line number at which the encoding is performed. |
* | (index) The token sequence number in the line. |
* | (value) The encoded token. |
The documentation for this class was generated from the following file: