|
crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
Structure containing information about the currently trained Hierarchical Dirichlet Process (HDP) model. More...
#include <TopicModelInfo.hpp>
Basic Information | |
| std::string | modelName |
| The name of the model. More... | |
| std::string | modelVersion |
| The version of the model (as string). More... | |
| std::size_t | numberOfDocuments {} |
| The number of documents in the model. More... | |
| std::size_t | numberOfTokens {} |
| The number of tokens in the model. More... | |
| std::size_t | sizeOfVocabulary {} |
| std::size_t | sizeOfVocabularyUsed {} |
| double | tokenEntropy {} |
| The entropy of tokens in the model. More... | |
| std::vector< std::string > | removedTokens |
| The top tokens removed before training. More... | |
Training Information | |
| std::size_t | numberOfIterations {} |
| The number of iterations performed. More... | |
| std::size_t | numberOfBurnInSteps {} |
| The number of initially skipped, i.e. burn-in, steps. More... | |
| std::size_t | optimizationInterval {} |
| The optimization interval. More... | |
| double | logLikelihoodPerToken {} |
| The log-likelihood per token. More... | |
Initial Parameters | |
| std::string | weighting |
| Term weighting mode as string. More... | |
| std::size_t | minCollectionFrequency {} |
| Minimum collection frequency of tokens. More... | |
| std::size_t | minDocumentFrequency {} |
| Minimum document frequency of tokens. More... | |
| std::size_t | numberOfTopTokensToBeRemoved {} |
| The number of top tokens to be removed. More... | |
| std::size_t | numberOfInitialTopics {} |
| The initial number of topics, which will be adjusted for the data during training. More... | |
| float | initialAlpha {} |
| The initial concentration coefficient of the Dirichlet Process for document–table. More... | |
| float | initialEta {} |
| The initial hyperparameter for the Dirichlet distribution for topic–token. More... | |
| float | initialGamma {} |
| The initial concentration coefficient of the Dirichlet Process for table–topic. More... | |
| std::size_t | seed {} |
| The initial seed for random number generation. More... | |
| std::string | trainedWithVersion {} |
| The version of the modeller the model has been trained with. More... | |
Parameters | |
| float | alpha {} |
| The concentration coeficient of the Dirichlet Process for document-table (HDP only). More... | |
| std::vector< float > | alphas |
| The Dirichlet priors on the per-document topic distributions (LDA only). More... | |
| float | eta {} |
| The Dirichlet prior on the per-topic token distribution (HDP only). More... | |
| float | gamma {} |
| The concentration coefficient of the Dirichlet Process for table-topic. More... | |
| std::size_t | numberOfTopics {} |
| The number of topics. More... | |
| std::size_t | numberOfTables {} |
| The number of tables. More... | |
Helper Function | |
| std::queue< std::string > | toQueueOfStrings () const |
| Return queue with strings describing the information contained in the structure. More... | |
Structure containing information about the currently trained Hierarchical Dirichlet Process (HDP) model.
|
inline |
Return queue with strings describing the information contained in the structure.
References alpha, eta, gamma, initialAlpha, initialEta, initialGamma, logLikelihoodPerToken, minCollectionFrequency, minDocumentFrequency, numberOfBurnInSteps, numberOfDocuments, numberOfInitialTopics, numberOfIterations, numberOfTables, numberOfTokens, numberOfTopics, numberOfTopTokensToBeRemoved, optimizationInterval, seed, sizeOfVocabulary, sizeOfVocabularyUsed, tokenEntropy, and trainedWithVersion.
Referenced by crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().
| float crawlservpp::Struct::TopicModelInfo::alpha {} |
The concentration coeficient of the Dirichlet Process for document-table (HDP only).
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::vector<float> crawlservpp::Struct::TopicModelInfo::alphas |
The Dirichlet priors on the per-document topic distributions (LDA only).
Referenced by crawlservpp::Data::TopicModel::getModelInfo().
| float crawlservpp::Struct::TopicModelInfo::eta {} |
The Dirichlet prior on the per-topic token distribution (HDP only).
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| float crawlservpp::Struct::TopicModelInfo::gamma {} |
The concentration coefficient of the Dirichlet Process for table-topic.
Not used by LDA models, i.e. set to zero when a fixed number of topics is set.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| float crawlservpp::Struct::TopicModelInfo::initialAlpha {} |
The initial concentration coefficient of the Dirichlet Process for document–table.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| float crawlservpp::Struct::TopicModelInfo::initialEta {} |
The initial hyperparameter for the Dirichlet distribution for topic–token.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| float crawlservpp::Struct::TopicModelInfo::initialGamma {} |
The initial concentration coefficient of the Dirichlet Process for table–topic.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| double crawlservpp::Struct::TopicModelInfo::logLikelihoodPerToken {} |
The log-likelihood per token.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::minCollectionFrequency {} |
Minimum collection frequency of tokens.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::minDocumentFrequency {} |
Minimum document frequency of tokens.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::string crawlservpp::Struct::TopicModelInfo::modelName |
The name of the model.
Referenced by crawlservpp::Data::TopicModel::getModelInfo().
| std::string crawlservpp::Struct::TopicModelInfo::modelVersion |
The version of the model (as string).
Referenced by crawlservpp::Data::TopicModel::getModelInfo().
| std::size_t crawlservpp::Struct::TopicModelInfo::numberOfBurnInSteps {} |
The number of initially skipped, i.e. burn-in, steps.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::numberOfDocuments {} |
The number of documents in the model.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::numberOfInitialTopics {} |
The initial number of topics, which will be adjusted for the data during training.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::numberOfIterations {} |
The number of iterations performed.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTables {} |
The number of tables.
Not used by LDA models, i.e. set to zero when a fixed number of topics is set.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTokens {} |
The number of tokens in the model.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTopics {} |
The number of topics.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTopTokensToBeRemoved {} |
The number of top tokens to be removed.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::optimizationInterval {} |
The optimization interval.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::vector<std::string> crawlservpp::Struct::TopicModelInfo::removedTokens |
The top tokens removed before training.
Referenced by crawlservpp::Data::TopicModel::getModelInfo().
| std::size_t crawlservpp::Struct::TopicModelInfo::seed {} |
The initial seed for random number generation.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::sizeOfVocabulary {} |
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::size_t crawlservpp::Struct::TopicModelInfo::sizeOfVocabularyUsed {} |
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| double crawlservpp::Struct::TopicModelInfo::tokenEntropy {} |
The entropy of tokens in the model.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::string crawlservpp::Struct::TopicModelInfo::trainedWithVersion {} |
The version of the modeller the model has been trained with.
Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().
| std::string crawlservpp::Struct::TopicModelInfo::weighting |
Term weighting mode as string.
Referenced by crawlservpp::Data::TopicModel::getModelInfo().