crawlserv++  [under development]
Application for crawling and analyzing textual content of websites.
crawlservpp::Struct::TopicModelInfo Struct Reference

Structure containing information about the currently trained Hierarchical Dirichlet Process (HDP) model. More...

#include <TopicModelInfo.hpp>

Basic Information

std::string modelName
 The name of the model. More...
 
std::string modelVersion
 The version of the model (as string). More...
 
std::size_t numberOfDocuments {}
 The number of documents in the model. More...
 
std::size_t numberOfTokens {}
 The number of tokens in the model. More...
 
std::size_t sizeOfVocabulary {}
 
std::size_t sizeOfVocabularyUsed {}
 
double tokenEntropy {}
 The entropy of tokens in the model. More...
 
std::vector< std::string > removedTokens
 The top tokens removed before training. More...
 

Training Information

std::size_t numberOfIterations {}
 The number of iterations performed. More...
 
std::size_t numberOfBurnInSteps {}
 The number of initially skipped, i.e. burn-in, steps. More...
 
std::size_t optimizationInterval {}
 The optimization interval. More...
 
double logLikelihoodPerToken {}
 The log-likelihood per token. More...
 

Initial Parameters

std::string weighting
 Term weighting mode as string. More...
 
std::size_t minCollectionFrequency {}
 Minimum collection frequency of tokens. More...
 
std::size_t minDocumentFrequency {}
 Minimum document frequency of tokens. More...
 
std::size_t numberOfTopTokensToBeRemoved {}
 The number of top tokens to be removed. More...
 
std::size_t numberOfInitialTopics {}
 The initial number of topics, which will be adjusted for the data during training. More...
 
float initialAlpha {}
 The initial concentration coefficient of the Dirichlet Process for document–table. More...
 
float initialEta {}
 The initial hyperparameter for the Dirichlet distribution for topic–token. More...
 
float initialGamma {}
 The initial concentration coefficient of the Dirichlet Process for table–topic. More...
 
std::size_t seed {}
 The initial seed for random number generation. More...
 
std::string trainedWithVersion {}
 The version of the modeller the model has been trained with. More...
 

Parameters

float alpha {}
 The concentration coeficient of the Dirichlet Process for document-table (HDP only). More...
 
std::vector< float > alphas
 The Dirichlet priors on the per-document topic distributions (LDA only). More...
 
float eta {}
 The Dirichlet prior on the per-topic token distribution (HDP only). More...
 
float gamma {}
 The concentration coefficient of the Dirichlet Process for table-topic. More...
 
std::size_t numberOfTopics {}
 The number of topics. More...
 
std::size_t numberOfTables {}
 The number of tables. More...
 

Helper Function

std::queue< std::string > toQueueOfStrings () const
 Return queue with strings describing the information contained in the structure. More...
 

Detailed Description

Structure containing information about the currently trained Hierarchical Dirichlet Process (HDP) model.

Member Function Documentation

◆ toQueueOfStrings()

Member Data Documentation

◆ alpha

float crawlservpp::Struct::TopicModelInfo::alpha {}

The concentration coeficient of the Dirichlet Process for document-table (HDP only).

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ alphas

std::vector<float> crawlservpp::Struct::TopicModelInfo::alphas

The Dirichlet priors on the per-document topic distributions (LDA only).

Referenced by crawlservpp::Data::TopicModel::getModelInfo().

◆ eta

float crawlservpp::Struct::TopicModelInfo::eta {}

The Dirichlet prior on the per-topic token distribution (HDP only).

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ gamma

float crawlservpp::Struct::TopicModelInfo::gamma {}

The concentration coefficient of the Dirichlet Process for table-topic.

Not used by LDA models, i.e. set to zero when a fixed number of topics is set.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ initialAlpha

float crawlservpp::Struct::TopicModelInfo::initialAlpha {}

The initial concentration coefficient of the Dirichlet Process for document–table.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ initialEta

float crawlservpp::Struct::TopicModelInfo::initialEta {}

The initial hyperparameter for the Dirichlet distribution for topic–token.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ initialGamma

float crawlservpp::Struct::TopicModelInfo::initialGamma {}

The initial concentration coefficient of the Dirichlet Process for table–topic.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ logLikelihoodPerToken

double crawlservpp::Struct::TopicModelInfo::logLikelihoodPerToken {}

The log-likelihood per token.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ minCollectionFrequency

std::size_t crawlservpp::Struct::TopicModelInfo::minCollectionFrequency {}

Minimum collection frequency of tokens.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ minDocumentFrequency

std::size_t crawlservpp::Struct::TopicModelInfo::minDocumentFrequency {}

Minimum document frequency of tokens.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ modelName

std::string crawlservpp::Struct::TopicModelInfo::modelName

The name of the model.

Referenced by crawlservpp::Data::TopicModel::getModelInfo().

◆ modelVersion

std::string crawlservpp::Struct::TopicModelInfo::modelVersion

The version of the model (as string).

Referenced by crawlservpp::Data::TopicModel::getModelInfo().

◆ numberOfBurnInSteps

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfBurnInSteps {}

The number of initially skipped, i.e. burn-in, steps.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfDocuments

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfDocuments {}

The number of documents in the model.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfInitialTopics

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfInitialTopics {}

The initial number of topics, which will be adjusted for the data during training.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfIterations

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfIterations {}

The number of iterations performed.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfTables

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTables {}

The number of tables.

Not used by LDA models, i.e. set to zero when a fixed number of topics is set.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfTokens

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTokens {}

The number of tokens in the model.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfTopics

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTopics {}

The number of topics.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfTopTokensToBeRemoved

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTopTokensToBeRemoved {}

The number of top tokens to be removed.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ optimizationInterval

std::size_t crawlservpp::Struct::TopicModelInfo::optimizationInterval {}

The optimization interval.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ removedTokens

std::vector<std::string> crawlservpp::Struct::TopicModelInfo::removedTokens

The top tokens removed before training.

Referenced by crawlservpp::Data::TopicModel::getModelInfo().

◆ seed

std::size_t crawlservpp::Struct::TopicModelInfo::seed {}

The initial seed for random number generation.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ sizeOfVocabulary

std::size_t crawlservpp::Struct::TopicModelInfo::sizeOfVocabulary {}

◆ sizeOfVocabularyUsed

std::size_t crawlservpp::Struct::TopicModelInfo::sizeOfVocabularyUsed {}

◆ tokenEntropy

double crawlservpp::Struct::TopicModelInfo::tokenEntropy {}

The entropy of tokens in the model.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ trainedWithVersion

std::string crawlservpp::Struct::TopicModelInfo::trainedWithVersion {}

The version of the modeller the model has been trained with.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ weighting

std::string crawlservpp::Struct::TopicModelInfo::weighting

Term weighting mode as string.

Referenced by crawlservpp::Data::TopicModel::getModelInfo().


The documentation for this struct was generated from the following file: