Structure containing information about the currently trained Hierarchical Dirichlet Process (HDP) model. More...

#include <TopicModelInfo.hpp>

Basic Information
std::string	modelName
	The name of the model. More...

std::string	modelVersion
	The version of the model (as string). More...

std::size_t	numberOfDocuments {}
	The number of documents in the model. More...

std::size_t	numberOfTokens {}
	The number of tokens in the model. More...

std::size_t	sizeOfVocabulary {}

std::size_t	sizeOfVocabularyUsed {}

double	tokenEntropy {}
	The entropy of tokens in the model. More...

std::vector< std::string >	removedTokens
	The top tokens removed before training. More...

Training Information
std::size_t	numberOfIterations {}
	The number of iterations performed. More...

std::size_t	numberOfBurnInSteps {}
	The number of initially skipped, i.e. burn-in, steps. More...

std::size_t	optimizationInterval {}
	The optimization interval. More...

double	logLikelihoodPerToken {}
	The log-likelihood per token. More...

Initial Parameters
std::string	weighting
	Term weighting mode as string. More...

std::size_t	minCollectionFrequency {}
	Minimum collection frequency of tokens. More...

std::size_t	minDocumentFrequency {}
	Minimum document frequency of tokens. More...

std::size_t	numberOfTopTokensToBeRemoved {}
	The number of top tokens to be removed. More...

std::size_t	numberOfInitialTopics {}
	The initial number of topics, which will be adjusted for the data during training. More...

float	initialAlpha {}
	The initial concentration coefficient of the Dirichlet Process for document–table. More...

float	initialEta {}
	The initial hyperparameter for the Dirichlet distribution for topic–token. More...

float	initialGamma {}
	The initial concentration coefficient of the Dirichlet Process for table–topic. More...

std::size_t	seed {}
	The initial seed for random number generation. More...

std::string	trainedWithVersion {}
	The version of the modeller the model has been trained with. More...

Parameters
float	alpha {}
	The concentration coeficient of the Dirichlet Process for document-table (HDP only). More...

std::vector< float >	alphas
	The Dirichlet priors on the per-document topic distributions (LDA only). More...

float	eta {}
	The Dirichlet prior on the per-topic token distribution (HDP only). More...

float	gamma {}
	The concentration coefficient of the Dirichlet Process for table-topic. More...

std::size_t	numberOfTopics {}
	The number of topics. More...

std::size_t	numberOfTables {}
	The number of tables. More...

Helper Function
std::queue< std::string >	toQueueOfStrings () const
	Return queue with strings describing the information contained in the structure. More...

Detailed Description

Structure containing information about the currently trained Hierarchical Dirichlet Process (HDP) model.

Member Function Documentation

◆ toQueueOfStrings()

std::queue<std::string> crawlservpp::Struct::TopicModelInfo::toQueueOfStrings ( ) const

inline

Return queue with strings describing the information contained in the structure.

References alpha, eta, gamma, initialAlpha, initialEta, initialGamma, logLikelihoodPerToken, minCollectionFrequency, minDocumentFrequency, numberOfBurnInSteps, numberOfDocuments, numberOfInitialTopics, numberOfIterations, numberOfTables, numberOfTokens, numberOfTopics, numberOfTopTokensToBeRemoved, optimizationInterval, seed, sizeOfVocabulary, sizeOfVocabularyUsed, tokenEntropy, and trainedWithVersion.

Referenced by crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().

Member Data Documentation

◆ alpha

float crawlservpp::Struct::TopicModelInfo::alpha {}

The concentration coeficient of the Dirichlet Process for document-table (HDP only).

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ alphas

std::vector<float> crawlservpp::Struct::TopicModelInfo::alphas

The Dirichlet priors on the per-document topic distributions (LDA only).

Referenced by crawlservpp::Data::TopicModel::getModelInfo().

◆ eta

float crawlservpp::Struct::TopicModelInfo::eta {}

The Dirichlet prior on the per-topic token distribution (HDP only).

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ gamma

float crawlservpp::Struct::TopicModelInfo::gamma {}

The concentration coefficient of the Dirichlet Process for table-topic.

Not used by LDA models, i.e. set to zero when a fixed number of topics is set.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ initialAlpha

float crawlservpp::Struct::TopicModelInfo::initialAlpha {}

The initial concentration coefficient of the Dirichlet Process for document–table.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ initialEta

float crawlservpp::Struct::TopicModelInfo::initialEta {}

The initial hyperparameter for the Dirichlet distribution for topic–token.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ initialGamma

float crawlservpp::Struct::TopicModelInfo::initialGamma {}

The initial concentration coefficient of the Dirichlet Process for table–topic.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ logLikelihoodPerToken

double crawlservpp::Struct::TopicModelInfo::logLikelihoodPerToken {}

The log-likelihood per token.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ minCollectionFrequency

std::size_t crawlservpp::Struct::TopicModelInfo::minCollectionFrequency {}

Minimum collection frequency of tokens.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ minDocumentFrequency

std::size_t crawlservpp::Struct::TopicModelInfo::minDocumentFrequency {}

Minimum document frequency of tokens.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ modelName

std::string crawlservpp::Struct::TopicModelInfo::modelName

The name of the model.

Referenced by crawlservpp::Data::TopicModel::getModelInfo().

◆ modelVersion

std::string crawlservpp::Struct::TopicModelInfo::modelVersion

The version of the model (as string).

Referenced by crawlservpp::Data::TopicModel::getModelInfo().

◆ numberOfBurnInSteps

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfBurnInSteps {}

The number of initially skipped, i.e. burn-in, steps.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfDocuments

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfDocuments {}

The number of documents in the model.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfInitialTopics

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfInitialTopics {}

The initial number of topics, which will be adjusted for the data during training.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfIterations

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfIterations {}

The number of iterations performed.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfTables

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTables {}

The number of tables.

Not used by LDA models, i.e. set to zero when a fixed number of topics is set.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfTokens

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTokens {}

The number of tokens in the model.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfTopics

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTopics {}

The number of topics.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ numberOfTopTokensToBeRemoved

std::size_t crawlservpp::Struct::TopicModelInfo::numberOfTopTokensToBeRemoved {}

The number of top tokens to be removed.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ optimizationInterval

std::size_t crawlservpp::Struct::TopicModelInfo::optimizationInterval {}

The optimization interval.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ removedTokens

std::vector<std::string> crawlservpp::Struct::TopicModelInfo::removedTokens

The top tokens removed before training.

Referenced by crawlservpp::Data::TopicModel::getModelInfo().

◆ seed

std::size_t crawlservpp::Struct::TopicModelInfo::seed {}

The initial seed for random number generation.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ sizeOfVocabulary

std::size_t crawlservpp::Struct::TopicModelInfo::sizeOfVocabulary {}

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ sizeOfVocabularyUsed

std::size_t crawlservpp::Struct::TopicModelInfo::sizeOfVocabularyUsed {}

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ tokenEntropy

double crawlservpp::Struct::TopicModelInfo::tokenEntropy {}

The entropy of tokens in the model.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ trainedWithVersion

std::string crawlservpp::Struct::TopicModelInfo::trainedWithVersion {}

The version of the modeller the model has been trained with.

Referenced by crawlservpp::Data::TopicModel::getModelInfo(), and toQueueOfStrings().

◆ weighting

std::string crawlservpp::Struct::TopicModelInfo::weighting

Term weighting mode as string.

Referenced by crawlservpp::Data::TopicModel::getModelInfo().

The documentation for this struct was generated from the following file:

Struct/TopicModelInfo.hpp

Basic Information

Training Information

Initial Parameters

Parameters

Helper Function

Detailed Description

Member Function Documentation

◆ toQueueOfStrings()

Member Data Documentation

◆ alpha

◆ alphas

◆ eta

◆ gamma

◆ initialAlpha

◆ initialEta

◆ initialGamma

◆ logLikelihoodPerToken

◆ minCollectionFrequency

◆ minDocumentFrequency

◆ modelName

◆ modelVersion

◆ numberOfBurnInSteps

◆ numberOfDocuments

◆ numberOfInitialTopics

◆ numberOfIterations

◆ numberOfTables

◆ numberOfTokens

◆ numberOfTopics

◆ numberOfTopTokensToBeRemoved

◆ optimizationInterval

◆ removedTokens

◆ seed

◆ sizeOfVocabulary

◆ sizeOfVocabularyUsed

◆ tokenEntropy

◆ trainedWithVersion

◆ weighting