crawlserv++  [under development]
Application for crawling and analyzing textual content of websites.
crawlservpp::Module::Analyzer::Algo::TopicModelling Class Referencefinal

Topic Modeller. More...

#include <TopicModelling.hpp>

Inheritance diagram for crawlservpp::Module::Analyzer::Algo::TopicModelling:
Collaboration diagram for crawlservpp::Module::Analyzer::Algo::TopicModelling:

Construction

 TopicModelling (Main::Database &dbBase, const ThreadOptions &threadOptions, const ThreadStatus &threadStatus)
 Continues a previously interrupted algorithm run. More...
 
 TopicModelling (Main::Database &dbBase, const ThreadOptions &threadOptions)
 Starts a new algorithm run. More...
 

Implemented Getter

std::string_view getName () const override
 Returns the name of the algorithm. More...
 

Implemented Algorithm Functions

void onAlgoInitTarget () override
 Initializes the target table for the algorithm. More...
 
void onAlgoInit () override
 Initializes the algorithm and processes its input. More...
 
void onAlgoTick () override
 Performs a number of training iterations, if necessary. More...
 
void onAlgoPause () override
 Does nothing. More...
 
void onAlgoUnpause () override
 Does nothing. More...
 
void onAlgoClear () override
 Does nothing. More...
 

Implemented Configuration Functions

void parseAlgoOption () override
 Parses a configuration option for the algorithm. More...
 
void checkAlgoOptions () override
 Checks the configuration options for the algorithm. More...
 
void resetAlgo () override
 Resets the algorithm. More...
 

Configuration

struct crawlservpp::Module::Analyzer::Config::Entries config
 Configuration of the analyzer. More...
 

Analyzer-Specific Configuration Parsing

void reset () override
 Resets the analyzer-specific configuration options. More...
 
void parseOption () override
 Parses an analyzer-specific configuration option. More...
 
void checkOptions () override
 Checks the analyzer-specific configuration options. More...
 

Database Connection

Database database
 Database connection for the analyzer thread. More...
 

Corpora

std::vector< Corpuscorpora
 Vector of corpora for the analyzer thread. More...
 

Implemented Thread Functions

void onInit () override
 Initializes the analyzer, the target table, and the algorithm. More...
 
void onTick () override
 Performs an algorithm tick. More...
 
void onPause () override
 Pauses the analyzer. More...
 
void onUnpause () override
 Unpauses the analyzer. More...
 
void onClear () override
 Clears the algorithm. More...
 
void onReset () override
 Resets the algorithm. More...
 

Query Functions

void initQueries () override
 Does nothing. More...
 
void deleteQueries () override
 Does nothing. More...
 
void addOptionalQuery (std::uint64_t queryId, QueryStruct &propertiesTo)
 Adds an optional query. More...
 
void addQueries (const std::vector< std::uint64_t > &queryIds, std::vector< QueryStruct > &propertiesTo)
 Adds multiple queries at once, ignoring empty ones. More...
 

Thread Control for Algorithms

void finished ()
 Sets the status of the analyzer to finished. More...
 
void pause ()
 Pauses the thread. More...
 

Helper Functions for Algorithms

std::string getTargetTableName () const
 Gets the full name of the target table. More...
 
bool addCorpora (bool isCombine, StatusSetter &statusSetter)
 Gets the contents of all corpora, filters and combines them if necessary. More...
 
void checkCorpusSources (StatusSetter &statusSetter)
 Checks the specified sources for creating the corpus. More...
 

Helper Functions for Clean-up

void uploadResult ()
 Upload the specified result via FTP. More...
 
void cleanUpCorpora ()
 Clean up all corpora and free their memory. More...
 
void cleanUpQueries ()
 Clean up all queries and free their memory. More...
 

Configuration Loader

void loadConfig (const std::string &configJson, LogQueue &warningsTo)
 Loads a configuration. More...
 

Parsing Options

enum  StringParsingOption {
  Default = 0, SQL, SubURL, URL,
  Trim
}
 Options for parsing strings. More...
 
enum  CharParsingOption { FromNumber = 0, FromString }
 Options for parsing char's. More...
 

Configuration Parsing

void category (const std::string &category)
 Sets the category of the subsequent configuration items to be checked for. More...
 
void option (const std::string &name, bool &target)
 Checks for a configuration option of type bool. More...
 
void option (const std::string &name, std::vector< bool > &target)
 Checks for a configuration option of type array of bool's. More...
 
void option (const std::string &name, char &target, CharParsingOption opt)
 Checks for a configuration option of type char. More...
 
void option (const std::string &name, std::vector< char > &target, CharParsingOption opt)
 Checks for a configuration option of type array of char's. More...
 
void option (const std::string &name, std::int16_t &target)
 Checks for a configuration option of type 16-bit integer. More...
 
void option (const std::string &name, std::vector< std::int16_t > &target)
 Checks for a configuration option of type array of 16-bit integers. More...
 
void option (const std::string &name, std::int32_t &target)
 Checks for a configuration option of type 32-bit integer. More...
 
void option (const std::string &name, std::vector< std::int32_t > &target)
 Checks for a configuration option of type array of 32-bit integers. More...
 
void option (const std::string &name, std::int64_t &target)
 Checks for a configuration option of type 64-bit integer. More...
 
void option (const std::string &name, std::vector< std::int64_t > &target)
 Checks for a configuration option of type array of 64-bit integers. More...
 
void option (const std::string &name, std::uint8_t &target)
 Checks for a configuration option of type unsigned 8-bit integer. More...
 
void option (const std::string &name, std::vector< std::uint8_t > &target)
 Checks for a configuration option of type array of unsigned 8-bit integers. More...
 
void option (const std::string &name, std::uint16_t &target)
 Checks for a configuration option of type unsigned 16-bit integer. More...
 
void option (const std::string &name, std::vector< std::uint16_t > &target)
 Checks for a configuration option of type array of unsigned 16-bit integers. More...
 
void option (const std::string &name, std::uint32_t &target)
 Checks for a configuration option of type unsigned 32-bit integer. More...
 
void option (const std::string &name, std::vector< std::uint32_t > &target)
 Checks for a configuration option of type array of unsigned 32-bit integers. More...
 
void option (const std::string &name, std::uint64_t &target)
 Checks for a configuration option of type unsigned 64-bit integer. More...
 
void option (const std::string &name, std::vector< std::uint64_t > &target)
 Checks for a configuration option of type array of unsigned 64-bit integers. More...
 
void option (const std::string &name, float &target)
 Checks for a configuration option of type floating-point number. More...
 
void option (const std::string &name, std::vector< float > &target)
 Checks for a configuration option of type array of floating-point numbers. More...
 
void option (const std::string &name, std::string &target, StringParsingOption opt=Default)
 Checks for a configuration option of type string. More...
 
void option (const std::string &name, std::vector< std::string > &target, StringParsingOption opt=Default)
 Checks for a configuration option of type array of strings. More...
 
void warning (const std::string &warning)
 Adds a warning to the logging queue. More...
 

Module-specific Configuration Parsing

virtual void parseBasicOption ()
 Parses a basic option. More...
 
virtual void resetBase ()
 Resets basic options. More...
 

Getters

std::uint64_t getId () const
 Gets the ID of the thread. More...
 
std::uint64_t getWebsite () const
 Gets the ID of the website used by the thread. More...
 
std::uint64_t getUrlList () const
 Gets the ID of the URL list used by the thread. More...
 
std::uint64_t getConfig () const
 Gets the ID of the configuration used by the thread. More...
 
bool isShutdown () const
 Checks whether the thread is shutting down or has shut down. More...
 
bool isRunning () const
 Checks whether the thread is still supposed to run. More...
 
bool isFinished () const
 Checks whether the shutdown of the thread has been finished. More...
 
bool isPaused () const
 Checks whether the thread has been paused. More...
 

Thread Control

void end ()
 Waits for the thread until shutdown is completed. More...
 
void reset ()
 Will reset the thread before the next tick. More...
 

Time Travel

void warpTo (std::uint64_t target)
 Jumps to the specified target ID ("time travel"). More...
 

Configuration

std::string websiteNamespace
 Namespace of the website used by the thread. More...
 
std::string urlListNamespace
 Namespace of the URL list used by the thread. More...
 
std::string configuration
 JSON string of the configuration used by the thread. More...
 

Protected Getters

bool isInterrupted () const
 Checks whether the thread has been interrupted. More...
 
std::string getStatusMessage () const
 Gets the current status message. More...
 
float getProgress () const
 Gets the current progress, in percent. More...
 
std::uint64_t getLast () const
 Gets the value of the last ID processed by the thread. More...
 
std::int64_t getWarpedOverAndReset ()
 Gets the number of IDs that have been jumped over, and resets them. More...
 

Protected Setters

void setStatusMessage (const std::string &statusMessage)
 Sets the status message of the thread. More...
 
void setProgress (float newProgress)
 Sets the progress of the thread. More...
 
void setLast (std::uint64_t lastId)
 Sets the last ID processed by the thread. More...
 
void incrementLast ()
 Increments the last ID processed by the thread. More...
 
void incrementProcessed ()
 Increments the number of IDs processed by the thread. More...
 

Protected Thread Control

void sleep (std::uint64_t ms) const
 Lets the thread sleep for the specified number of milliseconds. More...
 
void allowPausing ()
 Allows the thread to be paused. More...
 
void disallowPausing ()
 Disallows the thread to be paused. More...
 
void pauseByThread ()
 Forces the thread to pause. More...
 

Logging

bool isLogLevel (std::uint8_t level) const
 Checks whether a certain logging level is enabled. More...
 
void log (std::uint8_t level, const std::string &logEntry)
 Adds a thread-specific log entry to the database, if the current logging level is high enough. More...
 
void log (std::uint8_t level, std::queue< std::string > &logEntries)
 Adds multiple thread-specific log entries to the database, if the current logging level is high enough. More...
 

Public Getter

bool isQueryUsed (std::uint64_t queryId) const
 Checks whether the specified query is used by the container. More...
 

Setters

void setRepairCData (bool isRepairCData)
 Sets whether to try to repair CData when parsing XML. More...
 
void setRepairComments (bool isRepairComments)
 Sets whether to try to repair broken HTML/XML comments. More...
 
void setRemoveXmlInstructions (bool isRemoveXmlInstructions)
 Sets whether to remove XML processing instructions (<?xml:...>) before parsing HTML/XML content. More...
 
void setMinimizeMemory (bool isMinimizeMemory)
 Sets whether to minimize memory usage. More...
 
void setTidyErrorsAndWarnings (bool warnings, std::uint32_t numOfErrors)
 Sets how tidy-html5 reports errors and warnings. More...
 
void setQueryTarget (const std::string &content, const std::string &source)
 Sets the content to use the managed queries on. More...
 

Getters

std::size_t getNumberOfSubSets () const
 Gets the number of subsets currently acquired. More...
 
bool getTarget (std::string &targetTo)
 Gets the current query target, if available, and writes it to the given string. More...
 
bool getXml (std::string &resultTo, std::queue< std::string > &warningsTo)
 Parses the current query target as tidied XML and writes it to the given string. More...
 

Queries

QueryStruct addQuery (std::uint64_t id, const QueryProperties &properties)
 Adds a query with the given query properties to the container. More...
 
void clearQueries ()
 Clears all queries currently managed by the container and frees the associated memory. More...
 
void clearQueryTarget ()
 Clears the current query target and frees the associated memory. More...
 

Subsets

bool nextSubSet ()
 Requests the next subset for all subsequent queries. More...
 

Results

bool getBoolFromRegEx (const QueryStruct &query, const std::string &target, bool &resultTo, std::queue< std::string > &warningsTo) const
 Gets a boolean result from a RegEx query on a separate string. More...
 
bool getSingleFromRegEx (const QueryStruct &query, const std::string &target, std::string &resultTo, std::queue< std::string > &warningsTo) const
 Gets a single result from a RegEx query on a separate string. More...
 
bool getMultiFromRegEx (const QueryStruct &query, const std::string &target, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo) const
 Gets multiple results from a RegEx query on a separate string. More...
 
bool getBoolFromQuery (const QueryStruct &query, bool &resultTo, std::queue< std::string > &warningsTo)
 Gets a boolean result from a query of any type on the current query target. More...
 
bool getBoolFromQueryOnSubSet (const QueryStruct &query, bool &resultTo, std::queue< std::string > &warningsTo)
 Gets a boolean result from a query of any type on the current subset. More...
 
bool getSingleFromQuery (const QueryStruct &query, std::string &resultTo, std::queue< std::string > &warningsTo)
 Gets a single result from a query of any type on the current query target. More...
 
bool getSingleFromQueryOnSubSet (const QueryStruct &query, std::string &resultTo, std::queue< std::string > &warningsTo)
 Gets a single result from a query of any type on the current subset. More...
 
bool getMultiFromQuery (const QueryStruct &query, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo)
 Gets multiple results from a query of any type on the current query target. More...
 
bool getMultiFromQueryOnSubSet (const QueryStruct &query, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo)
 Gets multiple results from a query of any type on the current subset. More...
 
bool setSubSetsFromQuery (const QueryStruct &query, std::queue< std::string > &warningsTo)
 Sets subsets for subsequent queries using a query of any type. More...
 
bool addSubSetsFromQueryOnSubSet (const QueryStruct &query, std::queue< std::string > &warningsTo)
 Inserts more subsets after the current one based on a query on the current subset. More...
 

Memory

void reserveForSubSets (const QueryStruct &query, std::size_t n)
 Reserves memory for a specific number of subsets. More...
 

Detailed Description

Topic Modeller.

Topic modelling using the Hierarchical Dirichlet Process (HDP) and Latent Dirichlet Allocation (LDA) algorithms.

The former will be used if no fixed number of topics is given, the latter will be used if a fixed number of topics is given.

Using tomoto, the underlying C++ API of tomotopy, see: https://bab2min.github.io/tomotopy/

If you use the HDP topic modelling algorithm, please cite:

Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2005). Sharing clusters among related groups: Hierarchical Dirichlet processes. In Advances in neural information processing systems, 1385–1392.

Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2009). Distributed algorithms for topic models. Journal of Machine Learning Research, 10 (Aug), 1801–1828.

If you use the LDA topic modelling algorithm, please cite:

Blei, D.M., Ng, A.Y., & Jordan, M.I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993–1022.

Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2009). Distributed algorithms for topic models. Journal of Machine Learning Research, 10 (Aug), 1801–1828.

Member Enumeration Documentation

◆ CharParsingOption

Options for parsing char's.

Enumerator
FromNumber 

Get char by its numeric value.

FromString 

Get char from the beginning of a string.

Also supports certain escaped characters.

See also
Helper::Strings::getFirstOrEscapeChar

◆ StringParsingOption

Options for parsing strings.

Enumerator
Default 

Uses a string as it is.

SQL 

Requires a SQL-compatible string.

SubURL 

Converts a string to a sub-URL.

URL 

Converts a string to a URL (without the protocol).

Trim 

Trims a string.

Constructor & Destructor Documentation

◆ TopicModelling() [1/2]

crawlservpp::Module::Analyzer::Algo::TopicModelling::TopicModelling ( Main::Database dbBase,
const ThreadOptions threadOptions,
const ThreadStatus threadStatus 
)

Continues a previously interrupted algorithm run.

Parameters
dbBaseReference to the main database connection.
threadOptionsConstant reference to a structure containing the options for the thread.
threadStatusConstant reference to a structure containing the last known status of the thread.

References crawlservpp::Module::Thread::disallowPausing().

◆ TopicModelling() [2/2]

crawlservpp::Module::Analyzer::Algo::TopicModelling::TopicModelling ( Main::Database dbBase,
const ThreadOptions threadOptions 
)

Starts a new algorithm run.

Parameters
dbBaseReference to the main database connection.
threadOptionsConstant reference to a structure containing the options for the thread.

References crawlservpp::Module::Thread::disallowPausing().

Member Function Documentation

◆ addCorpora()

◆ addOptionalQuery()

void crawlservpp::Module::Analyzer::Thread::addOptionalQuery ( std::uint64_t  queryId,
QueryStruct propertiesTo 
)
protectedinherited

Adds an optional query.

Does nothing, if the query ID is zero.

Parameters
queryIdID of the query, as specified in the configuration.
propertiesToReference to a structure to which the properties of the query will be written.
See also
Container::addQuery

References crawlservpp::Query::Container::addQuery(), crawlservpp::Module::Analyzer::Thread::database, and crawlservpp::Wrapper::Database::getQueryProperties().

Referenced by crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo().

◆ addQueries()

void crawlservpp::Module::Analyzer::Thread::addQueries ( const std::vector< std::uint64_t > &  queryIds,
std::vector< QueryStruct > &  propertiesTo 
)
protectedinherited

Adds multiple queries at once, ignoring empty ones.

Ignores query IDs that are zero.

Parameters
queryIdsIDs of the queries, as specified in the configuration.
propertiesToReference to a vector to which the properties of the queries will be written.
See also
Container::addQuery

References crawlservpp::Query::Container::addQuery(), crawlservpp::Module::Analyzer::Thread::database, and crawlservpp::Wrapper::Database::getQueryProperties().

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().

◆ addQuery()

Struct::QueryStruct crawlservpp::Query::Container::addQuery ( std::uint64_t  id,
const QueryProperties properties 
)
inlineprotectedinherited

Adds a query with the given query properties to the container.

Parameters
idThe ID of the query. It will be saved in a thread-safe way and only be used by Container::isQueryUsed.
propertiesConstant reference to the properties of the query to add to the container.
Returns
A structure to be used to identify the added query, including the index of the query inside the container.
Exceptions
Container::Exceptionif an error occured while creating a query with the given properties or the specified type of the query is unknown.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryProperties::resultBool, crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryProperties::resultMulti, crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryProperties::resultSingle, crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryProperties::resultSubSets, crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryProperties::text, crawlservpp::Struct::QueryProperties::textOnly, crawlservpp::Struct::QueryProperties::type, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Analyzer::Thread::addOptionalQuery(), crawlservpp::Module::Analyzer::Thread::addQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ addSubSetsFromQueryOnSubSet()

bool crawlservpp::Query::Container::addSubSetsFromQueryOnSubSet ( const QueryStruct query,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Inserts more subsets after the current one based on a query on the current subset.

This function is used for recursive extracting.

Parameters
queryA constant reference to a structure identifying the query that will be performed to acquire the subset.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if new subsets have been added. False, if the execution of the query failed or did not see any results.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ allowPausing()

void crawlservpp::Module::Thread::allowPausing ( )
protectedinherited

Allows the thread to be paused.

Threads are pausable by default. Use this function if pausing has been disallowed via disallowPausing().

Thread-safe: Can be used by both the module and the main thread.

◆ category()

◆ checkAlgoOptions()

void crawlservpp::Module::Analyzer::Algo::TopicModelling::checkAlgoOptions ( )
overridevirtual

Checks the configuration options for the algorithm.

Exceptions
Analyzer::Thread::Exceptionif no topic table has been specified.

Implements crawlservpp::Module::Analyzer::Config.

References crawlservpp::Module::Analyzer::generalLoggingDefault, and crawlservpp::Module::Thread::log().

◆ checkCorpusSources()

◆ checkOptions()

◆ cleanUpCorpora()

void crawlservpp::Module::Analyzer::Thread::cleanUpCorpora ( )
protectedinherited

◆ cleanUpQueries()

void crawlservpp::Module::Analyzer::Thread::cleanUpQueries ( )
protectedinherited

Clean up all queries and free their memory.

References crawlservpp::Module::Analyzer::Thread::addQueries(), crawlservpp::Struct::StatusSetter::change(), crawlservpp::Query::Container::clearQueries(), crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::Thread::corpora, crawlservpp::Module::Analyzer::Thread::database, crawlservpp::Module::Analyzer::Thread::deleteQueries(), crawlservpp::Module::Analyzer::Config::Entries::filterDateEnable, crawlservpp::Module::Analyzer::Config::Entries::filterDateTo, crawlservpp::Module::Analyzer::Config::Entries::filterQueryAll, crawlservpp::Module::Analyzer::Config::Entries::filterQueryQueries, crawlservpp::Helper::Memory::free(), crawlservpp::Module::Analyzer::Config::Entries::generalCorpusChecks, crawlservpp::Module::Analyzer::Config::Entries::generalCorpusSlicing, crawlservpp::Module::Analyzer::Config::Entries::generalInputFields, crawlservpp::Module::Analyzer::Config::Entries::generalInputSources, crawlservpp::Module::Analyzer::Config::Entries::generalInputTables, crawlservpp::Module::Analyzer::Config::Entries::generalLogging, crawlservpp::Module::Analyzer::generalLoggingDefault, crawlservpp::Module::Analyzer::generalLoggingVerbose, crawlservpp::Module::Analyzer::Config::Entries::generalSleepMySql, crawlservpp::Module::Analyzer::Config::Entries::generalTargetTable, crawlservpp::Query::Container::getBoolFromQuery(), crawlservpp::Wrapper::Database::getConfiguration(), crawlservpp::Module::Analyzer::Database::getCorpus(), crawlservpp::Module::Analyzer::Thread::getName(), crawlservpp::Module::Analyzer::Thread::initQueries(), crawlservpp::Module::Thread::isRunning(), crawlservpp::Struct::StatusSetter::isRunning(), crawlservpp::Module::Config::loadConfig(), crawlservpp::Helper::CommaLocale::locale(), crawlservpp::Module::Thread::log(), crawlservpp::Module::Analyzer::Thread::onAlgoInit(), crawlservpp::Module::Analyzer::Thread::onAlgoInitTarget(), crawlservpp::Module::Analyzer::Database::prepare(), crawlservpp::Module::Analyzer::Database::setCorpusSlicing(), crawlservpp::Module::Analyzer::Database::setIsRunningCallback(), crawlservpp::Wrapper::Database::setLogging(), crawlservpp::Query::Container::setQueryTarget(), crawlservpp::Wrapper::Database::setSleepOnError(), crawlservpp::Module::Thread::setStatusMessage(), crawlservpp::Module::Analyzer::Database::setTargetTable(), crawlservpp::Timer::Simple::tickStr(), crawlservpp::Module::Analyzer::Config::Entries::tokenizerDicts, crawlservpp::Module::Analyzer::Config::Entries::tokenizerFreeMemoryEvery, crawlservpp::Module::Analyzer::Config::Entries::tokenizerLanguages, crawlservpp::Module::Analyzer::Config::Entries::tokenizerManipulators, crawlservpp::Module::Analyzer::Config::Entries::tokenizerModels, and crawlservpp::Module::Analyzer::Config::Entries::tokenizerSavePoints.

Referenced by crawlservpp::Module::Analyzer::Thread::finished(), and crawlservpp::Module::Analyzer::Thread::onClear().

◆ clearQueries()

void crawlservpp::Query::Container::clearQueries ( )
inlineprotectedinherited

◆ clearQueryTarget()

void crawlservpp::Query::Container::clearQueryTarget ( )
inlineprotectedinherited

◆ deleteQueries()

void crawlservpp::Module::Analyzer::Thread::deleteQueries ( )
overrideprotectedvirtualinherited

Does nothing.

To be overwritten by algorithms that use their own queries.

Implements crawlservpp::Query::Container.

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries().

◆ disallowPausing()

◆ end()

void crawlservpp::Module::Thread::end ( )
inherited

Waits for the thread until shutdown is completed.

Note
Either stop() or interrupt() must have been called before calling this function.
Warning
May not be used by the thread itself!

References crawlservpp::Main::Database::deleteThread().

Referenced by crawlservpp::Module::Crawler::Thread::onReset(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().

◆ finished()

◆ getBoolFromQuery()

bool crawlservpp::Query::Container::getBoolFromQuery ( const QueryStruct query,
bool &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets a boolean result from a query of any type on the current query target.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a boolean variable which will be set according to the result of the query.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getBoolFromQueryOnSubSet()

bool crawlservpp::Query::Container::getBoolFromQueryOnSubSet ( const QueryStruct query,
bool &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets a boolean result from a query of any type on the current subset.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a boolean variable which will be set according to the result of the query.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getBoolFromRegEx()

bool crawlservpp::Query::Container::getBoolFromRegEx ( const QueryStruct query,
const std::string &  target,
bool &  resultTo,
std::queue< std::string > &  warningsTo 
) const
inlineprotectedinherited

Gets a boolean result from a RegEx query on a separate string.

Parameters
queryA constant reference to a structure identifying the RegEx query that will be performed.
targetA constant reference to a string containing the target on which the query will be performed.
resultToA reference to a boolean variable which will be set according to the result of the query.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the query is of a different type or its execution failed.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().

◆ getConfig()

std::uint64_t crawlservpp::Module::Thread::getConfig ( ) const
inherited

Gets the ID of the configuration used by the thread.

Thread-safe: Can be used by both the module and the main thread, because the configuration is not changed after starting the thread.

Returns
The unique ID identifying the used configuration in the database.

References crawlservpp::Struct::ThreadOptions::config.

Referenced by crawlservpp::Module::Thread::Thread().

◆ getId()

std::uint64_t crawlservpp::Module::Thread::getId ( ) const
inherited

Gets the ID of the thread.

Thread-safe: Can be used by both the module and the main thread.

Returns
The unique ID identifying the thread in the database, because the ID is not changed after starting the thread.

◆ getLast()

std::uint64_t crawlservpp::Module::Thread::getLast ( ) const
protectedinherited

Gets the value of the last ID processed by the thread.

Warning
May only be used by the thread itself, not by the main thread!
Returns
The ID last processed by the thread, or zero if no ID has been processed yet.

Referenced by crawlservpp::Module::Crawler::Thread::onInit(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().

◆ getMultiFromQuery()

bool crawlservpp::Query::Container::getMultiFromQuery ( const QueryStruct query,
std::vector< std::string > &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets multiple results from a query of any type on the current query target.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a vector to which the results of the query will be appended.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getMultiFromQueryOnSubSet()

bool crawlservpp::Query::Container::getMultiFromQueryOnSubSet ( const QueryStruct query,
std::vector< std::string > &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets multiple results from a query of any type on the current subset.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a vector to which the results of the query will be appended.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getMultiFromRegEx()

bool crawlservpp::Query::Container::getMultiFromRegEx ( const QueryStruct query,
const std::string &  target,
std::vector< std::string > &  resultTo,
std::queue< std::string > &  warningsTo 
) const
inlineprotectedinherited

Gets multiple results from a RegEx query on a separate string.

Parameters
queryA constant reference to a structure identifying the RegEx query that will be performed.
targetA constant reference to a string containing the target on which the query will be performed.
resultToA reference to a vector to which the results of the query will be appended.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the query is of a different type or its execution failed.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset().

◆ getName()

std::string_view crawlservpp::Module::Analyzer::Algo::TopicModelling::getName ( ) const
overridevirtual

Returns the name of the algorithm.

Returns
A string view containing the name of the implemented algorithm.

Implements crawlservpp::Module::Analyzer::Thread.

◆ getNumberOfSubSets()

std::size_t crawlservpp::Query::Container::getNumberOfSubSets ( ) const
inlineprotectedinherited

Gets the number of subsets currently acquired.

Returns
The number of subsets generated by the last query that generated subsets as its result.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getProgress()

float crawlservpp::Module::Thread::getProgress ( ) const
protectedinherited

Gets the current progress, in percent.

Thread-safe: Can be used by both the module and the main thread.

Returns
The current progress of the thread, in percent – between 0.F (none) and 1.F (done).

Referenced by crawlservpp::Module::Parser::Thread::onReset(), and crawlservpp::Module::Extractor::Thread::onReset().

◆ getSingleFromQuery()

bool crawlservpp::Query::Container::getSingleFromQuery ( const QueryStruct query,
std::string &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets a single result from a query of any type on the current query target.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a string to which the result of the query will be written.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getSingleFromQueryOnSubSet()

bool crawlservpp::Query::Container::getSingleFromQueryOnSubSet ( const QueryStruct query,
std::string &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets a single result from a query of any type on the current subset.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a string to which the result of the query will be written.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getSingleFromRegEx()

bool crawlservpp::Query::Container::getSingleFromRegEx ( const QueryStruct query,
const std::string &  target,
std::string &  resultTo,
std::queue< std::string > &  warningsTo 
) const
inlineprotectedinherited

Gets a single result from a RegEx query on a separate string.

Parameters
queryA constant reference to a structure identifying the RegEx query that will be performed.
targetA constant reference to a string containing the target on which the query will be performed.
resultToA reference to a string to which the result of the query will be written.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the query is of a different type or its execution failed.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getStatusMessage()

std::string crawlservpp::Module::Thread::getStatusMessage ( ) const
protectedinherited

◆ getTarget()

bool crawlservpp::Query::Container::getTarget ( std::string &  targetTo)
inlineprotectedinherited

Gets the current query target, if available, and writes it to the given string.

Parameters
targetToReference to a string the query target will be written to, if one is available. Its content will not be changed if no query target is available.
Returns
True, if a query target was available and has been written to the referenced string. Returns false, if no query target was available and the referenced string has not been changed.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getTargetTableName()

◆ getUrlList()

std::uint64_t crawlservpp::Module::Thread::getUrlList ( ) const
inherited

Gets the ID of the URL list used by the thread.

Thread-safe: Can be used by both the module and the main thread, because the URL list is not changed after starting the thread.

Returns
The unique ID identifying the used URL list in the database.

References crawlservpp::Struct::ThreadOptions::urlList.

Referenced by crawlservpp::Module::Thread::Thread().

◆ getWarpedOverAndReset()

std::int64_t crawlservpp::Module::Thread::getWarpedOverAndReset ( )
protectedinherited

Gets the number of IDs that have been jumped over, and resets them.

Resets the number of IDs jumped over to zero.

Warning
May only be used by the thread itself, not by the main thread!
Returns
The number of IDs that have been jumped over due to a call to warpTo(), or zero if no IDs have been jumped over, at least not since the last call to getWarpedOverAndReset(). The result might be negative, if warpTo() resulted in a jump to a previous ID.

Referenced by crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().

◆ getWebsite()

std::uint64_t crawlservpp::Module::Thread::getWebsite ( ) const
inherited

Gets the ID of the website used by the thread.

Thread-safe: Can be used by both the module and the main thread, because the website is not changed after starting the thread.

Returns
The unique ID identifying the used website in the database.

References crawlservpp::Struct::ThreadOptions::website.

Referenced by crawlservpp::Module::Crawler::Thread::onReset(), and crawlservpp::Module::Thread::Thread().

◆ getXml()

bool crawlservpp::Query::Container::getXml ( std::string &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Parses the current query target as tidied XML and writes it to the given string.

Parameters
resultToReference to a string the parsed query target will be written to.
warningsToReference to a vector of strings to which warnings that occured during parsing will be appended.
Returns
True, if the parsing was successful and the tidied XML was written to the given string. False, if the parsing was not successful and the given string has not been changed.

References crawlservpp::Parsing::XML::getContent().

Referenced by crawlservpp::Module::Crawler::Thread::onReset().

◆ incrementLast()

void crawlservpp::Module::Thread::incrementLast ( )
protectedinherited

Increments the last ID processed by the thread.

Also sets the number of processed IDs, make sure to increment it before if the ID has been processed.

Warning
May only be used by the thread itself, not by the main thread!
See also
incrementProcessed, setLast, Main::Database::setThreadLast

References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::setThreadLast().

◆ incrementProcessed()

void crawlservpp::Module::Thread::incrementProcessed ( )
protectedinherited

Increments the number of IDs processed by the thread.

Warning
May only be used by the thread itself, not by the main thread!
See also
setLast, incrementLast

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ initQueries()

void crawlservpp::Module::Analyzer::Thread::initQueries ( )
overrideprotectedvirtualinherited

Does nothing.

To be overwritten by algorithms that use their own queries.

Implements crawlservpp::Query::Container.

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries().

◆ isFinished()

bool crawlservpp::Module::Thread::isFinished ( ) const
inherited

Checks whether the shutdown of the thread has been finished.

Thread-safe: Can be used by both the module and the main thread.

Returns
True, if the thread has been completely shut down. False otherwise.

◆ isInterrupted()

bool crawlservpp::Module::Thread::isInterrupted ( ) const
protectedinherited

Checks whether the thread has been interrupted.

Thread-safe: Can be used by both the module and the main thread.

Returns
True if the thread has been interrupted. False otherwise.

◆ isLogLevel()

bool crawlservpp::Module::Thread::isLogLevel ( std::uint8_t  level) const
protectedinherited

Checks whether a certain logging level is enabled.

Warning
May only be used by the thread itself, not by the main thread!
Parameters
levelThe logging level to be checked for.
Returns
True, if the current logging level is at least as high as the given level. False, if the current logging level is lower than the given one.

References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::isLogLevel().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().

◆ isPaused()

bool crawlservpp::Module::Thread::isPaused ( ) const
inherited

Checks whether the thread has been paused.

Thread-safe: Can be used by both the module and the main thread.

Returns
True, if the thread has been paused. False otherwise.

◆ isQueryUsed()

bool crawlservpp::Query::Container::isQueryUsed ( std::uint64_t  queryId) const
inlineinherited

Checks whether the specified query is used by the container.

Thread-safe. This function can be used by any thread.

Parameters
queryIdID of the query to be checked.

◆ isRunning()

bool crawlservpp::Module::Thread::isRunning ( ) const
inherited

Checks whether the thread is still supposed to run.

Thread-safe: Can be used by both the module and the main thread.

Returns
True, if the thread is has not been cancelled, even when it is paused. False, if the thread is not supposed to run any longer.

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Analyzer::Thread::onInit(), crawlservpp::Module::Parser::Thread::onInit(), crawlservpp::Module::Extractor::Thread::onInit(), crawlservpp::Module::Crawler::Thread::onInit(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AllTokens::resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), and resetAlgo().

◆ isShutdown()

bool crawlservpp::Module::Thread::isShutdown ( ) const
inherited

Checks whether the thread is shutting down or has shut down.

Thread-safe: Can be used by both the module and the main thread.

Returns
True, if the thread is shutting down or has been shut down. False, if the thread is continuing to run.
See also
Thread::isFinished

◆ loadConfig()

void crawlservpp::Module::Config::loadConfig ( const std::string &  configJson,
LogQueue &  warningsTo 
)
inlineinherited

Loads a configuration.

Parameters
configJsonConstant reference to a string containing the configuration as JSON.
warningsToReference to a queue to which warnings will be added that occur during the parsing of the configuration, also known as the "logging queue".
Exceptions
Module::Config::Exceptionif the configuration JSON cannot be parsed.

References crawlservpp::Struct::ConfigItem::category, crawlservpp::Module::Config::checkOptions(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::parseBasicOption(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::ConfigItem::str(), crawlservpp::Struct::ConfigItem::value, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ log() [1/2]

void crawlservpp::Module::Thread::log ( std::uint8_t  level,
const std::string &  logEntry 
)
protectedinherited

Adds a thread-specific log entry to the database, if the current logging level is high enough.

Removes invalid UTF-8 characters if necessary.

If debug logging is active, the entry will be written to the logging file as well.

The log entry will not be written to the database, if the current logging level is lower than the specified logging level. The logging level does not affect the writing of logging entries being to the logging file when debug logging is active.

Warning
May only be used by the thread itself, not by the main thread!
Note
String views cannot be used, because they are not supported by the API for the MySQL database.
Parameters
levelThe logging level for the entry. The entry will only be written to the database, if the current logging level is at least the logging level for the entry.
logEntryConstant reference to a string containing the log entry.
See also
Module::Database::log(std::uint8_t, const std::string&)

References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::log().

Referenced by crawlservpp::Module::Analyzer::Thread::addCorpora(), checkAlgoOptions(), crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Thread::log(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoTick(), crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Extractor::Thread::onClear(), crawlservpp::Module::Crawler::Thread::onClear(), crawlservpp::Module::Analyzer::Thread::onReset(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Extractor::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AllTokens::resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), resetAlgo(), and crawlservpp::Module::Analyzer::Thread::uploadResult().

◆ log() [2/2]

void crawlservpp::Module::Thread::log ( std::uint8_t  level,
std::queue< std::string > &  logEntries 
)
protectedinherited

Adds multiple thread-specific log entries to the database, if the current logging level is high enough.

Removes invalid UTF-8 characters if necessary.

If debug logging is active, the entries will be written to the logging file as well.

The log entries will not be written to the database, if the current logging level is lower than the specified logging level. The logging level does not affect the writing of logging entries being to the logging file when debug logging is active.

Warning
May only be used by the thread itself, not by the main thread!
Note
String views cannot be used, because they are not supported by the API for the MySQL database.
Parameters
levelThe logging level for the entries. The entries will only be written to the database, if the current logging level is at least the logging level for the entry.
logEntriesReference to a queue of strings containing the log entries to be written. It will be emptied regardless whether the log entries will be written to the database.
See also
Module::Database::log(std::uint8_t, std::queue<std::string>&)

References crawlservpp::Main::Database::connect(), crawlservpp::Module::Thread::database, crawlservpp::Module::Thread::getStatusMessage(), crawlservpp::Main::Database::getThreadPauseTime(), crawlservpp::Main::Database::getThreadRunTime(), crawlservpp::Module::Database::log(), crawlservpp::Module::Thread::log(), crawlservpp::Helper::DateTime::now(), crawlservpp::Module::Thread::onClear(), crawlservpp::Module::Thread::onInit(), crawlservpp::Module::Thread::onPause(), crawlservpp::Module::Thread::onReset(), crawlservpp::Module::Thread::onTick(), crawlservpp::Module::Thread::onUnpause(), crawlservpp::Module::Thread::pause(), crawlservpp::Module::Thread::pauseByThread(), crawlservpp::Module::Database::prepare(), crawlservpp::Helper::DateTime::secondsToString(), crawlservpp::Module::Thread::setLast(), crawlservpp::Module::Thread::setStatusMessage(), crawlservpp::Main::Database::setThreadPauseTime(), crawlservpp::Main::Database::setThreadRunTime(), and crawlservpp::Module::sleepOnConnectionErrorS.

◆ nextSubSet()

bool crawlservpp::Query::Container::nextSubSet ( )
inlineprotectedinherited

Requests the next subset for all subsequent queries.

Returns
True, if another subset existed that will be used by subsequent queries. False, if no more subsets exist.
Exceptions
Container::Exceptionif an invalid subset had previously been selected.

References crawlservpp::Helper::Memory::free(), crawlservpp::Helper::Json::free(), crawlservpp::Helper::Memory::freeIf(), crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, and crawlservpp::Struct::QueryStruct::typeXPath.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ onAlgoClear()

void crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoClear ( )
overridevirtual

Does nothing.

Implements crawlservpp::Module::Analyzer::Thread.

◆ onAlgoInit()

void crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInit ( )
overridevirtual

Initializes the algorithm and processes its input.

Note
When this function is called, both the prepared SQL statements, and the queries have already been initialized.
See also
initQueries

Implements crawlservpp::Module::Analyzer::Thread.

References crawlservpp::Module::Analyzer::generalLoggingExtended, crawlservpp::Module::Thread::isRunning(), crawlservpp::Module::Thread::log(), crawlservpp::Module::Thread::setProgress(), crawlservpp::Module::Thread::setStatusMessage(), and crawlservpp::Timer::Simple::tick().

◆ onAlgoInitTarget()

void crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInitTarget ( )
overridevirtual

Initializes the target table for the algorithm.

Note
When this function is called, neither the prepared SQL statements, nor the queries have been initialized yet.

Implements crawlservpp::Module::Analyzer::Thread.

References crawlservpp::Module::Analyzer::Thread::database, crawlservpp::Module::Analyzer::Database::initTargetTable(), and crawlservpp::Module::Analyzer::Database::setTargetFields().

◆ onAlgoPause()

void crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoPause ( )
overridevirtual

Does nothing.

Implements crawlservpp::Module::Analyzer::Thread.

◆ onAlgoTick()

void crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoTick ( )
overridevirtual

◆ onAlgoUnpause()

void crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoUnpause ( )
overridevirtual

Does nothing.

Implements crawlservpp::Module::Analyzer::Thread.

◆ onClear()

void crawlservpp::Module::Analyzer::Thread::onClear ( )
overrideprotectedvirtualinherited

◆ onInit()

void crawlservpp::Module::Analyzer::Thread::onInit ( )
overrideprotectedvirtualinherited

Initializes the analyzer, the target table, and the algorithm.

See also
onAlgoInit, onAlgoInitTarget

Implements crawlservpp::Module::Thread.

References crawlservpp::Module::Thread::isRunning().

Referenced by crawlservpp::Module::Analyzer::Thread::onReset().

◆ onPause()

void crawlservpp::Module::Analyzer::Thread::onPause ( )
overrideprotectedvirtualinherited

◆ onReset()

◆ onTick()

◆ onUnpause()

void crawlservpp::Module::Analyzer::Thread::onUnpause ( )
overrideprotectedvirtualinherited

◆ option() [1/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
bool &  target 
)
inlineprotectedinherited

Checks for a configuration option of type bool.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a boolean variable into which the value of the configuration entry will be written if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

Referenced by crawlservpp::Module::Analyzer::Algo::Assoc::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AllTokens::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::parseAlgoOption(), parseAlgoOption(), crawlservpp::Network::Config::parseBasicOption(), crawlservpp::Module::Parser::Config::parseOption(), crawlservpp::Module::Analyzer::Config::parseOption(), crawlservpp::Module::Crawler::Config::parseOption(), and crawlservpp::Module::Extractor::Config::parseOption().

◆ option() [2/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< bool > &  target 
)
inlineprotectedinherited

Checks for a configuration option of type array of bool's.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector of bool's into which the value of the configuration entry will be written if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [3/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
char &  target,
CharParsingOption  opt 
)
inlineprotectedinherited

Checks for a configuration option of type char.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a variable of the type char into which the value of the configuration entry will be written if it is encountered.
optParsing options used for the configuration option.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Module::Config::FromNumber, crawlservpp::Module::Config::FromString, crawlservpp::Helper::Strings::getFirstOrEscapeChar(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [4/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< char > &  target,
CharParsingOption  opt 
)
inlineprotectedinherited

Checks for a configuration option of type array of char's.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector of char's into which the value of the configuration entry will be written if it is encountered.
optParsing options used for the configuration option.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Module::Config::FromNumber, crawlservpp::Module::Config::FromString, crawlservpp::Helper::Strings::getFirstOrEscapeChar(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [5/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::int16_t &  target 
)
inlineprotectedinherited

Checks for a configuration option of type 16-bit integer.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a variable into which the value of the configuration entry will be written if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [6/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< std::int16_t > &  target 
)
inlineprotectedinherited

Checks for a configuration option of type array of 16-bit integers.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector into which the value of the configuration entry will be stored if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [7/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::int32_t &  target 
)
inlineprotectedinherited

Checks for a configuration option of type 32-bit integer.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a variable into which the value of the configuration entry will be written if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [8/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< std::int32_t > &  target 
)
inlineprotectedinherited

Checks for a configuration option of type array of 32-bit integers.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector into which the value of the configuration entry will be stored if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [9/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::int64_t &  target 
)
inlineprotectedinherited

Checks for a configuration option of type 64-bit integer.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a variable into which the value of the configuration entry will be written if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [10/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< std::int64_t > &  target 
)
inlineprotectedinherited

Checks for a configuration option of type array of 64-bit integers.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector into which the value of the configuration entry will be stored if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [11/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::uint8_t &  target 
)
inlineprotectedinherited

Checks for a configuration option of type unsigned 8-bit integer.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a variable into which the value of the configuration entry will be written if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [12/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< std::uint8_t > &  target 
)
inlineprotectedinherited

Checks for a configuration option of type array of unsigned 8-bit integers.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector into which the value of the configuration entry will be stored if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [13/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::uint16_t &  target 
)
inlineprotectedinherited

Checks for a configuration option of type unsigned 16-bit integer.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a variable into which the value of the configuration entry will be written if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [14/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< std::uint16_t > &  target 
)
inlineprotectedinherited

Checks for a configuration option of type array of unsigned 16-bit integers.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector into which the value of the configuration entry will be stored if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [15/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::uint32_t &  target 
)
inlineprotectedinherited

Checks for a configuration option of type unsigned 32-bit integer.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a variable into which the value of the configuration entry will be written if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [16/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< std::uint32_t > &  target 
)
inlineprotectedinherited

Checks for a configuration option of type array of unsigned 32-bit integers.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector into which the value of the configuration entry will be stored if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [17/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::uint64_t &  target 
)
inlineprotectedinherited

Checks for a configuration option of type unsigned 64-bit integer.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a variable into which the value of the configuration entry will be written if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [18/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< std::uint64_t > &  target 
)
inlineprotectedinherited

Checks for a configuration option of type array of unsigned 64-bit integers.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector into which the value of the configuration entry will be stored if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [19/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
float &  target 
)
inlineprotectedinherited

Checks for a configuration option of type floating-point number.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a variable into which the value of the configuration entry will be written if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [20/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< float > &  target 
)
inlineprotectedinherited

Checks for a configuration option of type array of floating-point numbers.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector into which the value of the configuration entry will be stored if it is encountered.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.

◆ option() [21/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::string &  target,
StringParsingOption  opt = Default 
)
inlineprotectedinherited

Checks for a configuration option of type string.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a string into which the value of the configuration entry will be stored if it is encountered.
optParsing option for the configuration entry.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Helper::Strings::checkSQLName(), crawlservpp::Module::Config::Default, crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::SQL, crawlservpp::Struct::ConfigItem::str(), crawlservpp::Module::Config::SubURL, crawlservpp::Helper::Strings::trim(), crawlservpp::Module::Config::Trim, crawlservpp::Module::Config::URL, and crawlservpp::Struct::ConfigItem::value.

◆ option() [22/22]

void crawlservpp::Module::Config::option ( const std::string &  name,
std::vector< std::string > &  target,
StringParsingOption  opt = Default 
)
inlineprotectedinherited

Checks for a configuration option of type array of strings.

Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.

Parameters
nameConstant reference to a string containing the name of the option to check for.
targetReference to a vector into which the value of the configuration entry will be stored if it is encountered.
optParsing option for the configuration entry.
Exceptions
Module::Config::Exceptionif no category has been set.
See also
category

References crawlservpp::Helper::Strings::checkSQLName(), crawlservpp::Module::Config::Default, crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::SQL, crawlservpp::Struct::ConfigItem::str(), crawlservpp::Module::Config::SubURL, crawlservpp::Helper::Strings::trim(), crawlservpp::Module::Config::Trim, crawlservpp::Module::Config::URL, and crawlservpp::Struct::ConfigItem::value.

◆ parseAlgoOption()

void crawlservpp::Module::Analyzer::Algo::TopicModelling::parseAlgoOption ( )
overridevirtual

Parses a configuration option for the algorithm.

Implements crawlservpp::Module::Analyzer::Config.

References crawlservpp::Module::Config::category(), and crawlservpp::Module::Config::option().

◆ parseBasicOption()

void crawlservpp::Module::Config::parseBasicOption ( )
inlineprotectedvirtualinherited

Parses a basic option.

Might be overridden by child classes.

Can be used by abstract classes to add additional configuration entries without being the final implementation, as in Network::Config.

Warning
Any reimplementation needs to call parseOption() for the configuration to work properly.
See also
Network::Config::parseBasicOption

Reimplemented in crawlservpp::Network::Config.

References crawlservpp::Module::Config::parseOption().

Referenced by crawlservpp::Module::Config::loadConfig().

◆ parseOption()

void crawlservpp::Module::Analyzer::Config::parseOption ( )
inlineoverrideprotectedvirtualinherited

Parses an analyzer-specific configuration option.

Implements crawlservpp::Module::Config.

References crawlservpp::Module::Config::category(), crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::Config::Entries::filterDateEnable, crawlservpp::Module::Analyzer::Config::Entries::filterDateFrom, crawlservpp::Module::Analyzer::Config::Entries::filterDateTo, crawlservpp::Module::Analyzer::Config::Entries::filterQueryAll, crawlservpp::Module::Analyzer::Config::Entries::filterQueryQueries, crawlservpp::Module::Analyzer::Config::Entries::generalCorpusChecks, crawlservpp::Module::Analyzer::Config::Entries::generalCorpusSlicing, crawlservpp::Module::Analyzer::Config::Entries::generalInputFields, crawlservpp::Module::Analyzer::Config::Entries::generalInputSources, crawlservpp::Module::Analyzer::Config::Entries::generalInputTables, crawlservpp::Module::Analyzer::Config::Entries::generalLogging, crawlservpp::Module::Analyzer::Config::Entries::generalRestartAfter, crawlservpp::Module::Analyzer::Config::Entries::generalSleepMySql, crawlservpp::Module::Analyzer::Config::Entries::generalSleepWhenFinished, crawlservpp::Module::Analyzer::Config::Entries::generalTargetTable, crawlservpp::Module::Analyzer::Config::Entries::groupDateFillGaps, crawlservpp::Module::Analyzer::Config::Entries::groupDateResolution, crawlservpp::Module::Config::option(), crawlservpp::Module::Analyzer::Config::parseAlgoOption(), crawlservpp::Module::Analyzer::Config::Entries::tokenizerDicts, crawlservpp::Module::Analyzer::Config::Entries::tokenizerFreeMemoryEvery, crawlservpp::Module::Analyzer::Config::Entries::tokenizerLanguages, crawlservpp::Module::Analyzer::Config::Entries::tokenizerManipulators, crawlservpp::Module::Analyzer::Config::Entries::tokenizerModels, crawlservpp::Module::Analyzer::Config::Entries::tokenizerSavePoints, crawlservpp::Module::Analyzer::Config::Entries::uploadFTP, crawlservpp::Module::Analyzer::Config::Entries::uploadProxy, crawlservpp::Module::Analyzer::Config::Entries::uploadTargetColumn, and crawlservpp::Module::Analyzer::Config::Entries::uploadVerbose.

◆ pause()

void crawlservpp::Module::Analyzer::Thread::pause ( )
protectedinherited

Pauses the thread.

Shadows Module::Thread::pause(), which should not be used by the thread.

See also
pauseByThread

References crawlservpp::Module::Thread::pauseByThread().

◆ pauseByThread()

void crawlservpp::Module::Thread::pauseByThread ( )
protectedinherited

◆ reserveForSubSets()

◆ reset() [1/2]

void crawlservpp::Module::Thread::reset ( )
inherited

Will reset the thread before the next tick.

◆ reset() [2/2]

void crawlservpp::Module::Analyzer::Config::reset ( )
inlineoverrideprotectedvirtualinherited

Resets the analyzer-specific configuration options.

Implements crawlservpp::Module::Config.

References crawlservpp::Module::Analyzer::Config::config, and crawlservpp::Module::Analyzer::Config::resetAlgo().

◆ resetAlgo()

void crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo ( )
overridevirtual

Resets the algorithm.

Implements crawlservpp::Module::Analyzer::Config.

References crawlservpp::Data::_double, crawlservpp::Data::_string, crawlservpp::Data::_uint64, crawlservpp::Module::Analyzer::Database::addAdditionalTable(), crawlservpp::Module::Analyzer::Thread::addCorpora(), crawlservpp::Data::TopicModel::addDocument(), crawlservpp::Wrapper::Database::addTargetColumn(), crawlservpp::Helper::Math::almostEqual(), crawlservpp::Struct::StatusSetter::change(), crawlservpp::Module::Analyzer::Thread::checkCorpusSources(), crawlservpp::Module::Analyzer::Thread::cleanUpCorpora(), crawlservpp::Timer::Simple::clear(), crawlservpp::Data::TopicModel::clear(), crawlservpp::Data::InsertFieldsMixed::columns_types_values, crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::Thread::corpora, crawlservpp::Module::Analyzer::Thread::database, crawlservpp::Struct::StatusSetter::finish(), crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Helper::Memory::free(), crawlservpp::Module::Analyzer::generalLoggingDefault, crawlservpp::Module::Analyzer::generalLoggingVerbose, crawlservpp::Module::Analyzer::Config::Entries::generalTargetTable, crawlservpp::Module::Analyzer::Database::getAdditionalTableName(), crawlservpp::Data::TopicModel::getDocumentsTopics(), crawlservpp::Data::TopicModel::getModelInfo(), crawlservpp::Data::TopicModel::getNumberOfTopics(), crawlservpp::Helper::FileSystem::getPathSeparator(), crawlservpp::Module::Analyzer::Thread::getTargetTableName(), crawlservpp::Data::Corpus::getTokenized(), crawlservpp::Data::TopicModel::getTopics(), crawlservpp::Data::TopicModel::getTopicsSorted(), crawlservpp::Data::TopicModel::getTopicTopNLabels(), crawlservpp::Data::TopicModel::getTopicTopNTokens(), crawlservpp::Wrapper::Database::insertCustomData(), crawlservpp::Module::Thread::isRunning(), crawlservpp::Data::TopicModel::label(), crawlservpp::Struct::TextMapEntry::length(), crawlservpp::Data::TopicModel::load(), crawlservpp::Helper::DotLocale::locale(), crawlservpp::Helper::CommaLocale::locale(), crawlservpp::Module::Thread::log(), crawlservpp::Struct::TextMapEntry::pos(), crawlservpp::Helper::Queue::reverse(), crawlservpp::Data::TopicModel::save(), crawlservpp::Data::TopicModel::setBurnInIteration(), crawlservpp::Data::TopicModel::setFixedNumberOfTopics(), crawlservpp::Data::TopicModel::setInitialParameters(), crawlservpp::Data::TopicModel::setLabelingOptions(), crawlservpp::Data::TopicModel::setParameterOptimizationInterval(), crawlservpp::Module::Thread::setProgress(), crawlservpp::Data::TopicModel::setRandomNumberGenerationSeed(), crawlservpp::Module::Thread::setStatusMessage(), crawlservpp::Data::TopicModel::setTokenRemoval(), crawlservpp::Data::TopicModel::setUseIdf(), crawlservpp::Data::TopicModel::startTraining(), crawlservpp::Data::InsertFieldsMixed::table, crawlservpp::Timer::Simple::tick(), crawlservpp::Timer::Simple::tickStr(), crawlservpp::Module::Analyzer::Algo::topicModellingColumnsPerLabel, crawlservpp::Module::Analyzer::Algo::topicModellingColumnsPerToken, crawlservpp::Module::Analyzer::Algo::topicModellingDirectory, crawlservpp::Module::Analyzer::Algo::topicModellingPrecisionLL, crawlservpp::Module::Analyzer::Algo::topicModellingPrecisionUlp, crawlservpp::Module::Analyzer::Algo::topicModellingTargetColumns, crawlservpp::Module::Analyzer::Algo::topicModellingTopicColumns, crawlservpp::Module::Analyzer::Algo::topicModellingUpdateProgressEvery, crawlservpp::Module::Analyzer::Algo::topicModellingUpdateProgressEveryDocs, crawlservpp::Struct::TopicModelInfo::toQueueOfStrings(), crawlservpp::Data::TopicModel::train(), crawlservpp::Struct::StatusSetter::update(), crawlservpp::Module::Analyzer::Database::updateAdditionalTable(), and crawlservpp::Module::Analyzer::Database::updateTargetTable().

◆ resetBase()

void crawlservpp::Module::Config::resetBase ( )
inlineprotectedvirtualinherited

Resets basic options.

Might be overridden by child classes.

Can be used by abstract classes to reset additional configuration entries without being the final implementation, as in Network::Config.

Warning
Any reimplementation needs to call reset() for the reset to work properly.
See also
Network::Config::resetBase

Reimplemented in crawlservpp::Network::Config.

References crawlservpp::Module::protocols, and crawlservpp::Module::Config::reset().

Referenced by crawlservpp::Module::Analyzer::Thread::onReset(), and crawlservpp::Module::Parser::Thread::onReset().

◆ setLast()

void crawlservpp::Module::Thread::setLast ( std::uint64_t  lastId)
protectedinherited

Sets the last ID processed by the thread.

Also sets the number of processed IDs, make sure to increment it before if the ID has been processed.

Warning
May only be used by the thread itself, not by the main thread!
Parameters
lastIdThe last ID processed by the thread.
See also
incrementProcessed, incrementLast, Main::Database::setThreadLast

References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::setThreadLast().

Referenced by crawlservpp::Module::Thread::log(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setMinimizeMemory()

void crawlservpp::Query::Container::setMinimizeMemory ( bool  isMinimizeMemory)
inlineprotectedinherited

Sets whether to minimize memory usage.

Note
Setting memory minimization to true might negatively affect performance.
Parameters
isMinimizeMemorySet whether to minimize memory usage, prioritizing memory usage over performance.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ setProgress()

void crawlservpp::Module::Thread::setProgress ( float  newProgress)
protectedinherited

Sets the progress of the thread.

Warning
May only be used by the thread itself, not by the main thread!
Parameters
newProgressThe new progress of the thread, between 0.f (none), and 1.f (done).
See also
Main::Database::setThreadProgress

References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::setThreadProgress().

Referenced by crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Analyzer::Thread::onTick(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Extractor::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AllTokens::resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), and resetAlgo().

◆ setQueryTarget()

void crawlservpp::Query::Container::setQueryTarget ( const std::string &  content,
const std::string &  source 
)
inlineprotectedinherited

Sets the content to use the managed queries on.

The old query target referencing the old content will be cleared.

Warning
Pointers to the strings will be saved in-class. Make sure the strings remain valid as long as they are used!
Parameters
contentConstant reference to a string containing the content to use the managed queries on.
sourceConstant reference to a string containing the source (URL) of the content. It will be used for logging and error reporting purposes only.

References crawlservpp::Query::Container::clearQueryTarget().

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setRemoveXmlInstructions()

void crawlservpp::Query::Container::setRemoveXmlInstructions ( bool  isRemoveXmlInstructions)
inlineprotectedinherited

Sets whether to remove XML processing instructions (<?xml:...>) before parsing HTML/XML content.

Parameters
isRemoveXmlInstructionsSets whether to remove XML processing instructions.

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setRepairCData()

void crawlservpp::Query::Container::setRepairCData ( bool  isRepairCData)
inlineprotectedinherited

Sets whether to try to repair CData when parsing XML.

Parameters
isRepairCDataSet whether to try to repair CData when parsing XML.

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setRepairComments()

void crawlservpp::Query::Container::setRepairComments ( bool  isRepairComments)
inlineprotectedinherited

Sets whether to try to repair broken HTML/XML comments.

Parameters
isRepairCommentsSet whether to try to repair broken HTML/XML comments.

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setStatusMessage()

void crawlservpp::Module::Thread::setStatusMessage ( const std::string &  statusMessage)
protectedinherited

Sets the status message of the thread.

Warning
May only be used by the thread itself, not by the main thread!
Note
String views cannot be used, because they are not supported by the API for the MySQL database.
Parameters
statusMessageConstant reference to a string containing the new status message to be set.
See also
Main::Database::setThreadStatus

References crawlservpp::Module::Thread::database, and crawlservpp::Main::Database::setThreadStatus().

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Thread::log(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoTick(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoTick(), onAlgoTick(), crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Extractor::Thread::onClear(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Extractor::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AllTokens::resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), resetAlgo(), and crawlservpp::Module::Analyzer::Thread::uploadResult().

◆ setSubSetsFromQuery()

bool crawlservpp::Query::Container::setSubSetsFromQuery ( const QueryStruct query,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Sets subsets for subsequent queries using a query of any type.

The subsets resulting from the query will be saved in-class. Previous subsets will be overwritten.

Parameters
queryA constant reference to a structure identifying the query that will be performed to acquire the subset.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ setTidyErrorsAndWarnings()

void crawlservpp::Query::Container::setTidyErrorsAndWarnings ( bool  warnings,
std::uint32_t  numOfErrors 
)
inlineprotectedinherited

Sets how tidy-html5 reports errors and warnings.

The reporting of both errors and warnings is deactivated by default.

For more information about tidy-html5, see its GitHub repository.

Parameters
warningsSpecify whether to report simple warnings.
numOfErrorsSet the number of errors to be reported. Set to zero to deactivate error reporting.

References crawlservpp::Parsing::XML::setOptions().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ sleep()

void crawlservpp::Module::Thread::sleep ( std::uint64_t  ms) const
protectedinherited

Lets the thread sleep for the specified number of milliseconds.

The sleep will be interrupted if the thread is stopped.

Thread-safe: Can be used by both the module and the main thread.

Parameters
msThe number of milliseconds for the thread to sleep, if it is not stopped.

References crawlservpp::Module::sleepMs.

Referenced by crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoTick(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Analyzer::Thread::onTick(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().

◆ uploadResult()

◆ warning()

◆ warpTo()

void crawlservpp::Module::Thread::warpTo ( std::uint64_t  target)
inherited

Jumps to the specified target ID ("time travel").

Skips the normal process of determining the next ID once the current ID has been processed.

Thread-safe: Can be used by both the module and the main thread.

Parameters
targetThe target ID that should be processed next.
Exceptions
Module::Thread::Exceptionif no target is specified, i.e. the target ID is zero.
See also
getWarpedOverAndReset

Member Data Documentation

◆ config

◆ configuration

std::string crawlservpp::Module::Thread::configuration
protectedinherited

JSON string of the configuration used by the thread.

See also
Main::Database::getConfiguration

Referenced by crawlservpp::Module::Thread::Thread().

◆ corpora

◆ database

◆ urlListNamespace

◆ websiteNamespace


The documentation for this class was generated from the following files: