crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
Counts all tokens in a corpus. More...
#include <AllTokens.hpp>
Construction | |
AllTokens (Main::Database &dbBase, const ThreadOptions &threadOptions, const ThreadStatus &threadStatus) | |
Continues a previously interrupted algorithm run. More... | |
AllTokens (Main::Database &dbBase, const ThreadOptions &threadOptions) | |
Starts a new algorithm run. More... | |
Implemented Getter | |
std::string_view | getName () const override |
Returns the name of the algorithm. More... | |
Implemented Algorithm Functions | |
void | onAlgoInitTarget () override |
Initializes the target table for the algorithm. More... | |
void | onAlgoInit () override |
Initializes the algorithm and processes its input. More... | |
void | onAlgoTick () override |
Counts tokens in the current date, article, or token. More... | |
void | onAlgoPause () override |
Does nothing. More... | |
void | onAlgoUnpause () override |
Unpauses the algorithm. More... | |
void | onAlgoClear () override |
Does nothing. More... | |
Implemented Configuration Functions | |
void | parseAlgoOption () override |
Parses a configuration option for the algorithm. More... | |
void | checkAlgoOptions () override |
Checks the configuration options for the algorithm. More... | |
void | resetAlgo () override |
Resets the algorithm. More... | |
Configuration | |
struct crawlservpp::Module::Analyzer::Config::Entries | config |
Configuration of the analyzer. More... | |
Analyzer-Specific Configuration Parsing | |
void | reset () override |
Resets the analyzer-specific configuration options. More... | |
void | parseOption () override |
Parses an analyzer-specific configuration option. More... | |
void | checkOptions () override |
Checks the analyzer-specific configuration options. More... | |
Database Connection | |
Database | database |
Database connection for the analyzer thread. More... | |
Corpora | |
std::vector< Corpus > | corpora |
Vector of corpora for the analyzer thread. More... | |
Implemented Thread Functions | |
void | onInit () override |
Initializes the analyzer, the target table, and the algorithm. More... | |
void | onTick () override |
Performs an algorithm tick. More... | |
void | onPause () override |
Pauses the analyzer. More... | |
void | onUnpause () override |
Unpauses the analyzer. More... | |
void | onClear () override |
Clears the algorithm. More... | |
void | onReset () override |
Resets the algorithm. More... | |
Query Functions | |
void | initQueries () override |
Does nothing. More... | |
void | deleteQueries () override |
Does nothing. More... | |
void | addOptionalQuery (std::uint64_t queryId, QueryStruct &propertiesTo) |
Adds an optional query. More... | |
void | addQueries (const std::vector< std::uint64_t > &queryIds, std::vector< QueryStruct > &propertiesTo) |
Adds multiple queries at once, ignoring empty ones. More... | |
Thread Control for Algorithms | |
void | finished () |
Sets the status of the analyzer to finished. More... | |
void | pause () |
Pauses the thread. More... | |
Helper Functions for Algorithms | |
std::string | getTargetTableName () const |
Gets the full name of the target table. More... | |
bool | addCorpora (bool isCombine, StatusSetter &statusSetter) |
Gets the contents of all corpora, filters and combines them if necessary. More... | |
void | checkCorpusSources (StatusSetter &statusSetter) |
Checks the specified sources for creating the corpus. More... | |
Helper Functions for Clean-up | |
void | uploadResult () |
Upload the specified result via FTP. More... | |
void | cleanUpCorpora () |
Clean up all corpora and free their memory. More... | |
void | cleanUpQueries () |
Clean up all queries and free their memory. More... | |
Configuration Loader | |
void | loadConfig (const std::string &configJson, LogQueue &warningsTo) |
Loads a configuration. More... | |
Parsing Options | |
enum | StringParsingOption { Default = 0, SQL, SubURL, URL, Trim } |
Options for parsing strings. More... | |
enum | CharParsingOption { FromNumber = 0, FromString } |
Options for parsing char's . More... | |
Configuration Parsing | |
void | category (const std::string &category) |
Sets the category of the subsequent configuration items to be checked for. More... | |
void | option (const std::string &name, bool &target) |
Checks for a configuration option of type bool . More... | |
void | option (const std::string &name, std::vector< bool > &target) |
Checks for a configuration option of type array of bool's . More... | |
void | option (const std::string &name, char &target, CharParsingOption opt) |
Checks for a configuration option of type char . More... | |
void | option (const std::string &name, std::vector< char > &target, CharParsingOption opt) |
Checks for a configuration option of type array of char's . More... | |
void | option (const std::string &name, std::int16_t &target) |
Checks for a configuration option of type 16-bit integer. More... | |
void | option (const std::string &name, std::vector< std::int16_t > &target) |
Checks for a configuration option of type array of 16-bit integers. More... | |
void | option (const std::string &name, std::int32_t &target) |
Checks for a configuration option of type 32-bit integer. More... | |
void | option (const std::string &name, std::vector< std::int32_t > &target) |
Checks for a configuration option of type array of 32-bit integers. More... | |
void | option (const std::string &name, std::int64_t &target) |
Checks for a configuration option of type 64-bit integer. More... | |
void | option (const std::string &name, std::vector< std::int64_t > &target) |
Checks for a configuration option of type array of 64-bit integers. More... | |
void | option (const std::string &name, std::uint8_t &target) |
Checks for a configuration option of type unsigned 8-bit integer. More... | |
void | option (const std::string &name, std::vector< std::uint8_t > &target) |
Checks for a configuration option of type array of unsigned 8-bit integers. More... | |
void | option (const std::string &name, std::uint16_t &target) |
Checks for a configuration option of type unsigned 16-bit integer. More... | |
void | option (const std::string &name, std::vector< std::uint16_t > &target) |
Checks for a configuration option of type array of unsigned 16-bit integers. More... | |
void | option (const std::string &name, std::uint32_t &target) |
Checks for a configuration option of type unsigned 32-bit integer. More... | |
void | option (const std::string &name, std::vector< std::uint32_t > &target) |
Checks for a configuration option of type array of unsigned 32-bit integers. More... | |
void | option (const std::string &name, std::uint64_t &target) |
Checks for a configuration option of type unsigned 64-bit integer. More... | |
void | option (const std::string &name, std::vector< std::uint64_t > &target) |
Checks for a configuration option of type array of unsigned 64-bit integers. More... | |
void | option (const std::string &name, float &target) |
Checks for a configuration option of type floating-point number. More... | |
void | option (const std::string &name, std::vector< float > &target) |
Checks for a configuration option of type array of floating-point numbers. More... | |
void | option (const std::string &name, std::string &target, StringParsingOption opt=Default) |
Checks for a configuration option of type string. More... | |
void | option (const std::string &name, std::vector< std::string > &target, StringParsingOption opt=Default) |
Checks for a configuration option of type array of strings. More... | |
void | warning (const std::string &warning) |
Adds a warning to the logging queue. More... | |
Module-specific Configuration Parsing | |
virtual void | parseBasicOption () |
Parses a basic option. More... | |
virtual void | resetBase () |
Resets basic options. More... | |
Getters | |
std::uint64_t | getId () const |
Gets the ID of the thread. More... | |
std::uint64_t | getWebsite () const |
Gets the ID of the website used by the thread. More... | |
std::uint64_t | getUrlList () const |
Gets the ID of the URL list used by the thread. More... | |
std::uint64_t | getConfig () const |
Gets the ID of the configuration used by the thread. More... | |
bool | isShutdown () const |
Checks whether the thread is shutting down or has shut down. More... | |
bool | isRunning () const |
Checks whether the thread is still supposed to run. More... | |
bool | isFinished () const |
Checks whether the shutdown of the thread has been finished. More... | |
bool | isPaused () const |
Checks whether the thread has been paused. More... | |
Thread Control | |
void | end () |
Waits for the thread until shutdown is completed. More... | |
void | reset () |
Will reset the thread before the next tick. More... | |
Time Travel | |
void | warpTo (std::uint64_t target) |
Jumps to the specified target ID ("time travel"). More... | |
Configuration | |
std::string | websiteNamespace |
Namespace of the website used by the thread. More... | |
std::string | urlListNamespace |
Namespace of the URL list used by the thread. More... | |
std::string | configuration |
JSON string of the configuration used by the thread. More... | |
Protected Getters | |
bool | isInterrupted () const |
Checks whether the thread has been interrupted. More... | |
std::string | getStatusMessage () const |
Gets the current status message. More... | |
float | getProgress () const |
Gets the current progress, in percent. More... | |
std::uint64_t | getLast () const |
Gets the value of the last ID processed by the thread. More... | |
std::int64_t | getWarpedOverAndReset () |
Gets the number of IDs that have been jumped over, and resets them. More... | |
Protected Setters | |
void | setStatusMessage (const std::string &statusMessage) |
Sets the status message of the thread. More... | |
void | setProgress (float newProgress) |
Sets the progress of the thread. More... | |
void | setLast (std::uint64_t lastId) |
Sets the last ID processed by the thread. More... | |
void | incrementLast () |
Increments the last ID processed by the thread. More... | |
void | incrementProcessed () |
Increments the number of IDs processed by the thread. More... | |
Protected Thread Control | |
void | sleep (std::uint64_t ms) const |
Lets the thread sleep for the specified number of milliseconds. More... | |
void | allowPausing () |
Allows the thread to be paused. More... | |
void | disallowPausing () |
Disallows the thread to be paused. More... | |
void | pauseByThread () |
Forces the thread to pause. More... | |
Logging | |
bool | isLogLevel (std::uint8_t level) const |
Checks whether a certain logging level is enabled. More... | |
void | log (std::uint8_t level, const std::string &logEntry) |
Adds a thread-specific log entry to the database, if the current logging level is high enough. More... | |
void | log (std::uint8_t level, std::queue< std::string > &logEntries) |
Adds multiple thread-specific log entries to the database, if the current logging level is high enough. More... | |
Public Getter | |
bool | isQueryUsed (std::uint64_t queryId) const |
Checks whether the specified query is used by the container. More... | |
Setters | |
void | setRepairCData (bool isRepairCData) |
Sets whether to try to repair CData when parsing XML. More... | |
void | setRepairComments (bool isRepairComments) |
Sets whether to try to repair broken HTML/XML comments. More... | |
void | setRemoveXmlInstructions (bool isRemoveXmlInstructions) |
Sets whether to remove XML processing instructions (< ?xml:...>) before parsing HTML/XML content. More... | |
void | setMinimizeMemory (bool isMinimizeMemory) |
Sets whether to minimize memory usage. More... | |
void | setTidyErrorsAndWarnings (bool warnings, std::uint32_t numOfErrors) |
Sets how tidy-html5 reports errors and warnings. More... | |
void | setQueryTarget (const std::string &content, const std::string &source) |
Sets the content to use the managed queries on. More... | |
Getters | |
std::size_t | getNumberOfSubSets () const |
Gets the number of subsets currently acquired. More... | |
bool | getTarget (std::string &targetTo) |
Gets the current query target, if available, and writes it to the given string. More... | |
bool | getXml (std::string &resultTo, std::queue< std::string > &warningsTo) |
Parses the current query target as tidied XML and writes it to the given string. More... | |
Queries | |
QueryStruct | addQuery (std::uint64_t id, const QueryProperties &properties) |
Adds a query with the given query properties to the container. More... | |
void | clearQueries () |
Clears all queries currently managed by the container and frees the associated memory. More... | |
void | clearQueryTarget () |
Clears the current query target and frees the associated memory. More... | |
Subsets | |
bool | nextSubSet () |
Requests the next subset for all subsequent queries. More... | |
Results | |
bool | getBoolFromRegEx (const QueryStruct &query, const std::string &target, bool &resultTo, std::queue< std::string > &warningsTo) const |
Gets a boolean result from a RegEx query on a separate string. More... | |
bool | getSingleFromRegEx (const QueryStruct &query, const std::string &target, std::string &resultTo, std::queue< std::string > &warningsTo) const |
Gets a single result from a RegEx query on a separate string. More... | |
bool | getMultiFromRegEx (const QueryStruct &query, const std::string &target, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo) const |
Gets multiple results from a RegEx query on a separate string. More... | |
bool | getBoolFromQuery (const QueryStruct &query, bool &resultTo, std::queue< std::string > &warningsTo) |
Gets a boolean result from a query of any type on the current query target. More... | |
bool | getBoolFromQueryOnSubSet (const QueryStruct &query, bool &resultTo, std::queue< std::string > &warningsTo) |
Gets a boolean result from a query of any type on the current subset. More... | |
bool | getSingleFromQuery (const QueryStruct &query, std::string &resultTo, std::queue< std::string > &warningsTo) |
Gets a single result from a query of any type on the current query target. More... | |
bool | getSingleFromQueryOnSubSet (const QueryStruct &query, std::string &resultTo, std::queue< std::string > &warningsTo) |
Gets a single result from a query of any type on the current subset. More... | |
bool | getMultiFromQuery (const QueryStruct &query, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo) |
Gets multiple results from a query of any type on the current query target. More... | |
bool | getMultiFromQueryOnSubSet (const QueryStruct &query, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo) |
Gets multiple results from a query of any type on the current subset. More... | |
bool | setSubSetsFromQuery (const QueryStruct &query, std::queue< std::string > &warningsTo) |
Sets subsets for subsequent queries using a query of any type. More... | |
bool | addSubSetsFromQueryOnSubSet (const QueryStruct &query, std::queue< std::string > &warningsTo) |
Inserts more subsets after the current one based on a query on the current subset. More... | |
Memory | |
void | reserveForSubSets (const QueryStruct &query, std::size_t n) |
Reserves memory for a specific number of subsets. More... | |
Counts all tokens in a corpus.
Tokens will be counted by date and/or article, if possible.
|
protectedinherited |
|
protectedinherited |
crawlservpp::Module::Analyzer::Algo::AllTokens::AllTokens | ( | Main::Database & | dbBase, |
const ThreadOptions & | threadOptions, | ||
const ThreadStatus & | threadStatus | ||
) |
Continues a previously interrupted algorithm run.
dbBase | Reference to the main database connection. |
threadOptions | Constant reference to a structure containing the options for the thread. |
threadStatus | Constant reference to a structure containing the last known status of the thread. |
References crawlservpp::Module::Thread::disallowPausing().
crawlservpp::Module::Analyzer::Algo::AllTokens::AllTokens | ( | Main::Database & | dbBase, |
const ThreadOptions & | threadOptions | ||
) |
Starts a new algorithm run.
dbBase | Reference to the main database connection. |
threadOptions | Constant reference to a structure containing the options for the thread. |
References crawlservpp::Module::Thread::disallowPausing().
|
protectedinherited |
Gets the contents of all corpora, filters and combines them if necessary.
isCombine | Specifies whether to combine multiple corpora, if applicable. |
statusSetter | Reference to a Struct::StatusSetter to be used for updating the status while adding the corpus. |
References crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::Thread::corpora, crawlservpp::Module::Analyzer::Config::Entries::generalInputSources, crawlservpp::Module::Analyzer::generalLoggingDefault, and crawlservpp::Module::Thread::log().
Referenced by crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), and crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().
|
protectedinherited |
Adds an optional query.
Does nothing, if the query ID is zero.
queryId | ID of the query, as specified in the configuration. |
propertiesTo | Reference to a structure to which the properties of the query will be written. |
References crawlservpp::Query::Container::addQuery(), crawlservpp::Module::Analyzer::Thread::database, and crawlservpp::Wrapper::Database::getQueryProperties().
Referenced by crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo().
|
protectedinherited |
Adds multiple queries at once, ignoring empty ones.
Ignores query IDs that are zero.
queryIds | IDs of the queries, as specified in the configuration. |
propertiesTo | Reference to a vector to which the properties of the queries will be written. |
References crawlservpp::Query::Container::addQuery(), crawlservpp::Module::Analyzer::Thread::database, and crawlservpp::Wrapper::Database::getQueryProperties().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().
|
inlineprotectedinherited |
Adds a query with the given query properties to the container.
id | The ID of the query. It will be saved in a thread-safe way and only be used by Container::isQueryUsed. |
properties | Constant reference to the properties of the query to add to the container. |
Container::Exception | if an error occured while creating a query with the given properties or the specified type of the query is unknown. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryProperties::resultBool, crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryProperties::resultMulti, crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryProperties::resultSingle, crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryProperties::resultSubSets, crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryProperties::text, crawlservpp::Struct::QueryProperties::textOnly, crawlservpp::Struct::QueryProperties::type, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Analyzer::Thread::addOptionalQuery(), crawlservpp::Module::Analyzer::Thread::addQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Inserts more subsets after the current one based on a query on the current subset.
This function is used for recursive extracting.
query | A constant reference to a structure identifying the query that will be performed to acquire the subset. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
Container::Exception | if no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
protectedinherited |
Allows the thread to be paused.
Threads are pausable by default. Use this function if pausing has been disallowed via disallowPausing().
Thread-safe: Can be used by both the module and the main thread.
|
inlineprotectedinherited |
Sets the category of the subsequent configuration items to be checked for.
category | Constant reference to a string containing the name of the category. |
References crawlservpp::Struct::ConfigItem::category.
Referenced by crawlservpp::Module::Analyzer::Algo::TermsOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::Assoc::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::parseAlgoOption(), parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::TopicModelling::parseAlgoOption(), crawlservpp::Network::Config::parseBasicOption(), crawlservpp::Module::Parser::Config::parseOption(), crawlservpp::Module::Analyzer::Config::parseOption(), crawlservpp::Module::Crawler::Config::parseOption(), crawlservpp::Module::Extractor::Config::parseOption(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().
|
overridevirtual |
Checks the configuration options for the algorithm.
Analyzer::Thread::Exception | if no token count table has been specified. |
Implements crawlservpp::Module::Analyzer::Config.
|
protectedinherited |
Checks the specified sources for creating the corpus.
statusSetter | Reference to a Struct::StatusSetter to be used for updating the status before checking the sources. |
References crawlservpp::Struct::StatusSetter::change(), crawlservpp::Module::Analyzer::Database::checkSources(), crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::Thread::database, and crawlservpp::Module::Analyzer::Config::Entries::generalInputSources.
Referenced by crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), and crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().
|
inlineoverrideprotectedvirtualinherited |
Checks the analyzer-specific configuration options.
Implements crawlservpp::Module::Config.
References crawlservpp::Module::Analyzer::Config::checkAlgoOptions(), crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::defaultPercentageCorpusSlices, crawlservpp::Module::Analyzer::Config::Entries::generalCorpusSlicing, crawlservpp::Module::Analyzer::Config::Entries::generalInputFields, crawlservpp::Module::Analyzer::Config::Entries::generalInputSources, crawlservpp::Module::Analyzer::Config::Entries::generalInputTables, crawlservpp::Module::Analyzer::maxPercentageCorpusSlices, crawlservpp::Module::Analyzer::minPercentageCorpusSlices, crawlservpp::Module::Analyzer::Config::Entries::tokenizerDicts, crawlservpp::Module::Analyzer::Config::Entries::tokenizerLanguages, crawlservpp::Module::Analyzer::Config::Entries::tokenizerManipulators, crawlservpp::Module::Analyzer::Config::Entries::tokenizerModels, and crawlservpp::Module::Config::warning().
|
protectedinherited |
Clean up all corpora and free their memory.
References crawlservpp::Module::Analyzer::Thread::corpora, and crawlservpp::Helper::Memory::free().
Referenced by crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Analyzer::Thread::onClear(), and crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().
|
protectedinherited |
Clean up all queries and free their memory.
References crawlservpp::Module::Analyzer::Thread::addQueries(), crawlservpp::Struct::StatusSetter::change(), crawlservpp::Query::Container::clearQueries(), crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::Thread::corpora, crawlservpp::Module::Analyzer::Thread::database, crawlservpp::Module::Analyzer::Thread::deleteQueries(), crawlservpp::Module::Analyzer::Config::Entries::filterDateEnable, crawlservpp::Module::Analyzer::Config::Entries::filterDateTo, crawlservpp::Module::Analyzer::Config::Entries::filterQueryAll, crawlservpp::Module::Analyzer::Config::Entries::filterQueryQueries, crawlservpp::Helper::Memory::free(), crawlservpp::Module::Analyzer::Config::Entries::generalCorpusChecks, crawlservpp::Module::Analyzer::Config::Entries::generalCorpusSlicing, crawlservpp::Module::Analyzer::Config::Entries::generalInputFields, crawlservpp::Module::Analyzer::Config::Entries::generalInputSources, crawlservpp::Module::Analyzer::Config::Entries::generalInputTables, crawlservpp::Module::Analyzer::Config::Entries::generalLogging, crawlservpp::Module::Analyzer::generalLoggingDefault, crawlservpp::Module::Analyzer::generalLoggingVerbose, crawlservpp::Module::Analyzer::Config::Entries::generalSleepMySql, crawlservpp::Module::Analyzer::Config::Entries::generalTargetTable, crawlservpp::Query::Container::getBoolFromQuery(), crawlservpp::Wrapper::Database::getConfiguration(), crawlservpp::Module::Analyzer::Database::getCorpus(), crawlservpp::Module::Analyzer::Thread::getName(), crawlservpp::Module::Analyzer::Thread::initQueries(), crawlservpp::Module::Thread::isRunning(), crawlservpp::Struct::StatusSetter::isRunning(), crawlservpp::Module::Config::loadConfig(), crawlservpp::Helper::CommaLocale::locale(), crawlservpp::Module::Thread::log(), crawlservpp::Module::Analyzer::Thread::onAlgoInit(), crawlservpp::Module::Analyzer::Thread::onAlgoInitTarget(), crawlservpp::Module::Analyzer::Database::prepare(), crawlservpp::Module::Analyzer::Database::setCorpusSlicing(), crawlservpp::Module::Analyzer::Database::setIsRunningCallback(), crawlservpp::Wrapper::Database::setLogging(), crawlservpp::Query::Container::setQueryTarget(), crawlservpp::Wrapper::Database::setSleepOnError(), crawlservpp::Module::Thread::setStatusMessage(), crawlservpp::Module::Analyzer::Database::setTargetTable(), crawlservpp::Timer::Simple::tickStr(), crawlservpp::Module::Analyzer::Config::Entries::tokenizerDicts, crawlservpp::Module::Analyzer::Config::Entries::tokenizerFreeMemoryEvery, crawlservpp::Module::Analyzer::Config::Entries::tokenizerLanguages, crawlservpp::Module::Analyzer::Config::Entries::tokenizerManipulators, crawlservpp::Module::Analyzer::Config::Entries::tokenizerModels, and crawlservpp::Module::Analyzer::Config::Entries::tokenizerSavePoints.
Referenced by crawlservpp::Module::Analyzer::Thread::finished(), and crawlservpp::Module::Analyzer::Thread::onClear().
|
inlineprotectedinherited |
Clears all queries currently managed by the container and frees the associated memory.
References crawlservpp::Helper::Memory::free().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Extractor::Thread::onClear(), and crawlservpp::Module::Crawler::Thread::onClear().
|
inlineprotectedinherited |
Clears the current query target and frees the associated memory.
References crawlservpp::Parsing::XML::clear(), crawlservpp::Helper::Memory::free(), and crawlservpp::Helper::Json::free().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), and crawlservpp::Query::Container::setQueryTarget().
|
overrideprotectedvirtualinherited |
Does nothing.
To be overwritten by algorithms that use their own queries.
Implements crawlservpp::Query::Container.
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries().
|
protectedinherited |
Disallows the thread to be paused.
Thread-safe: Can be used by both the module and the main thread.
Referenced by AllTokens(), crawlservpp::Module::Analyzer::Algo::Assoc::Assoc(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::AssocOverTime(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::CorpusGenerator(), crawlservpp::Module::Analyzer::Algo::ExtractIds::ExtractIds(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::SentimentOverTime(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::TermsOverTime(), crawlservpp::Module::Analyzer::Algo::TopicModelling::TopicModelling(), and crawlservpp::Module::Analyzer::Algo::WordsOverTime::WordsOverTime().
|
inherited |
Waits for the thread until shutdown is completed.
References crawlservpp::Main::Database::deleteThread().
Referenced by crawlservpp::Module::Crawler::Thread::onReset(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().
|
protectedinherited |
Sets the status of the analyzer to finished.
Call this function when the algorithm has finished.
Uploads the indicated result, if FTP upload is enabled.
References crawlservpp::Module::Analyzer::Thread::cleanUpCorpora(), crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::generalLoggingDefault, crawlservpp::Module::Thread::log(), crawlservpp::Helper::DateTime::now(), crawlservpp::Module::Thread::setProgress(), crawlservpp::Module::Thread::setStatusMessage(), and crawlservpp::Module::Analyzer::Thread::uploadResult().
Referenced by crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoTick(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoTick(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoTick(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoTick(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoTick(), onAlgoTick(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoTick(), and crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().
|
inlineprotectedinherited |
Gets a boolean result from a query of any type on the current query target.
query | A constant reference to a structure identifying the query that will be performed. |
resultTo | A reference to a boolean variable which will be set according to the result of the query. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
Container::Exception | if no query target has been specified or the query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Gets a boolean result from a query of any type on the current subset.
query | A constant reference to a structure identifying the query that will be performed. |
resultTo | A reference to a boolean variable which will be set according to the result of the query. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
Container::Exception | if no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotectedinherited |
Gets a boolean result from a RegEx query on a separate string.
query | A constant reference to a structure identifying the RegEx query that will be performed. |
target | A constant reference to a string containing the target on which the query will be performed. |
resultTo | A reference to a boolean variable which will be set according to the result of the query. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().
|
inherited |
Gets the ID of the configuration used by the thread.
Thread-safe: Can be used by both the module and the main thread, because the configuration is not changed after starting the thread.
References crawlservpp::Struct::ThreadOptions::config.
Referenced by crawlservpp::Module::Thread::Thread().
|
inherited |
Gets the ID of the thread.
Thread-safe: Can be used by both the module and the main thread.
|
protectedinherited |
Gets the value of the last ID processed by the thread.
Referenced by crawlservpp::Module::Crawler::Thread::onInit(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().
|
inlineprotectedinherited |
Gets multiple results from a query of any type on the current query target.
query | A constant reference to a structure identifying the query that will be performed. |
resultTo | A reference to a vector to which the results of the query will be appended. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
Container::Exception | if no query target has been specified or the query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Gets multiple results from a query of any type on the current subset.
query | A constant reference to a structure identifying the query that will be performed. |
resultTo | A reference to a vector to which the results of the query will be appended. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
Container::Exception | if no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotectedinherited |
Gets multiple results from a RegEx query on a separate string.
query | A constant reference to a structure identifying the RegEx query that will be performed. |
target | A constant reference to a string containing the target on which the query will be performed. |
resultTo | A reference to a vector to which the results of the query will be appended. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Parser::Thread::onReset().
|
overridevirtual |
Returns the name of the algorithm.
Implements crawlservpp::Module::Analyzer::Thread.
|
inlineprotectedinherited |
Gets the number of subsets currently acquired.
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
protectedinherited |
Gets the current progress, in percent.
Thread-safe: Can be used by both the module and the main thread.
0.F
(none) and 1.F
(done). Referenced by crawlservpp::Module::Parser::Thread::onReset(), and crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotectedinherited |
Gets a single result from a query of any type on the current query target.
query | A constant reference to a structure identifying the query that will be performed. |
resultTo | A reference to a string to which the result of the query will be written. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
Container::Exception | if no query target has been specified or the query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Gets a single result from a query of any type on the current subset.
query | A constant reference to a structure identifying the query that will be performed. |
resultTo | A reference to a string to which the result of the query will be written. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
Container::Exception | if no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotectedinherited |
Gets a single result from a RegEx query on a separate string.
query | A constant reference to a structure identifying the RegEx query that will be performed. |
target | A constant reference to a string containing the target on which the query will be performed. |
resultTo | A reference to a string to which the result of the query will be written. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
protectedinherited |
Gets the current status message.
Thread-safe: Can be used by both the module and the main thread.
Referenced by crawlservpp::Module::Thread::log(), crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Extractor::Thread::onClear(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Gets the current query target, if available, and writes it to the given string.
targetTo | Reference to a string the query target will be written to, if one is available. Its content will not be changed if no query target is available. |
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
protectedinherited |
Gets the full name of the target table.
References crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::Config::Entries::generalTargetTable, crawlservpp::Module::Thread::urlListNamespace, and crawlservpp::Module::Thread::websiteNamespace.
Referenced by crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo(), and crawlservpp::Module::Analyzer::Thread::uploadResult().
|
inherited |
Gets the ID of the URL list used by the thread.
Thread-safe: Can be used by both the module and the main thread, because the URL list is not changed after starting the thread.
References crawlservpp::Struct::ThreadOptions::urlList.
Referenced by crawlservpp::Module::Thread::Thread().
|
protectedinherited |
Gets the number of IDs that have been jumped over, and resets them.
Resets the number of IDs jumped over to zero.
Referenced by crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().
|
inherited |
Gets the ID of the website used by the thread.
Thread-safe: Can be used by both the module and the main thread, because the website is not changed after starting the thread.
References crawlservpp::Struct::ThreadOptions::website.
Referenced by crawlservpp::Module::Crawler::Thread::onReset(), and crawlservpp::Module::Thread::Thread().
|
inlineprotectedinherited |
Parses the current query target as tidied XML and writes it to the given string.
resultTo | Reference to a string the parsed query target will be written to. |
warningsTo | Reference to a vector of strings to which warnings that occured during parsing will be appended. |
References crawlservpp::Parsing::XML::getContent().
Referenced by crawlservpp::Module::Crawler::Thread::onReset().
|
protectedinherited |
Increments the last ID processed by the thread.
Also sets the number of processed IDs, make sure to increment it before if the ID has been processed.
References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::setThreadLast().
|
protectedinherited |
Increments the number of IDs processed by the thread.
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
overrideprotectedvirtualinherited |
Does nothing.
To be overwritten by algorithms that use their own queries.
Implements crawlservpp::Query::Container.
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries().
|
inherited |
Checks whether the shutdown of the thread has been finished.
Thread-safe: Can be used by both the module and the main thread.
|
protectedinherited |
Checks whether the thread has been interrupted.
Thread-safe: Can be used by both the module and the main thread.
|
protectedinherited |
Checks whether a certain logging level is enabled.
level | The logging level to be checked for. |
References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::isLogLevel().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().
|
inherited |
Checks whether the thread has been paused.
Thread-safe: Can be used by both the module and the main thread.
|
inlineinherited |
Checks whether the specified query is used by the container.
Thread-safe. This function can be used by any thread.
queryId | ID of the query to be checked. |
|
inherited |
Checks whether the thread is still supposed to run.
Thread-safe: Can be used by both the module and the main thread.
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInit(), crawlservpp::Module::Analyzer::Thread::onInit(), crawlservpp::Module::Parser::Thread::onInit(), crawlservpp::Module::Extractor::Thread::onInit(), crawlservpp::Module::Crawler::Thread::onInit(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().
|
inherited |
Checks whether the thread is shutting down or has shut down.
Thread-safe: Can be used by both the module and the main thread.
|
inlineinherited |
Loads a configuration.
configJson | Constant reference to a string containing the configuration as JSON. |
warningsTo | Reference to a queue to which warnings will be added that occur during the parsing of the configuration, also known as the "logging queue". |
Module::Config::Exception | if the configuration JSON cannot be parsed. |
References crawlservpp::Struct::ConfigItem::category, crawlservpp::Module::Config::checkOptions(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::parseBasicOption(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::ConfigItem::str(), crawlservpp::Struct::ConfigItem::value, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
protectedinherited |
Adds a thread-specific log entry to the database, if the current logging level is high enough.
Removes invalid UTF-8 characters if necessary.
If debug logging is active, the entry will be written to the logging file as well.
The log entry will not be written to the database, if the current logging level is lower than the specified logging level. The logging level does not affect the writing of logging entries being to the logging file when debug logging is active.
level | The logging level for the entry. The entry will only be written to the database, if the current logging level is at least the logging level for the entry. |
logEntry | Constant reference to a string containing the log entry. |
References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::log().
Referenced by crawlservpp::Module::Analyzer::Thread::addCorpora(), crawlservpp::Module::Analyzer::Algo::TopicModelling::checkAlgoOptions(), crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Thread::log(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInit(), onAlgoTick(), crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Extractor::Thread::onClear(), crawlservpp::Module::Crawler::Thread::onClear(), crawlservpp::Module::Analyzer::Thread::onReset(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Extractor::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo(), and crawlservpp::Module::Analyzer::Thread::uploadResult().
|
protectedinherited |
Adds multiple thread-specific log entries to the database, if the current logging level is high enough.
Removes invalid UTF-8 characters if necessary.
If debug logging is active, the entries will be written to the logging file as well.
The log entries will not be written to the database, if the current logging level is lower than the specified logging level. The logging level does not affect the writing of logging entries being to the logging file when debug logging is active.
level | The logging level for the entries. The entries will only be written to the database, if the current logging level is at least the logging level for the entry. |
logEntries | Reference to a queue of strings containing the log entries to be written. It will be emptied regardless whether the log entries will be written to the database. |
References crawlservpp::Main::Database::connect(), crawlservpp::Module::Thread::database, crawlservpp::Module::Thread::getStatusMessage(), crawlservpp::Main::Database::getThreadPauseTime(), crawlservpp::Main::Database::getThreadRunTime(), crawlservpp::Module::Database::log(), crawlservpp::Module::Thread::log(), crawlservpp::Helper::DateTime::now(), crawlservpp::Module::Thread::onClear(), crawlservpp::Module::Thread::onInit(), crawlservpp::Module::Thread::onPause(), crawlservpp::Module::Thread::onReset(), crawlservpp::Module::Thread::onTick(), crawlservpp::Module::Thread::onUnpause(), crawlservpp::Module::Thread::pause(), crawlservpp::Module::Thread::pauseByThread(), crawlservpp::Module::Database::prepare(), crawlservpp::Helper::DateTime::secondsToString(), crawlservpp::Module::Thread::setLast(), crawlservpp::Module::Thread::setStatusMessage(), crawlservpp::Main::Database::setThreadPauseTime(), crawlservpp::Main::Database::setThreadRunTime(), and crawlservpp::Module::sleepOnConnectionErrorS.
|
inlineprotectedinherited |
Requests the next subset for all subsequent queries.
Container::Exception | if an invalid subset had previously been selected. |
References crawlservpp::Helper::Memory::free(), crawlservpp::Helper::Json::free(), crawlservpp::Helper::Memory::freeIf(), crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, and crawlservpp::Struct::QueryStruct::typeXPath.
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
overridevirtual |
Does nothing.
Implements crawlservpp::Module::Analyzer::Thread.
|
overridevirtual |
Initializes the algorithm and processes its input.
AllTokens::Exception | if the corpus is empty. |
Implements crawlservpp::Module::Analyzer::Thread.
References crawlservpp::Module::Analyzer::Thread::addCorpora(), crawlservpp::Module::Analyzer::Thread::checkCorpusSources(), crawlservpp::Module::Analyzer::Thread::corpora, crawlservpp::Module::Analyzer::generalLoggingDefault, crawlservpp::Module::Analyzer::generalLoggingExtended, crawlservpp::Module::Analyzer::generalLoggingVerbose, crawlservpp::Module::Thread::isRunning(), crawlservpp::Module::Thread::log(), crawlservpp::Module::Thread::setProgress(), and crawlservpp::Module::Thread::setStatusMessage().
|
overridevirtual |
Initializes the target table for the algorithm.
Implements crawlservpp::Module::Analyzer::Thread.
References crawlservpp::Module::Analyzer::Thread::database, crawlservpp::Module::Analyzer::Database::initTargetTable(), and crawlservpp::Module::Analyzer::Database::setTargetFields().
|
overridevirtual |
Does nothing.
Implements crawlservpp::Module::Analyzer::Thread.
|
overridevirtual |
Counts tokens in the current date, article, or token.
If a date map exists, each tick the tokens for the current date are counted. If no date map, but an article map exists, each tick the tokens for the current article are counted. If neither a date nor an article map exists, one token is counted each tick.
Implements crawlservpp::Module::Analyzer::Thread.
References crawlservpp::Module::Analyzer::Algo::allTokensUpdateEveryArticle, crawlservpp::Module::Analyzer::Algo::allTokensUpdateEveryDate, crawlservpp::Module::Analyzer::Algo::allTokensUpdateEveryToken, crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Analyzer::generalLoggingDefault, crawlservpp::Module::Thread::log(), and crawlservpp::Module::Thread::setStatusMessage().
|
overridevirtual |
Unpauses the algorithm.
Needs to be implemented by the (child) class for the specific algorithm.
Implements crawlservpp::Module::Analyzer::Thread.
|
overrideprotectedvirtualinherited |
Clears the algorithm.
Implements crawlservpp::Module::Thread.
References crawlservpp::Module::Analyzer::Thread::cleanUpCorpora(), crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), and crawlservpp::Module::Analyzer::Thread::onAlgoClear().
Referenced by crawlservpp::Module::Analyzer::Thread::onReset().
|
overrideprotectedvirtualinherited |
Initializes the analyzer, the target table, and the algorithm.
Implements crawlservpp::Module::Thread.
References crawlservpp::Module::Thread::isRunning().
Referenced by crawlservpp::Module::Analyzer::Thread::onReset().
|
overrideprotectedvirtualinherited |
Pauses the analyzer.
Implements crawlservpp::Module::Thread.
References crawlservpp::Module::Analyzer::Thread::onAlgoPause().
|
overrideprotectedvirtualinherited |
Resets the algorithm.
Implements crawlservpp::Module::Thread.
References crawlservpp::Module::Analyzer::generalLoggingDefault, crawlservpp::Module::Thread::log(), crawlservpp::Module::Analyzer::Thread::onClear(), crawlservpp::Module::Analyzer::Thread::onInit(), crawlservpp::Module::Analyzer::Config::resetAlgo(), and crawlservpp::Module::Config::resetBase().
Referenced by crawlservpp::Module::Analyzer::Thread::onTick().
|
overrideprotectedvirtualinherited |
Performs an algorithm tick.
Implements crawlservpp::Module::Thread.
References crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::Config::Entries::generalRestartAfter, crawlservpp::Module::Analyzer::Config::Entries::generalSleepWhenFinished, crawlservpp::Helper::DateTime::now(), crawlservpp::Module::Analyzer::Thread::onAlgoTick(), crawlservpp::Module::Analyzer::Thread::onReset(), crawlservpp::Module::Thread::setProgress(), and crawlservpp::Module::Thread::sleep().
|
overrideprotectedvirtualinherited |
Unpauses the analyzer.
Implements crawlservpp::Module::Thread.
References crawlservpp::Module::Analyzer::Thread::onAlgoUnpause().
|
inlineprotectedinherited |
Checks for a configuration option of type bool
.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a boolean variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
Referenced by crawlservpp::Module::Analyzer::Algo::Assoc::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::parseAlgoOption(), parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::TopicModelling::parseAlgoOption(), crawlservpp::Network::Config::parseBasicOption(), crawlservpp::Module::Parser::Config::parseOption(), crawlservpp::Module::Analyzer::Config::parseOption(), crawlservpp::Module::Crawler::Config::parseOption(), and crawlservpp::Module::Extractor::Config::parseOption().
|
inlineprotectedinherited |
Checks for a configuration option of type array of bool's
.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector of bool's into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type char
.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable of the type char into which the value of the configuration entry will be written if it is encountered. |
opt | Parsing options used for the configuration option. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Module::Config::FromNumber, crawlservpp::Module::Config::FromString, crawlservpp::Helper::Strings::getFirstOrEscapeChar(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of char's
.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector of char's into which the value of the configuration entry will be written if it is encountered. |
opt | Parsing options used for the configuration option. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Module::Config::FromNumber, crawlservpp::Module::Config::FromString, crawlservpp::Helper::Strings::getFirstOrEscapeChar(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type 16-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of 16-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type 32-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of 32-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type 64-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of 64-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 8-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 8-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 16-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 16-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 32-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 32-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 64-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 64-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type floating-point number.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of floating-point numbers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type string.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a string into which the value of the configuration entry will be stored if it is encountered. |
opt | Parsing option for the configuration entry. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Helper::Strings::checkSQLName(), crawlservpp::Module::Config::Default, crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::SQL, crawlservpp::Struct::ConfigItem::str(), crawlservpp::Module::Config::SubURL, crawlservpp::Helper::Strings::trim(), crawlservpp::Module::Config::Trim, crawlservpp::Module::Config::URL, and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of strings.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
opt | Parsing option for the configuration entry. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Helper::Strings::checkSQLName(), crawlservpp::Module::Config::Default, crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::SQL, crawlservpp::Struct::ConfigItem::str(), crawlservpp::Module::Config::SubURL, crawlservpp::Helper::Strings::trim(), crawlservpp::Module::Config::Trim, crawlservpp::Module::Config::URL, and crawlservpp::Struct::ConfigItem::value.
|
overridevirtual |
Parses a configuration option for the algorithm.
Implements crawlservpp::Module::Analyzer::Config.
References crawlservpp::Module::Config::category(), and crawlservpp::Module::Config::option().
|
inlineprotectedvirtualinherited |
Parses a basic option.
Might be overridden by child classes.
Can be used by abstract classes to add additional configuration entries without being the final implementation, as in Network::Config.
Reimplemented in crawlservpp::Network::Config.
References crawlservpp::Module::Config::parseOption().
Referenced by crawlservpp::Module::Config::loadConfig().
|
inlineoverrideprotectedvirtualinherited |
Parses an analyzer-specific configuration option.
Implements crawlservpp::Module::Config.
References crawlservpp::Module::Config::category(), crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::Config::Entries::filterDateEnable, crawlservpp::Module::Analyzer::Config::Entries::filterDateFrom, crawlservpp::Module::Analyzer::Config::Entries::filterDateTo, crawlservpp::Module::Analyzer::Config::Entries::filterQueryAll, crawlservpp::Module::Analyzer::Config::Entries::filterQueryQueries, crawlservpp::Module::Analyzer::Config::Entries::generalCorpusChecks, crawlservpp::Module::Analyzer::Config::Entries::generalCorpusSlicing, crawlservpp::Module::Analyzer::Config::Entries::generalInputFields, crawlservpp::Module::Analyzer::Config::Entries::generalInputSources, crawlservpp::Module::Analyzer::Config::Entries::generalInputTables, crawlservpp::Module::Analyzer::Config::Entries::generalLogging, crawlservpp::Module::Analyzer::Config::Entries::generalRestartAfter, crawlservpp::Module::Analyzer::Config::Entries::generalSleepMySql, crawlservpp::Module::Analyzer::Config::Entries::generalSleepWhenFinished, crawlservpp::Module::Analyzer::Config::Entries::generalTargetTable, crawlservpp::Module::Analyzer::Config::Entries::groupDateFillGaps, crawlservpp::Module::Analyzer::Config::Entries::groupDateResolution, crawlservpp::Module::Config::option(), crawlservpp::Module::Analyzer::Config::parseAlgoOption(), crawlservpp::Module::Analyzer::Config::Entries::tokenizerDicts, crawlservpp::Module::Analyzer::Config::Entries::tokenizerFreeMemoryEvery, crawlservpp::Module::Analyzer::Config::Entries::tokenizerLanguages, crawlservpp::Module::Analyzer::Config::Entries::tokenizerManipulators, crawlservpp::Module::Analyzer::Config::Entries::tokenizerModels, crawlservpp::Module::Analyzer::Config::Entries::tokenizerSavePoints, crawlservpp::Module::Analyzer::Config::Entries::uploadFTP, crawlservpp::Module::Analyzer::Config::Entries::uploadProxy, crawlservpp::Module::Analyzer::Config::Entries::uploadTargetColumn, and crawlservpp::Module::Analyzer::Config::Entries::uploadVerbose.
|
protectedinherited |
Pauses the thread.
Shadows Module::Thread::pause(), which should not be used by the thread.
References crawlservpp::Module::Thread::pauseByThread().
|
protectedinherited |
Forces the thread to pause.
References crawlservpp::Module::Thread::database, and crawlservpp::Main::Database::setThreadStatus().
Referenced by crawlservpp::Module::Thread::log(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), and crawlservpp::Module::Analyzer::Thread::pause().
|
inlineprotectedinherited |
Reserves memory for a specific number of subsets.
query | A constant reference to a structure identifying the query for whose type memory will be specifically reserved. |
n | The number of subsets for which memory will be reserved. |
References crawlservpp::Parsing::XML::clear(), crawlservpp::Helper::Memory::free(), crawlservpp::Helper::Json::free(), crawlservpp::Helper::Container::moveInto(), crawlservpp::Parsing::XML::parse(), crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Helper::Json::stringify(), crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inherited |
Will reset the thread before the next tick.
|
inlineoverrideprotectedvirtualinherited |
Resets the analyzer-specific configuration options.
Implements crawlservpp::Module::Config.
References crawlservpp::Module::Analyzer::Config::config, and crawlservpp::Module::Analyzer::Config::resetAlgo().
|
overridevirtual |
Resets the algorithm.
Implements crawlservpp::Module::Analyzer::Config.
References crawlservpp::Data::_string, crawlservpp::Data::_uint64, crawlservpp::Module::Analyzer::Database::addAdditionalTable(), crawlservpp::Module::Analyzer::Algo::allTokensColumns, crawlservpp::Module::Analyzer::Algo::allTokensUpdateEveryRow, crawlservpp::Data::InsertFieldsMixed::columns_types_values, crawlservpp::Module::Analyzer::Thread::corpora, crawlservpp::Module::Analyzer::Thread::database, crawlservpp::Struct::TextMapEntry::end(), crawlservpp::Helper::Memory::free(), crawlservpp::Module::Analyzer::generalLoggingDefault, crawlservpp::Module::Analyzer::Database::getAdditionalTableName(), crawlservpp::Module::Analyzer::Thread::getTargetTableName(), crawlservpp::Wrapper::Database::insertCustomData(), crawlservpp::Module::Thread::isRunning(), crawlservpp::Module::Thread::log(), crawlservpp::Struct::TextMapEntry::pos(), crawlservpp::Module::Thread::setProgress(), crawlservpp::Module::Thread::setStatusMessage(), crawlservpp::Data::InsertFieldsMixed::table, crawlservpp::Timer::Simple::tickStr(), crawlservpp::Module::Analyzer::Database::updateTargetTable(), and crawlservpp::Struct::TextMapEntry::value.
|
inlineprotectedvirtualinherited |
Resets basic options.
Might be overridden by child classes.
Can be used by abstract classes to reset additional configuration entries without being the final implementation, as in Network::Config.
Reimplemented in crawlservpp::Network::Config.
References crawlservpp::Module::protocols, and crawlservpp::Module::Config::reset().
Referenced by crawlservpp::Module::Analyzer::Thread::onReset(), and crawlservpp::Module::Parser::Thread::onReset().
|
protectedinherited |
Sets the last ID processed by the thread.
Also sets the number of processed IDs, make sure to increment it before if the ID has been processed.
lastId | The last ID processed by the thread. |
References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::setThreadLast().
Referenced by crawlservpp::Module::Thread::log(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Sets whether to minimize memory usage.
isMinimizeMemory | Set whether to minimize memory usage, prioritizing memory usage over performance. |
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
protectedinherited |
Sets the progress of the thread.
newProgress | The new progress of the thread, between 0.f (none), and 1.f (done). |
References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::setThreadProgress().
Referenced by crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInit(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Analyzer::Thread::onTick(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Extractor::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().
|
inlineprotectedinherited |
Sets the content to use the managed queries on.
The old query target referencing the old content will be cleared.
content | Constant reference to a string containing the content to use the managed queries on. |
source | Constant reference to a string containing the source (URL) of the content. It will be used for logging and error reporting purposes only. |
References crawlservpp::Query::Container::clearQueryTarget().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Sets whether to remove XML processing instructions (<
?xml:...>) before parsing HTML/XML content.
isRemoveXmlInstructions | Sets whether to remove XML processing instructions. |
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Sets whether to try to repair CData when parsing XML.
isRepairCData | Set whether to try to repair CData when parsing XML. |
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Sets whether to try to repair broken HTML/XML comments.
isRepairComments | Set whether to try to repair broken HTML/XML comments. |
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
protectedinherited |
Sets the status message of the thread.
statusMessage | Constant reference to a string containing the new status message to be set. |
References crawlservpp::Module::Thread::database, and crawlservpp::Main::Database::setThreadStatus().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Thread::log(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoTick(), onAlgoTick(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoTick(), crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Extractor::Thread::onClear(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Extractor::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo(), and crawlservpp::Module::Analyzer::Thread::uploadResult().
|
inlineprotectedinherited |
Sets subsets for subsequent queries using a query of any type.
The subsets resulting from the query will be saved in-class. Previous subsets will be overwritten.
query | A constant reference to a structure identifying the query that will be performed to acquire the subset. |
warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
Container::Exception | if no query target has been specified or the query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotectedinherited |
Sets how tidy-html5
reports errors and warnings.
The reporting of both errors and warnings is deactivated by default.
For more information about tidy-html5, see its GitHub repository.
warnings | Specify whether to report simple warnings. |
numOfErrors | Set the number of errors to be reported. Set to zero to deactivate error reporting. |
References crawlservpp::Parsing::XML::setOptions().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
protectedinherited |
Lets the thread sleep for the specified number of milliseconds.
The sleep will be interrupted if the thread is stopped.
Thread-safe: Can be used by both the module and the main thread.
ms | The number of milliseconds for the thread to sleep, if it is not stopped. |
References crawlservpp::Module::sleepMs.
Referenced by crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoTick(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Analyzer::Thread::onTick(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().
|
protectedinherited |
Upload the specified result via FTP.
References crawlservpp::Data::_string, crawlservpp::Data::GetColumn::column, crawlservpp::Data::GetColumnsMixed::columns_types, crawlservpp::Module::Analyzer::Config::config, crawlservpp::Module::Analyzer::Thread::database, crawlservpp::Module::Analyzer::generalLoggingDefault, crawlservpp::Wrapper::Database::getColumnType(), crawlservpp::Module::Analyzer::Database::getCorporaLastUpdated(), crawlservpp::Wrapper::Database::getCustomData(), crawlservpp::Module::Analyzer::Thread::getTargetTableName(), crawlservpp::Module::Analyzer::Database::getTargetTableUpdated(), crawlservpp::Wrapper::Database::isColumnExists(), crawlservpp::Module::Thread::log(), crawlservpp::Data::GetColumnsMixed::order, crawlservpp::Data::parseSQLType(), crawlservpp::Module::Thread::setStatusMessage(), crawlservpp::Helper::Json::stringify(), crawlservpp::Data::GetColumn::table, crawlservpp::Data::GetColumnsMixed::table, crawlservpp::Data::GetColumn::type, crawlservpp::Module::Analyzer::Config::Entries::uploadFTP, crawlservpp::Module::Analyzer::Config::Entries::uploadTargetColumn, crawlservpp::Data::GetColumn::values, crawlservpp::Data::GetColumnsMixed::values, and crawlservpp::Network::FTPUpload::write().
Referenced by crawlservpp::Module::Analyzer::Thread::finished().
|
inlineprotectedinherited |
Adds a warning to the logging queue.
warning | Constant reference to a string containing the warning. |
Module::Config::Exception | if no log queue is active. |
Referenced by crawlservpp::Module::Analyzer::Algo::Assoc::checkAlgoOptions(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::checkAlgoOptions(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::checkAlgoOptions(), crawlservpp::Module::Parser::Config::checkOptions(), crawlservpp::Module::Analyzer::Config::checkOptions(), crawlservpp::Module::Crawler::Config::checkOptions(), crawlservpp::Module::Extractor::Config::checkOptions(), and crawlservpp::Network::Config::parseBasicOption().
|
inherited |
Jumps to the specified target ID ("time travel").
Skips the normal process of determining the next ID once the current ID has been processed.
Thread-safe: Can be used by both the module and the main thread.
target | The target ID that should be processed next. |
Module::Thread::Exception | if no target is specified, i.e. the target ID is zero. |
|
inherited |
Configuration of the analyzer.
Referenced by crawlservpp::Module::Analyzer::Thread::addCorpora(), crawlservpp::Module::Analyzer::Thread::checkCorpusSources(), crawlservpp::Module::Analyzer::Config::checkOptions(), crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Thread::getTargetTableName(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Thread::onTick(), crawlservpp::Module::Analyzer::Config::parseOption(), crawlservpp::Module::Analyzer::Config::reset(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo(), and crawlservpp::Module::Analyzer::Thread::uploadResult().
|
protectedinherited |
JSON string of the configuration used by the thread.
Referenced by crawlservpp::Module::Thread::Thread().
|
protectedinherited |
Vector of corpora for the analyzer thread.
Referenced by crawlservpp::Module::Analyzer::Thread::addCorpora(), crawlservpp::Module::Analyzer::Thread::cleanUpCorpora(), crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoTick(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoTick(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoTick(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().
|
protectedinherited |
Database connection for the analyzer thread.
Referenced by crawlservpp::Module::Analyzer::Thread::addOptionalQuery(), crawlservpp::Module::Analyzer::Thread::addQueries(), crawlservpp::Module::Analyzer::Thread::checkCorpusSources(), crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInitTarget(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInitTarget(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInitTarget(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInitTarget(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInitTarget(), onAlgoInitTarget(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInitTarget(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInitTarget(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo(), and crawlservpp::Module::Analyzer::Thread::uploadResult().
|
protectedinherited |
Namespace of the URL list used by the thread.
Referenced by crawlservpp::Module::Analyzer::Thread::getTargetTableName(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), and crawlservpp::Module::Thread::Thread().
|
protectedinherited |
Namespace of the website used by the thread.
Referenced by crawlservpp::Module::Analyzer::Thread::getTargetTableName(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), and crawlservpp::Module::Thread::Thread().