crawlserv++  [under development]
Application for crawling and analyzing textual content of websites.
crawlservpp::Module::Crawler::Thread Class Referencefinal

Crawler thread. More...

#include <Thread.hpp>

Inheritance diagram for crawlservpp::Module::Crawler::Thread:
Collaboration diagram for crawlservpp::Module::Crawler::Thread:

Classes

class  Exception
 Class for crawler exceptions. More...
 

Configuration Loader

void loadConfig (const std::string &configJson, LogQueue &warningsTo)
 Loads a configuration. More...
 

Parsing Options

enum  StringParsingOption {
  Default = 0, SQL, SubURL, URL,
  Trim
}
 Options for parsing strings. More...
 
enum  CharParsingOption { FromNumber = 0, FromString }
 Options for parsing char's. More...
 

Configuration Parsing

void category (const std::string &category)
 Sets the category of the subsequent configuration items to be checked for. More...
 
void option (const std::string &name, bool &target)
 Checks for a configuration option of type bool. More...
 
void option (const std::string &name, std::vector< bool > &target)
 Checks for a configuration option of type array of bool's. More...
 
void option (const std::string &name, char &target, CharParsingOption opt)
 Checks for a configuration option of type char. More...
 
void option (const std::string &name, std::vector< char > &target, CharParsingOption opt)
 Checks for a configuration option of type array of char's. More...
 
void option (const std::string &name, std::int16_t &target)
 Checks for a configuration option of type 16-bit integer. More...
 
void option (const std::string &name, std::vector< std::int16_t > &target)
 Checks for a configuration option of type array of 16-bit integers. More...
 
void option (const std::string &name, std::int32_t &target)
 Checks for a configuration option of type 32-bit integer. More...
 
void option (const std::string &name, std::vector< std::int32_t > &target)
 Checks for a configuration option of type array of 32-bit integers. More...
 
void option (const std::string &name, std::int64_t &target)
 Checks for a configuration option of type 64-bit integer. More...
 
void option (const std::string &name, std::vector< std::int64_t > &target)
 Checks for a configuration option of type array of 64-bit integers. More...
 
void option (const std::string &name, std::uint8_t &target)
 Checks for a configuration option of type unsigned 8-bit integer. More...
 
void option (const std::string &name, std::vector< std::uint8_t > &target)
 Checks for a configuration option of type array of unsigned 8-bit integers. More...
 
void option (const std::string &name, std::uint16_t &target)
 Checks for a configuration option of type unsigned 16-bit integer. More...
 
void option (const std::string &name, std::vector< std::uint16_t > &target)
 Checks for a configuration option of type array of unsigned 16-bit integers. More...
 
void option (const std::string &name, std::uint32_t &target)
 Checks for a configuration option of type unsigned 32-bit integer. More...
 
void option (const std::string &name, std::vector< std::uint32_t > &target)
 Checks for a configuration option of type array of unsigned 32-bit integers. More...
 
void option (const std::string &name, std::uint64_t &target)
 Checks for a configuration option of type unsigned 64-bit integer. More...
 
void option (const std::string &name, std::vector< std::uint64_t > &target)
 Checks for a configuration option of type array of unsigned 64-bit integers. More...
 
void option (const std::string &name, float &target)
 Checks for a configuration option of type floating-point number. More...
 
void option (const std::string &name, std::vector< float > &target)
 Checks for a configuration option of type array of floating-point numbers. More...
 
void option (const std::string &name, std::string &target, StringParsingOption opt=Default)
 Checks for a configuration option of type string. More...
 
void option (const std::string &name, std::vector< std::string > &target, StringParsingOption opt=Default)
 Checks for a configuration option of type array of strings. More...
 
void warning (const std::string &warning)
 Adds a warning to the logging queue. More...
 

Setter

void setCrossDomain (bool isCrossDomain)
 Sets whether the corresponding website is cross-domain. More...
 

Configuration

struct crawlservpp::Module::Crawler::Config::Entries config
 Configuration of the crawler. More...
 

Crawler-Specific Configuration Parsing

void parseOption () override
 Parses an crawler-specific configuration option. More...
 
void checkOptions () override
 Checks the crawler-specific configuration options. More...
 
void reset () override
 Resets the crawler-specific configuration options. More...
 

Construction

 Thread (Main::Database &dbBase, std::string_view cookieDirectory, const ThreadOptions &threadOptions, const NetworkSettings &networkSettings, const ThreadStatus &threadStatus)
 Constructor initializing a previously interrupted crawler thread. More...
 
 Thread (Main::Database &dbBase, std::string_view cookieDirectory, const ThreadOptions &threadOptions, const NetworkSettings &networkSettings)
 Constructor initializing a new crawler thread. More...
 

Database Connection

Database database
 Database connection for the crawler thread. More...
 

Networking

const NetworkSettings networkOptions
 Network settings for the crawler thread. More...
 
Network::Curl networking
 Networking for the crawler thread. More...
 
Network::TorControl torControl
 TOR control for the crawler thread. More...
 

Implemented Thread Functions

void onInit () override
 Initializes the crawler. More...
 
void onTick () override
 Performs a crawler tick. More...
 
void onPause () override
 Pauses the crawler. More...
 
void onUnpause () override
 Unpauses the crawler. More...
 
void onClear () override
 Clears the crawler. More...
 
void onReset () override
 Resets the crawler. More...
 

Getters

std::uint64_t getId () const
 Gets the ID of the thread. More...
 
std::uint64_t getWebsite () const
 Gets the ID of the website used by the thread. More...
 
std::uint64_t getUrlList () const
 Gets the ID of the URL list used by the thread. More...
 
std::uint64_t getConfig () const
 Gets the ID of the configuration used by the thread. More...
 
bool isShutdown () const
 Checks whether the thread is shutting down or has shut down. More...
 
bool isRunning () const
 Checks whether the thread is still supposed to run. More...
 
bool isFinished () const
 Checks whether the shutdown of the thread has been finished. More...
 
bool isPaused () const
 Checks whether the thread has been paused. More...
 

Thread Control

void end ()
 Waits for the thread until shutdown is completed. More...
 
void reset ()
 Will reset the thread before the next tick. More...
 

Time Travel

void warpTo (std::uint64_t target)
 Jumps to the specified target ID ("time travel"). More...
 

Configuration

std::string websiteNamespace
 Namespace of the website used by the thread. More...
 
std::string urlListNamespace
 Namespace of the URL list used by the thread. More...
 
std::string configuration
 JSON string of the configuration used by the thread. More...
 

Protected Getters

bool isInterrupted () const
 Checks whether the thread has been interrupted. More...
 
std::string getStatusMessage () const
 Gets the current status message. More...
 
float getProgress () const
 Gets the current progress, in percent. More...
 
std::uint64_t getLast () const
 Gets the value of the last ID processed by the thread. More...
 
std::int64_t getWarpedOverAndReset ()
 Gets the number of IDs that have been jumped over, and resets them. More...
 

Protected Setters

void setStatusMessage (const std::string &statusMessage)
 Sets the status message of the thread. More...
 
void setProgress (float newProgress)
 Sets the progress of the thread. More...
 
void setLast (std::uint64_t lastId)
 Sets the last ID processed by the thread. More...
 
void incrementLast ()
 Increments the last ID processed by the thread. More...
 
void incrementProcessed ()
 Increments the number of IDs processed by the thread. More...
 

Protected Thread Control

void sleep (std::uint64_t ms) const
 Lets the thread sleep for the specified number of milliseconds. More...
 
void allowPausing ()
 Allows the thread to be paused. More...
 
void disallowPausing ()
 Disallows the thread to be paused. More...
 
void pauseByThread ()
 Forces the thread to pause. More...
 

Logging

bool isLogLevel (std::uint8_t level) const
 Checks whether a certain logging level is enabled. More...
 
void log (std::uint8_t level, const std::string &logEntry)
 Adds a thread-specific log entry to the database, if the current logging level is high enough. More...
 
void log (std::uint8_t level, std::queue< std::string > &logEntries)
 Adds multiple thread-specific log entries to the database, if the current logging level is high enough. More...
 

Configuration

struct crawlservpp::Network::Config::Entries networkConfig
 Configuration for networking. More...
 

Parsing (Network Configuration)

void parseBasicOption () override
 Parses basic network configuration options. More...
 
void resetBase () override
 Resets basic network configuration options. More...
 

Helper (Network Configuration)

const std::string & getProtocol () const
 Gets the protocol to be used for networking. More...
 

Public Getter

bool isQueryUsed (std::uint64_t queryId) const
 Checks whether the specified query is used by the container. More...
 

Setters

void setRepairCData (bool isRepairCData)
 Sets whether to try to repair CData when parsing XML. More...
 
void setRepairComments (bool isRepairComments)
 Sets whether to try to repair broken HTML/XML comments. More...
 
void setRemoveXmlInstructions (bool isRemoveXmlInstructions)
 Sets whether to remove XML processing instructions (<?xml:...>) before parsing HTML/XML content. More...
 
void setMinimizeMemory (bool isMinimizeMemory)
 Sets whether to minimize memory usage. More...
 
void setTidyErrorsAndWarnings (bool warnings, std::uint32_t numOfErrors)
 Sets how tidy-html5 reports errors and warnings. More...
 
void setQueryTarget (const std::string &content, const std::string &source)
 Sets the content to use the managed queries on. More...
 

Getters

std::size_t getNumberOfSubSets () const
 Gets the number of subsets currently acquired. More...
 
bool getTarget (std::string &targetTo)
 Gets the current query target, if available, and writes it to the given string. More...
 
bool getXml (std::string &resultTo, std::queue< std::string > &warningsTo)
 Parses the current query target as tidied XML and writes it to the given string. More...
 

Queries

QueryStruct addQuery (std::uint64_t id, const QueryProperties &properties)
 Adds a query with the given query properties to the container. More...
 
void clearQueries ()
 Clears all queries currently managed by the container and frees the associated memory. More...
 
void clearQueryTarget ()
 Clears the current query target and frees the associated memory. More...
 

Subsets

bool nextSubSet ()
 Requests the next subset for all subsequent queries. More...
 

Results

bool getBoolFromRegEx (const QueryStruct &query, const std::string &target, bool &resultTo, std::queue< std::string > &warningsTo) const
 Gets a boolean result from a RegEx query on a separate string. More...
 
bool getSingleFromRegEx (const QueryStruct &query, const std::string &target, std::string &resultTo, std::queue< std::string > &warningsTo) const
 Gets a single result from a RegEx query on a separate string. More...
 
bool getMultiFromRegEx (const QueryStruct &query, const std::string &target, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo) const
 Gets multiple results from a RegEx query on a separate string. More...
 
bool getBoolFromQuery (const QueryStruct &query, bool &resultTo, std::queue< std::string > &warningsTo)
 Gets a boolean result from a query of any type on the current query target. More...
 
bool getBoolFromQueryOnSubSet (const QueryStruct &query, bool &resultTo, std::queue< std::string > &warningsTo)
 Gets a boolean result from a query of any type on the current subset. More...
 
bool getSingleFromQuery (const QueryStruct &query, std::string &resultTo, std::queue< std::string > &warningsTo)
 Gets a single result from a query of any type on the current query target. More...
 
bool getSingleFromQueryOnSubSet (const QueryStruct &query, std::string &resultTo, std::queue< std::string > &warningsTo)
 Gets a single result from a query of any type on the current subset. More...
 
bool getMultiFromQuery (const QueryStruct &query, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo)
 Gets multiple results from a query of any type on the current query target. More...
 
bool getMultiFromQueryOnSubSet (const QueryStruct &query, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo)
 Gets multiple results from a query of any type on the current subset. More...
 
bool setSubSetsFromQuery (const QueryStruct &query, std::queue< std::string > &warningsTo)
 Sets subsets for subsequent queries using a query of any type. More...
 
bool addSubSetsFromQueryOnSubSet (const QueryStruct &query, std::queue< std::string > &warningsTo)
 Inserts more subsets after the current one based on a query on the current subset. More...
 

Memory

void reserveForSubSets (const QueryStruct &query, std::size_t n)
 Reserves memory for a specific number of subsets. More...
 

Detailed Description

Crawler thread.

Constructor & Destructor Documentation

◆ Thread() [1/2]

crawlservpp::Module::Crawler::Thread::Thread ( Main::Database dbBase,
std::string_view  cookieDirectory,
const ThreadOptions threadOptions,
const NetworkSettings networkSettings,
const ThreadStatus threadStatus 
)

Constructor initializing a previously interrupted crawler thread.

Parameters
dbBaseReference to the main database connection.
cookieDirectoryView of a string containing the (sub-)directory for storing cookie files.
threadOptionsConstant reference to a structure containing the options for the thread.
networkSettingsNetwork settings.
threadStatusConstant reference to a structure containing the last known status of the thread.

◆ Thread() [2/2]

crawlservpp::Module::Crawler::Thread::Thread ( Main::Database dbBase,
std::string_view  cookieDirectory,
const ThreadOptions threadOptions,
const NetworkSettings networkSettings 
)

Constructor initializing a new crawler thread.

Member Function Documentation

◆ addQuery()

Struct::QueryStruct crawlservpp::Query::Container::addQuery ( std::uint64_t  id,
const QueryProperties properties 
)
inlineprotectedinherited

Adds a query with the given query properties to the container.

Parameters
idThe ID of the query. It will be saved in a thread-safe way and only be used by Container::isQueryUsed.
propertiesConstant reference to the properties of the query to add to the container.
Returns
A structure to be used to identify the added query, including the index of the query inside the container.
Exceptions
Container::Exceptionif an error occured while creating a query with the given properties or the specified type of the query is unknown.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryProperties::resultBool, crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryProperties::resultMulti, crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryProperties::resultSingle, crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryProperties::resultSubSets, crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryProperties::text, crawlservpp::Struct::QueryProperties::textOnly, crawlservpp::Struct::QueryProperties::type, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Analyzer::Thread::addOptionalQuery(), crawlservpp::Module::Analyzer::Thread::addQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ addSubSetsFromQueryOnSubSet()

bool crawlservpp::Query::Container::addSubSetsFromQueryOnSubSet ( const QueryStruct query,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Inserts more subsets after the current one based on a query on the current subset.

This function is used for recursive extracting.

Parameters
queryA constant reference to a structure identifying the query that will be performed to acquire the subset.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if new subsets have been added. False, if the execution of the query failed or did not see any results.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ allowPausing()

void crawlservpp::Module::Thread::allowPausing ( )
protectedinherited

Allows the thread to be paused.

Threads are pausable by default. Use this function if pausing has been disallowed via disallowPausing().

Thread-safe: Can be used by both the module and the main thread.

◆ clearQueries()

void crawlservpp::Query::Container::clearQueries ( )
inlineprotectedinherited

◆ clearQueryTarget()

void crawlservpp::Query::Container::clearQueryTarget ( )
inlineprotectedinherited

◆ disallowPausing()

◆ end()

void crawlservpp::Module::Thread::end ( )
inherited

Waits for the thread until shutdown is completed.

Note
Either stop() or interrupt() must have been called before calling this function.
Warning
May not be used by the thread itself!

References crawlservpp::Main::Database::deleteThread().

Referenced by onReset(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().

◆ getBoolFromQuery()

bool crawlservpp::Query::Container::getBoolFromQuery ( const QueryStruct query,
bool &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets a boolean result from a query of any type on the current query target.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a boolean variable which will be set according to the result of the query.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ getBoolFromQueryOnSubSet()

bool crawlservpp::Query::Container::getBoolFromQueryOnSubSet ( const QueryStruct query,
bool &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets a boolean result from a query of any type on the current subset.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a boolean variable which will be set according to the result of the query.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getBoolFromRegEx()

bool crawlservpp::Query::Container::getBoolFromRegEx ( const QueryStruct query,
const std::string &  target,
bool &  resultTo,
std::queue< std::string > &  warningsTo 
) const
inlineprotectedinherited

Gets a boolean result from a RegEx query on a separate string.

Parameters
queryA constant reference to a structure identifying the RegEx query that will be performed.
targetA constant reference to a string containing the target on which the query will be performed.
resultToA reference to a boolean variable which will be set according to the result of the query.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the query is of a different type or its execution failed.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), onReset(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().

◆ getConfig()

std::uint64_t crawlservpp::Module::Thread::getConfig ( ) const
inherited

Gets the ID of the configuration used by the thread.

Thread-safe: Can be used by both the module and the main thread, because the configuration is not changed after starting the thread.

Returns
The unique ID identifying the used configuration in the database.

References crawlservpp::Struct::ThreadOptions::config.

Referenced by crawlservpp::Module::Thread::Thread().

◆ getId()

std::uint64_t crawlservpp::Module::Thread::getId ( ) const
inherited

Gets the ID of the thread.

Thread-safe: Can be used by both the module and the main thread.

Returns
The unique ID identifying the thread in the database, because the ID is not changed after starting the thread.

◆ getLast()

std::uint64_t crawlservpp::Module::Thread::getLast ( ) const
protectedinherited

Gets the value of the last ID processed by the thread.

Warning
May only be used by the thread itself, not by the main thread!
Returns
The ID last processed by the thread, or zero if no ID has been processed yet.

Referenced by onInit(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), onReset(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().

◆ getMultiFromQuery()

bool crawlservpp::Query::Container::getMultiFromQuery ( const QueryStruct query,
std::vector< std::string > &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets multiple results from a query of any type on the current query target.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a vector to which the results of the query will be appended.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ getMultiFromQueryOnSubSet()

bool crawlservpp::Query::Container::getMultiFromQueryOnSubSet ( const QueryStruct query,
std::vector< std::string > &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets multiple results from a query of any type on the current subset.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a vector to which the results of the query will be appended.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getMultiFromRegEx()

bool crawlservpp::Query::Container::getMultiFromRegEx ( const QueryStruct query,
const std::string &  target,
std::vector< std::string > &  resultTo,
std::queue< std::string > &  warningsTo 
) const
inlineprotectedinherited

Gets multiple results from a RegEx query on a separate string.

Parameters
queryA constant reference to a structure identifying the RegEx query that will be performed.
targetA constant reference to a string containing the target on which the query will be performed.
resultToA reference to a vector to which the results of the query will be appended.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the query is of a different type or its execution failed.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset().

◆ getNumberOfSubSets()

std::size_t crawlservpp::Query::Container::getNumberOfSubSets ( ) const
inlineprotectedinherited

Gets the number of subsets currently acquired.

Returns
The number of subsets generated by the last query that generated subsets as its result.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getProgress()

float crawlservpp::Module::Thread::getProgress ( ) const
protectedinherited

Gets the current progress, in percent.

Thread-safe: Can be used by both the module and the main thread.

Returns
The current progress of the thread, in percent – between 0.F (none) and 1.F (done).

Referenced by crawlservpp::Module::Parser::Thread::onReset(), and crawlservpp::Module::Extractor::Thread::onReset().

◆ getSingleFromQuery()

bool crawlservpp::Query::Container::getSingleFromQuery ( const QueryStruct query,
std::string &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets a single result from a query of any type on the current query target.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a string to which the result of the query will be written.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ getSingleFromQueryOnSubSet()

bool crawlservpp::Query::Container::getSingleFromQueryOnSubSet ( const QueryStruct query,
std::string &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Gets a single result from a query of any type on the current subset.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a string to which the result of the query will be written.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getSingleFromRegEx()

bool crawlservpp::Query::Container::getSingleFromRegEx ( const QueryStruct query,
const std::string &  target,
std::string &  resultTo,
std::queue< std::string > &  warningsTo 
) const
inlineprotectedinherited

Gets a single result from a RegEx query on a separate string.

Parameters
queryA constant reference to a structure identifying the RegEx query that will be performed.
targetA constant reference to a string containing the target on which the query will be performed.
resultToA reference to a string to which the result of the query will be written.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the query is of a different type or its execution failed.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ getStatusMessage()

std::string crawlservpp::Module::Thread::getStatusMessage ( ) const
protectedinherited

Gets the current status message.

Thread-safe: Can be used by both the module and the main thread.

Returns
A copy of the current status message.

Referenced by crawlservpp::Module::Thread::log(), crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Extractor::Thread::onClear(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ getTarget()

bool crawlservpp::Query::Container::getTarget ( std::string &  targetTo)
inlineprotectedinherited

Gets the current query target, if available, and writes it to the given string.

Parameters
targetToReference to a string the query target will be written to, if one is available. Its content will not be changed if no query target is available.
Returns
True, if a query target was available and has been written to the referenced string. Returns false, if no query target was available and the referenced string has not been changed.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getUrlList()

std::uint64_t crawlservpp::Module::Thread::getUrlList ( ) const
inherited

Gets the ID of the URL list used by the thread.

Thread-safe: Can be used by both the module and the main thread, because the URL list is not changed after starting the thread.

Returns
The unique ID identifying the used URL list in the database.

References crawlservpp::Struct::ThreadOptions::urlList.

Referenced by crawlservpp::Module::Thread::Thread().

◆ getWarpedOverAndReset()

std::int64_t crawlservpp::Module::Thread::getWarpedOverAndReset ( )
protectedinherited

Gets the number of IDs that have been jumped over, and resets them.

Resets the number of IDs jumped over to zero.

Warning
May only be used by the thread itself, not by the main thread!
Returns
The number of IDs that have been jumped over due to a call to warpTo(), or zero if no IDs have been jumped over, at least not since the last call to getWarpedOverAndReset(). The result might be negative, if warpTo() resulted in a jump to a previous ID.

Referenced by onReset(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().

◆ getWebsite()

std::uint64_t crawlservpp::Module::Thread::getWebsite ( ) const
inherited

Gets the ID of the website used by the thread.

Thread-safe: Can be used by both the module and the main thread, because the website is not changed after starting the thread.

Returns
The unique ID identifying the used website in the database.

References crawlservpp::Struct::ThreadOptions::website.

Referenced by onReset(), and crawlservpp::Module::Thread::Thread().

◆ getXml()

bool crawlservpp::Query::Container::getXml ( std::string &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Parses the current query target as tidied XML and writes it to the given string.

Parameters
resultToReference to a string the parsed query target will be written to.
warningsToReference to a vector of strings to which warnings that occured during parsing will be appended.
Returns
True, if the parsing was successful and the tidied XML was written to the given string. False, if the parsing was not successful and the given string has not been changed.

References crawlservpp::Parsing::XML::getContent().

Referenced by onReset().

◆ incrementLast()

void crawlservpp::Module::Thread::incrementLast ( )
protectedinherited

Increments the last ID processed by the thread.

Also sets the number of processed IDs, make sure to increment it before if the ID has been processed.

Warning
May only be used by the thread itself, not by the main thread!
See also
incrementProcessed, setLast, Main::Database::setThreadLast

References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::setThreadLast().

◆ incrementProcessed()

void crawlservpp::Module::Thread::incrementProcessed ( )
protectedinherited

Increments the number of IDs processed by the thread.

Warning
May only be used by the thread itself, not by the main thread!
See also
setLast, incrementLast

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ isFinished()

bool crawlservpp::Module::Thread::isFinished ( ) const
inherited

Checks whether the shutdown of the thread has been finished.

Thread-safe: Can be used by both the module and the main thread.

Returns
True, if the thread has been completely shut down. False otherwise.

◆ isInterrupted()

bool crawlservpp::Module::Thread::isInterrupted ( ) const
protectedinherited

Checks whether the thread has been interrupted.

Thread-safe: Can be used by both the module and the main thread.

Returns
True if the thread has been interrupted. False otherwise.

◆ isLogLevel()

bool crawlservpp::Module::Thread::isLogLevel ( std::uint8_t  level) const
protectedinherited

Checks whether a certain logging level is enabled.

Warning
May only be used by the thread itself, not by the main thread!
Parameters
levelThe logging level to be checked for.
Returns
True, if the current logging level is at least as high as the given level. False, if the current logging level is lower than the given one.

References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::isLogLevel().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), onReset(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().

◆ isPaused()

bool crawlservpp::Module::Thread::isPaused ( ) const
inherited

Checks whether the thread has been paused.

Thread-safe: Can be used by both the module and the main thread.

Returns
True, if the thread has been paused. False otherwise.

◆ isQueryUsed()

bool crawlservpp::Query::Container::isQueryUsed ( std::uint64_t  queryId) const
inlineinherited

Checks whether the specified query is used by the container.

Thread-safe. This function can be used by any thread.

Parameters
queryIdID of the query to be checked.

◆ isRunning()

bool crawlservpp::Module::Thread::isRunning ( ) const
inherited

Checks whether the thread is still supposed to run.

Thread-safe: Can be used by both the module and the main thread.

Returns
True, if the thread is has not been cancelled, even when it is paused. False, if the thread is not supposed to run any longer.

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInit(), crawlservpp::Module::Analyzer::Thread::onInit(), crawlservpp::Module::Parser::Thread::onInit(), crawlservpp::Module::Extractor::Thread::onInit(), onInit(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), onReset(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AllTokens::resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().

◆ isShutdown()

bool crawlservpp::Module::Thread::isShutdown ( ) const
inherited

Checks whether the thread is shutting down or has shut down.

Thread-safe: Can be used by both the module and the main thread.

Returns
True, if the thread is shutting down or has been shut down. False, if the thread is continuing to run.
See also
Thread::isFinished

◆ log() [1/2]

void crawlservpp::Module::Thread::log ( std::uint8_t  level,
const std::string &  logEntry 
)
protectedinherited

Adds a thread-specific log entry to the database, if the current logging level is high enough.

Removes invalid UTF-8 characters if necessary.

If debug logging is active, the entry will be written to the logging file as well.

The log entry will not be written to the database, if the current logging level is lower than the specified logging level. The logging level does not affect the writing of logging entries being to the logging file when debug logging is active.

Warning
May only be used by the thread itself, not by the main thread!
Note
String views cannot be used, because they are not supported by the API for the MySQL database.
Parameters
levelThe logging level for the entry. The entry will only be written to the database, if the current logging level is at least the logging level for the entry.
logEntryConstant reference to a string containing the log entry.
See also
Module::Database::log(std::uint8_t, const std::string&)

References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::log().

Referenced by crawlservpp::Module::Analyzer::Thread::addCorpora(), crawlservpp::Module::Analyzer::Algo::TopicModelling::checkAlgoOptions(), crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Thread::log(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoTick(), crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Extractor::Thread::onClear(), onClear(), crawlservpp::Module::Analyzer::Thread::onReset(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), onReset(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Extractor::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AllTokens::resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo(), and crawlservpp::Module::Analyzer::Thread::uploadResult().

◆ log() [2/2]

void crawlservpp::Module::Thread::log ( std::uint8_t  level,
std::queue< std::string > &  logEntries 
)
protectedinherited

Adds multiple thread-specific log entries to the database, if the current logging level is high enough.

Removes invalid UTF-8 characters if necessary.

If debug logging is active, the entries will be written to the logging file as well.

The log entries will not be written to the database, if the current logging level is lower than the specified logging level. The logging level does not affect the writing of logging entries being to the logging file when debug logging is active.

Warning
May only be used by the thread itself, not by the main thread!
Note
String views cannot be used, because they are not supported by the API for the MySQL database.
Parameters
levelThe logging level for the entries. The entries will only be written to the database, if the current logging level is at least the logging level for the entry.
logEntriesReference to a queue of strings containing the log entries to be written. It will be emptied regardless whether the log entries will be written to the database.
See also
Module::Database::log(std::uint8_t, std::queue<std::string>&)

References crawlservpp::Main::Database::connect(), crawlservpp::Module::Thread::database, crawlservpp::Module::Thread::getStatusMessage(), crawlservpp::Main::Database::getThreadPauseTime(), crawlservpp::Main::Database::getThreadRunTime(), crawlservpp::Module::Database::log(), crawlservpp::Module::Thread::log(), crawlservpp::Helper::DateTime::now(), crawlservpp::Module::Thread::onClear(), crawlservpp::Module::Thread::onInit(), crawlservpp::Module::Thread::onPause(), crawlservpp::Module::Thread::onReset(), crawlservpp::Module::Thread::onTick(), crawlservpp::Module::Thread::onUnpause(), crawlservpp::Module::Thread::pause(), crawlservpp::Module::Thread::pauseByThread(), crawlservpp::Module::Database::prepare(), crawlservpp::Helper::DateTime::secondsToString(), crawlservpp::Module::Thread::setLast(), crawlservpp::Module::Thread::setStatusMessage(), crawlservpp::Main::Database::setThreadPauseTime(), crawlservpp::Main::Database::setThreadRunTime(), and crawlservpp::Module::sleepOnConnectionErrorS.

◆ nextSubSet()

bool crawlservpp::Query::Container::nextSubSet ( )
inlineprotectedinherited

Requests the next subset for all subsequent queries.

Returns
True, if another subset existed that will be used by subsequent queries. False, if no more subsets exist.
Exceptions
Container::Exceptionif an invalid subset had previously been selected.

References crawlservpp::Helper::Memory::free(), crawlservpp::Helper::Json::free(), crawlservpp::Helper::Memory::freeIf(), crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, and crawlservpp::Struct::QueryStruct::typeXPath.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ onClear()

◆ onInit()

void crawlservpp::Module::Crawler::Thread::onInit ( )
overrideprotectedvirtual

Initializes the crawler.

Exceptions
Module::Crawler::Thread::Exceptionif no query for link extraction has been specified.

Implements crawlservpp::Module::Thread.

References crawlservpp::Module::Thread::getLast(), and crawlservpp::Module::Thread::isRunning().

Referenced by onReset().

◆ onPause()

void crawlservpp::Module::Crawler::Thread::onPause ( )
overrideprotectedvirtual

Pauses the crawler.

Stores the current time for keeping track of the time, the crawler is paused.

Implements crawlservpp::Module::Thread.

References crawlservpp::Helper::DateTime::now().

◆ onReset()

void crawlservpp::Module::Crawler::Thread::onReset ( )
overrideprotectedvirtual

Resets the crawler.

Implements crawlservpp::Module::Thread.

References crawlservpp::Network::TorControl::active(), crawlservpp::Query::Container::addQuery(), crawlservpp::Module::Crawler::Database::addUrlIfNotExists(), crawlservpp::Module::Crawler::Database::addUrlsIfNotExist(), crawlservpp::Helper::Container::append(), crawlservpp::Module::Crawler::archiveMementoContentType, crawlservpp::Module::Crawler::archiveRefString, crawlservpp::Module::Crawler::archiveRefTimeStampLength, crawlservpp::Module::Crawler::archiveRenewUrlLockEveryMs, crawlservpp::Struct::CrawlTimersTick::archives, crawlservpp::Struct::CrawlStatsTick::checkedUrls, crawlservpp::Struct::CrawlStatsTick::checkedUrlsArchive, crawlservpp::Query::Container::clearQueryTarget(), crawlservpp::Module::Crawler::Config::config, crawlservpp::Helper::DateTime::convertLongDateTimeToSQLTimeStamp(), crawlservpp::Helper::DateTime::convertSQLTimeStampToTimeStamp(), crawlservpp::Helper::DateTime::convertTimeStampToSQLTimeStamp(), crawlservpp::Module::Crawler::Config::Entries::crawlerArchives, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesNames, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesUrlsMemento, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesUrlsSkip, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesUrlsTimemap, crawlservpp::Module::Crawler::Config::Entries::crawlerLogging, crawlservpp::Module::Crawler::crawlerLoggingDefault, crawlservpp::Module::Crawler::crawlerLoggingExtended, crawlservpp::Module::Crawler::crawlerLoggingVerbose, crawlservpp::Module::Crawler::Config::Entries::crawlerMaxBatchSize, crawlservpp::Module::Crawler::Config::Entries::crawlerParamsAdd, crawlservpp::Module::Crawler::Config::Entries::crawlerParamsBlackList, crawlservpp::Module::Crawler::Config::Entries::crawlerParamsWhiteList, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesBlackListContent, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesBlackListTypes, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesBlackListUrls, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinks, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksBlackListContent, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksBlackListTypes, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksBlackListUrls, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksWhiteListContent, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksWhiteListTypes, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksWhiteListUrls, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesWhiteListContent, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesWhiteListTypes, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesWhiteListUrls, crawlservpp::Module::Crawler::Config::Entries::crawlerReCrawl, crawlservpp::Module::Crawler::Config::Entries::crawlerReCrawlStart, crawlservpp::Module::Crawler::Config::Entries::crawlerRemoveXmlInstructions, crawlservpp::Module::Crawler::Config::Entries::crawlerRepairCData, crawlservpp::Module::Crawler::Config::Entries::crawlerRepairComments, crawlservpp::Module::Crawler::Config::Entries::crawlerRestartAfter, crawlservpp::Module::Crawler::Config::Entries::crawlerReTries, crawlservpp::Module::Crawler::Config::Entries::crawlerRetryArchive, crawlservpp::Module::Crawler::Config::Entries::crawlerRetryEmpty, crawlservpp::Module::Crawler::Config::Entries::crawlerRetryHttp, crawlservpp::Module::Crawler::Config::Entries::crawlerSleepError, crawlservpp::Module::Crawler::Config::Entries::crawlerSleepHttp, crawlservpp::Module::Crawler::Config::Entries::crawlerSleepIdle, crawlservpp::Module::Crawler::Config::Entries::crawlerSleepMySql, crawlservpp::Module::Crawler::Config::Entries::crawlerStart, crawlservpp::Module::Crawler::Config::Entries::crawlerStartIgnore, crawlservpp::Module::Crawler::Config::Entries::crawlerTidyWarnings, crawlservpp::Module::Crawler::Config::Entries::crawlerTiming, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlCaseSensitive, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlChunks, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlDebug, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlMaxLength, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlStartupCheck, crawlservpp::Module::Crawler::Config::Entries::crawlerWarningsFile, crawlservpp::Module::Crawler::Config::Entries::crawlerXml, crawlservpp::Module::Crawler::Config::Entries::customCounters, crawlservpp::Module::Crawler::Config::Entries::customCountersAlias, crawlservpp::Module::Crawler::Config::Entries::customCountersAliasAdd, crawlservpp::Module::Crawler::Config::Entries::customCountersEnd, crawlservpp::Module::Crawler::Config::Entries::customCountersGlobal, crawlservpp::Module::Crawler::Config::Entries::customCountersStart, crawlservpp::Module::Crawler::Config::Entries::customCountersStep, crawlservpp::Module::Crawler::Config::Entries::customReCrawl, crawlservpp::Module::Crawler::Config::Entries::customRobots, crawlservpp::Module::Crawler::Config::Entries::customTokenHeaders, crawlservpp::Module::Crawler::Config::Entries::customTokens, crawlservpp::Module::Crawler::Config::Entries::customTokensCookies, crawlservpp::Module::Crawler::Config::Entries::customTokensKeep, crawlservpp::Module::Crawler::Config::Entries::customTokensRequired, crawlservpp::Module::Crawler::Config::Entries::customTokensSource, crawlservpp::Module::Crawler::Config::Entries::customTokensUsePost, crawlservpp::Module::Crawler::Config::Entries::customUrls, crawlservpp::Module::Crawler::Config::Entries::customUsePost, database, crawlservpp::Module::Thread::end(), crawlservpp::Module::Crawler::Config::Entries::expectedErrorIfLarger, crawlservpp::Module::Crawler::Config::Entries::expectedErrorIfSmaller, crawlservpp::Module::Crawler::Config::Entries::expectedQuery, crawlservpp::Query::Container::getBoolFromQuery(), crawlservpp::Query::Container::getBoolFromRegEx(), crawlservpp::Wrapper::Database::getConfiguration(), crawlservpp::Network::Curl::getContent(), crawlservpp::Network::Curl::getContentType(), crawlservpp::Network::Curl::getCurlCode(), crawlservpp::Module::Thread::getLast(), crawlservpp::Query::Container::getMultiFromQuery(), crawlservpp::Module::Crawler::Database::getNextUrl(), crawlservpp::Module::Crawler::Database::getNumberOfUrls(), crawlservpp::Network::Config::getProtocol(), crawlservpp::Network::Curl::getPublicIp(), crawlservpp::Wrapper::Database::getQueryProperties(), crawlservpp::Network::Curl::getResponseCode(), crawlservpp::Query::Container::getSingleFromQuery(), crawlservpp::Query::Container::getSingleFromRegEx(), crawlservpp::Module::Thread::getStatusMessage(), crawlservpp::Parsing::URI::getSubUri(), crawlservpp::Module::Crawler::Database::getUrlId(), crawlservpp::Module::Crawler::Database::getUrlPosition(), crawlservpp::Module::Thread::getWarpedOverAndReset(), crawlservpp::Module::Thread::getWebsite(), crawlservpp::Wrapper::Database::getWebsiteDomain(), crawlservpp::Query::Container::getXml(), crawlservpp::Struct::CrawlTimersContent::http, crawlservpp::Module::Crawler::httpIgnoreString, crawlservpp::Module::Crawler::httpResponseCodeIgnore, crawlservpp::Module::Crawler::httpResponseCodeMax, crawlservpp::Module::Crawler::httpResponseCodeMin, crawlservpp::Module::Crawler::httpsIgnoreString, crawlservpp::Module::Crawler::httpsString, crawlservpp::Module::Crawler::httpString, crawlservpp::Module::Thread::incrementProcessed(), crawlservpp::Wrapper::DatabaseTryLock< DB >::isActive(), crawlservpp::Module::Crawler::Database::isArchivedContentExists(), crawlservpp::Module::Thread::isLogLevel(), crawlservpp::Module::Thread::isRunning(), crawlservpp::Parsing::URI::isSameDomain(), crawlservpp::Module::Crawler::Database::isUrlCrawled(), crawlservpp::Helper::Utf8::isValidUtf8(), crawlservpp::Module::Config::loadConfig(), crawlservpp::Helper::CommaLocale::locale(), crawlservpp::Module::Crawler::Database::lockUrlIfOk(), crawlservpp::Module::Thread::log(), crawlservpp::Parsing::URI::makeAbsolute(), crawlservpp::Network::Config::networkConfig, networking, networkOptions, crawlservpp::Network::TorControl::newIdentity(), crawlservpp::Struct::CrawlStatsTick::newUrls, crawlservpp::Struct::CrawlStatsTick::newUrlsArchive, crawlservpp::Helper::DateTime::now(), onClear(), onInit(), crawlservpp::Struct::CrawlTimersContent::parse, crawlservpp::Parsing::URI::parseLink(), crawlservpp::Module::Thread::pauseByThread(), crawlservpp::Module::Crawler::Database::prepare(), crawlservpp::Module::Crawler::Config::Entries::redirectCookies, crawlservpp::Module::Crawler::Config::Entries::redirectHeaders, crawlservpp::Module::Crawler::Config::Entries::redirectQueryContent, crawlservpp::Module::Crawler::Config::Entries::redirectQueryUrl, crawlservpp::Module::Crawler::redirectSourceContent, crawlservpp::Module::Crawler::redirectSourceUrl, crawlservpp::Module::Crawler::Config::Entries::redirectTo, crawlservpp::Module::Crawler::Config::Entries::redirectUsePost, crawlservpp::Module::Crawler::Config::Entries::redirectVarNames, crawlservpp::Module::Crawler::Config::Entries::redirectVarSources, crawlservpp::Helper::Strings::replaceAll(), crawlservpp::Network::Config::resetBase(), crawlservpp::Network::Curl::resetConnection(), crawlservpp::Network::Config::Entries::resetTor, crawlservpp::Network::Config::Entries::resetTorAfter, crawlservpp::Network::Config::Entries::resetTorOnlyAfter, crawlservpp::Module::Crawler::robotsFirstLetters, crawlservpp::Module::Crawler::robotsMinLineLength, crawlservpp::Module::Crawler::robotsRelativeUrl, crawlservpp::Module::Crawler::robotsSitemapBegin, crawlservpp::Module::Crawler::Database::saveArchivedContent(), crawlservpp::Module::Crawler::Database::saveContent(), crawlservpp::Struct::CrawlTimersTick::select, crawlservpp::Network::Curl::setConfigCurrent(), crawlservpp::Network::Curl::setConfigGlobal(), crawlservpp::Network::Curl::setCookies(), crawlservpp::Module::Crawler::Config::setCrossDomain(), crawlservpp::Parsing::URI::setCurrentDomain(), crawlservpp::Parsing::URI::setCurrentOrigin(), crawlservpp::Network::Curl::setHeaders(), crawlservpp::Module::Thread::setLast(), crawlservpp::Wrapper::Database::setLogging(), crawlservpp::Module::Crawler::Database::setMaxBatchSize(), crawlservpp::Network::TorControl::setNewIdentityMax(), crawlservpp::Network::TorControl::setNewIdentityMin(), crawlservpp::Module::Thread::setProgress(), crawlservpp::Query::Container::setQueryTarget(), crawlservpp::Module::Crawler::Database::setRecrawl(), crawlservpp::Query::Container::setRemoveXmlInstructions(), crawlservpp::Query::Container::setRepairCData(), crawlservpp::Query::Container::setRepairComments(), crawlservpp::Wrapper::Database::setSleepOnError(), crawlservpp::Module::Thread::setStatusMessage(), crawlservpp::Query::Container::setTidyErrorsAndWarnings(), crawlservpp::Module::Crawler::Database::setUrlCaseSensitive(), crawlservpp::Module::Crawler::Database::setUrlDebug(), crawlservpp::Module::Crawler::Database::setUrlFinishedIfOk(), crawlservpp::Module::Crawler::Database::setUrlStartupCheck(), crawlservpp::Struct::CrawlTimersContent::sleep, crawlservpp::Module::Thread::sleep(), crawlservpp::Helper::Strings::sortAndRemoveDuplicates(), crawlservpp::Timer::StartStop::start(), crawlservpp::Timer::StartStop::stop(), crawlservpp::Timer::Simple::tick(), torControl, crawlservpp::Struct::CrawlTimersTick::total, crawlservpp::Timer::StartStop::totalStr(), crawlservpp::Helper::Strings::trim(), crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Parsing::URI::unescape(), crawlservpp::Module::Crawler::Database::unLockUrlIfOk(), crawlservpp::Network::Curl::unsetCookies(), crawlservpp::Network::Curl::unsetHeaders(), crawlservpp::Struct::CrawlTimersContent::update, crawlservpp::Module::Crawler::updateCustomUrlCountEvery, crawlservpp::Module::Crawler::Database::urlDuplicationCheck(), crawlservpp::Module::Crawler::Database::urlEmptyCheck(), crawlservpp::Module::Crawler::Database::urlHashCheck(), crawlservpp::Module::Thread::urlListNamespace, crawlservpp::Struct::CrawlStatsTick::urlLockTimeArchiveMs, crawlservpp::Main::Exception::view(), crawlservpp::Module::Thread::websiteNamespace, and crawlservpp::Module::Crawler::wwwString.

◆ onTick()

void crawlservpp::Module::Crawler::Thread::onTick ( )
overrideprotectedvirtual

◆ onUnpause()

void crawlservpp::Module::Crawler::Thread::onUnpause ( )
overrideprotectedvirtual

Unpauses the crawler.

Calculates the time, the crawler was paused.

Implements crawlservpp::Module::Thread.

References crawlservpp::Helper::DateTime::now().

◆ pauseByThread()

void crawlservpp::Module::Thread::pauseByThread ( )
protectedinherited

◆ reserveForSubSets()

◆ reset()

void crawlservpp::Module::Thread::reset ( )
inherited

Will reset the thread before the next tick.

◆ setLast()

void crawlservpp::Module::Thread::setLast ( std::uint64_t  lastId)
protectedinherited

Sets the last ID processed by the thread.

Also sets the number of processed IDs, make sure to increment it before if the ID has been processed.

Warning
May only be used by the thread itself, not by the main thread!
Parameters
lastIdThe last ID processed by the thread.
See also
incrementProcessed, incrementLast, Main::Database::setThreadLast

References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::setThreadLast().

Referenced by crawlservpp::Module::Thread::log(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ setMinimizeMemory()

void crawlservpp::Query::Container::setMinimizeMemory ( bool  isMinimizeMemory)
inlineprotectedinherited

Sets whether to minimize memory usage.

Note
Setting memory minimization to true might negatively affect performance.
Parameters
isMinimizeMemorySet whether to minimize memory usage, prioritizing memory usage over performance.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ setProgress()

void crawlservpp::Module::Thread::setProgress ( float  newProgress)
protectedinherited

Sets the progress of the thread.

Warning
May only be used by the thread itself, not by the main thread!
Parameters
newProgressThe new progress of the thread, between 0.f (none), and 1.f (done).
See also
Main::Database::setThreadProgress

References crawlservpp::Module::Thread::database, and crawlservpp::Module::Database::setThreadProgress().

Referenced by crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInit(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), onReset(), crawlservpp::Module::Analyzer::Thread::onTick(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Extractor::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AllTokens::resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo().

◆ setQueryTarget()

void crawlservpp::Query::Container::setQueryTarget ( const std::string &  content,
const std::string &  source 
)
inlineprotectedinherited

Sets the content to use the managed queries on.

The old query target referencing the old content will be cleared.

Warning
Pointers to the strings will be saved in-class. Make sure the strings remain valid as long as they are used!
Parameters
contentConstant reference to a string containing the content to use the managed queries on.
sourceConstant reference to a string containing the source (URL) of the content. It will be used for logging and error reporting purposes only.

References crawlservpp::Query::Container::clearQueryTarget().

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ setRemoveXmlInstructions()

void crawlservpp::Query::Container::setRemoveXmlInstructions ( bool  isRemoveXmlInstructions)
inlineprotectedinherited

Sets whether to remove XML processing instructions (<?xml:...>) before parsing HTML/XML content.

Parameters
isRemoveXmlInstructionsSets whether to remove XML processing instructions.

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ setRepairCData()

void crawlservpp::Query::Container::setRepairCData ( bool  isRepairCData)
inlineprotectedinherited

Sets whether to try to repair CData when parsing XML.

Parameters
isRepairCDataSet whether to try to repair CData when parsing XML.

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ setRepairComments()

void crawlservpp::Query::Container::setRepairComments ( bool  isRepairComments)
inlineprotectedinherited

Sets whether to try to repair broken HTML/XML comments.

Parameters
isRepairCommentsSet whether to try to repair broken HTML/XML comments.

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ setStatusMessage()

void crawlservpp::Module::Thread::setStatusMessage ( const std::string &  statusMessage)
protectedinherited

Sets the status message of the thread.

Warning
May only be used by the thread itself, not by the main thread!
Note
String views cannot be used, because they are not supported by the API for the MySQL database.
Parameters
statusMessageConstant reference to a string containing the new status message to be set.
See also
Main::Database::setThreadStatus

References crawlservpp::Module::Thread::database, and crawlservpp::Main::Database::setThreadStatus().

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Analyzer::Thread::finished(), crawlservpp::Module::Thread::log(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::ExtractIds::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Assoc::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::Empty::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoInit(), crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoTick(), crawlservpp::Module::Analyzer::Algo::AllTokens::onAlgoTick(), crawlservpp::Module::Analyzer::Algo::TopicModelling::onAlgoTick(), crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Extractor::Thread::onClear(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), onReset(), crawlservpp::Module::Parser::Thread::onTick(), crawlservpp::Module::Extractor::Thread::onTick(), crawlservpp::Module::Analyzer::Algo::TermsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::ExtractIds::resetAlgo(), crawlservpp::Module::Analyzer::Algo::WordsOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AllTokens::resetAlgo(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo(), crawlservpp::Module::Analyzer::Algo::TopicModelling::resetAlgo(), and crawlservpp::Module::Analyzer::Thread::uploadResult().

◆ setSubSetsFromQuery()

bool crawlservpp::Query::Container::setSubSetsFromQuery ( const QueryStruct query,
std::queue< std::string > &  warningsTo 
)
inlineprotectedinherited

Sets subsets for subsequent queries using a query of any type.

The subsets resulting from the query will be saved in-class. Previous subsets will be overwritten.

Parameters
queryA constant reference to a structure identifying the query that will be performed to acquire the subset.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ setTidyErrorsAndWarnings()

void crawlservpp::Query::Container::setTidyErrorsAndWarnings ( bool  warnings,
std::uint32_t  numOfErrors 
)
inlineprotectedinherited

Sets how tidy-html5 reports errors and warnings.

The reporting of both errors and warnings is deactivated by default.

For more information about tidy-html5, see its GitHub repository.

Parameters
warningsSpecify whether to report simple warnings.
numOfErrorsSet the number of errors to be reported. Set to zero to deactivate error reporting.

References crawlservpp::Parsing::XML::setOptions().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and onReset().

◆ sleep()

void crawlservpp::Module::Thread::sleep ( std::uint64_t  ms) const
protectedinherited

Lets the thread sleep for the specified number of milliseconds.

The sleep will be interrupted if the thread is stopped.

Thread-safe: Can be used by both the module and the main thread.

Parameters
msThe number of milliseconds for the thread to sleep, if it is not stopped.

References crawlservpp::Module::sleepMs.

Referenced by crawlservpp::Module::Analyzer::Algo::CorpusGenerator::onAlgoTick(), onReset(), crawlservpp::Module::Analyzer::Thread::onTick(), crawlservpp::Module::Parser::Thread::onTick(), and crawlservpp::Module::Extractor::Thread::onTick().

◆ warpTo()

void crawlservpp::Module::Thread::warpTo ( std::uint64_t  target)
inherited

Jumps to the specified target ID ("time travel").

Skips the normal process of determining the next ID once the current ID has been processed.

Thread-safe: Can be used by both the module and the main thread.

Parameters
targetThe target ID that should be processed next.
Exceptions
Module::Thread::Exceptionif no target is specified, i.e. the target ID is zero.
See also
getWarpedOverAndReset

Member Data Documentation

◆ configuration

std::string crawlservpp::Module::Thread::configuration
protectedinherited

JSON string of the configuration used by the thread.

See also
Main::Database::getConfiguration

Referenced by crawlservpp::Module::Thread::Thread().

◆ database

Database crawlservpp::Module::Crawler::Thread::database
protected

Database connection for the crawler thread.

Referenced by onReset().

◆ networking

Network::Curl crawlservpp::Module::Crawler::Thread::networking
protected

Networking for the crawler thread.

Referenced by onReset().

◆ networkOptions

const NetworkSettings crawlservpp::Module::Crawler::Thread::networkOptions
protected

Network settings for the crawler thread.

Referenced by onReset().

◆ torControl

Network::TorControl crawlservpp::Module::Crawler::Thread::torControl
protected

TOR control for the crawler thread.

Referenced by onReset(), and onTick().

◆ urlListNamespace

std::string crawlservpp::Module::Thread::urlListNamespace
protectedinherited

◆ websiteNamespace

std::string crawlservpp::Module::Thread::websiteNamespace
protectedinherited

The documentation for this class was generated from the following files: