crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
Configuration for extractors. More...
#include <Config.hpp>
Classes | |
struct | Entries |
Configuration entries for extractor threads. More... | |
class | Exception |
Class for extractor configuration exceptions. More... | |
Configuration Loader | |
void | loadConfig (const std::string &configJson, LogQueue &warningsTo) |
Loads a configuration. More... | |
Parsing Options | |
enum | StringParsingOption { Default = 0, SQL, SubURL, URL, Trim } |
Options for parsing strings. More... | |
enum | CharParsingOption { FromNumber = 0, FromString } |
Options for parsing char's . More... | |
Configuration Parsing | |
void | category (const std::string &category) |
Sets the category of the subsequent configuration items to be checked for. More... | |
void | option (const std::string &name, bool &target) |
Checks for a configuration option of type bool . More... | |
void | option (const std::string &name, std::vector< bool > &target) |
Checks for a configuration option of type array of bool's . More... | |
void | option (const std::string &name, char &target, CharParsingOption opt) |
Checks for a configuration option of type char . More... | |
void | option (const std::string &name, std::vector< char > &target, CharParsingOption opt) |
Checks for a configuration option of type array of char's . More... | |
void | option (const std::string &name, std::int16_t &target) |
Checks for a configuration option of type 16-bit integer. More... | |
void | option (const std::string &name, std::vector< std::int16_t > &target) |
Checks for a configuration option of type array of 16-bit integers. More... | |
void | option (const std::string &name, std::int32_t &target) |
Checks for a configuration option of type 32-bit integer. More... | |
void | option (const std::string &name, std::vector< std::int32_t > &target) |
Checks for a configuration option of type array of 32-bit integers. More... | |
void | option (const std::string &name, std::int64_t &target) |
Checks for a configuration option of type 64-bit integer. More... | |
void | option (const std::string &name, std::vector< std::int64_t > &target) |
Checks for a configuration option of type array of 64-bit integers. More... | |
void | option (const std::string &name, std::uint8_t &target) |
Checks for a configuration option of type unsigned 8-bit integer. More... | |
void | option (const std::string &name, std::vector< std::uint8_t > &target) |
Checks for a configuration option of type array of unsigned 8-bit integers. More... | |
void | option (const std::string &name, std::uint16_t &target) |
Checks for a configuration option of type unsigned 16-bit integer. More... | |
void | option (const std::string &name, std::vector< std::uint16_t > &target) |
Checks for a configuration option of type array of unsigned 16-bit integers. More... | |
void | option (const std::string &name, std::uint32_t &target) |
Checks for a configuration option of type unsigned 32-bit integer. More... | |
void | option (const std::string &name, std::vector< std::uint32_t > &target) |
Checks for a configuration option of type array of unsigned 32-bit integers. More... | |
void | option (const std::string &name, std::uint64_t &target) |
Checks for a configuration option of type unsigned 64-bit integer. More... | |
void | option (const std::string &name, std::vector< std::uint64_t > &target) |
Checks for a configuration option of type array of unsigned 64-bit integers. More... | |
void | option (const std::string &name, float &target) |
Checks for a configuration option of type floating-point number. More... | |
void | option (const std::string &name, std::vector< float > &target) |
Checks for a configuration option of type array of floating-point numbers. More... | |
void | option (const std::string &name, std::string &target, StringParsingOption opt=Default) |
Checks for a configuration option of type string. More... | |
void | option (const std::string &name, std::vector< std::string > &target, StringParsingOption opt=Default) |
Checks for a configuration option of type array of strings. More... | |
void | warning (const std::string &warning) |
Adds a warning to the logging queue. More... | |
Configuration | |
struct crawlservpp::Module::Extractor::Config::Entries | config |
Configuration of the extractor. More... | |
Extractor-Specific Configuration Parsing | |
void | parseOption () override |
Parses an extractor-specific configuration option. More... | |
void | checkOptions () override |
Checks the extractor-specific configuration options. More... | |
void | reset () override |
Resets the extractor-specific configuration options. More... | |
Configuration | |
struct crawlservpp::Network::Config::Entries | networkConfig |
Configuration for networking. More... | |
Parsing (Network Configuration) | |
void | parseBasicOption () override |
Parses basic network configuration options. More... | |
void | resetBase () override |
Resets basic network configuration options. More... | |
Helper (Network Configuration) | |
const std::string & | getProtocol () const |
Gets the protocol to be used for networking. More... | |
Configuration for extractors.
|
protectedinherited |
|
protectedinherited |
|
inlineprotectedinherited |
Sets the category of the subsequent configuration items to be checked for.
category | Constant reference to a string containing the name of the category. |
References crawlservpp::Struct::ConfigItem::category.
Referenced by crawlservpp::Module::Analyzer::Algo::TermsOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::Assoc::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AllTokens::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::TopicModelling::parseAlgoOption(), crawlservpp::Network::Config::parseBasicOption(), crawlservpp::Module::Parser::Config::parseOption(), crawlservpp::Module::Analyzer::Config::parseOption(), crawlservpp::Module::Crawler::Config::parseOption(), parseOption(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().
|
inlineoverrideprotectedvirtual |
Checks the extractor-specific configuration options.
Module::Extractor::Config::Exception | if no target table has been specified. |
Implements crawlservpp::Module::Config.
References config, crawlservpp::Module::Extractor::Config::Entries::extractingDateTimeFormats, crawlservpp::Module::Extractor::Config::Entries::extractingDateTimeLocales, crawlservpp::Module::Extractor::Config::Entries::extractingDateTimeQueries, crawlservpp::Module::Extractor::Config::Entries::extractingFieldDateTimeFormats, crawlservpp::Module::Extractor::Config::Entries::extractingFieldDateTimeLocales, crawlservpp::Module::Extractor::Config::Entries::extractingFieldDelimiters, crawlservpp::Module::Extractor::Config::Entries::extractingFieldIgnoreEmpty, crawlservpp::Module::Extractor::Config::Entries::extractingFieldJSON, crawlservpp::Module::Extractor::Config::Entries::extractingFieldNames, crawlservpp::Module::Extractor::Config::Entries::extractingFieldQueries, crawlservpp::Module::Extractor::Config::Entries::extractingFieldTidyTexts, crawlservpp::Module::Extractor::Config::Entries::extractingFieldWarningsEmpty, crawlservpp::Module::Extractor::Config::Entries::generalTargetTable, crawlservpp::Module::Extractor::Config::Entries::linkedDateTimeFormats, crawlservpp::Module::Extractor::Config::Entries::linkedDateTimeLocales, crawlservpp::Module::Extractor::Config::Entries::linkedDelimiters, crawlservpp::Module::Extractor::Config::Entries::linkedFieldNames, crawlservpp::Module::Extractor::Config::Entries::linkedFieldQueries, crawlservpp::Module::Extractor::Config::Entries::linkedIgnoreEmpty, crawlservpp::Module::Extractor::Config::Entries::linkedJSON, crawlservpp::Module::Extractor::Config::Entries::linkedTidyTexts, crawlservpp::Module::Extractor::Config::Entries::linkedWarningsEmpty, crawlservpp::Module::Extractor::Config::Entries::sourceUrl, crawlservpp::Module::Extractor::Config::Entries::sourceUrlFirst, crawlservpp::Module::Extractor::Config::Entries::variablesAlias, crawlservpp::Module::Extractor::Config::Entries::variablesAliasAdd, crawlservpp::Module::Extractor::Config::Entries::variablesDateTimeFormat, crawlservpp::Module::Extractor::Config::Entries::variablesDateTimeLocale, crawlservpp::Module::Extractor::Config::Entries::variablesName, crawlservpp::Module::Extractor::Config::Entries::variablesParsedColumn, crawlservpp::Module::Extractor::Config::Entries::variablesParsedTable, crawlservpp::Module::Extractor::Config::Entries::variablesQuery, crawlservpp::Module::Extractor::Config::Entries::variablesSkipQuery, crawlservpp::Module::Extractor::Config::Entries::variablesSource, crawlservpp::Module::Extractor::Config::Entries::variablesTokens, crawlservpp::Module::Extractor::Config::Entries::variablesTokensCookies, crawlservpp::Module::Extractor::Config::Entries::variablesTokensQuery, crawlservpp::Module::Extractor::Config::Entries::variablesTokensSource, crawlservpp::Module::Extractor::Config::Entries::variablesTokensUsePost, and crawlservpp::Module::Config::warning().
|
inlineinherited |
Gets the protocol to be used for networking.
References crawlservpp::Network::Config::networkConfig, and crawlservpp::Network::Config::Entries::protocol.
Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineinherited |
Loads a configuration.
configJson | Constant reference to a string containing the configuration as JSON. |
warningsTo | Reference to a queue to which warnings will be added that occur during the parsing of the configuration, also known as the "logging queue". |
Module::Config::Exception | if the configuration JSON cannot be parsed. |
References crawlservpp::Struct::ConfigItem::category, crawlservpp::Module::Config::checkOptions(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::parseBasicOption(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::ConfigItem::str(), crawlservpp::Struct::ConfigItem::value, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Checks for a configuration option of type bool
.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a boolean variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
Referenced by crawlservpp::Module::Analyzer::Algo::Assoc::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AllTokens::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::TopicModelling::parseAlgoOption(), crawlservpp::Network::Config::parseBasicOption(), crawlservpp::Module::Parser::Config::parseOption(), crawlservpp::Module::Analyzer::Config::parseOption(), crawlservpp::Module::Crawler::Config::parseOption(), and parseOption().
|
inlineprotectedinherited |
Checks for a configuration option of type array of bool's
.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector of bool's into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type char
.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable of the type char into which the value of the configuration entry will be written if it is encountered. |
opt | Parsing options used for the configuration option. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Module::Config::FromNumber, crawlservpp::Module::Config::FromString, crawlservpp::Helper::Strings::getFirstOrEscapeChar(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of char's
.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector of char's into which the value of the configuration entry will be written if it is encountered. |
opt | Parsing options used for the configuration option. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Module::Config::FromNumber, crawlservpp::Module::Config::FromString, crawlservpp::Helper::Strings::getFirstOrEscapeChar(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type 16-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of 16-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type 32-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of 32-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type 64-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of 64-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 8-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 8-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 16-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 16-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 32-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 32-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 64-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 64-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type floating-point number.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of floating-point numbers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type string.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a string into which the value of the configuration entry will be stored if it is encountered. |
opt | Parsing option for the configuration entry. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Helper::Strings::checkSQLName(), crawlservpp::Module::Config::Default, crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::SQL, crawlservpp::Struct::ConfigItem::str(), crawlservpp::Module::Config::SubURL, crawlservpp::Helper::Strings::trim(), crawlservpp::Module::Config::Trim, crawlservpp::Module::Config::URL, and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of strings.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
name | Constant reference to a string containing the name of the option to check for. |
target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
opt | Parsing option for the configuration entry. |
Module::Config::Exception | if no category has been set. |
References crawlservpp::Helper::Strings::checkSQLName(), crawlservpp::Module::Config::Default, crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::SQL, crawlservpp::Struct::ConfigItem::str(), crawlservpp::Module::Config::SubURL, crawlservpp::Helper::Strings::trim(), crawlservpp::Module::Config::Trim, crawlservpp::Module::Config::URL, and crawlservpp::Struct::ConfigItem::value.
|
inlineoverridevirtualinherited |
Parses basic network configuration options.
Reimplemented from crawlservpp::Module::Config.
References crawlservpp::Module::Config::category(), crawlservpp::Network::Config::Entries::connectionsMax, crawlservpp::Network::Config::Entries::contentLengthIgnore, crawlservpp::Network::Config::Entries::cookies, crawlservpp::Network::Config::Entries::cookiesLoad, crawlservpp::Network::Config::Entries::cookiesOverwrite, crawlservpp::Network::Config::Entries::cookiesSave, crawlservpp::Network::Config::Entries::cookiesSession, crawlservpp::Network::Config::Entries::cookiesSet, crawlservpp::Network::Config::Entries::dnsCacheTimeOut, crawlservpp::Network::Config::Entries::dnsDoH, crawlservpp::Network::Config::Entries::dnsInterface, crawlservpp::Network::Config::Entries::dnsResolves, crawlservpp::Network::Config::Entries::dnsServers, crawlservpp::Network::Config::Entries::dnsShuffle, crawlservpp::Network::Config::Entries::encodingBr, crawlservpp::Network::Config::Entries::encodingDeflate, crawlservpp::Network::Config::Entries::encodingGZip, crawlservpp::Network::Config::Entries::encodingIdentity, crawlservpp::Network::Config::Entries::encodingTransfer, crawlservpp::Network::Config::Entries::encodingZstd, crawlservpp::Network::Config::Entries::headers, crawlservpp::Network::Config::Entries::http200Aliases, crawlservpp::Network::Config::Entries::httpVersion, crawlservpp::Network::Config::Entries::localInterface, crawlservpp::Network::Config::Entries::localPort, crawlservpp::Network::Config::Entries::localPortRange, crawlservpp::Network::Config::networkConfig, crawlservpp::Network::Config::Entries::noReUse, crawlservpp::Module::Config::option(), crawlservpp::Network::Config::parseOption(), crawlservpp::Network::Config::Entries::protocol, crawlservpp::Network::Config::Entries::proxy, crawlservpp::Network::Config::Entries::proxyAuth, crawlservpp::Network::Config::Entries::proxyHeaders, crawlservpp::Network::Config::Entries::proxyPre, crawlservpp::Network::Config::Entries::proxyTlsSrpPassword, crawlservpp::Network::Config::Entries::proxyTlsSrpUser, crawlservpp::Network::Config::Entries::proxyTunnelling, crawlservpp::Network::Config::Entries::redirect, crawlservpp::Network::Config::Entries::redirectMax, crawlservpp::Network::Config::Entries::redirectPost301, crawlservpp::Network::Config::Entries::redirectPost302, crawlservpp::Network::Config::Entries::redirectPost303, crawlservpp::Network::Config::Entries::referer, crawlservpp::Network::Config::Entries::refererAutomatic, crawlservpp::Network::Config::Entries::resetTor, crawlservpp::Network::Config::Entries::resetTorAfter, crawlservpp::Network::Config::Entries::resetTorOnlyAfter, crawlservpp::Network::Config::Entries::speedDownLimit, crawlservpp::Network::Config::Entries::speedLowLimit, crawlservpp::Network::Config::Entries::speedLowTime, crawlservpp::Network::Config::Entries::speedUpLimit, crawlservpp::Network::Config::Entries::sslVerifyHost, crawlservpp::Network::Config::Entries::sslVerifyPeer, crawlservpp::Network::Config::Entries::sslVerifyProxyHost, crawlservpp::Network::Config::Entries::sslVerifyProxyPeer, crawlservpp::Network::Config::Entries::sslVerifyStatus, crawlservpp::Network::Config::Entries::tcpFastOpen, crawlservpp::Network::Config::Entries::tcpKeepAlive, crawlservpp::Network::Config::Entries::tcpKeepAliveIdle, crawlservpp::Network::Config::Entries::tcpKeepAliveInterval, crawlservpp::Network::Config::Entries::tcpNagle, crawlservpp::Network::Config::Entries::timeOut, crawlservpp::Network::Config::Entries::timeOutHappyEyeballs, crawlservpp::Network::Config::Entries::timeOutRequest, crawlservpp::Network::Config::Entries::tlsSrpPassword, crawlservpp::Network::Config::Entries::tlsSrpUser, crawlservpp::Network::Config::Entries::userAgent, crawlservpp::Network::Config::Entries::verbose, and crawlservpp::Module::Config::warning().
|
inlineoverrideprotectedvirtual |
Parses an extractor-specific configuration option.
Implements crawlservpp::Network::Config.
References crawlservpp::Module::Config::category(), config, crawlservpp::Module::Extractor::Config::Entries::expectedErrorIfLarger, crawlservpp::Module::Extractor::Config::Entries::expectedErrorIfSmaller, crawlservpp::Module::Extractor::Config::Entries::expectedParsedColumn, crawlservpp::Module::Extractor::Config::Entries::expectedParsedTable, crawlservpp::Module::Extractor::Config::Entries::expectedQuery, crawlservpp::Module::Extractor::Config::Entries::expectedSource, crawlservpp::Module::Extractor::Config::Entries::extractingDatasetQueries, crawlservpp::Module::Extractor::Config::Entries::extractingDateTimeFormats, crawlservpp::Module::Extractor::Config::Entries::extractingDateTimeLocales, crawlservpp::Module::Extractor::Config::Entries::extractingDateTimeQueries, crawlservpp::Module::Extractor::Config::Entries::extractingErrorFail, crawlservpp::Module::Extractor::Config::Entries::extractingErrorRetry, crawlservpp::Module::Extractor::Config::Entries::extractingFieldDateTimeFormats, crawlservpp::Module::Extractor::Config::Entries::extractingFieldDateTimeLocales, crawlservpp::Module::Extractor::Config::Entries::extractingFieldDelimiters, crawlservpp::Module::Extractor::Config::Entries::extractingFieldIgnoreEmpty, crawlservpp::Module::Extractor::Config::Entries::extractingFieldJSON, crawlservpp::Module::Extractor::Config::Entries::extractingFieldNames, crawlservpp::Module::Extractor::Config::Entries::extractingFieldQueries, crawlservpp::Module::Extractor::Config::Entries::extractingFieldTidyTexts, crawlservpp::Module::Extractor::Config::Entries::extractingFieldWarningsEmpty, crawlservpp::Module::Extractor::Config::Entries::extractingIdIgnore, crawlservpp::Module::Extractor::Config::Entries::extractingIdQueries, crawlservpp::Module::Extractor::Config::Entries::extractingOverwrite, crawlservpp::Module::Extractor::Config::Entries::extractingRecursive, crawlservpp::Module::Extractor::Config::Entries::extractingRecursiveMaxDepth, crawlservpp::Module::Extractor::Config::Entries::extractingRemoveDuplicates, crawlservpp::Module::Extractor::Config::Entries::extractingRemoveXmlInstructions, crawlservpp::Module::Extractor::Config::Entries::extractingRepairCData, crawlservpp::Module::Extractor::Config::Entries::extractingRepairComments, crawlservpp::Module::Extractor::Config::Entries::extractingSkipQuery, crawlservpp::Module::Extractor::Config::Entries::generalCacheSize, crawlservpp::Module::Extractor::Config::Entries::generalExtractCustom, crawlservpp::Module::Extractor::Config::Entries::generalLock, crawlservpp::Module::Extractor::Config::Entries::generalLogging, crawlservpp::Module::Extractor::Config::Entries::generalMaxBatchSize, crawlservpp::Module::Extractor::Config::Entries::generalMinimizeMemory, crawlservpp::Module::Extractor::Config::Entries::generalReExtract, crawlservpp::Module::Extractor::Config::Entries::generalReTries, crawlservpp::Module::Extractor::Config::Entries::generalRetryHttp, crawlservpp::Module::Extractor::Config::Entries::generalSleepError, crawlservpp::Module::Extractor::Config::Entries::generalSleepHttp, crawlservpp::Module::Extractor::Config::Entries::generalSleepIdle, crawlservpp::Module::Extractor::Config::Entries::generalSleepMySql, crawlservpp::Module::Extractor::Config::Entries::generalTargetTable, crawlservpp::Module::Extractor::Config::Entries::generalTidyErrors, crawlservpp::Module::Extractor::Config::Entries::generalTidyWarnings, crawlservpp::Module::Extractor::Config::Entries::generalTiming, crawlservpp::Module::Extractor::Config::Entries::linkedDatasetQueries, crawlservpp::Module::Extractor::Config::Entries::linkedDateTimeFormats, crawlservpp::Module::Extractor::Config::Entries::linkedDateTimeLocales, crawlservpp::Module::Extractor::Config::Entries::linkedDelimiters, crawlservpp::Module::Extractor::Config::Entries::linkedFieldNames, crawlservpp::Module::Extractor::Config::Entries::linkedFieldQueries, crawlservpp::Module::Extractor::Config::Entries::linkedIdIgnore, crawlservpp::Module::Extractor::Config::Entries::linkedIdQueries, crawlservpp::Module::Extractor::Config::Entries::linkedIgnoreEmpty, crawlservpp::Module::Extractor::Config::Entries::linkedJSON, crawlservpp::Module::Extractor::Config::Entries::linkedLink, crawlservpp::Module::Extractor::Config::Entries::linkedOverwrite, crawlservpp::Module::Extractor::Config::Entries::linkedTargetTable, crawlservpp::Module::Extractor::Config::Entries::linkedTidyTexts, crawlservpp::Module::Extractor::Config::Entries::linkedWarningsEmpty, crawlservpp::Module::Config::option(), crawlservpp::Module::Extractor::Config::Entries::pagingAlias, crawlservpp::Module::Extractor::Config::Entries::pagingAliasAdd, crawlservpp::Module::Extractor::Config::Entries::pagingFirst, crawlservpp::Module::Extractor::Config::Entries::pagingFirstString, crawlservpp::Module::Extractor::Config::Entries::pagingIsNextFrom, crawlservpp::Module::Extractor::Config::Entries::pagingNextFrom, crawlservpp::Module::Extractor::Config::Entries::pagingNumberFrom, crawlservpp::Module::Extractor::Config::Entries::pagingStep, crawlservpp::Module::Extractor::Config::Entries::pagingVariable, crawlservpp::Module::Extractor::Config::Entries::sourceCookies, crawlservpp::Module::Extractor::Config::Entries::sourceHeaders, crawlservpp::Module::Extractor::Config::Entries::sourceUrl, crawlservpp::Module::Extractor::Config::Entries::sourceUrlFirst, crawlservpp::Module::Extractor::Config::Entries::sourceUsePost, crawlservpp::Module::Extractor::Config::Entries::variablesAlias, crawlservpp::Module::Extractor::Config::Entries::variablesAliasAdd, crawlservpp::Module::Extractor::Config::Entries::variablesDateTimeFormat, crawlservpp::Module::Extractor::Config::Entries::variablesDateTimeLocale, crawlservpp::Module::Extractor::Config::Entries::variablesName, crawlservpp::Module::Extractor::Config::Entries::variablesParsedColumn, crawlservpp::Module::Extractor::Config::Entries::variablesParsedTable, crawlservpp::Module::Extractor::Config::Entries::variablesQuery, crawlservpp::Module::Extractor::Config::Entries::variablesSkipQuery, crawlservpp::Module::Extractor::Config::Entries::variablesSource, crawlservpp::Module::Extractor::Config::Entries::variablesTokenHeaders, crawlservpp::Module::Extractor::Config::Entries::variablesTokens, crawlservpp::Module::Extractor::Config::Entries::variablesTokensCookies, crawlservpp::Module::Extractor::Config::Entries::variablesTokensQuery, crawlservpp::Module::Extractor::Config::Entries::variablesTokensSource, and crawlservpp::Module::Extractor::Config::Entries::variablesTokensUsePost.
|
inlineoverrideprotectedvirtual |
Resets the extractor-specific configuration options.
Implements crawlservpp::Network::Config.
References config, and crawlservpp::Module::Extractor::protocolsToRemove.
|
inlineoverridevirtualinherited |
Resets basic network configuration options.
Reimplemented from crawlservpp::Module::Config.
References crawlservpp::Network::Config::networkConfig, and crawlservpp::Network::Config::reset().
Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Adds a warning to the logging queue.
warning | Constant reference to a string containing the warning. |
Module::Config::Exception | if no log queue is active. |
Referenced by crawlservpp::Module::Analyzer::Algo::Assoc::checkAlgoOptions(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::checkAlgoOptions(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::checkAlgoOptions(), crawlservpp::Module::Parser::Config::checkOptions(), crawlservpp::Module::Analyzer::Config::checkOptions(), crawlservpp::Module::Crawler::Config::checkOptions(), checkOptions(), and crawlservpp::Network::Config::parseBasicOption().
struct crawlservpp::Module::Extractor::Config::Entries crawlservpp::Module::Extractor::Config::config |
Configuration of the extractor.
Referenced by checkOptions(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onTick(), parseOption(), and reset().
|
inherited |
Configuration for networking.
Referenced by crawlservpp::Network::Config::getProtocol(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Network::Config::parseBasicOption(), crawlservpp::Network::Config::resetBase(), crawlservpp::Network::Curl::setConfigCurrent(), and crawlservpp::Network::Curl::setConfigGlobal().