|
crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
Configuration for crawlers. More...
#include <Config.hpp>


Classes | |
| struct | Entries |
| Configuration entries for crawler threads. More... | |
| class | Exception |
| Class for crawler configuration exceptions. More... | |
Configuration Loader | |
| void | loadConfig (const std::string &configJson, LogQueue &warningsTo) |
| Loads a configuration. More... | |
Parsing Options | |
| enum | StringParsingOption { Default = 0, SQL, SubURL, URL, Trim } |
| Options for parsing strings. More... | |
| enum | CharParsingOption { FromNumber = 0, FromString } |
Options for parsing char's. More... | |
Configuration Parsing | |
| void | category (const std::string &category) |
| Sets the category of the subsequent configuration items to be checked for. More... | |
| void | option (const std::string &name, bool &target) |
Checks for a configuration option of type bool. More... | |
| void | option (const std::string &name, std::vector< bool > &target) |
Checks for a configuration option of type array of bool's. More... | |
| void | option (const std::string &name, char &target, CharParsingOption opt) |
Checks for a configuration option of type char. More... | |
| void | option (const std::string &name, std::vector< char > &target, CharParsingOption opt) |
Checks for a configuration option of type array of char's. More... | |
| void | option (const std::string &name, std::int16_t &target) |
| Checks for a configuration option of type 16-bit integer. More... | |
| void | option (const std::string &name, std::vector< std::int16_t > &target) |
| Checks for a configuration option of type array of 16-bit integers. More... | |
| void | option (const std::string &name, std::int32_t &target) |
| Checks for a configuration option of type 32-bit integer. More... | |
| void | option (const std::string &name, std::vector< std::int32_t > &target) |
| Checks for a configuration option of type array of 32-bit integers. More... | |
| void | option (const std::string &name, std::int64_t &target) |
| Checks for a configuration option of type 64-bit integer. More... | |
| void | option (const std::string &name, std::vector< std::int64_t > &target) |
| Checks for a configuration option of type array of 64-bit integers. More... | |
| void | option (const std::string &name, std::uint8_t &target) |
| Checks for a configuration option of type unsigned 8-bit integer. More... | |
| void | option (const std::string &name, std::vector< std::uint8_t > &target) |
| Checks for a configuration option of type array of unsigned 8-bit integers. More... | |
| void | option (const std::string &name, std::uint16_t &target) |
| Checks for a configuration option of type unsigned 16-bit integer. More... | |
| void | option (const std::string &name, std::vector< std::uint16_t > &target) |
| Checks for a configuration option of type array of unsigned 16-bit integers. More... | |
| void | option (const std::string &name, std::uint32_t &target) |
| Checks for a configuration option of type unsigned 32-bit integer. More... | |
| void | option (const std::string &name, std::vector< std::uint32_t > &target) |
| Checks for a configuration option of type array of unsigned 32-bit integers. More... | |
| void | option (const std::string &name, std::uint64_t &target) |
| Checks for a configuration option of type unsigned 64-bit integer. More... | |
| void | option (const std::string &name, std::vector< std::uint64_t > &target) |
| Checks for a configuration option of type array of unsigned 64-bit integers. More... | |
| void | option (const std::string &name, float &target) |
| Checks for a configuration option of type floating-point number. More... | |
| void | option (const std::string &name, std::vector< float > &target) |
| Checks for a configuration option of type array of floating-point numbers. More... | |
| void | option (const std::string &name, std::string &target, StringParsingOption opt=Default) |
| Checks for a configuration option of type string. More... | |
| void | option (const std::string &name, std::vector< std::string > &target, StringParsingOption opt=Default) |
| Checks for a configuration option of type array of strings. More... | |
| void | warning (const std::string &warning) |
| Adds a warning to the logging queue. More... | |
Setter | |
| void | setCrossDomain (bool isCrossDomain) |
| Sets whether the corresponding website is cross-domain. More... | |
Configuration | |
| struct crawlservpp::Module::Crawler::Config::Entries | config |
| Configuration of the crawler. More... | |
Crawler-Specific Configuration Parsing | |
| void | parseOption () override |
| Parses an crawler-specific configuration option. More... | |
| void | checkOptions () override |
| Checks the crawler-specific configuration options. More... | |
| void | reset () override |
| Resets the crawler-specific configuration options. More... | |
Configuration | |
| struct crawlservpp::Network::Config::Entries | networkConfig |
| Configuration for networking. More... | |
Parsing (Network Configuration) | |
| void | parseBasicOption () override |
| Parses basic network configuration options. More... | |
| void | resetBase () override |
| Resets basic network configuration options. More... | |
Helper (Network Configuration) | |
| const std::string & | getProtocol () const |
| Gets the protocol to be used for networking. More... | |
Configuration for crawlers.
|
protectedinherited |
|
protectedinherited |
|
inlineprotectedinherited |
Sets the category of the subsequent configuration items to be checked for.
| category | Constant reference to a string containing the name of the category. |
References crawlservpp::Struct::ConfigItem::category.
Referenced by crawlservpp::Module::Analyzer::Algo::TermsOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::Assoc::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AllTokens::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::TopicModelling::parseAlgoOption(), crawlservpp::Network::Config::parseBasicOption(), crawlservpp::Module::Parser::Config::parseOption(), crawlservpp::Module::Analyzer::Config::parseOption(), parseOption(), crawlservpp::Module::Extractor::Config::parseOption(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().
|
inlineoverrideprotectedvirtual |
Checks the crawler-specific configuration options.
| Module::Crawler::Config::Exception | if no link extraction query has been specified. |
Implements crawlservpp::Module::Config.
References config, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesNames, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesUrlsMemento, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesUrlsTimemap, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinks, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlChunks, crawlservpp::Module::Crawler::Config::Entries::customCounters, crawlservpp::Module::Crawler::Config::Entries::customCountersAlias, crawlservpp::Module::Crawler::Config::Entries::customCountersAliasAdd, crawlservpp::Module::Crawler::Config::Entries::customCountersEnd, crawlservpp::Module::Crawler::Config::Entries::customCountersStart, crawlservpp::Module::Crawler::Config::Entries::customCountersStep, crawlservpp::Module::Crawler::Config::Entries::customTokens, crawlservpp::Module::Crawler::Config::Entries::customTokensCookies, crawlservpp::Module::Crawler::Config::Entries::customTokensKeep, crawlservpp::Module::Crawler::Config::Entries::customTokensQuery, crawlservpp::Module::Crawler::Config::Entries::customTokensRequired, crawlservpp::Module::Crawler::Config::Entries::customTokensSource, crawlservpp::Module::Crawler::Config::Entries::customTokensUsePost, crawlservpp::Module::Crawler::defaultUrlChunks, crawlservpp::Module::Crawler::Config::Entries::redirectVarNames, crawlservpp::Module::Crawler::Config::Entries::redirectVarQueries, crawlservpp::Module::Crawler::Config::Entries::redirectVarSources, and crawlservpp::Module::Config::warning().
|
inlineinherited |
Gets the protocol to be used for networking.
References crawlservpp::Network::Config::networkConfig, and crawlservpp::Network::Config::Entries::protocol.
Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineinherited |
Loads a configuration.
| configJson | Constant reference to a string containing the configuration as JSON. |
| warningsTo | Reference to a queue to which warnings will be added that occur during the parsing of the configuration, also known as the "logging queue". |
| Module::Config::Exception | if the configuration JSON cannot be parsed. |
References crawlservpp::Struct::ConfigItem::category, crawlservpp::Module::Config::checkOptions(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::parseBasicOption(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::ConfigItem::str(), crawlservpp::Struct::ConfigItem::value, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Checks for a configuration option of type bool.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a boolean variable into which the value of the configuration entry will be written if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
Referenced by crawlservpp::Module::Analyzer::Algo::Assoc::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::AllTokens::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::parseAlgoOption(), crawlservpp::Module::Analyzer::Algo::TopicModelling::parseAlgoOption(), crawlservpp::Network::Config::parseBasicOption(), crawlservpp::Module::Parser::Config::parseOption(), crawlservpp::Module::Analyzer::Config::parseOption(), parseOption(), and crawlservpp::Module::Extractor::Config::parseOption().
|
inlineprotectedinherited |
Checks for a configuration option of type array of bool's.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector of bool's into which the value of the configuration entry will be written if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type char.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a variable of the type char into which the value of the configuration entry will be written if it is encountered. |
| opt | Parsing options used for the configuration option. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Module::Config::FromNumber, crawlservpp::Module::Config::FromString, crawlservpp::Helper::Strings::getFirstOrEscapeChar(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of char's.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector of char's into which the value of the configuration entry will be written if it is encountered. |
| opt | Parsing options used for the configuration option. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Module::Config::FromNumber, crawlservpp::Module::Config::FromString, crawlservpp::Helper::Strings::getFirstOrEscapeChar(), crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type 16-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of 16-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type 32-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of 32-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type 64-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of 64-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 8-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 8-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 16-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 16-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 32-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 32-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type unsigned 64-bit integer.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of unsigned 64-bit integers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type floating-point number.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a variable into which the value of the configuration entry will be written if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of floating-point numbers.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Struct::ConfigItem::name, crawlservpp::Struct::ConfigItem::str(), and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type string.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a string into which the value of the configuration entry will be stored if it is encountered. |
| opt | Parsing option for the configuration entry. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Helper::Strings::checkSQLName(), crawlservpp::Module::Config::Default, crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::SQL, crawlservpp::Struct::ConfigItem::str(), crawlservpp::Module::Config::SubURL, crawlservpp::Helper::Strings::trim(), crawlservpp::Module::Config::Trim, crawlservpp::Module::Config::URL, and crawlservpp::Struct::ConfigItem::value.
|
inlineprotectedinherited |
Checks for a configuration option of type array of strings.
Ignores the option and adds a warning to the warnings queue, if the requested type does not match the data type in the configuration JSON.
| name | Constant reference to a string containing the name of the option to check for. |
| target | Reference to a vector into which the value of the configuration entry will be stored if it is encountered. |
| opt | Parsing option for the configuration entry. |
| Module::Config::Exception | if no category has been set. |
References crawlservpp::Helper::Strings::checkSQLName(), crawlservpp::Module::Config::Default, crawlservpp::Struct::ConfigItem::name, crawlservpp::Module::Config::SQL, crawlservpp::Struct::ConfigItem::str(), crawlservpp::Module::Config::SubURL, crawlservpp::Helper::Strings::trim(), crawlservpp::Module::Config::Trim, crawlservpp::Module::Config::URL, and crawlservpp::Struct::ConfigItem::value.
|
inlineoverridevirtualinherited |
Parses basic network configuration options.
Reimplemented from crawlservpp::Module::Config.
References crawlservpp::Module::Config::category(), crawlservpp::Network::Config::Entries::connectionsMax, crawlservpp::Network::Config::Entries::contentLengthIgnore, crawlservpp::Network::Config::Entries::cookies, crawlservpp::Network::Config::Entries::cookiesLoad, crawlservpp::Network::Config::Entries::cookiesOverwrite, crawlservpp::Network::Config::Entries::cookiesSave, crawlservpp::Network::Config::Entries::cookiesSession, crawlservpp::Network::Config::Entries::cookiesSet, crawlservpp::Network::Config::Entries::dnsCacheTimeOut, crawlservpp::Network::Config::Entries::dnsDoH, crawlservpp::Network::Config::Entries::dnsInterface, crawlservpp::Network::Config::Entries::dnsResolves, crawlservpp::Network::Config::Entries::dnsServers, crawlservpp::Network::Config::Entries::dnsShuffle, crawlservpp::Network::Config::Entries::encodingBr, crawlservpp::Network::Config::Entries::encodingDeflate, crawlservpp::Network::Config::Entries::encodingGZip, crawlservpp::Network::Config::Entries::encodingIdentity, crawlservpp::Network::Config::Entries::encodingTransfer, crawlservpp::Network::Config::Entries::encodingZstd, crawlservpp::Network::Config::Entries::headers, crawlservpp::Network::Config::Entries::http200Aliases, crawlservpp::Network::Config::Entries::httpVersion, crawlservpp::Network::Config::Entries::localInterface, crawlservpp::Network::Config::Entries::localPort, crawlservpp::Network::Config::Entries::localPortRange, crawlservpp::Network::Config::networkConfig, crawlservpp::Network::Config::Entries::noReUse, crawlservpp::Module::Config::option(), crawlservpp::Network::Config::parseOption(), crawlservpp::Network::Config::Entries::protocol, crawlservpp::Network::Config::Entries::proxy, crawlservpp::Network::Config::Entries::proxyAuth, crawlservpp::Network::Config::Entries::proxyHeaders, crawlservpp::Network::Config::Entries::proxyPre, crawlservpp::Network::Config::Entries::proxyTlsSrpPassword, crawlservpp::Network::Config::Entries::proxyTlsSrpUser, crawlservpp::Network::Config::Entries::proxyTunnelling, crawlservpp::Network::Config::Entries::redirect, crawlservpp::Network::Config::Entries::redirectMax, crawlservpp::Network::Config::Entries::redirectPost301, crawlservpp::Network::Config::Entries::redirectPost302, crawlservpp::Network::Config::Entries::redirectPost303, crawlservpp::Network::Config::Entries::referer, crawlservpp::Network::Config::Entries::refererAutomatic, crawlservpp::Network::Config::Entries::resetTor, crawlservpp::Network::Config::Entries::resetTorAfter, crawlservpp::Network::Config::Entries::resetTorOnlyAfter, crawlservpp::Network::Config::Entries::speedDownLimit, crawlservpp::Network::Config::Entries::speedLowLimit, crawlservpp::Network::Config::Entries::speedLowTime, crawlservpp::Network::Config::Entries::speedUpLimit, crawlservpp::Network::Config::Entries::sslVerifyHost, crawlservpp::Network::Config::Entries::sslVerifyPeer, crawlservpp::Network::Config::Entries::sslVerifyProxyHost, crawlservpp::Network::Config::Entries::sslVerifyProxyPeer, crawlservpp::Network::Config::Entries::sslVerifyStatus, crawlservpp::Network::Config::Entries::tcpFastOpen, crawlservpp::Network::Config::Entries::tcpKeepAlive, crawlservpp::Network::Config::Entries::tcpKeepAliveIdle, crawlservpp::Network::Config::Entries::tcpKeepAliveInterval, crawlservpp::Network::Config::Entries::tcpNagle, crawlservpp::Network::Config::Entries::timeOut, crawlservpp::Network::Config::Entries::timeOutHappyEyeballs, crawlservpp::Network::Config::Entries::timeOutRequest, crawlservpp::Network::Config::Entries::tlsSrpPassword, crawlservpp::Network::Config::Entries::tlsSrpUser, crawlservpp::Network::Config::Entries::userAgent, crawlservpp::Network::Config::Entries::verbose, and crawlservpp::Module::Config::warning().
|
inlineoverrideprotectedvirtual |
Parses an crawler-specific configuration option.
Implements crawlservpp::Network::Config.
References crawlservpp::Module::Config::category(), config, crawlservpp::Module::Crawler::Config::Entries::crawlerArchives, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesNames, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesOnly, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesUrlsMemento, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesUrlsSkip, crawlservpp::Module::Crawler::Config::Entries::crawlerArchivesUrlsTimemap, crawlservpp::Module::Crawler::Config::Entries::crawlerLock, crawlservpp::Module::Crawler::Config::Entries::crawlerLogging, crawlservpp::Module::Crawler::Config::Entries::crawlerMaxBatchSize, crawlservpp::Module::Crawler::Config::Entries::crawlerParamsAdd, crawlservpp::Module::Crawler::Config::Entries::crawlerParamsBlackList, crawlservpp::Module::Crawler::Config::Entries::crawlerParamsWhiteList, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesBlackListContent, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesBlackListTypes, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesBlackListUrls, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinks, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksBlackListContent, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksBlackListTypes, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksBlackListUrls, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksWhiteListContent, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksWhiteListTypes, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesLinksWhiteListUrls, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesWhiteListContent, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesWhiteListTypes, crawlservpp::Module::Crawler::Config::Entries::crawlerQueriesWhiteListUrls, crawlservpp::Module::Crawler::Config::Entries::crawlerReCrawl, crawlservpp::Module::Crawler::Config::Entries::crawlerReCrawlAlways, crawlservpp::Module::Crawler::Config::Entries::crawlerReCrawlStart, crawlservpp::Module::Crawler::Config::Entries::crawlerRemoveXmlInstructions, crawlservpp::Module::Crawler::Config::Entries::crawlerRepairCData, crawlservpp::Module::Crawler::Config::Entries::crawlerRepairComments, crawlservpp::Module::Crawler::Config::Entries::crawlerRestartAfter, crawlservpp::Module::Crawler::Config::Entries::crawlerReTries, crawlservpp::Module::Crawler::Config::Entries::crawlerRetryArchive, crawlservpp::Module::Crawler::Config::Entries::crawlerRetryEmpty, crawlservpp::Module::Crawler::Config::Entries::crawlerRetryHttp, crawlservpp::Module::Crawler::Config::Entries::crawlerSleepError, crawlservpp::Module::Crawler::Config::Entries::crawlerSleepHttp, crawlservpp::Module::Crawler::Config::Entries::crawlerSleepIdle, crawlservpp::Module::Crawler::Config::Entries::crawlerSleepMySql, crawlservpp::Module::Crawler::Config::Entries::crawlerStart, crawlservpp::Module::Crawler::Config::Entries::crawlerStartIgnore, crawlservpp::Module::Crawler::Config::Entries::crawlerTidyErrors, crawlservpp::Module::Crawler::Config::Entries::crawlerTidyWarnings, crawlservpp::Module::Crawler::Config::Entries::crawlerTiming, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlCaseSensitive, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlChunks, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlDebug, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlMaxLength, crawlservpp::Module::Crawler::Config::Entries::crawlerUrlStartupCheck, crawlservpp::Module::Crawler::Config::Entries::crawlerWarningsFile, crawlservpp::Module::Crawler::Config::Entries::crawlerXml, crawlservpp::Module::Crawler::Config::Entries::customCounters, crawlservpp::Module::Crawler::Config::Entries::customCountersAlias, crawlservpp::Module::Crawler::Config::Entries::customCountersAliasAdd, crawlservpp::Module::Crawler::Config::Entries::customCountersEnd, crawlservpp::Module::Crawler::Config::Entries::customCountersGlobal, crawlservpp::Module::Crawler::Config::Entries::customCountersStart, crawlservpp::Module::Crawler::Config::Entries::customCountersStep, crawlservpp::Module::Crawler::Config::Entries::customReCrawl, crawlservpp::Module::Crawler::Config::Entries::customRobots, crawlservpp::Module::Crawler::Config::Entries::customTokenHeaders, crawlservpp::Module::Crawler::Config::Entries::customTokens, crawlservpp::Module::Crawler::Config::Entries::customTokensCookies, crawlservpp::Module::Crawler::Config::Entries::customTokensKeep, crawlservpp::Module::Crawler::Config::Entries::customTokensQuery, crawlservpp::Module::Crawler::Config::Entries::customTokensRequired, crawlservpp::Module::Crawler::Config::Entries::customTokensSource, crawlservpp::Module::Crawler::Config::Entries::customTokensUsePost, crawlservpp::Module::Crawler::Config::Entries::customUrls, crawlservpp::Module::Crawler::Config::Entries::customUsePost, crawlservpp::Module::Crawler::Config::Entries::expectedErrorIfLarger, crawlservpp::Module::Crawler::Config::Entries::expectedErrorIfSmaller, crawlservpp::Module::Crawler::Config::Entries::expectedQuery, crawlservpp::Module::Config::option(), crawlservpp::Module::Crawler::Config::Entries::redirectCookies, crawlservpp::Module::Crawler::Config::Entries::redirectHeaders, crawlservpp::Module::Crawler::Config::Entries::redirectQueryContent, crawlservpp::Module::Crawler::Config::Entries::redirectQueryUrl, crawlservpp::Module::Crawler::Config::Entries::redirectTo, crawlservpp::Module::Crawler::Config::Entries::redirectUsePost, crawlservpp::Module::Crawler::Config::Entries::redirectVarNames, crawlservpp::Module::Crawler::Config::Entries::redirectVarQueries, and crawlservpp::Module::Crawler::Config::Entries::redirectVarSources.
|
inlineoverrideprotectedvirtual |
Resets the crawler-specific configuration options.
Implements crawlservpp::Network::Config.
References config.
|
inlineoverridevirtualinherited |
Resets basic network configuration options.
Reimplemented from crawlservpp::Module::Config.
References crawlservpp::Network::Config::networkConfig, and crawlservpp::Network::Config::reset().
Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inline |
Sets whether the corresponding website is cross-domain.
| isCrossDomain | Set whether the website crawled by this crawler is cross-domain. |
Referenced by crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotectedinherited |
Adds a warning to the logging queue.
| warning | Constant reference to a string containing the warning. |
| Module::Config::Exception | if no log queue is active. |
Referenced by crawlservpp::Module::Analyzer::Algo::Assoc::checkAlgoOptions(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::checkAlgoOptions(), crawlservpp::Module::Analyzer::Algo::SentimentOverTime::checkAlgoOptions(), crawlservpp::Module::Parser::Config::checkOptions(), crawlservpp::Module::Analyzer::Config::checkOptions(), checkOptions(), crawlservpp::Module::Extractor::Config::checkOptions(), and crawlservpp::Network::Config::parseBasicOption().
| struct crawlservpp::Module::Crawler::Config::Entries crawlservpp::Module::Crawler::Config::config |
Configuration of the crawler.
Referenced by checkOptions(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onTick(), parseOption(), and reset().
|
inherited |
Configuration for networking.
Referenced by crawlservpp::Network::Config::getProtocol(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Network::Config::parseBasicOption(), crawlservpp::Network::Config::resetBase(), crawlservpp::Network::Curl::setConfigCurrent(), and crawlservpp::Network::Curl::setConfigGlobal().