|
crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
Query container. More...
#include <Container.hpp>

Classes | |
| class | Exception |
| Class for query container exceptions. More... | |
Construction and Destruction | |
| Container ()=default | |
| Default constructor. More... | |
| virtual | ~Container ()=default |
| Default destructor. More... | |
Copy and move | |
| Container (const Container &)=delete | |
| Deleted copy constructor. More... | |
| Container & | operator= (const Container &)=delete |
| Deleted copy assignment operator. More... | |
| Container (Container &&)=delete | |
| Deleted move constructor. More... | |
| Container & | operator= (Container &&)=delete |
| Deleted move assignment operator. More... | |
Public Getter | |
| bool | isQueryUsed (std::uint64_t queryId) const |
| Checks whether the specified query is used by the container. More... | |
Setters | |
| void | setRepairCData (bool isRepairCData) |
| Sets whether to try to repair CData when parsing XML. More... | |
| void | setRepairComments (bool isRepairComments) |
| Sets whether to try to repair broken HTML/XML comments. More... | |
| void | setRemoveXmlInstructions (bool isRemoveXmlInstructions) |
Sets whether to remove XML processing instructions (<?xml:...>) before parsing HTML/XML content. More... | |
| void | setMinimizeMemory (bool isMinimizeMemory) |
| Sets whether to minimize memory usage. More... | |
| void | setTidyErrorsAndWarnings (bool warnings, std::uint32_t numOfErrors) |
Sets how tidy-html5 reports errors and warnings. More... | |
| void | setQueryTarget (const std::string &content, const std::string &source) |
| Sets the content to use the managed queries on. More... | |
Getters | |
| std::size_t | getNumberOfSubSets () const |
| Gets the number of subsets currently acquired. More... | |
| bool | getTarget (std::string &targetTo) |
| Gets the current query target, if available, and writes it to the given string. More... | |
| bool | getXml (std::string &resultTo, std::queue< std::string > &warningsTo) |
| Parses the current query target as tidied XML and writes it to the given string. More... | |
Initialization and Cleanup | |
| virtual void | initQueries ()=0 |
| Pure virtual function initializing queries. More... | |
| virtual void | deleteQueries ()=0 |
Queries | |
| QueryStruct | addQuery (std::uint64_t id, const QueryProperties &properties) |
| Adds a query with the given query properties to the container. More... | |
| void | clearQueries () |
| Clears all queries currently managed by the container and frees the associated memory. More... | |
| void | clearQueryTarget () |
| Clears the current query target and frees the associated memory. More... | |
Subsets | |
| bool | nextSubSet () |
| Requests the next subset for all subsequent queries. More... | |
Results | |
| bool | getBoolFromRegEx (const QueryStruct &query, const std::string &target, bool &resultTo, std::queue< std::string > &warningsTo) const |
| Gets a boolean result from a RegEx query on a separate string. More... | |
| bool | getSingleFromRegEx (const QueryStruct &query, const std::string &target, std::string &resultTo, std::queue< std::string > &warningsTo) const |
| Gets a single result from a RegEx query on a separate string. More... | |
| bool | getMultiFromRegEx (const QueryStruct &query, const std::string &target, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo) const |
| Gets multiple results from a RegEx query on a separate string. More... | |
| bool | getBoolFromQuery (const QueryStruct &query, bool &resultTo, std::queue< std::string > &warningsTo) |
| Gets a boolean result from a query of any type on the current query target. More... | |
| bool | getBoolFromQueryOnSubSet (const QueryStruct &query, bool &resultTo, std::queue< std::string > &warningsTo) |
| Gets a boolean result from a query of any type on the current subset. More... | |
| bool | getSingleFromQuery (const QueryStruct &query, std::string &resultTo, std::queue< std::string > &warningsTo) |
| Gets a single result from a query of any type on the current query target. More... | |
| bool | getSingleFromQueryOnSubSet (const QueryStruct &query, std::string &resultTo, std::queue< std::string > &warningsTo) |
| Gets a single result from a query of any type on the current subset. More... | |
| bool | getMultiFromQuery (const QueryStruct &query, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo) |
| Gets multiple results from a query of any type on the current query target. More... | |
| bool | getMultiFromQueryOnSubSet (const QueryStruct &query, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo) |
| Gets multiple results from a query of any type on the current subset. More... | |
| bool | setSubSetsFromQuery (const QueryStruct &query, std::queue< std::string > &warningsTo) |
| Sets subsets for subsequent queries using a query of any type. More... | |
| bool | addSubSetsFromQueryOnSubSet (const QueryStruct &query, std::queue< std::string > &warningsTo) |
| Inserts more subsets after the current one based on a query on the current subset. More... | |
Memory | |
| void | reserveForSubSets (const QueryStruct &query, std::size_t n) |
| Reserves memory for a specific number of subsets. More... | |
Query container.
Abstract class to be inherited by module thread classes managing their queries.
Most member functions of the container are protected, as they will only be used from inside its child classes.
|
default |
Default constructor.
|
virtualdefault |
Default destructor.
|
delete |
Deleted copy constructor.
|
delete |
Deleted move constructor.
|
inlineprotected |
Adds a query with the given query properties to the container.
| id | The ID of the query. It will be saved in a thread-safe way and only be used by Container::isQueryUsed. |
| properties | Constant reference to the properties of the query to add to the container. |
| Container::Exception | if an error occured while creating a query with the given properties or the specified type of the query is unknown. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryProperties::resultBool, crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryProperties::resultMulti, crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryProperties::resultSingle, crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryProperties::resultSubSets, crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryProperties::text, crawlservpp::Struct::QueryProperties::textOnly, crawlservpp::Struct::QueryProperties::type, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Analyzer::Thread::addOptionalQuery(), crawlservpp::Module::Analyzer::Thread::addQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotected |
Inserts more subsets after the current one based on a query on the current subset.
This function is used for recursive extracting.
| query | A constant reference to a structure identifying the query that will be performed to acquire the subset. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
| Container::Exception | if no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotected |
Clears all queries currently managed by the container and frees the associated memory.
References crawlservpp::Helper::Memory::free().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onClear(), crawlservpp::Module::Extractor::Thread::onClear(), and crawlservpp::Module::Crawler::Thread::onClear().
|
inlineprotected |
Clears the current query target and frees the associated memory.
References crawlservpp::Parsing::XML::clear(), crawlservpp::Helper::Memory::free(), and crawlservpp::Helper::Json::free().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), and setQueryTarget().
|
protectedpure virtual |
This function needs to be implemented by the child classes of the container, so that children need to cleanup their queries on their own.
Implemented in crawlservpp::Module::Analyzer::Thread.
|
inlineprotected |
Gets a boolean result from a query of any type on the current query target.
| query | A constant reference to a structure identifying the query that will be performed. |
| resultTo | A reference to a boolean variable which will be set according to the result of the query. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
| Container::Exception | if no query target has been specified or the query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotected |
Gets a boolean result from a query of any type on the current subset.
| query | A constant reference to a structure identifying the query that will be performed. |
| resultTo | A reference to a boolean variable which will be set according to the result of the query. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
| Container::Exception | if no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotected |
Gets a boolean result from a RegEx query on a separate string.
| query | A constant reference to a structure identifying the RegEx query that will be performed. |
| target | A constant reference to a string containing the target on which the query will be performed. |
| resultTo | A reference to a boolean variable which will be set according to the result of the query. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().
|
inlineprotected |
Gets multiple results from a query of any type on the current query target.
| query | A constant reference to a structure identifying the query that will be performed. |
| resultTo | A reference to a vector to which the results of the query will be appended. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
| Container::Exception | if no query target has been specified or the query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotected |
Gets multiple results from a query of any type on the current subset.
| query | A constant reference to a structure identifying the query that will be performed. |
| resultTo | A reference to a vector to which the results of the query will be appended. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
| Container::Exception | if no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotected |
Gets multiple results from a RegEx query on a separate string.
| query | A constant reference to a structure identifying the RegEx query that will be performed. |
| target | A constant reference to a string containing the target on which the query will be performed. |
| resultTo | A reference to a vector to which the results of the query will be appended. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Parser::Thread::onReset().
|
inlineprotected |
Gets the number of subsets currently acquired.
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotected |
Gets a single result from a query of any type on the current query target.
| query | A constant reference to a structure identifying the query that will be performed. |
| resultTo | A reference to a string to which the result of the query will be written. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
| Container::Exception | if no query target has been specified or the query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotected |
Gets a single result from a query of any type on the current subset.
| query | A constant reference to a structure identifying the query that will be performed. |
| resultTo | A reference to a string to which the result of the query will be written. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
| Container::Exception | if no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotected |
Gets a single result from a RegEx query on a separate string.
| query | A constant reference to a structure identifying the RegEx query that will be performed. |
| target | A constant reference to a string containing the target on which the query will be performed. |
| resultTo | A reference to a string to which the result of the query will be written. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotected |
Gets the current query target, if available, and writes it to the given string.
| targetTo | Reference to a string the query target will be written to, if one is available. Its content will not be changed if no query target is available. |
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotected |
Parses the current query target as tidied XML and writes it to the given string.
| resultTo | Reference to a string the parsed query target will be written to. |
| warningsTo | Reference to a vector of strings to which warnings that occured during parsing will be appended. |
References crawlservpp::Parsing::XML::getContent().
Referenced by crawlservpp::Module::Crawler::Thread::onReset().
|
protectedpure virtual |
Pure virtual function initializing queries.
This function needs to be implemented by the child classes of the container, so that children need to initialize their queries on their own.
Implemented in crawlservpp::Module::Analyzer::Thread.
|
inline |
Checks whether the specified query is used by the container.
Thread-safe. This function can be used by any thread.
| queryId | ID of the query to be checked. |
|
inlineprotected |
Requests the next subset for all subsequent queries.
| Container::Exception | if an invalid subset had previously been selected. |
References crawlservpp::Helper::Memory::free(), crawlservpp::Helper::Json::free(), crawlservpp::Helper::Memory::freeIf(), crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, and crawlservpp::Struct::QueryStruct::typeXPath.
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
Deleted copy assignment operator.
Deleted move assignment operator.
|
inlineprotected |
Reserves memory for a specific number of subsets.
| query | A constant reference to a structure identifying the query for whose type memory will be specifically reserved. |
| n | The number of subsets for which memory will be reserved. |
References crawlservpp::Parsing::XML::clear(), crawlservpp::Helper::Memory::free(), crawlservpp::Helper::Json::free(), crawlservpp::Helper::Container::moveInto(), crawlservpp::Parsing::XML::parse(), crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Helper::Json::stringify(), crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotected |
Sets whether to minimize memory usage.
| isMinimizeMemory | Set whether to minimize memory usage, prioritizing memory usage over performance. |
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotected |
Sets the content to use the managed queries on.
The old query target referencing the old content will be cleared.
| content | Constant reference to a string containing the content to use the managed queries on. |
| source | Constant reference to a string containing the source (URL) of the content. It will be used for logging and error reporting purposes only. |
References clearQueryTarget().
Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotected |
Sets whether to remove XML processing instructions (<?xml:...>) before parsing HTML/XML content.
| isRemoveXmlInstructions | Sets whether to remove XML processing instructions. |
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotected |
Sets whether to try to repair CData when parsing XML.
| isRepairCData | Set whether to try to repair CData when parsing XML. |
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotected |
Sets whether to try to repair broken HTML/XML comments.
| isRepairComments | Set whether to try to repair broken HTML/XML comments. |
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().
|
inlineprotected |
Sets subsets for subsequent queries using a query of any type.
The subsets resulting from the query will be saved in-class. Previous subsets will be overwritten.
| query | A constant reference to a structure identifying the query that will be performed to acquire the subset. |
| warningsTo | A reference to a vector of strings to which all warnings will be appended that occur during the execution of the query. |
| Container::Exception | if no query target has been specified or the query is of an unknown type. |
References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().
Referenced by crawlservpp::Module::Extractor::Thread::onReset().
|
inlineprotected |
Sets how tidy-html5 reports errors and warnings.
The reporting of both errors and warnings is deactivated by default.
For more information about tidy-html5, see its GitHub repository.
| warnings | Specify whether to report simple warnings. |
| numOfErrors | Set the number of errors to be reported. Set to zero to deactivate error reporting. |
References crawlservpp::Parsing::XML::setOptions().
Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().