crawlserv++  [under development]
Application for crawling and analyzing textual content of websites.
crawlservpp::Query::Container Class Referenceabstract

Query container. More...

#include <Container.hpp>

Inheritance diagram for crawlservpp::Query::Container:

Classes

class  Exception
 Class for query container exceptions. More...
 

Construction and Destruction

 Container ()=default
 Default constructor. More...
 
virtual ~Container ()=default
 Default destructor. More...
 

Copy and move

The class is neither copyable, nor moveable.

 Container (const Container &)=delete
 Deleted copy constructor. More...
 
Containeroperator= (const Container &)=delete
 Deleted copy assignment operator. More...
 
 Container (Container &&)=delete
 Deleted move constructor. More...
 
Containeroperator= (Container &&)=delete
 Deleted move assignment operator. More...
 

Public Getter

bool isQueryUsed (std::uint64_t queryId) const
 Checks whether the specified query is used by the container. More...
 

Setters

void setRepairCData (bool isRepairCData)
 Sets whether to try to repair CData when parsing XML. More...
 
void setRepairComments (bool isRepairComments)
 Sets whether to try to repair broken HTML/XML comments. More...
 
void setRemoveXmlInstructions (bool isRemoveXmlInstructions)
 Sets whether to remove XML processing instructions (<?xml:...>) before parsing HTML/XML content. More...
 
void setMinimizeMemory (bool isMinimizeMemory)
 Sets whether to minimize memory usage. More...
 
void setTidyErrorsAndWarnings (bool warnings, std::uint32_t numOfErrors)
 Sets how tidy-html5 reports errors and warnings. More...
 
void setQueryTarget (const std::string &content, const std::string &source)
 Sets the content to use the managed queries on. More...
 

Getters

std::size_t getNumberOfSubSets () const
 Gets the number of subsets currently acquired. More...
 
bool getTarget (std::string &targetTo)
 Gets the current query target, if available, and writes it to the given string. More...
 
bool getXml (std::string &resultTo, std::queue< std::string > &warningsTo)
 Parses the current query target as tidied XML and writes it to the given string. More...
 

Initialization and Cleanup

virtual void initQueries ()=0
 Pure virtual function initializing queries. More...
 
virtual void deleteQueries ()=0
 

Queries

QueryStruct addQuery (std::uint64_t id, const QueryProperties &properties)
 Adds a query with the given query properties to the container. More...
 
void clearQueries ()
 Clears all queries currently managed by the container and frees the associated memory. More...
 
void clearQueryTarget ()
 Clears the current query target and frees the associated memory. More...
 

Subsets

bool nextSubSet ()
 Requests the next subset for all subsequent queries. More...
 

Results

bool getBoolFromRegEx (const QueryStruct &query, const std::string &target, bool &resultTo, std::queue< std::string > &warningsTo) const
 Gets a boolean result from a RegEx query on a separate string. More...
 
bool getSingleFromRegEx (const QueryStruct &query, const std::string &target, std::string &resultTo, std::queue< std::string > &warningsTo) const
 Gets a single result from a RegEx query on a separate string. More...
 
bool getMultiFromRegEx (const QueryStruct &query, const std::string &target, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo) const
 Gets multiple results from a RegEx query on a separate string. More...
 
bool getBoolFromQuery (const QueryStruct &query, bool &resultTo, std::queue< std::string > &warningsTo)
 Gets a boolean result from a query of any type on the current query target. More...
 
bool getBoolFromQueryOnSubSet (const QueryStruct &query, bool &resultTo, std::queue< std::string > &warningsTo)
 Gets a boolean result from a query of any type on the current subset. More...
 
bool getSingleFromQuery (const QueryStruct &query, std::string &resultTo, std::queue< std::string > &warningsTo)
 Gets a single result from a query of any type on the current query target. More...
 
bool getSingleFromQueryOnSubSet (const QueryStruct &query, std::string &resultTo, std::queue< std::string > &warningsTo)
 Gets a single result from a query of any type on the current subset. More...
 
bool getMultiFromQuery (const QueryStruct &query, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo)
 Gets multiple results from a query of any type on the current query target. More...
 
bool getMultiFromQueryOnSubSet (const QueryStruct &query, std::vector< std::string > &resultTo, std::queue< std::string > &warningsTo)
 Gets multiple results from a query of any type on the current subset. More...
 
bool setSubSetsFromQuery (const QueryStruct &query, std::queue< std::string > &warningsTo)
 Sets subsets for subsequent queries using a query of any type. More...
 
bool addSubSetsFromQueryOnSubSet (const QueryStruct &query, std::queue< std::string > &warningsTo)
 Inserts more subsets after the current one based on a query on the current subset. More...
 

Memory

void reserveForSubSets (const QueryStruct &query, std::size_t n)
 Reserves memory for a specific number of subsets. More...
 

Detailed Description

Query container.

Abstract class to be inherited by module thread classes managing their queries.

Most member functions of the container are protected, as they will only be used from inside its child classes.

Constructor & Destructor Documentation

◆ Container() [1/3]

crawlservpp::Query::Container::Container ( )
default

Default constructor.

◆ ~Container()

virtual crawlservpp::Query::Container::~Container ( )
virtualdefault

Default destructor.

◆ Container() [2/3]

crawlservpp::Query::Container::Container ( const Container )
delete

Deleted copy constructor.

◆ Container() [3/3]

crawlservpp::Query::Container::Container ( Container &&  )
delete

Deleted move constructor.

Member Function Documentation

◆ addQuery()

Struct::QueryStruct crawlservpp::Query::Container::addQuery ( std::uint64_t  id,
const QueryProperties properties 
)
inlineprotected

Adds a query with the given query properties to the container.

Parameters
idThe ID of the query. It will be saved in a thread-safe way and only be used by Container::isQueryUsed.
propertiesConstant reference to the properties of the query to add to the container.
Returns
A structure to be used to identify the added query, including the index of the query inside the container.
Exceptions
Container::Exceptionif an error occured while creating a query with the given properties or the specified type of the query is unknown.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryProperties::resultBool, crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryProperties::resultMulti, crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryProperties::resultSingle, crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryProperties::resultSubSets, crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryProperties::text, crawlservpp::Struct::QueryProperties::textOnly, crawlservpp::Struct::QueryProperties::type, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Analyzer::Thread::addOptionalQuery(), crawlservpp::Module::Analyzer::Thread::addQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ addSubSetsFromQueryOnSubSet()

bool crawlservpp::Query::Container::addSubSetsFromQueryOnSubSet ( const QueryStruct query,
std::queue< std::string > &  warningsTo 
)
inlineprotected

Inserts more subsets after the current one based on a query on the current subset.

This function is used for recursive extracting.

Parameters
queryA constant reference to a structure identifying the query that will be performed to acquire the subset.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if new subsets have been added. False, if the execution of the query failed or did not see any results.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ clearQueries()

void crawlservpp::Query::Container::clearQueries ( )
inlineprotected

◆ clearQueryTarget()

void crawlservpp::Query::Container::clearQueryTarget ( )
inlineprotected

◆ deleteQueries()

virtual void crawlservpp::Query::Container::deleteQueries ( )
protectedpure virtual

This function needs to be implemented by the child classes of the container, so that children need to cleanup their queries on their own.

Implemented in crawlservpp::Module::Analyzer::Thread.

◆ getBoolFromQuery()

bool crawlservpp::Query::Container::getBoolFromQuery ( const QueryStruct query,
bool &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotected

Gets a boolean result from a query of any type on the current query target.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a boolean variable which will be set according to the result of the query.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getBoolFromQueryOnSubSet()

bool crawlservpp::Query::Container::getBoolFromQueryOnSubSet ( const QueryStruct query,
bool &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotected

Gets a boolean result from a query of any type on the current subset.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a boolean variable which will be set according to the result of the query.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getBoolFromRegEx()

bool crawlservpp::Query::Container::getBoolFromRegEx ( const QueryStruct query,
const std::string &  target,
bool &  resultTo,
std::queue< std::string > &  warningsTo 
) const
inlineprotected

Gets a boolean result from a RegEx query on a separate string.

Parameters
queryA constant reference to a structure identifying the RegEx query that will be performed.
targetA constant reference to a string containing the target on which the query will be performed.
resultToA reference to a boolean variable which will be set according to the result of the query.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the query is of a different type or its execution failed.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultBool, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), crawlservpp::Module::Analyzer::Algo::Assoc::resetAlgo(), crawlservpp::Module::Analyzer::Algo::AssocOverTime::resetAlgo(), and crawlservpp::Module::Analyzer::Algo::SentimentOverTime::resetAlgo().

◆ getMultiFromQuery()

bool crawlservpp::Query::Container::getMultiFromQuery ( const QueryStruct query,
std::vector< std::string > &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotected

Gets multiple results from a query of any type on the current query target.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a vector to which the results of the query will be appended.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getMultiFromQueryOnSubSet()

bool crawlservpp::Query::Container::getMultiFromQueryOnSubSet ( const QueryStruct query,
std::vector< std::string > &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotected

Gets multiple results from a query of any type on the current subset.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a vector to which the results of the query will be appended.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getMultiFromRegEx()

bool crawlservpp::Query::Container::getMultiFromRegEx ( const QueryStruct query,
const std::string &  target,
std::vector< std::string > &  resultTo,
std::queue< std::string > &  warningsTo 
) const
inlineprotected

Gets multiple results from a RegEx query on a separate string.

Parameters
queryA constant reference to a structure identifying the RegEx query that will be performed.
targetA constant reference to a string containing the target on which the query will be performed.
resultToA reference to a vector to which the results of the query will be appended.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the query is of a different type or its execution failed.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultMulti, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset().

◆ getNumberOfSubSets()

std::size_t crawlservpp::Query::Container::getNumberOfSubSets ( ) const
inlineprotected

Gets the number of subsets currently acquired.

Returns
The number of subsets generated by the last query that generated subsets as its result.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getSingleFromQuery()

bool crawlservpp::Query::Container::getSingleFromQuery ( const QueryStruct query,
std::string &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotected

Gets a single result from a query of any type on the current query target.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a string to which the result of the query will be written.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getSingleFromQueryOnSubSet()

bool crawlservpp::Query::Container::getSingleFromQueryOnSubSet ( const QueryStruct query,
std::string &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotected

Gets a single result from a query of any type on the current subset.

Parameters
queryA constant reference to a structure identifying the query that will be performed.
resultToA reference to a string to which the result of the query will be written.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target or no subset has been specified, the current subset is invalid, or the given query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getSingleFromRegEx()

bool crawlservpp::Query::Container::getSingleFromRegEx ( const QueryStruct query,
const std::string &  target,
std::string &  resultTo,
std::queue< std::string > &  warningsTo 
) const
inlineprotected

Gets a single result from a RegEx query on a separate string.

Parameters
queryA constant reference to a structure identifying the RegEx query that will be performed.
targetA constant reference to a string containing the target on which the query will be performed.
resultToA reference to a string to which the result of the query will be written.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the query is of a different type or its execution failed.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Struct::QueryStruct::resultSingle, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getTarget()

bool crawlservpp::Query::Container::getTarget ( std::string &  targetTo)
inlineprotected

Gets the current query target, if available, and writes it to the given string.

Parameters
targetToReference to a string the query target will be written to, if one is available. Its content will not be changed if no query target is available.
Returns
True, if a query target was available and has been written to the referenced string. Returns false, if no query target was available and the referenced string has not been changed.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ getXml()

bool crawlservpp::Query::Container::getXml ( std::string &  resultTo,
std::queue< std::string > &  warningsTo 
)
inlineprotected

Parses the current query target as tidied XML and writes it to the given string.

Parameters
resultToReference to a string the parsed query target will be written to.
warningsToReference to a vector of strings to which warnings that occured during parsing will be appended.
Returns
True, if the parsing was successful and the tidied XML was written to the given string. False, if the parsing was not successful and the given string has not been changed.

References crawlservpp::Parsing::XML::getContent().

Referenced by crawlservpp::Module::Crawler::Thread::onReset().

◆ initQueries()

virtual void crawlservpp::Query::Container::initQueries ( )
protectedpure virtual

Pure virtual function initializing queries.

This function needs to be implemented by the child classes of the container, so that children need to initialize their queries on their own.

Implemented in crawlservpp::Module::Analyzer::Thread.

◆ isQueryUsed()

bool crawlservpp::Query::Container::isQueryUsed ( std::uint64_t  queryId) const
inline

Checks whether the specified query is used by the container.

Thread-safe. This function can be used by any thread.

Parameters
queryIdID of the query to be checked.

◆ nextSubSet()

bool crawlservpp::Query::Container::nextSubSet ( )
inlineprotected

Requests the next subset for all subsequent queries.

Returns
True, if another subset existed that will be used by subsequent queries. False, if no more subsets exist.
Exceptions
Container::Exceptionif an invalid subset had previously been selected.

References crawlservpp::Helper::Memory::free(), crawlservpp::Helper::Json::free(), crawlservpp::Helper::Memory::freeIf(), crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, and crawlservpp::Struct::QueryStruct::typeXPath.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ operator=() [1/2]

Container& crawlservpp::Query::Container::operator= ( const Container )
delete

Deleted copy assignment operator.

◆ operator=() [2/2]

Container& crawlservpp::Query::Container::operator= ( Container &&  )
delete

Deleted move assignment operator.

◆ reserveForSubSets()

◆ setMinimizeMemory()

void crawlservpp::Query::Container::setMinimizeMemory ( bool  isMinimizeMemory)
inlineprotected

Sets whether to minimize memory usage.

Note
Setting memory minimization to true might negatively affect performance.
Parameters
isMinimizeMemorySet whether to minimize memory usage, prioritizing memory usage over performance.

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ setQueryTarget()

void crawlservpp::Query::Container::setQueryTarget ( const std::string &  content,
const std::string &  source 
)
inlineprotected

Sets the content to use the managed queries on.

The old query target referencing the old content will be cleared.

Warning
Pointers to the strings will be saved in-class. Make sure the strings remain valid as long as they are used!
Parameters
contentConstant reference to a string containing the content to use the managed queries on.
sourceConstant reference to a string containing the source (URL) of the content. It will be used for logging and error reporting purposes only.

References clearQueryTarget().

Referenced by crawlservpp::Module::Analyzer::Thread::cleanUpQueries(), crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setRemoveXmlInstructions()

void crawlservpp::Query::Container::setRemoveXmlInstructions ( bool  isRemoveXmlInstructions)
inlineprotected

Sets whether to remove XML processing instructions (<?xml:...>) before parsing HTML/XML content.

Parameters
isRemoveXmlInstructionsSets whether to remove XML processing instructions.

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setRepairCData()

void crawlservpp::Query::Container::setRepairCData ( bool  isRepairCData)
inlineprotected

Sets whether to try to repair CData when parsing XML.

Parameters
isRepairCDataSet whether to try to repair CData when parsing XML.

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setRepairComments()

void crawlservpp::Query::Container::setRepairComments ( bool  isRepairComments)
inlineprotected

Sets whether to try to repair broken HTML/XML comments.

Parameters
isRepairCommentsSet whether to try to repair broken HTML/XML comments.

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setSubSetsFromQuery()

bool crawlservpp::Query::Container::setSubSetsFromQuery ( const QueryStruct query,
std::queue< std::string > &  warningsTo 
)
inlineprotected

Sets subsets for subsequent queries using a query of any type.

The subsets resulting from the query will be saved in-class. Previous subsets will be overwritten.

Parameters
queryA constant reference to a structure identifying the query that will be performed to acquire the subset.
warningsToA reference to a vector of strings to which all warnings will be appended that occur during the execution of the query.
Returns
True, if the query was successful. False, if the execution of the query failed.
Exceptions
Container::Exceptionif no query target has been specified or the query is of an unknown type.

References crawlservpp::Struct::QueryStruct::index, crawlservpp::Helper::Json::parseCons(), crawlservpp::Helper::Json::parseRapid(), crawlservpp::Struct::QueryStruct::resultSubSets, crawlservpp::Struct::QueryStruct::type, crawlservpp::Struct::QueryStruct::typeJsonPath, crawlservpp::Struct::QueryStruct::typeJsonPointer, crawlservpp::Struct::QueryStruct::typeNone, crawlservpp::Struct::QueryStruct::typeRegEx, crawlservpp::Struct::QueryStruct::typeXPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPath, crawlservpp::Struct::QueryStruct::typeXPathJsonPointer, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset().

◆ setTidyErrorsAndWarnings()

void crawlservpp::Query::Container::setTidyErrorsAndWarnings ( bool  warnings,
std::uint32_t  numOfErrors 
)
inlineprotected

Sets how tidy-html5 reports errors and warnings.

The reporting of both errors and warnings is deactivated by default.

For more information about tidy-html5, see its GitHub repository.

Parameters
warningsSpecify whether to report simple warnings.
numOfErrorsSet the number of errors to be reported. Set to zero to deactivate error reporting.

References crawlservpp::Parsing::XML::setOptions().

Referenced by crawlservpp::Module::Parser::Thread::onReset(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().


The documentation for this class was generated from the following file: