crawlserv++  [under development]
Application for crawling and analyzing textual content of websites.
crawlservpp::Network::Curl Class Reference

Provides an interface to the libcurl library for sending and receiving data over the network. More...

#include <Curl.hpp>

Classes

class  Exception
 Class for libcurl exceptions. More...
 

Construction and Destruction

 Curl (std::string_view cookieDirectory, const NetworkSettings &setNetworkSettings)
 Constructor setting the cookie directory and the network options. More...
 
virtual ~Curl ()=default
 Default destructor. More...
 

Setters

void setConfigGlobal (const Config &globalConfig, bool limited, std::queue< std::string > &warningsTo)
 Sets the network options for the connection according to the given configuration. More...
 
void setConfigCurrent (const Config &currentConfig)
 Sets temporary network options for the connection according to the given configuration. More...
 
void setCookies (const std::string &cookies)
 Sets custom cookies. More...
 
void setHeaders (const std::vector< std::string > &customHeaders)
 Sets custom HTTP headers. More...
 
void setVerbose (bool isVerbose)
 Forces libcurl into or out of verbose mode. More...
 
void unsetCookies ()
 Unsets custom cookies previously set. More...
 
void unsetHeaders ()
 Unsets custom HTTP headers previously set. More...
 

Getters

void getContent (std::string_view url, bool usePost, std::string &contentTo, const std::vector< std::uint32_t > &errors)
 Uses the connection to get content by sending a HTTP request to the specified URL. More...
 
std::uint32_t getResponseCode () const noexcept
 Gets the response code of the HTTP reply received last. More...
 
std::string getContentType () const noexcept
 Gets the content type of the HTTP reply received last. More...
 
CURLcode getCurlCode () const noexcept
 Gets the libcurl return code received from the last API call. More...
 
std::string getPublicIp ()
 Uses the connection to determine its public IP address. More...
 

Reset

void resetConnection (std::uint64_t sleepForMilliseconds, const IsRunningCallback &isRunningCallback)
 Resets the connection. More...
 

URL Encoding

std::string escape (const std::string &stringToEscape, bool usePlusForSpace)
 URL encodes the given string. More...
 
std::string unescape (const std::string &escapedString, bool usePlusForSpace)
 URL decodes the given string. More...
 
std::string escapeUrl (std::string_view urlToEscape)
 URL encodes the given string while leaving reserved characters (; / ? : @ = & #) intact. More...
 

Copy and Move

The class is not copyable and not moveable.

 Curl (Curl &)=delete
 Deleted copy constructor. More...
 
Curloperator= (Curl &)=delete
 Deleted copy assignment operator. More...
 
 Curl (Curl &&)=delete
 Deleted move constructor. More...
 
Curloperator= (Curl &&)=delete
 Deleted move assignment operator. More...
 

Helper

static std::string curlStringToString (char *curlString)
 Copies the given libcurl string into a std::string and releases its memory. More...
 

Header Handling

static int header (char *data, std::size_t size, std::size_t nitems, void *thisPtr)
 Static header function to handle incoming header data. More...
 
int headerInClass (char *data, std::size_t size)
 In-class header function to handle incoming header data. More...
 

Writers

static int writer (char *data, std::size_t size, std::size_t nmemb, void *thisPtr)
 Static writer function to handle incoming network data. More...
 
int writerInClass (char *data, std::size_t size)
 In-class writer function to handle incoming network data. More...
 

Detailed Description

Provides an interface to the libcurl library for sending and receiving data over the network.

This class is used by both the crawler and the extractor.

It is not thread-safe, which means you need to use multiple instances for multiple threads.

Internally, the class uses Wrapper::Curl to interface with the libcurl library.

For more information about the libcurl library, see its website.

Constructor & Destructor Documentation

◆ Curl() [1/3]

crawlservpp::Network::Curl::Curl ( std::string_view  cookieDirectory,
const NetworkSettings setNetworkSettings 
)
inline

Constructor setting the cookie directory and the network options.

Initializes libcurl and sets some basic global default options like the write function, which is used to handle incoming network traffic (and is provided by the class).

Parameters
cookieDirectoryThe path to the directory where cookies will be saved in.
setNetworkSettingsThe network options for the connection represented by this instance.
Exceptions
Curl::Exceptionif the API could not be initalized, the used libcurl library does not support SSL, or the initial options not be set.
See also
writer, writerInClass, NetworkSettings

References CURL_VERSION_SSL, crawlservpp::Wrapper::Curl::get(), header(), crawlservpp::Wrapper::Curl::valid(), and writer().

◆ ~Curl()

virtual crawlservpp::Network::Curl::~Curl ( )
virtualdefault

Default destructor.

◆ Curl() [2/3]

crawlservpp::Network::Curl::Curl ( Curl )
delete

Deleted copy constructor.

◆ Curl() [3/3]

crawlservpp::Network::Curl::Curl ( Curl &&  )
delete

Deleted move constructor.

Member Function Documentation

◆ curlStringToString()

std::string crawlservpp::Network::Curl::curlStringToString ( char *  curlString)
inlinestaticprotected

Copies the given libcurl string into a std::string and releases its memory.

Afterwards curlString will be invalid and its memory freed.

If curlString is a nullptr it will be ignored.

Warning
It is imperative that curlString is either a nullptr or a valid libcurl string. Otherwise the program may crash and the memory be corrupted.
In this case, there is no exception handling preventing crashes or memory leaks provided by the API.
Parameters
curlStringA pointer to a valid curlString or nullptr.
Returns
A copy of the string originally saved in curlString or an empty string if curlString is a nullptr.

Referenced by escape(), escapeUrl(), and unescape().

◆ escape()

std::string crawlservpp::Network::Curl::escape ( const std::string &  stringToEscape,
bool  usePlusForSpace 
)
inline

URL encodes the given string.

Note
A string view cannot be used, because the underlying API requires a null-terminated string.
Warning
The libcurl library needs to be successfully initialized for URL encoding, except for an empty string.
Parameters
stringToEscapeConst reference to the string to be encoded.
usePlusForSpaceStates whether to convert spaces to + instead of %20.
Returns
A copy of the encoded string.
Exceptions
Curl::Exceptionif the libcurl library has not been initialized.
See also
curl_escape

References curlStringToString(), crawlservpp::Network::encodedSpace, crawlservpp::Network::encodedSpaceLength, crawlservpp::Wrapper::Curl::get(), and crawlservpp::Wrapper::Curl::valid().

◆ escapeUrl()

std::string crawlservpp::Network::Curl::escapeUrl ( std::string_view  urlToEscape)
inline

URL encodes the given string while leaving reserved characters (; / ? : @ = & #) intact.

The function will copy those parts of the string that need to be escaped and use the libcurl library to escape them.

Leaves the characters ; / ? : @ = & # unchanged in the resulting string.

Warning
The libcurl library needs to be successfully initialized for URL encoding, except for an empty string (or a string containing only reserved characters).
Parameters
urlToEscapeA view to the string containing the URL to be encoded.
Returns
A copy of the encoded string.
Exceptions
Curl::Exceptionif the libcurl library has not been initialized.
See also
curl_escape

References curlStringToString(), crawlservpp::Wrapper::Curl::get(), crawlservpp::Network::reservedCharacters, and crawlservpp::Wrapper::Curl::valid().

Referenced by writerInClass().

◆ getContent()

void crawlservpp::Network::Curl::getContent ( std::string_view  url,
bool  usePost,
std::string &  contentTo,
const std::vector< std::uint32_t > &  errors 
)
inline

Uses the connection to get content by sending a HTTP request to the specified URL.

When using HTTP POST, the data to be sent will be determined the same way as for a HTTP GET request – from behind the first question mark (?) in the given URL.

If no question mark is present, no additional data will be sent along the HTTP POST request.

Before sending the request, the given URL will be encoded while keeping possible reserved characters intact.

Response code and content type of the reply will be saved to be requested by getResponseCode() and getContentType().

After a successful request, replies encoded in ISO-8859-1 will be converted to UTF-8 and invalid UTF-8 sequences will be removed.

Parameters
urlConst reference to the string containing the URL to request.
usePostStates whether to use HTTP POST instead of HTTP GET on this request.
contentToReference to a string in which the received content will be stored.
errorsVector of HTTP error codes which will be handled by throwing an exception, except if the error code is also present in the X-ts header returned by the host.
Exceptions
Curl::Exceptionif setting the necessary options failed, the HTTP request could not be sent, information about the reply could not be retrieved or any of the specified HTTP error codes has been received.

References crawlservpp::Wrapper::Curl::get().

Referenced by getPublicIp(), crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getContentType()

std::string crawlservpp::Network::Curl::getContentType ( ) const
inlinenoexcept

Gets the content type of the HTTP reply received last.

Returns
A copy to the string containing the content type received during the last call to getContent.

Referenced by crawlservpp::Module::Crawler::Thread::onReset().

◆ getCurlCode()

CURLcode crawlservpp::Network::Curl::getCurlCode ( ) const
inlinenoexcept

Gets the libcurl return code received from the last API call.

Use this function to determine which error occured after another call to this class failed.

Returns
The received libcurl return code.
See also
libcurl error codes

Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getPublicIp()

std::string crawlservpp::Network::Curl::getPublicIp ( )
inline

Uses the connection to determine its public IP address.

Requests the public IP address of the connection from an external URL defined inside this function.

Returns
The public IP address of the connection as string – or a string beginning with "N/A" if the IP address could not be determined. An error description might follow.

References getContent(), crawlservpp::Network::getPublicIpErrors, crawlservpp::Network::getPublicIpFrom, and crawlservpp::Main::Exception::view().

Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ getResponseCode()

std::uint32_t crawlservpp::Network::Curl::getResponseCode ( ) const
inlinenoexcept

Gets the response code of the HTTP reply received last.

Returns
The HTTP code received during the last call to getContent.

Referenced by crawlservpp::Module::Crawler::Thread::onReset().

◆ header()

int crawlservpp::Network::Curl::header ( char *  data,
std::size_t  size,
std::size_t  nitems,
void *  thisPtr 
)
inlinestaticprotected

Static header function to handle incoming header data.

If thisPtr is not nullptr, the function will forward the incoming header data without change to the headerInClass function.

Parameters
dataPointer to the incoming header data.
sizeAlways 1.
nitemsThe size of the incoming header data.
thisPtrPointer to the instance of the Curl class.
Returns
The number of bytes processed by the in-class function.

References headerInClass().

Referenced by Curl(), and resetConnection().

◆ headerInClass()

int crawlservpp::Network::Curl::headerInClass ( char *  data,
std::size_t  size 
)
inlineprotected

In-class header function to handle incoming header data.

The function will check for a X-ts header and save its value.

Parameters
dataPointer to the incoming data.
sizeThe size of the incoming header data.
Returns
The number of bytes processed, which will be identical to size.

References crawlservpp::Network::xTsHeaderName, and crawlservpp::Network::xTsHeaderNameLen.

Referenced by header().

◆ operator=() [1/2]

Curl& crawlservpp::Network::Curl::operator= ( Curl )
delete

Deleted copy assignment operator.

◆ operator=() [2/2]

Curl& crawlservpp::Network::Curl::operator= ( Curl &&  )
delete

Deleted move assignment operator.

◆ resetConnection()

void crawlservpp::Network::Curl::resetConnection ( std::uint64_t  sleepForMilliseconds,
const IsRunningCallback &  isRunningCallback 
)
inline

Resets the connection.

After cleaning up the connection, the function will wait for the specified sleep time, but regularly check the status of the application to not considerably delay its shutdown.

It then resets the configuration passed to setConfigGlobal().

Warning
The configuration passed to setConfigCurrent() will be discarded.
Note
Warnings when re-setting the configuration will be discarded, because they will have already been reported when the configuration was originally set.
Parameters
sleepForMillisecondsTime to wait in milliseconds before re-establishing the connection.
isRunningCallbackConstant reference to a callback function (or lambda) which returns whether the application is still running.
Exceptions
Curl::Exceptionif any connection option could not be (re-)set.

References crawlservpp::Network::checkEveryMilliseconds, crawlservpp::Wrapper::Curl::clear(), crawlservpp::Wrapper::CurlList::clear(), crawlservpp::Wrapper::Curl::get(), header(), crawlservpp::Wrapper::Curl::init(), crawlservpp::Helper::DateTime::now(), setConfigGlobal(), and writer().

Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setConfigCurrent()

void crawlservpp::Network::Curl::setConfigCurrent ( const Config currentConfig)
inline

Sets temporary network options for the connection according to the given configuration.

Only uses Config::cookiesOverwrite from the given configuration to add or manipulate cookies already set.

Parameters
currentConfigThe network configuration to be used.
Exceptions
Curl::Exception
See also
CURLOPT_COOKIELIST

References crawlservpp::Network::Config::Entries::cookiesOverwrite, and crawlservpp::Network::Config::networkConfig.

Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setConfigGlobal()

void crawlservpp::Network::Curl::setConfigGlobal ( const Config globalConfig,
bool  limited,
std::queue< std::string > &  warningsTo 
)
inline

Sets the network options for the connection according to the given configuration.

Warnings might include options set, but not supported by the available version of the libcurl library.

Note
If limited is true, cookie settings, custom headers, HTTP version and error responses of the network configuration will be ignored.
Use this for secondary connections not related to the original website, e.g. to web archives.
Parameters
globalConfiga Network configuration.
limitedIndicates whether the settings will have only limited effect (see below).
warningsToReference to a queue of strings that will be filled with warnings if they occur.
Exceptions
Curl::Exceptionif any of the options could not be set.
See also
setConfigCurrent, Config

References crawlservpp::Wrapper::CurlList::append(), crawlservpp::Network::authTypeTlsSrp, crawlservpp::Network::Config::Entries::connectionsMax, crawlservpp::Helper::FileSystem::contains(), crawlservpp::Network::Config::Entries::contentLengthIgnore, crawlservpp::Network::Config::Entries::cookies, crawlservpp::Network::Config::Entries::cookiesLoad, crawlservpp::Network::Config::Entries::cookiesSave, crawlservpp::Network::Config::Entries::cookiesSession, crawlservpp::Network::Config::Entries::cookiesSet, CURL_HET_DEFAULT, CURL_HTTP_VERSION_2_0, CURL_HTTP_VERSION_2_PRIOR_KNOWLEDGE, CURL_HTTP_VERSION_2TLS, CURL_HTTP_VERSION_3, CURL_VERSION_BROTLI, CURL_VERSION_HTTP2, CURL_VERSION_HTTP3, CURL_VERSION_LIBZ, CURL_VERSION_TLSAUTH_SRP, CURL_VERSION_ZSTD, CURLOPT_DNS_SHUFFLE_ADDRESSES, CURLOPT_DOH_URL, CURLOPT_HAPPY_EYEBALLS_TIMEOUT_MS, CURLOPT_PRE_PROXY, CURLOPT_PROXY_SSL_VERIFYHOST, CURLOPT_PROXY_SSL_VERIFYPEER, CURLOPT_PROXY_TLSAUTH_PASSWORD, CURLOPT_PROXY_TLSAUTH_TYPE, CURLOPT_PROXY_TLSAUTH_USERNAME, CURLOPT_TCP_FASTOPEN, crawlservpp::Struct::NetworkSettings::defaultProxy, crawlservpp::Network::Config::Entries::dnsCacheTimeOut, crawlservpp::Network::Config::Entries::dnsDoH, crawlservpp::Network::Config::Entries::dnsInterface, crawlservpp::Network::Config::Entries::dnsResolves, crawlservpp::Network::Config::Entries::dnsServers, crawlservpp::Network::Config::Entries::dnsShuffle, crawlservpp::Network::Config::Entries::encodingBr, crawlservpp::Network::Config::Entries::encodingDeflate, crawlservpp::Network::Config::Entries::encodingGZip, crawlservpp::Network::Config::Entries::encodingIdentity, crawlservpp::Network::Config::Entries::encodingTransfer, crawlservpp::Network::Config::Entries::encodingZstd, crawlservpp::Helper::FileSystem::getPathSeparator(), crawlservpp::Network::Config::Entries::headers, crawlservpp::Network::Config::Entries::http200Aliases, crawlservpp::Network::Config::Entries::httpVersion, crawlservpp::Network::httpVersion1, crawlservpp::Network::httpVersion11, crawlservpp::Network::httpVersion2, crawlservpp::Network::httpVersion2Only, crawlservpp::Network::httpVersion2Tls, crawlservpp::Network::httpVersion3Only, crawlservpp::Network::httpVersionAny, crawlservpp::Network::Config::Entries::localInterface, crawlservpp::Network::Config::Entries::localPort, crawlservpp::Network::Config::Entries::localPortRange, crawlservpp::Network::Config::networkConfig, crawlservpp::Network::Config::Entries::noReUse, crawlservpp::Network::Config::Entries::protocol, crawlservpp::Network::Config::Entries::proxy, crawlservpp::Network::Config::Entries::proxyAuth, crawlservpp::Network::Config::Entries::proxyHeaders, crawlservpp::Network::Config::Entries::proxyPre, crawlservpp::Network::Config::Entries::proxyTlsSrpPassword, crawlservpp::Network::Config::Entries::proxyTlsSrpUser, crawlservpp::Network::Config::Entries::proxyTunnelling, crawlservpp::Network::Config::Entries::redirect, crawlservpp::Network::Config::Entries::redirectMax, crawlservpp::Network::Config::Entries::redirectPost301, crawlservpp::Network::Config::Entries::redirectPost302, crawlservpp::Network::Config::Entries::redirectPost303, crawlservpp::Network::Config::Entries::referer, crawlservpp::Network::Config::Entries::refererAutomatic, setCookies(), crawlservpp::Network::Config::Entries::speedDownLimit, crawlservpp::Network::Config::Entries::speedLowLimit, crawlservpp::Network::Config::Entries::speedLowTime, crawlservpp::Network::Config::Entries::speedUpLimit, crawlservpp::Network::Config::Entries::sslVerifyHost, crawlservpp::Network::Config::Entries::sslVerifyPeer, crawlservpp::Network::Config::Entries::sslVerifyProxyHost, crawlservpp::Network::Config::Entries::sslVerifyProxyPeer, crawlservpp::Network::Config::Entries::sslVerifyStatus, crawlservpp::Network::Config::Entries::tcpFastOpen, crawlservpp::Network::Config::Entries::tcpKeepAlive, crawlservpp::Network::Config::Entries::tcpKeepAliveIdle, crawlservpp::Network::Config::Entries::tcpKeepAliveInterval, crawlservpp::Network::Config::Entries::tcpNagle, crawlservpp::Network::Config::Entries::timeOut, crawlservpp::Network::Config::Entries::timeOutHappyEyeballs, crawlservpp::Network::Config::Entries::timeOutRequest, crawlservpp::Network::Config::Entries::tlsSrpPassword, crawlservpp::Network::Config::Entries::tlsSrpUser, crawlservpp::Network::Config::Entries::userAgent, crawlservpp::Wrapper::Curl::valid(), crawlservpp::Network::Config::Entries::verbose, crawlservpp::Network::versionBrotli, crawlservpp::Network::versionDnsShuffle, crawlservpp::Network::versionDoH, crawlservpp::Network::versionHappyEyeballs, crawlservpp::Network::versionHttp2, crawlservpp::Network::versionHttp2Only, crawlservpp::Network::versionHttp2Tls, crawlservpp::Network::versionHttp3Only, crawlservpp::Network::versionPreProxy, crawlservpp::Network::versionProxySslVerify, crawlservpp::Network::versionProxyTlsAuth, crawlservpp::Network::versionTcpFastOpen, and crawlservpp::Network::versionZstd.

Referenced by crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), and resetConnection().

◆ setCookies()

void crawlservpp::Network::Curl::setCookies ( const std::string &  cookies)
inline

Sets custom cookies.

These cookies will be sent along with all subsequent HTTP requests as long as the connection is not reset.

If a reference to an empty string is given, the function will unset cookies previously set through this function.

This function works independently from the internal libcurl cookie engine.

Warning
Custom cookies set by this function will be lost as soon as the connection is reset.
Note
A string view cannot be used, because the underlying API requires a null-terminated string.
Parameters
cookiesConst reference to a string containing the cookies to send in the same format as in the corresponding HTTP header, i.e. "name1=content1; name2=content2;" etc.
Exceptions
Curl::Exceptionif the cookies could not be set.
See also
unsetCookies, resetConnection, CURLOPT_COOKIE

Referenced by crawlservpp::Module::Extractor::Thread::onReset(), crawlservpp::Module::Crawler::Thread::onReset(), and setConfigGlobal().

◆ setHeaders()

void crawlservpp::Network::Curl::setHeaders ( const std::vector< std::string > &  customHeaders)
inline

Sets custom HTTP headers.

These headers will be sent along with all subsequent HTTP requests as long as the connection is not reset.

Warning
Custom headers set by this function will be lost as soon as the connection is reset.
Parameters
customHeadersA vector of strings providing the custom HTTP headers to be set.
Exceptions
Curl::Exceptionif the headers could not be set.
See also
unsetHeaders, resetConnection, CURLOPT_HTTPHEADER

References crawlservpp::Wrapper::CurlList::append(), and crawlservpp::Wrapper::CurlList::clear().

Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ setVerbose()

void crawlservpp::Network::Curl::setVerbose ( bool  isVerbose)
inline

Forces libcurl into or out of verbose mode.

In verbose mode, extensive connection information will be written to stdout.

Warning
This function will override any configuration. It should be used for debugging purposes only.
Parameters
isVerboseIf true, libcurl will be forced into verbose mode. If false, libcurl will be forced out of verbose mode.
Exceptions
Curl::Exceptionif the verbose mode could not be set.
See also
CURLOPT_VERBOSE

◆ unescape()

std::string crawlservpp::Network::Curl::unescape ( const std::string &  escapedString,
bool  usePlusForSpace 
)
inline

URL decodes the given string.

Note
A string view cannot be used, because the underlying API requires a null-terminated string.
Warning
The libcurl library needs to be successfully initialized for URL encoding, except for an empty string.
Parameters
escapedStringConst reference to the string to be decoded.
usePlusForSpaceStates whether plusses should be decoded to spaces.
Returns
A copy of the decoded string.
Exceptions
Curl::Exceptionif the libcurl library has not been initialized.
See also
curl_unescape

References curlStringToString(), crawlservpp::Wrapper::Curl::get(), and crawlservpp::Wrapper::Curl::valid().

◆ unsetCookies()

void crawlservpp::Network::Curl::unsetCookies ( )
inline

Unsets custom cookies previously set.

All cookies set by setCookies() will be discarded.

This function works independently from the internal libcurl cookie engine.

Exceptions
Curl::Exceptionif the cookies could not be unset.
See also
CURLOPT_COOKIE

Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ unsetHeaders()

void crawlservpp::Network::Curl::unsetHeaders ( )
inline

Unsets custom HTTP headers previously set.

All HTTP headers set by setHeaders() will be discarded.

Exceptions
Curl::Exceptionif the headers could not be unset.
See also
CURLOPT_HTTPHEADER

References crawlservpp::Wrapper::CurlList::clear().

Referenced by crawlservpp::Module::Extractor::Thread::onReset(), and crawlservpp::Module::Crawler::Thread::onReset().

◆ writer()

int crawlservpp::Network::Curl::writer ( char *  data,
std::size_t  size,
std::size_t  nmemb,
void *  thisPtr 
)
inlinestaticprotected

Static writer function to handle incoming network data.

If thisPtr is not nullptr, the function will forward the incoming data without change to the writerInClass function.

Parameters
dataPointer to the incoming data.
sizeAlways 1.
nmembThe size of the incoming data.
thisPtrPointer to the instance of the Curl class.
Returns
The number of bytes processed by the in-class function.

References writerInClass().

Referenced by Curl(), and resetConnection().

◆ writerInClass()

int crawlservpp::Network::Curl::writerInClass ( char *  data,
std::size_t  size 
)
inlineprotected

In-class writer function to handle incoming network data.

The function will append the data to the currently processed content.

Parameters
dataPointer to the incoming data.
sizeThe size of the incoming data.
Returns
The number of bytes processed, which will be identical to size.

References crawlservpp::Data::Compression::Gzip::decompress(), escapeUrl(), crawlservpp::Wrapper::Curl::get(), crawlservpp::Wrapper::CurlList::get(), crawlservpp::Network::gzipMagicNumber, crawlservpp::Helper::Utf8::iso88591ToUtf8(), and crawlservpp::Helper::Utf8::repairUtf8().

Referenced by writer().


The documentation for this class was generated from the following file: