crawlserv++  [under development]
Application for crawling and analyzing textual content of websites.
crawlservpp::Helper::Strings Namespace Reference

Namespace for global string helper functions. More...

Constants

constexpr std::array utfWhitespaces
 UTF-8 whitespaces used by utfTidy(). More...
 
constexpr auto checkHexLength {3}
 Length of a two-digit hexademical number including the preceding percentage sign. More...
 
constexpr auto randCharSet
 Characters to be chosen from for random string generation performed by generateRandom(). More...
 

Replacing

void replaceAll (std::string &strInOut, std::string_view needle, std::string_view replacement)
 Replaces all occurences within a string with another string. More...
 

Conversion

bool stringToBool (std::string inputString)
 Converts a string into a boolean value. More...
 

Number Format Checking

bool isDec (std::string_view inputString)
 Checks whether a string contains only decimal digits and max. one dot (.). More...
 
bool isHex (std::string_view inputString)
 Checks whether a string contains only hexadecimal digits. More...
 

Trimming

void trim (std::string &stringToTrim)
 Removes whitespaces around a string. More...
 

Joining

std::string join (const std::vector< std::string > &strings, char delimiter, bool ignoreEmpty)
 Concatenates all elements of a vector into a single string. More...
 
std::string join (const std::vector< std::string > &strings, std::string_view delimiter, bool ignoreEmpty)
 Concatenates all elements of a vector into a single string. More...
 
std::string join (std::queue< std::string > &strings, char delimiter, bool ignoreEmpty)
 Concatenates all elements of a queue into a single string. More...
 
std::string join (std::queue< std::string > &strings, std::string_view delimiter, bool ignoreEmpty)
 Concatenates all elements of a queue into a single string. More...
 
void join (const std::vector< std::string > &strings, char delimiter, bool ignoreEmpty, std::string &appendTo)
 Concatenates all elements of a vector and appends them to a string. More...
 
void join (const std::vector< std::string > &strings, std::string_view delimiter, bool ignoreEmpty, std::string &appendTo)
 Concatenates all elements of a vector and appends them to a string. More...
 
void join (std::queue< std::string > &strings, char delimiter, bool ignoreEmpty, std::string &appendTo)
 Concatenates all elements of a queue into a single string. More...
 
void join (std::queue< std::string > &strings, std::string_view delimiter, bool ignoreEmpty, std::string &appendTo)
 Concatenates all elements of a queue into a single string. More...
 

Splitting

std::vector< std::string > split (const std::string &str, char delimiter)
 Splits a string into a vector of strings using the given delimiter. More...
 
std::vector< std::string > split (std::string_view str, std::string_view delimiter)
 Splits a string into a vector of strings using the given delimiter. More...
 
std::queue< std::string > splitToQueue (std::string_view str, char delimiter, bool removeEmpty)
 Splits a string into a queue of strings using the given delimiter. More...
 
std::queue< std::string > splitToQueue (std::string_view str, std::string_view delimiter, bool removeEmpty)
 Splits a string into a queue of strings using the given delimiter. More...
 

Sorting

void sortAndRemoveDuplicates (std::vector< std::string > &vectorOfStrings, bool caseSensitive)
 Sorts the given vector of strings and removes duplicates. More...
 

Escape Characters

char getFirstOrEscapeChar (std::string_view from)
 Gets the first character or an escaped character from the beginning of the given string. More...
 

Encoding

void encodePercentage (std::string &stringToEncode)
 Encodes percentage signs that are not followed by a two-digit hexadecimal number with %25. More...
 

Tidying

void utfTidy (std::string &stringToTidy)
 Removes new lines and unnecessary spaces, including UTF-8 whitespaces. More...
 

Name Checking

bool checkDomainName (std::string_view name)
 Checks whether the given string is a a valid domain name. More...
 
bool checkSQLName (std::string_view name)
 Checks whether the given string is a valid name for MySQL tables and fields. More...
 

Random String Generation

std::string generateRandom (std::size_t length)
 Generates a random alpha-numerical string of the given length. More...
 

Detailed Description

Namespace for global string helper functions.

Function Documentation

◆ checkDomainName()

bool crawlservpp::Helper::Strings::checkDomainName ( std::string_view  name)
inline

Checks whether the given string is a a valid domain name.

Note
Checks only for characters that would interfer with internal MySQL statements: /, \, and '.
Parameters
nameView of the string to be checked for a valid domain name.
Returns
True, if the string might be used as a domain name. False otherwise.

Referenced by crawlservpp::Main::Server::tick().

◆ checkSQLName()

bool crawlservpp::Helper::Strings::checkSQLName ( std::string_view  name)
inline

Checks whether the given string is a valid name for MySQL tables and fields.

Parameters
nameView of the string to be checked for a valid MySQL table or field name.
Returns
True if the string might be used as a MySQL table or field name. False otherwise.

Referenced by crawlservpp::Main::Database::clearTable(), crawlservpp::Main::Database::isTargetTable(), crawlservpp::Module::Config::option(), crawlservpp::Main::Database::readColumnAsStrings(), crawlservpp::Main::Database::readTableAsStrings(), and crawlservpp::Main::Server::tick().

◆ encodePercentage()

void crawlservpp::Helper::Strings::encodePercentage ( std::string &  stringToEncode)
inline

Encodes percentage signs that are not followed by a two-digit hexadecimal number with %25.

Parameters
stringToEncodeReference to the string in which the percentage signs will be encoded in-situ.

References checkHexLength, and isHex().

Referenced by crawlservpp::Parsing::URI::escapeUri().

◆ generateRandom()

std::string crawlservpp::Helper::Strings::generateRandom ( std::size_t  length)
inline

Generates a random alpha-numerical string of the given length.

Parameters
lengthLength of the string to be generated.
Returns
The generated alpha-numerical string of the given length.

References randCharSet.

Referenced by crawlservpp::Main::WebServer::getIP(), and crawlservpp::Main::Server::tick().

◆ getFirstOrEscapeChar()

char crawlservpp::Helper::Strings::getFirstOrEscapeChar ( std::string_view  from)
inline

Gets the first character or an escaped character from the beginning of the given string.

Escaped characters that are supported: \n, \t, and \\

Note
Only ASCII characters are supported. The returning character will be invalid, if the given string starts with a UTF-8 character.
Parameters
fromA view of the string to extract the character from.
Returns
the escaped character if the first two characters of the given string equal one of the supported escape sequences. The first character of the given string otherwise.

Referenced by crawlservpp::Module::Config::option().

◆ isDec()

bool crawlservpp::Helper::Strings::isDec ( std::string_view  inputString)
inline

Checks whether a string contains only decimal digits and max. one dot (.).

Parameters
inputStringA view into the string to check for decimal digits.
Returns
True, if the string only contains hexadecimal digits (0-9), max. one dot (.) and no whitespaces. False otherwise.

Referenced by crawlservpp::Data::ImportExport::OpenDocument::cell().

◆ isHex()

bool crawlservpp::Helper::Strings::isHex ( std::string_view  inputString)
inline

Checks whether a string contains only hexadecimal digits.

Case-insensitive.

Parameters
inputStringA view into the string to check for hexadecimal digits.
Returns
True, if the string only contains hexadecimal digits (0-F) and no whitespaces. False otherwise.

Referenced by encodePercentage().

◆ join() [1/8]

std::string crawlservpp::Helper::Strings::join ( const std::vector< std::string > &  strings,
char  delimiter,
bool  ignoreEmpty 
)
inline

Concatenates all elements of a vector into a single string.

Parameters
stringsConstant reference to a vector containing the strings to be concatenated.
delimiterA character to be inserted inbetween the concatenated strings.
ignoreEmptyIgnore empty strings when concatenating the given elements.
Returns
The string containing the concatenated elements, separated by the given delimiter, or an empty string if no elements have been concatenated.

Referenced by crawlservpp::Data::ImportExport::Text::exportList(), crawlservpp::Module::Parser::Thread::onReset(), and crawlservpp::Module::Extractor::Thread::onReset().

◆ join() [2/8]

std::string crawlservpp::Helper::Strings::join ( const std::vector< std::string > &  strings,
std::string_view  delimiter,
bool  ignoreEmpty 
)
inline

Concatenates all elements of a vector into a single string.

Parameters
stringsConstant reference to a vector containing the strings to be concatenated.
delimiterView of a string to be inserted inbetween the concatenated strings.
ignoreEmptyIgnore empty strings when concatenating the given elements.
Returns
The string containing the concatenated elements, separated by the given delimiter, or an empty string if no elements have been concatenated.

◆ join() [3/8]

std::string crawlservpp::Helper::Strings::join ( std::queue< std::string > &  strings,
char  delimiter,
bool  ignoreEmpty 
)
inline

Concatenates all elements of a queue into a single string.

Note
The queue will be completely emptied in the process, even if elements will be ignored.
Parameters
stringsConstant reference to a vector containing the strings to be concatenated.
delimiterA character to be inserted inbetween the concatenated strings.
ignoreEmptyIgnore empty strings when concatenating the given elements.
Returns
The string containing the concatenated elements, separated by the given delimiter, or an empty string if no elements have been concatenated.

◆ join() [4/8]

std::string crawlservpp::Helper::Strings::join ( std::queue< std::string > &  strings,
std::string_view  delimiter,
bool  ignoreEmpty 
)
inline

Concatenates all elements of a queue into a single string.

Note
The queue will be completely emptied in the process, even if elements will be ignored.
Parameters
stringsConstant reference to a vector containing the strings to be concatenated.
delimiterView of a string to be inserted inbetween the concatenated strings.
ignoreEmptyIgnore empty strings when concatenating the given elements.
Returns
The string containing the concatenated elements, separated by the given delimiter, or an empty string if no elements have been concatenated.

◆ join() [5/8]

void crawlservpp::Helper::Strings::join ( const std::vector< std::string > &  strings,
char  delimiter,
bool  ignoreEmpty,
std::string &  appendTo 
)
inline

Concatenates all elements of a vector and appends them to a string.

Parameters
stringsConstant reference to a vector containing the strings to be concatenated.
delimiterA character to be inserted inbetween the concatenated strings.
ignoreEmptyIgnore empty strings when concatenating the given elements.
appendToThe string that will be appended with the concatenated elements, separated by the given delimiter. It will remain unchanged, if no elements have been concatenated.

◆ join() [6/8]

void crawlservpp::Helper::Strings::join ( const std::vector< std::string > &  strings,
std::string_view  delimiter,
bool  ignoreEmpty,
std::string &  appendTo 
)
inline

Concatenates all elements of a vector and appends them to a string.

Parameters
stringsConstant reference to a vector containing the strings to be concatenated.
delimiterA view of the string to be inserted inbetween the concatenated strings.
ignoreEmptyIgnore empty strings when concatenating the given elements.
appendToThe string that will be appended with the concatenated elements, separated by the given delimiter. It will remain unchanged, if no elements have been concatenated.

◆ join() [7/8]

void crawlservpp::Helper::Strings::join ( std::queue< std::string > &  strings,
char  delimiter,
bool  ignoreEmpty,
std::string &  appendTo 
)
inline

Concatenates all elements of a queue into a single string.

Note
The queue will be completely emptied in the process, even if elements will be ignored.
Parameters
stringsConstant reference to a vector containing the strings to be concatenated.
delimiterA character to be inserted inbetween the concatenated strings.
ignoreEmptyIgnore empty strings when concatenating the given elements.
appendToThe string that will be appended with the concatenated elements, separated by the given delimiter. It will remain unchanged, if no elements have been concatenated.

◆ join() [8/8]

void crawlservpp::Helper::Strings::join ( std::queue< std::string > &  strings,
std::string_view  delimiter,
bool  ignoreEmpty,
std::string &  appendTo 
)
inline

Concatenates all elements of a queue into a single string.

Note
The queue will be completely emptied in the process, even if elements will be ignored.
Parameters
stringsConstant reference to a vector containing the strings to be concatenated.
delimiterA view of the string to be inserted inbetween the concatenated strings.
ignoreEmptyIgnore empty strings when concatenating the given elements.
appendToThe string that will be appended with the concatenated elements, separated by the given delimiter. It will remain unchanged, if no elements have been concatenated.

◆ replaceAll()

void crawlservpp::Helper::Strings::replaceAll ( std::string &  strInOut,
std::string_view  needle,
std::string_view  replacement 
)
inline

◆ sortAndRemoveDuplicates()

void crawlservpp::Helper::Strings::sortAndRemoveDuplicates ( std::vector< std::string > &  vectorOfStrings,
bool  caseSensitive 
)
inline

Sorts the given vector of strings and removes duplicates.

Note
Only ASCII characters are supported when sorting the string. Non-ASCII characters may result in a wrong sorting order.
Parameters
vectorOfStringsReference to the vector of strings, which will be sorted and from which duplicates will be removed in-situ.
caseSensitiveTrue, if the removal should be performed case-sensitive. False otherwise.

Referenced by crawlservpp::Module::Crawler::Thread::onReset().

◆ split() [1/2]

std::vector< std::string > crawlservpp::Helper::Strings::split ( const std::string &  str,
char  delimiter 
)
inline

Splits a string into a vector of strings using the given delimiter.

Parameters
strA const reference to the string to be split up.
delimiterThe character around which the resulting elements will be splitted.
Returns
A new vector containing the splitted elements.

References split().

Referenced by crawlservpp::Data::Lemmatizer::clear(), crawlservpp::Main::Database::connect(), and crawlservpp::Helper::Portability::enumLocales().

◆ split() [2/2]

std::vector< std::string > crawlservpp::Helper::Strings::split ( std::string_view  str,
std::string_view  delimiter 
)
inline

Splits a string into a vector of strings using the given delimiter.

Parameters
strA const reference to the string to be split up.
delimiterA view of the string around which the resulting elements will be splitted.
Returns
A new vector containing the splitted elements.

Referenced by split(), and splitToQueue().

◆ splitToQueue() [1/2]

std::queue< std::string > crawlservpp::Helper::Strings::splitToQueue ( std::string_view  str,
char  delimiter,
bool  removeEmpty 
)
inline

Splits a string into a queue of strings using the given delimiter.

Parameters
strA const reference to the string to be split up.
delimiterThe character around which the resulting elements will be splitted.
removeEmptySet whether to ignore empty strings and not add them to the resulting queue.
Returns
A new queue containing the splitted elements.

References split().

Referenced by crawlservpp::Wrapper::TidyDoc::cleanAndRepair(), crawlservpp::Wrapper::TidyDoc::getOutput(), crawlservpp::Data::ImportExport::Text::importList(), and crawlservpp::Wrapper::TidyDoc::parse().

◆ splitToQueue() [2/2]

std::queue< std::string > crawlservpp::Helper::Strings::splitToQueue ( std::string_view  str,
std::string_view  delimiter,
bool  removeEmpty 
)
inline

Splits a string into a queue of strings using the given delimiter.

Parameters
strA const reference to the string to be split up.
delimiterA view of the string around which the resulting elements will be splitted.
removeEmptySet whether to ignore empty strings and not add them to the resulting queue.
Returns
A new queue containing the splitted elements.

◆ stringToBool()

bool crawlservpp::Helper::Strings::stringToBool ( std::string  inputString)
inline

Converts a string into a boolean value.

Only case-insensitive variations of "true" will be converted into true.

Note
In order for the conversion to be case-insensitive, a copy of the given string will be made.
Parameters
inputStringThe string to be converted into a boolean value.
Returns
True, if the given string represents true. False otherwise.

◆ trim()

◆ utfTidy()

void crawlservpp::Helper::Strings::utfTidy ( std::string &  stringToTidy)
inline

Removes new lines and unnecessary spaces, including UTF-8 whitespaces.

Parameters
stringToTidyReference to the string from which new lines and unnecessary spaces will be removed in-situ.

References replaceAll(), trim(), and utfWhitespaces.

Referenced by crawlservpp::Module::Parser::Thread::onReset(), and crawlservpp::Module::Extractor::Thread::onReset().

Variable Documentation

◆ checkHexLength

constexpr auto crawlservpp::Helper::Strings::checkHexLength {3}
inline

Length of a two-digit hexademical number including the preceding percentage sign.

Referenced by encodePercentage().

◆ randCharSet

constexpr auto crawlservpp::Helper::Strings::randCharSet
inline
Initial value:
{
"01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"sv
}

Characters to be chosen from for random string generation performed by generateRandom().

Referenced by generateRandom().

◆ utfWhitespaces

constexpr std::array crawlservpp::Helper::Strings::utfWhitespaces
inline
Initial value:
{
"\u0085"sv,
"\u00a0"sv,
"\u1680"sv,
"\u2000"sv,
"\u2001"sv,
"\u2002"sv,
"\u2003"sv,
"\u2004"sv,
"\u2005"sv,
"\u2006"sv,
"\u2007"sv,
"\u2008"sv,
"\u2009"sv,
"\u200a"sv,
"\u2028"sv,
"\u2029"sv,
"\u202f"sv,
"\u205f"sv,
"\u2060"sv,
"\u3000"sv,
}

UTF-8 whitespaces used by utfTidy().

Referenced by utfTidy().