crawlserv++  [under development]
Application for crawling and analyzing textual content of websites.
crawlservpp::Parsing Namespace Reference

Namespace for classes parsing HTML, URIs, and XML. More...

Classes

class  HTML
 Parses and cleans HTML markup. More...
 
class  URI
 Parser for RFC 3986 URIs that can also analyze their relationships with each other. More...
 
class  XML
 Parses HTML markup into clean XML. More...
 

Constants

constexpr std::string_view tidyEncoding {"utf8"}
 The character encoding used by the tidy-html5 API. More...
 
constexpr auto maxEscapedCharLength {6}
 Maximum length of a URL-escaped character. More...
 
constexpr auto xmlBegin {"<?xml "sv}
 The beginning of XML markup. More...
 
constexpr std::array xmlTags {"<?i>"sv}
 Array containing additional XML markup tags to be removed. More...
 
constexpr auto cDataBegin {"<![CDATA["sv}
 The beginning of a CDATA element. More...
 
constexpr auto cDataEnd {"]]>"sv}
 The end of a CDATA element. More...
 
constexpr auto conditionalBegin {"<![if "sv}
 The beginning of a conditional comment. More...
 
constexpr auto conditionalEnd {"<![endif]>"sv}
 The end of a conditional comment. More...
 
constexpr auto conditionalInsert {"--"sv}
 Characters to be inserted/replaced to make conditional comments valid. More...
 
constexpr auto conditionalInsertOffsetBegin {2}
 Offset at which to insert at the beginning to make conditional comments valid. More...
 
constexpr auto conditionalInsertOffsetEnd {9}
 Offset at which to insert at the end to make conditional comments valid. More...
 
constexpr auto conditionalInsertOffsetStrayEnd {2}
 Offset at which to insert into stray end tag left from conditional comment. More...
 
constexpr auto commentCharsToReplace {"--"sv}
 Characters to be replaced inside comments. More...
 
constexpr auto commentCharsReplaceBy {"=="sv}
 Characters used as replacement inside comments. More...
 
constexpr auto invalidBegin {"<? "sv}
 The beginning of an invalid comment. More...
 
constexpr auto invalidEnd {" ?>"sv}
 The end of an invalid comment. More...
 
constexpr auto invalidInsertBegin {"!--"sv}
 Characters to be inserted at the beginning to make invalid comments valid. More...
 
constexpr auto invalidInsertEnd {"--"sv}
 Characters to be inserted at the end to make invalid comments valid. More...
 
constexpr auto invalidInsertOffsetBegin {1}
 Offset at which to insert at the beginning to make invalid comments valid. More...
 
constexpr auto invalidInsertOffsetEnd {2}
 Offset at which to insert at the end to make invalid comments valid. More...
 
constexpr auto numDebugCharacters {50}
 The maximum number of characters to be shown in error messages. More...
 
constexpr auto xmlInstructionBegin {"<?xml:"sv}
 The beginning of a XML processing instruction. More...
 
constexpr auto xmlInstructionEnd {">"sv}
 The end of a XML processing instruction. More...
 

Detailed Description

Namespace for classes parsing HTML, URIs, and XML.

Variable Documentation

◆ cDataBegin

constexpr auto crawlservpp::Parsing::cDataBegin {"<![CDATA["sv}
inline

The beginning of a CDATA element.

Referenced by crawlservpp::Parsing::XML::clear().

◆ cDataEnd

constexpr auto crawlservpp::Parsing::cDataEnd {"]]>"sv}
inline

The end of a CDATA element.

Referenced by crawlservpp::Parsing::XML::clear().

◆ commentCharsReplaceBy

constexpr auto crawlservpp::Parsing::commentCharsReplaceBy {"=="sv}
inline

Characters used as replacement inside comments.

Referenced by crawlservpp::Parsing::XML::clear().

◆ commentCharsToReplace

constexpr auto crawlservpp::Parsing::commentCharsToReplace {"--"sv}
inline

Characters to be replaced inside comments.

Referenced by crawlservpp::Parsing::XML::clear().

◆ conditionalBegin

constexpr auto crawlservpp::Parsing::conditionalBegin {"<![if "sv}
inline

The beginning of a conditional comment.

Referenced by crawlservpp::Parsing::XML::clear().

◆ conditionalEnd

constexpr auto crawlservpp::Parsing::conditionalEnd {"<![endif]>"sv}
inline

The end of a conditional comment.

Referenced by crawlservpp::Parsing::XML::clear().

◆ conditionalInsert

constexpr auto crawlservpp::Parsing::conditionalInsert {"--"sv}
inline

Characters to be inserted/replaced to make conditional comments valid.

Referenced by crawlservpp::Parsing::XML::clear().

◆ conditionalInsertOffsetBegin

constexpr auto crawlservpp::Parsing::conditionalInsertOffsetBegin {2}
inline

Offset at which to insert at the beginning to make conditional comments valid.

Referenced by crawlservpp::Parsing::XML::clear().

◆ conditionalInsertOffsetEnd

constexpr auto crawlservpp::Parsing::conditionalInsertOffsetEnd {9}
inline

Offset at which to insert at the end to make conditional comments valid.

Referenced by crawlservpp::Parsing::XML::clear().

◆ conditionalInsertOffsetStrayEnd

constexpr auto crawlservpp::Parsing::conditionalInsertOffsetStrayEnd {2}
inline

Offset at which to insert into stray end tag left from conditional comment.

Referenced by crawlservpp::Parsing::XML::clear().

◆ invalidBegin

constexpr auto crawlservpp::Parsing::invalidBegin {"<? "sv}
inline

The beginning of an invalid comment.

Referenced by crawlservpp::Parsing::XML::clear().

◆ invalidEnd

constexpr auto crawlservpp::Parsing::invalidEnd {" ?>"sv}
inline

The end of an invalid comment.

Referenced by crawlservpp::Parsing::XML::clear().

◆ invalidInsertBegin

constexpr auto crawlservpp::Parsing::invalidInsertBegin {"!--"sv}
inline

Characters to be inserted at the beginning to make invalid comments valid.

Referenced by crawlservpp::Parsing::XML::clear().

◆ invalidInsertEnd

constexpr auto crawlservpp::Parsing::invalidInsertEnd {"--"sv}
inline

Characters to be inserted at the end to make invalid comments valid.

Referenced by crawlservpp::Parsing::XML::clear().

◆ invalidInsertOffsetBegin

constexpr auto crawlservpp::Parsing::invalidInsertOffsetBegin {1}
inline

Offset at which to insert at the beginning to make invalid comments valid.

Referenced by crawlservpp::Parsing::XML::clear().

◆ invalidInsertOffsetEnd

constexpr auto crawlservpp::Parsing::invalidInsertOffsetEnd {2}
inline

Offset at which to insert at the end to make invalid comments valid.

◆ maxEscapedCharLength

constexpr auto crawlservpp::Parsing::maxEscapedCharLength {6}
inline

Maximum length of a URL-escaped character.

Referenced by crawlservpp::Parsing::URI::escape().

◆ numDebugCharacters

constexpr auto crawlservpp::Parsing::numDebugCharacters {50}
inline

The maximum number of characters to be shown in error messages.

Referenced by crawlservpp::Parsing::XML::clear().

◆ tidyEncoding

constexpr std::string_view crawlservpp::Parsing::tidyEncoding {"utf8"}
inline

The character encoding used by the tidy-html5 API.

Referenced by crawlservpp::Parsing::HTML::tidyAndConvert().

◆ xmlBegin

constexpr auto crawlservpp::Parsing::xmlBegin {"<?xml "sv}
inline

The beginning of XML markup.

Referenced by crawlservpp::Parsing::XML::parse().

◆ xmlInstructionBegin

constexpr auto crawlservpp::Parsing::xmlInstructionBegin {"<?xml:"sv}
inline

The beginning of a XML processing instruction.

Referenced by crawlservpp::Parsing::XML::clear().

◆ xmlInstructionEnd

constexpr auto crawlservpp::Parsing::xmlInstructionEnd {">"sv}
inline

The end of a XML processing instruction.

Referenced by crawlservpp::Parsing::XML::clear().

◆ xmlTags

constexpr std::array crawlservpp::Parsing::xmlTags {"<?i>"sv}
inline

Array containing additional XML markup tags to be removed.

Referenced by crawlservpp::Parsing::XML::clear().