|
crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
Namespace for classes parsing HTML, URIs, and XML. More...
Classes | |
| class | HTML |
| Parses and cleans HTML markup. More... | |
| class | URI |
| Parser for RFC 3986 URIs that can also analyze their relationships with each other. More... | |
| class | XML |
| Parses HTML markup into clean XML. More... | |
Constants | |
| constexpr std::string_view | tidyEncoding {"utf8"} |
| The character encoding used by the tidy-html5 API. More... | |
| constexpr auto | maxEscapedCharLength {6} |
| Maximum length of a URL-escaped character. More... | |
| constexpr auto | xmlBegin {"<?xml "sv} |
| The beginning of XML markup. More... | |
| constexpr std::array | xmlTags {"<?i>"sv} |
| Array containing additional XML markup tags to be removed. More... | |
| constexpr auto | cDataBegin {"<![CDATA["sv} |
The beginning of a CDATA element. More... | |
| constexpr auto | cDataEnd {"]]>"sv} |
The end of a CDATA element. More... | |
| constexpr auto | conditionalBegin {"<![if "sv} |
| The beginning of a conditional comment. More... | |
| constexpr auto | conditionalEnd {"<![endif]>"sv} |
| The end of a conditional comment. More... | |
| constexpr auto | conditionalInsert {"--"sv} |
| Characters to be inserted/replaced to make conditional comments valid. More... | |
| constexpr auto | conditionalInsertOffsetBegin {2} |
| Offset at which to insert at the beginning to make conditional comments valid. More... | |
| constexpr auto | conditionalInsertOffsetEnd {9} |
| Offset at which to insert at the end to make conditional comments valid. More... | |
| constexpr auto | conditionalInsertOffsetStrayEnd {2} |
| Offset at which to insert into stray end tag left from conditional comment. More... | |
| constexpr auto | commentCharsToReplace {"--"sv} |
| Characters to be replaced inside comments. More... | |
| constexpr auto | commentCharsReplaceBy {"=="sv} |
| Characters used as replacement inside comments. More... | |
| constexpr auto | invalidBegin {"<? "sv} |
| The beginning of an invalid comment. More... | |
| constexpr auto | invalidEnd {" ?>"sv} |
| The end of an invalid comment. More... | |
| constexpr auto | invalidInsertBegin {"!--"sv} |
| Characters to be inserted at the beginning to make invalid comments valid. More... | |
| constexpr auto | invalidInsertEnd {"--"sv} |
| Characters to be inserted at the end to make invalid comments valid. More... | |
| constexpr auto | invalidInsertOffsetBegin {1} |
| Offset at which to insert at the beginning to make invalid comments valid. More... | |
| constexpr auto | invalidInsertOffsetEnd {2} |
| Offset at which to insert at the end to make invalid comments valid. More... | |
| constexpr auto | numDebugCharacters {50} |
| The maximum number of characters to be shown in error messages. More... | |
| constexpr auto | xmlInstructionBegin {"<?xml:"sv} |
| The beginning of a XML processing instruction. More... | |
| constexpr auto | xmlInstructionEnd {">"sv} |
| The end of a XML processing instruction. More... | |
Namespace for classes parsing HTML, URIs, and XML.
|
inline |
The beginning of a CDATA element.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
The end of a CDATA element.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Characters used as replacement inside comments.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Characters to be replaced inside comments.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
The beginning of a conditional comment.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
The end of a conditional comment.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Characters to be inserted/replaced to make conditional comments valid.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Offset at which to insert at the beginning to make conditional comments valid.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Offset at which to insert at the end to make conditional comments valid.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Offset at which to insert into stray end tag left from conditional comment.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
The beginning of an invalid comment.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
The end of an invalid comment.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Characters to be inserted at the beginning to make invalid comments valid.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Characters to be inserted at the end to make invalid comments valid.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Offset at which to insert at the beginning to make invalid comments valid.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Offset at which to insert at the end to make invalid comments valid.
|
inline |
Maximum length of a URL-escaped character.
Referenced by crawlservpp::Parsing::URI::escape().
|
inline |
The maximum number of characters to be shown in error messages.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
The character encoding used by the tidy-html5 API.
Referenced by crawlservpp::Parsing::HTML::tidyAndConvert().
|
inline |
The beginning of XML markup.
Referenced by crawlservpp::Parsing::XML::parse().
|
inline |
The beginning of a XML processing instruction.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
The end of a XML processing instruction.
Referenced by crawlservpp::Parsing::XML::clear().
|
inline |
Array containing additional XML markup tags to be removed.
Referenced by crawlservpp::Parsing::XML::clear().