|
crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
#include <cstddef>#include <string>

Go to the source code of this file.
Namespaces | |
| crawlservpp::Data::Stemmer | |
| Namespace for linguistic stemmers. | |
Functions | |
| void | crawlservpp::Data::Stemmer::stemGerman (std::string &token) |
| Stems a token in German. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::minLengthStrip2 {6} |
| Minimum length of a token to strip two letters from the end or the beginning. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::minLengthStrip1 {4} |
| Minimum length of a token to strip one letter from the end. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::binInv {0xff} |
| Literal for binary inversion. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::toLowerCase {32} |
| Number to add to make uppercase ASCII letters lowercase. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::utf8mb2 {0xC3} |
| First byte of 2-byte UTF-8 characters for umlauts and sharp s. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::utf8mb3 {0xE1} |
| First byte of 3-byte UTF-8 character for capital sharp s. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::umlautA2sm {0xA4} |
| Second byte of UTF-8 umlaut ä. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::umlautA2l {0x84} |
| Second byte of UTF-8 umlaut Ä. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::umlautO2sm {0xB6} |
| Second byte of UTF-8 umlaut ö. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::umlautO2l {0x96} |
| Second byte of UTF-8 umlaut Ö. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::umlautU2sm {0xBC} |
| Second byte of UTF-8 umlaut ü. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::umlautU2l {0x9C} |
| Second byte of UTF-8 umlaut Ü. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::sharpS2sm {0x9F} |
| Second byte of UTF-8 sharp s. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::sharpS2l {0xBA} |
| Second byte of UTF-8 capital sharp s. More... | |
| constexpr auto | crawlservpp::Data::Stemmer::sharpS3l {0x9E} |
| Third byte of UTF-8 capital sharp s. More... | |