crawlserv++
[under development]
Application for crawling and analyzing textual content of websites.
|
#include "../Main/Exception.hpp"
#include "../_extern/utf8/source/utf8.h"
#include <cstddef>
#include <string>
#include <string_view>
Go to the source code of this file.
Classes | |
class | crawlservpp::Helper::Utf8::Exception |
Class for UTF-8 exceptions. More... | |
Namespaces | |
crawlservpp::Helper::Utf8 | |
Namespace for global UTF-8 encoding functions. | |
Functions | |
bool | crawlservpp::Helper::Utf8::isLastCharValidUtf8 (const std::string &stringToCheck) |
Checks the last character (i.e. up to four bytes at the end) of the given string for valid UTF-8. More... | |
Constants | |
constexpr auto | crawlservpp::Helper::Utf8::utf8MemoryFactor {2} |
Factor for guessing the maximum amount of memory used for UTF-8 compared to ISO-8859-1. More... | |
constexpr auto | crawlservpp::Helper::Utf8::bitmaskTopBit {0x80} |
Bit mask to extract the first bit of a multibyte character. More... | |
constexpr auto | crawlservpp::Helper::Utf8::bitmaskTopTwoBits {0xc0} |
Bit mask to extract the top two bits of a multibyte character. More... | |
constexpr auto | crawlservpp::Helper::Utf8::shiftSixBits {6} |
Shift six bits. More... | |
constexpr auto | crawlservpp::Helper::Utf8::bitmaskLastSixBits0b000001 {0x3F} |
Bit mask to check the last six bits for 0b000001. More... | |
constexpr auto | crawlservpp::Helper::Utf8::oneByte {1} |
One byte. More... | |
constexpr auto | crawlservpp::Helper::Utf8::twoBytes {2} |
Two bytes. More... | |
constexpr auto | crawlservpp::Helper::Utf8::threeBytes {3} |
Three bytes. More... | |
constexpr auto | crawlservpp::Helper::Utf8::fourBytes {4} |
Four bytes. More... | |
Conversion | |
std::string | crawlservpp::Helper::Utf8::iso88591ToUtf8 (std::string_view strIn) |
Converts a string from ISO-8859-1 to UTF-8. More... | |
Validation | |
bool | crawlservpp::Helper::Utf8::isValidUtf8 (std::string_view stringToCheck, std::string &errTo) |
Checks whether a string contains valid UTF-8. More... | |
bool | crawlservpp::Helper::Utf8::isLastCharValidUtf8 (std::string_view stringToCheck) |
bool | crawlservpp::Helper::Utf8::isSingleUtf8Char (std::string_view stringToCheck) |
Returns whether the given string contains exactly one UTF-8 code point. More... | |
Repair | |
bool | crawlservpp::Helper::Utf8::repairUtf8 (std::string_view strIn, std::string &strOut) |
Replaces invalid UTF-8 characters in the given string and returns whether invalid characters occured. More... | |
Length | |
std::size_t | crawlservpp::Helper::Utf8::length (std::string_view str) |