crawlserv++  [under development]
Application for crawling and analyzing textual content of websites.
Utf8.hpp File Reference
#include "../Main/Exception.hpp"
#include "../_extern/utf8/source/utf8.h"
#include <cstddef>
#include <string>
#include <string_view>
Include dependency graph for Utf8.hpp:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  crawlservpp::Helper::Utf8::Exception
 Class for UTF-8 exceptions. More...
 

Namespaces

 crawlservpp::Helper::Utf8
 Namespace for global UTF-8 encoding functions.
 

Functions

bool crawlservpp::Helper::Utf8::isLastCharValidUtf8 (const std::string &stringToCheck)
 Checks the last character (i.e. up to four bytes at the end) of the given string for valid UTF-8. More...
 

Constants

constexpr auto crawlservpp::Helper::Utf8::utf8MemoryFactor {2}
 Factor for guessing the maximum amount of memory used for UTF-8 compared to ISO-8859-1. More...
 
constexpr auto crawlservpp::Helper::Utf8::bitmaskTopBit {0x80}
 Bit mask to extract the first bit of a multibyte character. More...
 
constexpr auto crawlservpp::Helper::Utf8::bitmaskTopTwoBits {0xc0}
 Bit mask to extract the top two bits of a multibyte character. More...
 
constexpr auto crawlservpp::Helper::Utf8::shiftSixBits {6}
 Shift six bits. More...
 
constexpr auto crawlservpp::Helper::Utf8::bitmaskLastSixBits0b000001 {0x3F}
 Bit mask to check the last six bits for 0b000001. More...
 
constexpr auto crawlservpp::Helper::Utf8::oneByte {1}
 One byte. More...
 
constexpr auto crawlservpp::Helper::Utf8::twoBytes {2}
 Two bytes. More...
 
constexpr auto crawlservpp::Helper::Utf8::threeBytes {3}
 Three bytes. More...
 
constexpr auto crawlservpp::Helper::Utf8::fourBytes {4}
 Four bytes. More...
 

Conversion

std::string crawlservpp::Helper::Utf8::iso88591ToUtf8 (std::string_view strIn)
 Converts a string from ISO-8859-1 to UTF-8. More...
 

Validation

bool crawlservpp::Helper::Utf8::isValidUtf8 (std::string_view stringToCheck, std::string &errTo)
 Checks whether a string contains valid UTF-8. More...
 
bool crawlservpp::Helper::Utf8::isLastCharValidUtf8 (std::string_view stringToCheck)
 
bool crawlservpp::Helper::Utf8::isSingleUtf8Char (std::string_view stringToCheck)
 Returns whether the given string contains exactly one UTF-8 code point. More...
 

Repair

bool crawlservpp::Helper::Utf8::repairUtf8 (std::string_view strIn, std::string &strOut)
 Replaces invalid UTF-8 characters in the given string and returns whether invalid characters occured. More...
 

Length

std::size_t crawlservpp::Helper::Utf8::length (std::string_view str)