doxygen
|
Various UTF8 related helper functions. More...
#include <cstdint>
#include <string>
Go to the source code of this file.
Functions | |
std::string | convertUTF8ToLower (const std::string &input) |
Converts the input string into a lower case version, also taking into account non-ASCII characters that has a lower case variant. | |
std::string | convertUTF8ToUpper (const std::string &input) |
Converts the input string into a upper case version, also taking into account non-ASCII characters that has a upper case variant. | |
std::string | getUTF8CharAt (const std::string &input, size_t pos) |
Returns the UTF8 character found at byte position pos in the input string. More... | |
uint32_t | getUnicodeForUTF8CharAt (const std::string &input, size_t pos) |
Returns the 32bit Unicode value matching character at byte position pos in the UTF8 encoded input. | |
uint8_t | getUTF8CharNumBytes (char firstByte) |
Returns the number of bytes making up a single UTF8 character given the first byte in the sequence. | |
const char * | writeUTF8Char (TextStream &t, const char *s) |
Writes the UTF8 character pointed to by s to stream t and returns a pointer to the next character. | |
bool | lastUTF8CharIsMultibyte (const std::string &input) |
Returns true iff the last character in input is a multibyte character. More... | |
bool | isUTF8CharUpperCase (const std::string &input, size_t pos) |
Returns true iff the input string at byte position pos holds an upper case character. More... | |
int | isUTF8NonBreakableSpace (const char *input) |
Check if the first character pointed at by input is a non-breakable whitespace character. More... | |
bool | isUTF8PunctuationCharacter (uint32_t unicode) |
Check if the given Unicode character represents a punctuation character. | |
Various UTF8 related helper functions.
See https://en.wikipedia.org/wiki/UTF-8 for details on UTF8 encoding.
std::string getUTF8CharAt | ( | const std::string & | input, |
size_t | pos | ||
) |
Returns the UTF8 character found at byte position pos in the input string.
The resulting string can be a multi byte sequence.
bool isUTF8CharUpperCase | ( | const std::string & | input, |
size_t | pos | ||
) |
Returns true iff the input string at byte position pos holds an upper case character.
int isUTF8NonBreakableSpace | ( | const char * | input | ) |
Check if the first character pointed at by input is a non-breakable whitespace character.
Returns the byte size of the character if there is match or 0 if not.
bool lastUTF8CharIsMultibyte | ( | const std::string & | input | ) |
Returns true iff the last character in input is a multibyte character.