doxygen
Functions
utf8.h File Reference

Various UTF8 related helper functions. More...

#include <cstdint>
#include <string>
Include dependency graph for utf8.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Functions

std::string convertUTF8ToLower (const std::string &input)
 Converts the input string into a lower case version, also taking into account non-ASCII characters that has a lower case variant.
 
std::string convertUTF8ToUpper (const std::string &input)
 Converts the input string into a upper case version, also taking into account non-ASCII characters that has a upper case variant.
 
std::string getUTF8CharAt (const std::string &input, size_t pos)
 Returns the UTF8 character found at byte position pos in the input string. More...
 
uint32_t getUnicodeForUTF8CharAt (const std::string &input, size_t pos)
 Returns the 32bit Unicode value matching character at byte position pos in the UTF8 encoded input.
 
uint8_t getUTF8CharNumBytes (char firstByte)
 Returns the number of bytes making up a single UTF8 character given the first byte in the sequence.
 
const char * writeUTF8Char (TextStream &t, const char *s)
 Writes the UTF8 character pointed to by s to stream t and returns a pointer to the next character.
 
bool lastUTF8CharIsMultibyte (const std::string &input)
 Returns true iff the last character in input is a multibyte character. More...
 
bool isUTF8CharUpperCase (const std::string &input, size_t pos)
 Returns true iff the input string at byte position pos holds an upper case character. More...
 
int isUTF8NonBreakableSpace (const char *input)
 Check if the first character pointed at by input is a non-breakable whitespace character. More...
 
bool isUTF8PunctuationCharacter (uint32_t unicode)
 Check if the given Unicode character represents a punctuation character.
 

Detailed Description

Various UTF8 related helper functions.

See https://en.wikipedia.org/wiki/UTF-8 for details on UTF8 encoding.

Function Documentation

◆ getUTF8CharAt()

std::string getUTF8CharAt ( const std::string &  input,
size_t  pos 
)

Returns the UTF8 character found at byte position pos in the input string.

The resulting string can be a multi byte sequence.

◆ isUTF8CharUpperCase()

bool isUTF8CharUpperCase ( const std::string &  input,
size_t  pos 
)

Returns true iff the input string at byte position pos holds an upper case character.

◆ isUTF8NonBreakableSpace()

int isUTF8NonBreakableSpace ( const char *  input)

Check if the first character pointed at by input is a non-breakable whitespace character.

Returns the byte size of the character if there is match or 0 if not.

◆ lastUTF8CharIsMultibyte()

bool lastUTF8CharIsMultibyte ( const std::string &  input)

Returns true iff the last character in input is a multibyte character.