|
| static size_t | utf8_bytes (const void *sequence) |
| | Return the number of bytes the UTF-8 character starting at sequence is taking. More...
|
| |
| static size_t | utf8_bytes (uint32_t codepoint) |
| | Return the number of bytes necessary to convert the codepoint into UTF-8. More...
|
| |
| static size_t | isutf8 (const void *utf8_start, const void *end_of_buffer) |
| | Examine the UTF-8 sequence to determine whether or not it is valid UTF-8. More...
|
| |
| static uint32_t | utf8_to_codepoint (const void *utf8_start, const void *end_of_buffer, size_t &bytes_consumed) |
| | Convert the UTF-8 sequence into a Unicode codepoin (i.e. decode the UTF-8). More...
|
| |
| static size_t | codepoint_to_utf8 (void *utf8_start, const void *end_of_buffer, uint32_t codepoint) |
| | Encode the Unicode codepoint in UTF-8 into buffer utf8_start, not exceeding end_of_buffer. More...
|
| |
| static void | tocasefold (std::vector< uint32_t > &casefolded, uint32_t codepoint) |
| | Strip all accents, non-alphanumerics, and then casefold. More...
|
| |
| static const uint32_t * | tocasefold (uint32_t codepoint) |
| | Return a pointer to a 0 terminated array of codepoints that is the casefolded normalised codepoint. More...
|
| |
| static int | isalpha (uint32_t codepoint) |
| | Unicode version is isalpha(). More...
|
| |
| static int | isalnum (uint32_t codepoint) |
| | Unicode version is isalnum(). More...
|
| |
| static int | isupper (uint32_t codepoint) |
| | Unicode version is isupper(). More...
|
| |
| static int | islower (uint32_t codepoint) |
| | Unicode version is islower(). More...
|
| |
| static int | iscntrl (uint32_t codepoint) |
| | Unicode version is iscntrl(). More...
|
| |
| static int | isdigit (uint32_t codepoint) |
| | Unicode version is isdigit(). More...
|
| |
| static int | isgraph (uint32_t codepoint) |
| | Unicode version is isgraph(). More...
|
| |
| static int | ispunct (uint32_t codepoint) |
| | Unicode version is ispunct(). More...
|
| |
| static int | isspace (uint32_t codepoint) |
| | Unicode version is isspace(), by the "C" isspace() and Unicode definition. More...
|
| |
| static int | isuspace (uint32_t codepoint) |
| | Unicode version is isspace(), by the Unicode definition. More...
|
| |
| static int | isxdigit (uint32_t codepoint) |
| | Unicode version is isxdigit(). More...
|
| |
| static int | ismark (uint32_t codepoint) |
| | Check to see if the codepoint is a mark. More...
|
| |
| static int | issymbol (uint32_t codepoint) |
| | Check to see if the codepoint is a symbol. More...
|
| |
| static int | isxmlnamestartchar (uint32_t codepoint) |
| | Check to see if the codepoint is a valid character to start and XML tag name with. More...
|
| |
| static int | isxmlnamechar (uint32_t codepoint) |
| | Check to see if the codepoint is a valid character to follow a NameStartChar in an XML tag name. More...
|
| |
|
static void | unittest (void) |
| | Unit test this class.
|
| |
Implementation of the ctype methods on Unicode codepoints.
| static int JASS::unicode::ispunct |
( |
uint32_t |
codepoint | ) |
|
|
inlinestatic |
Unicode version is ispunct().
Character is of the general Unicode category "Pd, Ps, Pe, Pc, Po, Pi, Pf".
- Parameters
-
| codepoint | [in] The Unicode codepoint to check. |
- Returns
- true if a punctuation character, else false.
| static int JASS::unicode::isspace |
( |
uint32_t |
codepoint | ) |
|
|
inlinestatic |
Unicode version is isspace(), by the "C" isspace() and Unicode definition.
Character is "Part of C0(tab, vertical tab, form feed, carriage return, and linefeed characters), Zs, Zl, Zp, and NEL(U+0085)".
- Parameters
-
| codepoint | [in] The Unicode codepoint to check. |
- Returns
- true if a space character, else false.
| static void JASS::unicode::tocasefold |
( |
std::vector< uint32_t > & |
casefolded, |
|
|
uint32_t |
codepoint |
|
) |
| |
|
inlinestatic |
Strip all accents, non-alphanumerics, and then casefold.
This is the JASS character normalisation method. It converts to Unicode "NFKD", strips all non-alpha-numerics, then performs Unicode casefolding "C+F". As unicode decomposition is involved (and casefolding) the resulting string can be considerably larger than a single codepoint. The worst case is the single codepoint U+FDFA becoming 18 codepoints once normalisd. Two codepoints, U+FDFA and U+FDFB expand into strings that contain spaces; it is the caller's responsibility to manage this should it need to be managed.
- Parameters
-
| casefolded | [out] The normalise Unicode codepoint string is appended to this parameter. |
| codepoint | [in] The codepoint to normalise. |
| static const uint32_t* JASS::unicode::tocasefold |
( |
uint32_t |
codepoint | ) |
|
|
inlinestatic |
Return a pointer to a 0 terminated array of codepoints that is the casefolded normalised codepoint.
This is the JASS character normalisation method. It converts to Unicode "NFKD", strips all non-alpha-numerics, then performs Unicode casefolding "C+F". As unicode decomposition is involved (and casefolding) the resulting string can be considerably larger than a single codepoint. The worst case is the single codepoint U+FDFA becoming 18 codepoints once normalisd. Two codepoints, U+FDFA and U+FDFB expand into strings that contain spaces; it is the caller's responsibility to manage this should it need to be managed.
- Parameters
-
| codepoint | [in] The codepoint to normalise. |