Reference¶

This section is intentionally reference-first.

Each named API family gets its own entry with:

Conventions¶

utf8_string_view and utf8_string measure offsets in UTF-8 code units, which means byte offsets.
utf16_string_view and utf16_string measure offsets in UTF-16 code units.
utf32_string_view and utf32_string measure offsets in UTF-32 code points, which are also the code units of that encoding.
size() always reports code units, not Unicode scalar values and not grapheme clusters.
npos is static_cast<size_type>(-1) on all string and view types.

A UTF-8 character boundary is the start of a UTF-8 scalar encoding.
A UTF-16 character boundary is any index that is not inside a surrogate pair.
A UTF-32 character boundary is any index in [0, size()].
Character-aware search families such as find(Char, pos), find(View, pos), rfind(...), find_first_of(View, pos), and find_last_of(View, pos) first align the starting offset to a valid character boundary.
Boundary-sensitive accessors such as char_at, substr, grapheme_at, grapheme_substr, and split_once_at reject invalid boundaries explicitly with std::nullopt.
Owning-string mutators that splice raw ranges, such as insert, erase, reverse(pos, count), and partial case transforms, reject invalid boundaries with std::out_of_range.

Grapheme-aware APIs use the default Unicode grapheme cluster rules.
Grapheme offsets are still reported in raw UTF-8 bytes, UTF-16 code units, or UTF-32 code points, depending on the type.

View factories report validation failures through std::expected.
View accessors that may fail because of boundary issues return std::optional.
Owning-string mutators that cannot represent failure with a return type typically throw std::out_of_range for invalid indices or invalid substring boundaries.
Unicode case mapping, normalization, and transcoding functions in the public API assume the source object is already validated, so they do not report encoding errors again.

When UTF8_RANGES_ENABLE_ICU=1 is enabled, locale-aware casing overloads accept locale_id.
_locale is the compile-time checked literal form of that token.
is_available_locale(...) is a non-throwing probe for exact ICU locale availability.
Locale-aware casing overloads may still succeed for locales that are not explicitly available, because ICU may canonicalize or fall back to a more general locale.

"Code units" means UTF-8 bytes for UTF-8 types, UTF-16 code units for UTF-16 types, and UTF-32 code points for UTF-32 types.
"Scalars" means Unicode scalar values.
"Matches" means the number of delimiter or replacement matches actually processed.

Family	Types
Characters	`utf8_char`, `utf16_char`, `utf32_char`
Borrowed validated text	`utf8_string_view`, `utf16_string_view`, `utf32_string_view`
Owning validated text	`basic_utf8_string<Allocator>`, `basic_utf16_string<Allocator>`, `basic_utf32_string<Allocator>`, `utf8_string`, `utf16_string`, `utf32_string`, `pmr::utf8_string`, `pmr::utf16_string`, `pmr::utf32_string`
Helper ranges	`views::utf8_view`, `views::utf16_view`, `views::utf32_view`, reversed views, lossy views, grapheme cluster views
Literals	`_u8c`, `_u16c`, `_u32c`, `_utf8_sv`, `_utf16_sv`, `_utf32_sv`, `_utf8_s`, `_utf16_s`, `_utf32_s`, `_grapheme_utf8`, `_grapheme_utf16`, `_grapheme_utf32`
Optional ICU locale support	`locale_id`, `_locale`, `is_available_locale(...)`