String Views¶

utf8_string_view, utf16_string_view, and utf32_string_view are borrowed validated text views.

They expose most of the library's read-only Unicode surface: validated iteration, boundary-aware access, raw code-unit search, character-aware search, grapheme-aware search, split/trim views, and owning transformations.

Unlike the helper adapters in unicode_ranges::views, these types are not themselves direct subclasses of std::ranges::view_interface. They are validated string-view classes with a std::basic_string_view-like owning model and a string-oriented API surface. The lazy range adapters returned by members such as chars(), graphemes(), split(...), and matches(...) are the actual view types.

When a signature block uses Char, View, or Predicate, it refers to the encoding-specific type family in that section:

UTF-8: utf8_char, utf8_string_view, details::utf8_char_predicate
UTF-16: utf16_char, utf16_string_view, details::utf16_char_predicate
UTF-32: utf32_char, utf32_string_view, details::utf32_char_predicate

Unless a section explicitly narrows the discussion, the UTF-8, UTF-16, and UTF-32 view APIs are structurally parallel.

To keep the larger synopsis blocks readable, some sections spell out the UTF-8 and UTF-16 overload families explicitly and rely on this rule for UTF-32. Unless a section says otherwise, replace char8_t / utf8_char / utf8_string_view / basic_utf8_string or char16_t / utf16_char / utf16_string_view / basic_utf16_string with the corresponding UTF-32 names and you get the same API family.

Construction And Raw View Access¶

Synopsis¶

class utf8_string_view {
public:
    utf8_string_view() = default;

    static constexpr std::expected<utf8_string_view, utf8_error>
    from_bytes(std::u8string_view bytes) noexcept;

    static constexpr utf8_string_view
    from_bytes_unchecked(std::u8string_view bytes) noexcept;

    constexpr std::u8string_view base() const noexcept;
    constexpr std::u8string_view as_view() const noexcept;
    constexpr operator std::u8string_view() const noexcept;
};

class utf16_string_view {
public:
    utf16_string_view() = default;

    static constexpr std::expected<utf16_string_view, utf16_error>
    from_code_units(std::u16string_view code_units) noexcept;

    static constexpr utf16_string_view
    from_code_units_unchecked(std::u16string_view code_units) noexcept;

    constexpr std::u16string_view base() const noexcept;
    constexpr std::u16string_view as_view() const noexcept;
    constexpr operator std::u16string_view() const noexcept;
};

class utf32_string_view {
public:
    utf32_string_view() = default;

    static constexpr std::expected<utf32_string_view, utf32_error>
    from_code_points(std::u32string_view code_points) noexcept;

    static constexpr utf32_string_view
    from_code_points_unchecked(std::u32string_view code_points) noexcept;

    constexpr std::u32string_view base() const noexcept;
    constexpr std::u32string_view as_view() const noexcept;
    constexpr operator std::u32string_view() const noexcept;
};

Behavior¶

Checked factories validate the source encoding.
Unchecked factories assume the input is already valid.
base(), as_view(), and the implicit conversion expose the corresponding standard-library std::basic_string_view.
utf8_error, utf16_error, and utf32_error are operation-specific aliases of unicode_error; read first_invalid_element_index for the failing byte, code-unit, or code-point index.

Return value¶

Checked factories return std::unexpected(...) on invalid UTF data.
Unchecked factories and raw-view accessors return the view directly.

Complexity¶

Checked factories are linear in the number of code units.
Unchecked factories and raw-view accessors are constant.

Exceptions And `noexcept`¶

None.

All listed overloads are noexcept.

Example¶

#include "unicode_ranges_all.hpp"

#include <print>

using namespace unicode_ranges;
using namespace unicode_ranges::literals;

int main()
{
    constexpr auto text = "é🇷🇴!"_utf8_sv;

    std::println("{}", text);                   // é🇷🇴!
    std::println("{}", text.size());            // 12 UTF-8 code units
    std::println("{}", text.char_count());      // 5 Unicode scalars
    std::println("{}", text.grapheme_count());  // 3 graphemes
    std::println("{}", text.find("!"_u8c));     // 11
    std::println("{}", text.find("🇷"_u8c));    // 3

    std::println("{}", text.chars());          // [e, ́, 🇷, 🇴, !]
    std::println("{::s}", text.graphemes());   // [é, 🇷🇴, !]
}

Comparison, Streaming, Hashing, And Formatting¶

Synopsis¶

friend constexpr bool operator==(const utf8_string_view&, const utf8_string_view&) noexcept;
friend constexpr auto operator<=>(const utf8_string_view&, const utf8_string_view&) noexcept;
std::ostream& operator<<(std::ostream&, utf8_string_view);
template<> struct std::hash<utf8_string_view>;
template<> struct std::formatter<utf8_string_view, char>;

friend constexpr bool operator==(const utf16_string_view&, const utf16_string_view&) noexcept;
friend constexpr auto operator<=>(const utf16_string_view&, const utf16_string_view&) noexcept;
std::ostream& operator<<(std::ostream&, utf16_string_view);
template<> struct std::hash<utf16_string_view>;
template<> struct std::formatter<utf16_string_view, char>;

friend constexpr bool operator==(const utf32_string_view&, const utf32_string_view&) noexcept;
friend constexpr auto operator<=>(const utf32_string_view&, const utf32_string_view&) noexcept;
std::ostream& operator<<(std::ostream&, utf32_string_view);
template<> struct std::hash<utf32_string_view>;
template<> struct std::formatter<utf32_string_view, char>;

Behavior¶

Equality and ordering compare encoded contents lexicographically.
UTF-8 streams directly to std::ostream.
UTF-16 converts each scalar to UTF-8 when written to std::ostream.
UTF-32 converts each scalar to UTF-8 when written to std::ostream.
The std::formatter specializations format textual output.
The std::hash specializations hash the underlying standard-library string view.

Return value¶

Equality operators return bool.
Ordering operators return the corresponding ordering category.
Stream insertion returns the output stream.
Hashing and formatting integrate with the standard-library customization points for the validated view types.

Complexity¶

Comparison is linear in the compared prefix.
Streaming, hashing, and formatting are linear in the amount of text processed.

Exceptions And `noexcept`¶

Comparison and hashing do not throw.
Streaming may report stream errors through the stream state.
UTF-16 formatting may allocate internally while transcoding.
UTF-32 formatting may allocate internally while transcoding.
Comparison and hashing are non-throwing.
Streaming and formatting are not noexcept.

Iteration Families¶

Synopsis¶

constexpr auto chars() const& noexcept;
constexpr auto chars() && noexcept(/* conditional */); // owning strings only
constexpr auto reversed_chars() const& noexcept;
constexpr auto reversed_chars() && noexcept(/* conditional */); // owning strings only
constexpr auto graphemes() const& noexcept;
constexpr auto graphemes() && noexcept(/* conditional */); // owning strings only
constexpr auto char_indices() const& noexcept;
constexpr auto char_indices() && noexcept(/* conditional */); // owning strings only
constexpr auto grapheme_indices() const& noexcept;
constexpr auto grapheme_indices() && noexcept(/* conditional */); // owning strings only

Behavior¶

chars() iterates validated Unicode scalar values.
reversed_chars() iterates the same scalar values from the end.
graphemes() iterates default Unicode grapheme clusters.
char_indices() yields std::pair objects of the form (offset, Char).
grapheme_indices() yields std::pair objects of the form (offset, View).
View and owning-lvalue receivers return ranges that borrow from the receiver.
Owning-rvalue receivers return move-only owning views that keep the moved string alive and do not model std::ranges::borrowed_range.
These owning views are intentionally move-only. In C++26-or-newer compiler modes that define __cpp_deleted_function, deleted constructors and assignments include a diagnostic reason.
All five returned range types are lazy views derived from std::ranges::view_interface.
These five core iteration families expose forward iterators, so they are multi-pass and may be traversed more than once as long as the underlying source view remains alive.
For UTF-32, chars(), reversed_chars(), and char_indices() are stronger: they are sized common random-access views because UTF-32 stores one scalar per code unit.

Return value¶

Returns a lightweight range or view object.

Complexity¶

Constructing the range is constant.
Iterating the whole range is linear in the number of scalars or grapheme clusters.

Exceptions And `noexcept`¶

None.

View and owning-lvalue receiver overloads are noexcept.
Owning-rvalue receiver overloads are conditionally noexcept; with the default owning string types, they are noexcept.

Example¶

#include "unicode_ranges_all.hpp"

#include <print>

using namespace unicode_ranges;
using namespace unicode_ranges::literals;

int main()
{
    const utf8_string text = "mañana 👩‍💻"_utf8_s;

    std::println("{}", text);                 // mañana 👩‍💻
    std::println("{}", text.chars());         // [m, a, ñ, a, n, a,  , 👩, ‍, 💻]
    std::println("{::s}", text.graphemes());  // [m, a, ñ, a, n, a,  , 👩‍💻]
}

Size, Copy, Emptiness, ASCII, And Counts¶

Synopsis¶

constexpr size_type size() const noexcept;
// utf8_string_view
constexpr size_type copy(char8_t* dest, size_type count, size_type pos = 0) const;
// utf16_string_view
constexpr size_type copy(char16_t* dest, size_type count, size_type pos = 0) const;
// utf32_string_view
constexpr size_type copy(char32_t* dest, size_type count, size_type pos = 0) const;
constexpr bool empty() const noexcept;
constexpr bool is_ascii() const noexcept;
constexpr size_type char_count() const noexcept;
constexpr size_type grapheme_count() const noexcept;

Behavior¶

size() counts code units.
copy(dest, count, pos) copies raw code units into caller-owned storage and does not append a null terminator.
If count == npos or the requested range extends past the end, copy() copies through size().
UTF-8 and UTF-16 copies require the copied substring to begin and end on validated character boundaries.
empty() tests for zero code units.
is_ascii() returns true only if all code units are ASCII.
char_count() counts Unicode scalar values.
grapheme_count() counts default Unicode grapheme clusters.
For UTF-32, size() and char_count() are the same value.

Return value¶

copy() returns the number of raw code units copied.
The other members return the requested observer or count.

Complexity¶

size() and empty() are constant.
copy() is linear in the number of copied code units.
is_ascii() and grapheme_count() are linear in the view length.
char_count() is constant for UTF-32 and linear for UTF-8/UTF-16.

Exceptions And `noexcept`¶

copy() throws std::out_of_range when pos > size() or when the requested UTF-8/UTF-16 substring would split a character.
Other listed members do not throw.
All listed members except copy() are noexcept.
copy() is not noexcept.

Normalization Queries¶

Synopsis¶

constexpr bool is_normalized(normalization_form form) const;
constexpr bool is_nfc() const;
constexpr bool is_nfd() const;
constexpr bool is_nfkc() const;
constexpr bool is_nfkd() const;

Behavior¶

These members normalize the view contents into the requested form and compare the result against the original view.

Return value¶

Returns true when the view is already in the requested normalization form.

Complexity¶

Linear to super-linear in the input length, depending on the amount of Unicode decomposition and composition required.

Exceptions And `noexcept`¶

May throw allocator or container exceptions while materializing the normalized copy.

Not noexcept.

`contains`¶

Synopsis¶

constexpr bool contains(utf8_char ch) const noexcept;
constexpr bool contains(utf8_string_view sv) const noexcept;
constexpr bool contains(std::span<const utf8_char> chars) const noexcept;
template <details::utf8_char_predicate Pred>
constexpr bool contains(Pred pred) const noexcept;

constexpr bool contains(utf16_char ch) const noexcept;
constexpr bool contains(utf16_string_view sv) const noexcept;
constexpr bool contains(std::span<const utf16_char> chars) const noexcept;
template <details::utf16_char_predicate Pred>
constexpr bool contains(Pred pred) const noexcept;

Behavior¶

Character, view, span, and predicate overloads are character-aware.
In this family, the character-set overload is spelled as std::span<const Char>.
The std::span overload treats the span as a character set rather than as one contiguous substring.

Overload differences¶

The examples below use constexpr auto text = "😄🇷🇴✨"_utf8_sv;.

Overload	Meaning	Example
`contains(Char ch)`	exact validated character search	`text.contains("✨"_u8c)`
`contains(View sv)`	exact validated substring search	`text.contains("🇷🇴"_utf8_sv)`
`contains(std::span<const Char> chars)`	character-set membership: succeeds if any one character in the text equals any one element of the span	`text.contains(std::array{"🔥"_u8c, "✨"_u8c})`
`contains(Pred pred)`	predicate match on validated characters	`text.contains([](utf8_char ch) { return !ch.is_ascii(); })`

The std::span overload is special because it is not substring matching. std::array{"🇷"_u8c, "🇴"_u8c} does not mean "find the grapheme 🇷🇴" and it does not require adjacent characters. It means "match a single character that is either 🇷 or 🇴".

Inspiration¶

The character and view overloads are close in spirit to C++ std::basic_string_view::contains and Rust's str search surface. The span and predicate overloads extend that familiar shape with character-set and predicate-based matching.

Return value¶

Equivalent to find(...) != npos.

Complexity¶

Linear in the view length.

Exceptions And `noexcept`¶

None.

All overloads are noexcept.

Grapheme-Aware Search¶

Synopsis¶

constexpr bool contains_grapheme(utf8_char ch) const noexcept;
constexpr bool contains_grapheme(utf8_string_view sv) const noexcept;
constexpr size_type find_grapheme(utf8_char ch, size_type pos = 0) const noexcept;
constexpr size_type find_grapheme(utf8_string_view sv, size_type pos = 0) const noexcept;
constexpr size_type rfind_grapheme(utf8_char ch, size_type pos = npos) const noexcept;
constexpr size_type rfind_grapheme(utf8_string_view sv, size_type pos = npos) const noexcept;

constexpr bool contains_grapheme(utf16_char ch) const noexcept;
constexpr bool contains_grapheme(utf16_string_view sv) const noexcept;
constexpr size_type find_grapheme(utf16_char ch, size_type pos = 0) const noexcept;
constexpr size_type find_grapheme(utf16_string_view sv, size_type pos = 0) const noexcept;
constexpr size_type rfind_grapheme(utf16_char ch, size_type pos = npos) const noexcept;
constexpr size_type rfind_grapheme(utf16_string_view sv, size_type pos = npos) const noexcept;

Behavior¶

These overloads only report matches that begin on grapheme boundaries.

Return value¶

Returns the matching offset in UTF-8 bytes or UTF-16 code units, or npos.

Complexity¶

Linear in the view length.

Exceptions And `noexcept`¶

None.

All overloads are noexcept.

`find` And `rfind`¶

Synopsis¶

constexpr size_type find(char8_t ch, size_type pos = 0) const noexcept;
constexpr size_type find(utf8_char ch, size_type pos = 0) const noexcept;
constexpr size_type find(utf8_string_view sv, size_type pos = 0) const noexcept;
constexpr size_type find(std::span<const utf8_char> chars, size_type pos = 0) const noexcept;
template <details::utf8_char_predicate Pred>
constexpr size_type find(Pred pred, size_type pos = 0) const noexcept;

constexpr size_type rfind(char8_t ch, size_type pos = npos) const noexcept;
constexpr size_type rfind(utf8_char ch, size_type pos = npos) const noexcept;
constexpr size_type rfind(utf8_string_view sv, size_type pos = npos) const noexcept;
constexpr size_type rfind(std::span<const utf8_char> chars, size_type pos = npos) const noexcept;
template <details::utf8_char_predicate Pred>
constexpr size_type rfind(Pred pred, size_type pos = npos) const noexcept;

constexpr size_type find(char16_t ch, size_type pos = 0) const noexcept;
constexpr size_type find(utf16_char ch, size_type pos = 0) const noexcept;
constexpr size_type find(utf16_string_view sv, size_type pos = 0) const noexcept;
constexpr size_type find(std::span<const utf16_char> chars, size_type pos = 0) const noexcept;
template <details::utf16_char_predicate Pred>
constexpr size_type find(Pred pred, size_type pos = 0) const noexcept;

constexpr size_type rfind(char16_t ch, size_type pos = npos) const noexcept;
constexpr size_type rfind(utf16_char ch, size_type pos = npos) const noexcept;
constexpr size_type rfind(utf16_string_view sv, size_type pos = npos) const noexcept;
constexpr size_type rfind(std::span<const utf16_char> chars, size_type pos = npos) const noexcept;
template <details::utf16_char_predicate Pred>
constexpr size_type rfind(Pred pred, size_type pos = npos) const noexcept;

Behavior¶

The raw char8_t and char16_t overloads search code units directly.
The Char, View, span, and predicate overloads are character-aware.
Character-aware forward searches align pos upward to the next valid character boundary.
Character-aware reverse searches align pos downward to the previous valid character boundary.
The span overload treats the span as a character set.

Overload differences¶

The examples below use constexpr auto text = "😄-🇷🇴-✨"_utf8_sv;.

Overload	Meaning	Example
`find(char8_t ch, pos)`	raw code-unit search; usually most useful for ASCII punctuation or diagnostics	`text.find(u8'-') == 4`
`find(Char ch, pos)`	exact validated character search	`text.find("✨"_u8c) == 14`
`find(View sv, pos)`	exact validated substring search	`text.find("🇷🇴"_utf8_sv) == 5`
`find(std::span<const Char> chars, pos)`	first character belonging to a character set	`text.find(std::array{"🔥"_u8c, "✨"_u8c}) == 14`
`find(Pred pred, pos)`	first validated character satisfying a predicate	`text.find([](utf8_char ch) { return ch.is_ascii_punctuation(); }) == 4`

The same distinctions apply to rfind(...), but searching from the end.

Inspiration¶

The raw-code-unit and exact-substring forms are intentionally familiar to users of C++ std::basic_string_view::find and rfind. The predicate-oriented forms are closer to Rust's str pattern-heavy search APIs.

Return value¶

Returns the matching code-unit offset, or npos.

Complexity¶

Linear in the number of remaining code units or scalars.

Exceptions And `noexcept`¶

None.

All overloads are noexcept.

Example¶

#include "unicode_ranges_all.hpp"

#include <print>

using namespace unicode_ranges;
using namespace unicode_ranges::literals;

int main()
{
    constexpr auto view = "café café"_utf8_sv;
    auto owned = "été en été"_utf8_s;

    std::println("{}", view);                  // café café
    std::println("{}", view.find("é"_u8c));    // 3
    std::println("{}", view.rfind("é"_u8c));   // 9

    std::println("{}",
        owned.replace_all("é"_u8c, "e"_u8c)); // ete en ete
}

`find_first_of`, `find_first_not_of`, `find_last_of`, `find_last_not_of`¶

Synopsis¶

constexpr size_type find_first_of(char8_t ch, size_type pos = 0) const noexcept;
constexpr size_type find_first_of(utf8_char ch, size_type pos = 0) const noexcept;
constexpr size_type find_first_of(utf8_string_view sv, size_type pos = 0) const noexcept;

constexpr size_type find_first_not_of(char8_t ch, size_type pos = 0) const noexcept;
constexpr size_type find_first_not_of(utf8_char ch, size_type pos = 0) const noexcept;
constexpr size_type find_first_not_of(utf8_string_view sv, size_type pos = 0) const noexcept;

constexpr size_type find_last_of(char8_t ch, size_type pos = npos) const noexcept;
constexpr size_type find_last_of(utf8_char ch, size_type pos = npos) const noexcept;
constexpr size_type find_last_of(utf8_string_view sv, size_type pos = npos) const noexcept;

constexpr size_type find_last_not_of(char8_t ch, size_type pos = npos) const noexcept;
constexpr size_type find_last_not_of(utf8_char ch, size_type pos = npos) const noexcept;
constexpr size_type find_last_not_of(utf8_string_view sv, size_type pos = npos) const noexcept;

constexpr size_type find_first_of(char16_t ch, size_type pos = 0) const noexcept;
constexpr size_type find_first_of(utf16_char ch, size_type pos = 0) const noexcept;
constexpr size_type find_first_of(utf16_string_view sv, size_type pos = 0) const noexcept;

constexpr size_type find_first_not_of(char16_t ch, size_type pos = 0) const noexcept;
constexpr size_type find_first_not_of(utf16_char ch, size_type pos = 0) const noexcept;
constexpr size_type find_first_not_of(utf16_string_view sv, size_type pos = 0) const noexcept;

constexpr size_type find_last_of(char16_t ch, size_type pos = npos) const noexcept;
constexpr size_type find_last_of(utf16_char ch, size_type pos = npos) const noexcept;
constexpr size_type find_last_of(utf16_string_view sv, size_type pos = npos) const noexcept;

constexpr size_type find_last_not_of(char16_t ch, size_type pos = npos) const noexcept;
constexpr size_type find_last_not_of(utf16_char ch, size_type pos = npos) const noexcept;
constexpr size_type find_last_not_of(utf16_string_view sv, size_type pos = npos) const noexcept;

Behavior¶

Raw code-unit overloads examine code units directly.
Char and View overloads are character-aware.
View overloads treat the view as a set of characters that may match or fail the character being examined.

Overload differences¶

The examples below use constexpr auto text = "😄🇷🇴✨"_utf8_sv;.

Overload	Meaning	Example
`find_first_of(Char ch)`	first exact character match	`text.find_first_of("✨"_u8c)`
`find_first_of(View sv)`	first character that is contained in `sv`	`text.find_first_of("🇷✨"_utf8_sv)`
`find_first_not_of(View sv)`	first character that is not contained in `sv`	`text.find_first_not_of("😄"_utf8_sv)`
`find_last_of(View sv)`	last character contained in `sv`	`text.find_last_of("🇷✨"_utf8_sv)`
`find_last_not_of(View sv)`	last character not contained in `sv`	`text.find_last_not_of("✨"_utf8_sv)`

This family is intentionally character-set oriented. A View argument here is not a substring delimiter; it is a bag of candidate characters, much like the classic find_first_of family on the C++ standard string types.

Inspiration¶

This family is directly modeled after C++ std::basic_string_view::find_first_of and related members.

Return value¶

Returns the matching code-unit offset, or npos.

Complexity¶

Linear in the view length.

Exceptions And `noexcept`¶

None.

All overloads are noexcept.

Boundary Queries¶

Synopsis¶

constexpr bool is_char_boundary(size_type index) const noexcept;
constexpr bool is_grapheme_boundary(size_type index) const noexcept;
constexpr size_type ceil_char_boundary(size_type pos) const noexcept;
constexpr size_type floor_char_boundary(size_type pos) const noexcept;
constexpr size_type ceil_grapheme_boundary(size_type pos) const noexcept;
constexpr size_type floor_grapheme_boundary(size_type pos) const noexcept;

Behavior¶

Character-boundary members use encoding-level scalar boundaries.
Grapheme-boundary members use Unicode grapheme segmentation.
ceil_* returns the first boundary at or after pos.
floor_* returns the last boundary at or before pos.

Return value¶

Returns a boolean for predicate queries and a boundary offset for ceil_* / floor_*.

Complexity¶

is_char_boundary() is constant.
The other members are linear in the distance to the nearest boundary.

Exceptions And `noexcept`¶

None.

All listed members are noexcept.

Example¶

#include "unicode_ranges_all.hpp"

#include <print>

using namespace unicode_ranges;
using namespace unicode_ranges::literals;

int main()
{
    constexpr auto text = "é🇷🇴!"_utf8_sv;

    std::println("{}", text.is_char_boundary(1));         // true
    std::println("{}", text.is_grapheme_boundary(1));     // false
    std::println("{}", text.ceil_grapheme_boundary(7));   // 11
    std::println("{}", text.floor_grapheme_boundary(7));  // 3

    std::println("{}", text.chars());          // [e, ́, 🇷, 🇴, !]
    std::println("{::s}", text.graphemes());   // [é, 🇷🇴, !]
}

Direct Access And Substrings¶

Synopsis¶

constexpr std::optional<utf8_char> char_at(size_type index) const noexcept;
constexpr utf8_char char_at_unchecked(size_type index) const noexcept;
constexpr std::optional<utf8_string_view> grapheme_at(size_type index) const& noexcept;
constexpr std::optional<utf8_string_view> grapheme_at(size_type index) && = delete;      // owning strings only
constexpr std::optional<utf8_string_view> grapheme_at(size_type index) const&& = delete; // owning strings only
constexpr std::optional<utf8_char> front() const noexcept;
constexpr utf8_char front_unchecked() const noexcept;
constexpr std::optional<utf8_char> back() const noexcept;
constexpr utf8_char back_unchecked() const noexcept;

// View receivers
constexpr std::optional<utf8_string_view> substr(size_type pos, size_type count = npos) const& noexcept;
constexpr utf8_string_view substr_unchecked(size_type pos, size_type count = npos) const& noexcept;
constexpr std::optional<utf8_string_view> grapheme_substr(size_type pos, size_type count = npos) const& noexcept;

// Owning string receivers
constexpr std::optional<utf8_string> substr(size_type pos, size_type count = npos) const&;
constexpr std::optional<utf8_string> substr(size_type pos, size_type count = npos) && noexcept(/* conditional */);
constexpr utf8_string substr_unchecked(size_type pos, size_type count = npos) const&;
constexpr utf8_string substr_unchecked(size_type pos, size_type count = npos) && noexcept(/* conditional */);
constexpr std::optional<utf8_string> grapheme_substr(size_type pos, size_type count = npos) const&;
constexpr std::optional<utf8_string> grapheme_substr(size_type pos, size_type count = npos) && noexcept(/* conditional */);

constexpr std::optional<utf16_char> char_at(size_type index) const noexcept;
constexpr utf16_char char_at_unchecked(size_type index) const noexcept;
constexpr std::optional<utf16_char> front() const noexcept;
constexpr utf16_char front_unchecked() const noexcept;
constexpr std::optional<utf16_char> back() const noexcept;
constexpr utf16_char back_unchecked() const noexcept;

// UTF-16 and UTF-32 expose the same ref-qualified access and substring families.

Behavior¶

Checked accessors return std::nullopt for empty views, invalid indices, or invalid boundaries.
char_at_unchecked(), front_unchecked(), and back_unchecked() assume their preconditions hold.
substr() requires both ends of the slice to be character boundaries. If count != npos, it must fit within the remaining extent or the call returns std::nullopt.
substr_unchecked() interprets pos and count like substr(). If count == npos, it selects the remaining tail; otherwise it assumes the requested bounds already form a valid UTF substring.
grapheme_at() and grapheme_substr() require grapheme boundaries. If count != npos, grapheme_substr() also requires the requested extent to stay within the remaining text.
grapheme_at() returns a borrowed view into the receiver. On owning strings, the && and const&& overloads are deleted so a temporary owning string cannot produce a dangling subview.
substr(), substr_unchecked(), and grapheme_substr() are ownership-preserving: view receivers return views, owning receivers return owning strings.
Owning rvalue substr(), substr_unchecked(), and grapheme_substr() are for disposable strings. They adjust the existing owned buffer where possible and do not create an additional owning copy.

Return value¶

Checked accessors return the requested character or slice when the request is valid, otherwise std::nullopt.
substr_unchecked() returns the requested borrowed or owning slice directly.

Complexity¶

Checked element access is constant to linear in the size of the selected character.
Grapheme checks are linear in nearby segmentation work.
Owning lvalue substr(), substr_unchecked(), and grapheme_substr() copy the selected slice.
Owning rvalue substr() and grapheme_substr() are linear in the amount of boundary work plus the cost of adjusting the existing storage.
substr_unchecked() skips the checked boundary-validation work.

Exceptions And `noexcept`¶

View accessors do not throw.
Owning lvalue substr(), substr_unchecked(), and grapheme_substr() may throw allocator or container exceptions.
Owning rvalue substr(), substr_unchecked(), and grapheme_substr() do not allocate in the bound-adjustment path; their noexcept status follows the owning string move constructor.
View accessors are noexcept.
Deleted owning-rvalue grapheme_at() signatures cannot be called.
Owning lvalue substr(), substr_unchecked(), and grapheme_substr() are not noexcept.
Owning rvalue substr(), substr_unchecked(), and grapheme_substr() are conditionally noexcept; with the default owning string types, they are noexcept.

Prefix And Suffix Tests¶

Synopsis¶

constexpr bool starts_with(char ch) const noexcept;   // utf8 only
constexpr bool starts_with(char8_t ch) const noexcept;
constexpr bool starts_with(utf8_char ch) const noexcept;
constexpr bool starts_with(utf8_string_view sv) const noexcept;
constexpr bool starts_with(std::span<const utf8_char> chars) const noexcept;
template <details::utf8_char_predicate Pred>
constexpr bool starts_with(Pred pred) const noexcept(/* conditional */);

constexpr bool ends_with(char ch) const noexcept;     // utf8 only
constexpr bool ends_with(char8_t ch) const noexcept;
constexpr bool ends_with(utf8_char ch) const noexcept;
constexpr bool ends_with(utf8_string_view sv) const noexcept;
constexpr bool ends_with(std::span<const utf8_char> chars) const noexcept;

constexpr bool starts_with(char16_t ch) const noexcept;
constexpr bool starts_with(utf16_char ch) const noexcept;
constexpr bool starts_with(utf16_string_view sv) const noexcept;
constexpr bool starts_with(std::span<const utf16_char> chars) const noexcept;
template <details::utf16_char_predicate Pred>
constexpr bool starts_with(Pred pred) const noexcept(/* conditional */);

constexpr bool ends_with(char16_t ch) const noexcept;
constexpr bool ends_with(utf16_char ch) const noexcept;
constexpr bool ends_with(utf16_string_view sv) const noexcept;
constexpr bool ends_with(std::span<const utf16_char> chars) const noexcept;

Behavior¶

Character overloads compare the first or last validated character.
View overloads compare encoded prefixes or suffixes.
Span overloads treat the span as a set of characters.
Predicate overloads test the first character only and are conditionally noexcept.

Overload differences¶

The examples below use constexpr auto text = "😄🇷🇴✨"_utf8_sv;.

Overload	Meaning	Example
`starts_with(Char ch)`	compare the first validated character	`text.starts_with("😄"_u8c)`
`starts_with(View sv)`	compare an exact encoded prefix	`text.starts_with("😄🇷🇴"_utf8_sv)`
`starts_with(std::span<const Char> chars)`	test whether the first character belongs to a character set	`text.starts_with(std::array{"😄"_u8c, "✨"_u8c})`
`starts_with(Pred pred)`	test the first character with a predicate	`text.starts_with([](utf8_char ch) { return !ch.is_ascii(); })`

The same distinctions apply to ends_with(...), but against the last character or suffix.

Inspiration¶

The single-character and exact-prefix overloads are close to C++ std::basic_string_view::starts_with and Rust's str prefix/suffix APIs.

Return value¶

Returns false when the view is empty and no character is available to test.

Complexity¶

Constant for single-character and predicate overloads, linear in the compared prefix or suffix for view overloads.

Exceptions And `noexcept`¶

Only predicate overloads can throw, and only if the predicate throws.

Non-predicate overloads are noexcept.
Predicate overloads are conditionally noexcept.

Split Views¶

Synopsis¶

constexpr auto split(Char ch) const& noexcept;
constexpr auto split(Char ch) && noexcept(/* conditional */); // owning strings only
constexpr auto split(View sv) const& noexcept;
constexpr auto split(View sv) && noexcept(/* conditional */); // owning strings only
template <Predicate Pred> constexpr auto split(Pred pred) const& noexcept;
template <Predicate Pred> constexpr auto split(Pred pred) && noexcept(/* conditional */); // owning strings only

constexpr auto split_trimmed(Char ch) const& noexcept;
constexpr auto split_trimmed(View sv) const& noexcept;
template <Predicate Pred> constexpr auto split_trimmed(Pred pred) const& noexcept;

constexpr auto split_whitespace() const& noexcept;
constexpr auto split_ascii_whitespace() const& noexcept;

constexpr auto rsplit(Char ch) const& noexcept;
constexpr auto rsplit(View sv) const& noexcept;
template <Predicate Pred> constexpr auto rsplit(Pred pred) const& noexcept;

constexpr auto split_terminator(Char ch) const& noexcept;
constexpr auto split_terminator(View sv) const& noexcept;
template <Predicate Pred> constexpr auto split_terminator(Pred pred) const& noexcept;

constexpr auto rsplit_terminator(Char ch) const& noexcept;
constexpr auto rsplit_terminator(View sv) const& noexcept;
template <Predicate Pred> constexpr auto rsplit_terminator(Pred pred) const& noexcept;

constexpr auto splitn(size_type count, Char ch) const& noexcept;
constexpr auto splitn(size_type count, View sv) const& noexcept;
template <Predicate Pred> constexpr auto splitn(size_type count, Pred pred) const& noexcept;

constexpr auto rsplitn(size_type count, Char ch) const& noexcept;
constexpr auto rsplitn(size_type count, View sv) const& noexcept;
template <Predicate Pred> constexpr auto rsplitn(size_type count, Pred pred) const& noexcept;

constexpr auto split_inclusive(Char ch) const& noexcept;
constexpr auto split_inclusive(View sv) const& noexcept;
template <Predicate Pred> constexpr auto split_inclusive(Pred pred) const& noexcept;

// Every listed split-view family exposes the same owning-string && pattern as split(...).

Behavior¶

These members return lazy split ranges over subviews of the source text.
The returned split types are std::ranges::view_interface-based views rather than eagerly materialized containers.
split_trimmed trims whitespace around each field.
split_whitespace uses Unicode whitespace; split_ascii_whitespace uses ASCII whitespace only.
split_terminator suppresses a trailing empty field caused only by a trailing delimiter.
split_inclusive keeps the delimiter in the yielded field.
splitn and rsplitn stop after count fields.
On owning rvalues, split-view APIs return move-only owning views that keep the moved string alive for iteration.

Overload differences¶

The examples below use constexpr auto text = "😄 | 🇷🇴 | ✨"_utf8_sv;.

Overload	Meaning	Example
`split(Char ch)`	split on an exact validated character delimiter	`text.split("\|"_u8c)`
`split(View sv)`	split on an exact validated substring delimiter	`text.split(" \| "_utf8_sv)`
`split(Pred pred)`	split whenever a validated character satisfies the predicate	`text.split([](utf8_char ch) { return ch.is_ascii_whitespace(); })`

The same pattern applies to split_trimmed, split_terminator, split_inclusive, splitn, rsplit, rsplitn, and rsplit_terminator.

Inspiration¶

This family leans heavily on Rust's str split APIs such as split, split_once, split_inclusive, split_terminator, split_whitespace, and splitn.

Return value¶

Returns a lazy range; iteration performs the actual split. View and owning-lvalue receivers produce borrowed subviews into the receiver. Owning-rvalue receivers produce a move-only, non-borrowed owning view that owns the source string.

Complexity¶

Constructing the range is constant.
Iterating the full split result is linear in the source length.

Performance Notes¶

ASCII whitespace scans avoid constructing character objects on UTF-8 input; non-ASCII bytes are treated as non-delimiters for split_ascii_whitespace.

Exceptions And `noexcept`¶

Construction does not allocate. Predicate objects may throw later when the returned range is iterated.

View and owning-lvalue receiver overloads are noexcept.
Owning-rvalue receiver overloads are conditionally noexcept; with the default owning string types, they are noexcept.

Example¶

#include "unicode_ranges_all.hpp"

#include <print>

using namespace unicode_ranges;
using namespace unicode_ranges::literals;

int main()
{
    constexpr auto line = " café | thé | apă "_utf8_sv;
    constexpr auto framed = "***café***"_utf8_sv;

    for (auto part : line.split_trimmed("|"_utf8_sv))
    {
        std::println("[{}]", part);
    }
    // [café]
    // [thé]
    // [apă]

    std::println("{}", framed.trim_matches("*"_u8c)); // café
}

Match Views And One-Shot Splits¶

Synopsis¶

// character_set_range<R, Char> means a viewable forward range of validated Char values.
constexpr auto matches(Char ch) const& noexcept;
constexpr auto matches(Char ch) && noexcept; // owning strings
constexpr auto matches(View sv) const& noexcept;
constexpr auto matches(View sv) && noexcept; // owning strings
template <std::ranges::viewable_range R> requires character_set_range<R, Char>
constexpr auto matches(R&& chars) const& noexcept;
template <std::ranges::viewable_range R> requires character_set_range<R, Char>
constexpr auto matches(R&& chars) && noexcept; // owning strings
template <Predicate Pred> constexpr auto matches(Pred pred) const& noexcept;
template <Predicate Pred> constexpr auto matches(Pred pred) && noexcept; // owning strings

constexpr auto match_indices(Char ch) const& noexcept;
constexpr auto match_indices(Char ch) && noexcept; // owning strings
constexpr auto match_indices(View sv) const& noexcept;
constexpr auto match_indices(View sv) && noexcept; // owning strings
template <std::ranges::viewable_range R> requires character_set_range<R, Char>
constexpr auto match_indices(R&& chars) const& noexcept;
template <std::ranges::viewable_range R> requires character_set_range<R, Char>
constexpr auto match_indices(R&& chars) && noexcept; // owning strings
template <Predicate Pred> constexpr auto match_indices(Pred pred) const& noexcept;
template <Predicate Pred> constexpr auto match_indices(Pred pred) && noexcept; // owning strings

constexpr auto rmatches(Char ch) const& noexcept;
constexpr auto rmatches(Char ch) && noexcept; // owning strings
constexpr auto rmatches(View sv) const& noexcept;
constexpr auto rmatches(View sv) && noexcept; // owning strings
template <std::ranges::viewable_range R> requires character_set_range<R, Char>
constexpr auto rmatches(R&& chars) const& noexcept;
template <std::ranges::viewable_range R> requires character_set_range<R, Char>
constexpr auto rmatches(R&& chars) && noexcept; // owning strings
template <Predicate Pred> constexpr auto rmatches(Pred pred) const& noexcept;
template <Predicate Pred> constexpr auto rmatches(Pred pred) && noexcept; // owning strings

constexpr auto rmatch_indices(Char ch) const& noexcept;
constexpr auto rmatch_indices(Char ch) && noexcept; // owning strings
constexpr auto rmatch_indices(View sv) const& noexcept;
constexpr auto rmatch_indices(View sv) && noexcept; // owning strings
template <std::ranges::viewable_range R> requires character_set_range<R, Char>
constexpr auto rmatch_indices(R&& chars) const& noexcept;
template <std::ranges::viewable_range R> requires character_set_range<R, Char>
constexpr auto rmatch_indices(R&& chars) && noexcept; // owning strings
template <Predicate Pred> constexpr auto rmatch_indices(Pred pred) const& noexcept;
template <Predicate Pred> constexpr auto rmatch_indices(Pred pred) && noexcept; // owning strings

constexpr split_once_result<View> split_once(Char ch) const& noexcept;
constexpr split_once_result<View> split_once(View sv) const& noexcept;
constexpr split_once_result<View> split_once(std::span<const Char> chars) const& noexcept;
template <Predicate Pred>
constexpr split_once_result<View> split_once(Pred pred) const& noexcept;
template <typename... Args>
constexpr split_once_result<View> split_once(Args&&...) && = delete;      // owning strings only
template <typename... Args>
constexpr split_once_result<View> split_once(Args&&...) const&& = delete; // owning strings only

constexpr split_once_result<View> rsplit_once(Char ch) const& noexcept;
constexpr split_once_result<View> rsplit_once(View sv) const& noexcept;
constexpr split_once_result<View> rsplit_once(std::span<const Char> chars) const& noexcept;
template <Predicate Pred>
constexpr split_once_result<View> rsplit_once(Pred pred) const& noexcept;
template <typename... Args>
constexpr split_once_result<View> rsplit_once(Args&&...) && = delete;      // owning strings only
template <typename... Args>
constexpr split_once_result<View> rsplit_once(Args&&...) const&& = delete; // owning strings only

constexpr split_once_at_result<View> split_once_at(size_type delim) const& noexcept;
constexpr split_once_at_result<View> split_once_at(size_type delim) && = delete;      // owning strings only
constexpr split_once_at_result<View> split_once_at(size_type delim) const&& = delete; // owning strings only
constexpr split_once_at_unchecked_result<View> split_once_at_unchecked(size_type delim) const& noexcept;
constexpr split_once_at_unchecked_result<View> split_once_at_unchecked(size_type delim) && = delete;      // owning strings only
constexpr split_once_at_unchecked_result<View> split_once_at_unchecked(size_type delim) const&& = delete; // owning strings only

Behavior¶

matches and rmatches yield matching subviews, not character values. This is true even when the pattern is a single Char, a character-set range, or a predicate that matches one character.
match_indices yields forward-ordered (offset, subview) pairs.
rmatch_indices yields reverse-ordered (offset, subview) pairs.
Match-family range overloads treat the input as an "any of these characters" set. The input must be a viewable forward range of validated Char values; the stored delimiter set must become a copyable forward view after adaptation.
On owning strings, rvalue-qualified match views are move-only owning views that keep the source string alive.
split_once and rsplit_once split around the first or last match.
split_once_at validates that delim is a character boundary.
split_once_at_unchecked assumes the supplied offset is already a valid boundary; violating that precondition is undefined behavior.
The range-returning members in this section are lazy std::ranges::view_interface-based views. Owning-string receivers and temporary delimiter-set ranges may make the returned view non-borrowed.
One-shot split APIs return pair-like result objects with left() and right() borrowed subviews. On owning strings, their && and const&& overloads are deleted so a temporary owning string cannot produce dangling result elements.
split_once_result<View>, split_once_at_result<View>, and split_once_at_unchecked_result<View> provide operator bool, has_value(), tuple-like access for structured bindings, and range iteration.
These result types model sized contiguous views. A successful checked result has range size 2; a failed checked result has range size 0. An unchecked result always has range size 2 and has_value() == true.
Iterating a successful one-shot split result yields exactly two views: the left side, then the right side. Iterating a failed result yields no elements.
In C++26-or-newer compiler modes that define __cpp_deleted_function, these deleted overloads include a diagnostic reason; otherwise they compile as ordinary deleted overloads.

Overload differences¶

The examples below use constexpr auto text = "😄=🇷🇴=✨"_utf8_sv;.

Overload	Meaning	Example
`split_once(Char ch)`	split at the first exact character delimiter	`text.split_once("="_u8c)`
`split_once(View sv)`	split at the first exact substring delimiter	`text.split_once("=🇷🇴="_utf8_sv)`
`split_once(std::span<const Char> chars)`	split at the first character that belongs to a character set	`text.split_once(std::array{"="_u8c, "✨"_u8c})`
`split_once(Pred pred)`	split at the first character satisfying the predicate	`text.split_once([](utf8_char ch) { return ch.is_ascii_punctuation(); })`

The same distinctions apply to rsplit_once(...), but from the end.

The match-family range overloads use the same character-set idea lazily:

auto marks = text.match_indices(std::array{ "="_u8c, "âœ¨"_u8c });

Temporary non-view ranges, including std::array{...} delimiter sets, are owned by the returned view. Lvalue ranges are referenced using normal range lifetime rules, and raw std::initializer_list delimiter sets are intentionally unsupported.

For one-shot split APIs, bind the owning string first or call through a view when the source storage is known to outlive the result. The same compiled example also shows the rvalue-aware trim and strip overloads:

#include "unicode_ranges_all.hpp"

#include <cassert>
#include <ranges>
#include <utility>

using namespace unicode_ranges;
using namespace unicode_ranges::literals;

int main()
{
    utf8_string text = u8"  café  "_utf8_s;

    auto copied = text.trim_whitespace();
    assert(copied == u8"café"_utf8_sv);
    assert(text == u8"  café  "_utf8_sv);

    auto disposable = u8"  café  "_utf8_s;
    auto trimmed = std::move(disposable).trim_whitespace();
    assert(trimmed == u8"café"_utf8_sv);

    auto framed = u8"<<<payload>>>"_utf8_s;
    auto payload = std::move(framed).trim_prefix(u8"<<<"_utf8_sv).trim_suffix(u8">>>"_utf8_sv);
    assert(payload == u8"payload"_utf8_sv);

    utf8_string key_value = u8"name=value"_utf8_s;
    auto split = key_value.split_once(u8"="_u8c);
    assert(split.has_value());
    assert(split.left() == u8"name"_utf8_sv);
    assert(split.right() == u8"value"_utf8_sv);
    assert(std::ranges::size(split) == 2);
}

Inspiration¶

The pair-like result surface is deliberately close to Rust's str::split_once and str::rsplit_once.

In C++20 and C++23, bind the result before destructuring:

if (auto split = text.split_once(u8"="_u8c)) {
    auto [left, right] = split;
}

C++26 structured binding conditions can destructure directly in the condition on compilers that implement that language feature.

Return value¶

Match APIs return lazy ranges.
split_once and rsplit_once return a false split_once_result<View> when no match exists. In that case, left() and right() both view the full input.
split_once_at returns a false split_once_at_result<View> when delim is not a character boundary. In that case, left() and right() both view the full input.
split_once_at_unchecked returns a true split_once_at_unchecked_result<View> and performs the split without validation.

Complexity¶

Linear in the view length.

Performance Notes¶

Character-set matchers cache ASCII membership, so ASCII-heavy match sets can scan without decoding every character.

Exceptions And `noexcept`¶

None, unless a predicate object throws when invoked.

Match-family Char, View, and predicate overloads are noexcept.
Match-family character-set range overloads are conditionally noexcept based on adapting the delimiter range.
Match-family owning-rvalue overloads are conditionally noexcept; with the default owning string types, they are noexcept.
One-shot split APIs are noexcept for valid receiver categories.
Deleted owning-rvalue one-shot split signatures cannot be called.

Trim Families¶

Synopsis¶

// View receivers
constexpr View trim_prefix(Char ch) const& noexcept;
constexpr View trim_prefix(View sv) const& noexcept;
constexpr View trim_suffix(Char ch) const& noexcept;
constexpr View trim_suffix(View sv) const& noexcept;

constexpr View trim_start_matches(Char ch) const& noexcept;
constexpr View trim_start_matches(View sv) const& noexcept;
constexpr View trim_start_matches(std::span<const Char> chars) const& noexcept;
template <Predicate Pred> constexpr View trim_start_matches(Pred pred) const& noexcept(/* conditional */);

constexpr View trim_end_matches(Char ch) const& noexcept;
constexpr View trim_end_matches(View sv) const& noexcept;
constexpr View trim_end_matches(std::span<const Char> chars) const& noexcept;
template <Predicate Pred> constexpr View trim_end_matches(Pred pred) const& noexcept(/* conditional */);

constexpr View trim_matches(Char ch) const& noexcept;
constexpr View trim_matches(View sv) const& noexcept;
constexpr View trim_matches(std::span<const Char> chars) const& noexcept;
template <Predicate Pred> constexpr View trim_matches(Pred pred) const& noexcept(/* conditional */);

constexpr View trim_whitespace_start() const& noexcept;
constexpr View trim_whitespace_end() const& noexcept;
constexpr View trim_whitespace() const& noexcept;
constexpr View trim_ascii_whitespace_start() const& noexcept;
constexpr View trim_ascii_whitespace_end() const& noexcept;
constexpr View trim_ascii_whitespace() const& noexcept;

// Owning string receivers return the matching utf*_string instead of View.
constexpr String trim_prefix(...) const&;
constexpr String trim_prefix(...) && noexcept(/* conditional */);
constexpr String trim_suffix(...) const&;
constexpr String trim_suffix(...) && noexcept(/* conditional */);
constexpr String trim_start_matches(...) const&;
constexpr String trim_start_matches(...) && noexcept(/* conditional */);
constexpr String trim_end_matches(...) const&;
constexpr String trim_end_matches(...) && noexcept(/* conditional */);
constexpr String trim_matches(...) const&;
constexpr String trim_matches(...) && noexcept(/* conditional */);
constexpr String trim_whitespace_start() const&;
constexpr String trim_whitespace_start() && noexcept(/* conditional */);
constexpr String trim_whitespace_end() const&;
constexpr String trim_whitespace_end() && noexcept(/* conditional */);
constexpr String trim_whitespace() const&;
constexpr String trim_whitespace() && noexcept(/* conditional */);
constexpr String trim_ascii_whitespace_start() const&;
constexpr String trim_ascii_whitespace_start() && noexcept(/* conditional */);
constexpr String trim_ascii_whitespace_end() const&;
constexpr String trim_ascii_whitespace_end() && noexcept(/* conditional */);
constexpr String trim_ascii_whitespace() const&;
constexpr String trim_ascii_whitespace() && noexcept(/* conditional */);

Behavior¶

trim_prefix and trim_suffix remove exactly one matching prefix or suffix and keep the original view when no removal happens.
trim_*_matches remove repeated matches from one or both ends.
trim_whitespace_* and trim_whitespace() use Unicode whitespace semantics.
trim_ascii_whitespace_* and trim_ascii_whitespace() use ASCII whitespace only.
View receivers return borrowed subviews and never allocate.
Owning lvalue receivers return owning strings and leave the receiver unchanged.
Owning rvalue receivers return owning strings and adjust the existing buffer where possible. This is the preferred form when the source string is disposable.

Overload differences¶

The examples below use constexpr auto framed = "✨✨😄🇷🇴✨✨"_utf8_sv;.

Overload	Meaning	Example
`trim_prefix(View sv)`	remove one exact prefix occurrence or return the original view unchanged	`framed.trim_prefix("✨✨"_utf8_sv)`
`trim_start_matches(Char ch)`	repeatedly remove one exact character from the start	`framed.trim_start_matches("✨"_u8c)`
`trim_start_matches(View sv)`	repeatedly remove one exact substring from the start	`framed.trim_start_matches("✨"_utf8_sv)`
`trim_start_matches(std::span<const Char> chars)`	repeatedly remove any leading character that belongs to a set	`framed.trim_start_matches(std::array{"✨"_u8c, "😄"_u8c})`
`trim_start_matches(Pred pred)`	repeatedly remove leading characters satisfying a predicate	`framed.trim_start_matches([](utf8_char ch) { return !ch.is_ascii(); })`

The same distinctions apply to trim_end_matches(...) and trim_matches(...).

For the span overload, adjacency does not matter. std::array{"✨"_u8c, "😄"_u8c} means "keep trimming while the next edge character is either ✨ or 😄". It does not mean "trim the substring ✨😄".

Inspiration¶

This family is strongly inspired by Rust's str APIs such as trim_matches, trim_start_matches, trim_end_matches, trim_prefix, and trim_suffix.

Return value¶

Returns a borrowed subview for view receivers, or an owning string for owning receivers.

Owning rvalue results do not refer to the moved-from object.

Complexity¶

Linear in the number of leading or trailing characters examined.

Exceptions And `noexcept`¶

View receivers do not throw, unless a predicate object throws when invoked.
Owning lvalue receivers may throw allocator or container exceptions because they produce an owning copy.
Owning rvalue receivers do not allocate in the bound-adjustment path; predicate overloads can still throw if the predicate throws.
View receiver overloads are noexcept except predicate forms, which are conditionally noexcept.
Owning lvalue receiver overloads are not noexcept.
Owning rvalue receiver overloads are conditionally noexcept; with the default owning string types, the non-predicate forms and non-throwing predicate forms are noexcept.

Owning Transformations¶

Synopsis¶

template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_utf8(const Allocator& alloc = Allocator()) const&;
constexpr auto to_utf8() &&;                                    // owning strings only; returns the same owning UTF-8 string type
template <typename Allocator>
constexpr basic_utf8_string<Allocator> to_utf8(const Allocator& alloc) &&; // owning strings only

template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_utf16(const Allocator& alloc = Allocator()) const;

template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_ascii_lowercase(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_ascii_lowercase(size_type pos, size_type count, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_ascii_uppercase(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_ascii_uppercase(size_type pos, size_type count, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_lowercase(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_lowercase(size_type pos, size_type count, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_uppercase(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_uppercase(size_type pos, size_type count, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> normalize(normalization_form form, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_nfc(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_nfd(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_nfkc(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_nfkd(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> case_fold(const Allocator& alloc = Allocator()) const;

template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_utf16(const Allocator& alloc = Allocator()) const&;
constexpr auto to_utf16() &&;                                       // owning strings only; returns the same owning UTF-16 string type
template <typename Allocator>
constexpr basic_utf16_string<Allocator> to_utf16(const Allocator& alloc) &&; // owning strings only

template <typename Allocator = std::allocator<char8_t>>
constexpr basic_utf8_string<Allocator> to_utf8(const Allocator& alloc = Allocator()) const;

template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_ascii_lowercase(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_ascii_lowercase(size_type pos, size_type count, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_ascii_uppercase(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_ascii_uppercase(size_type pos, size_type count, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_lowercase(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_lowercase(size_type pos, size_type count, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_uppercase(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_uppercase(size_type pos, size_type count, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> normalize(normalization_form form, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_nfc(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_nfd(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_nfkc(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> to_nfkd(const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
constexpr basic_utf16_string<Allocator> case_fold(const Allocator& alloc = Allocator()) const;

The UTF-32 view surface exposes the same transformation families with basic_utf32_string return types, plus the same cross-encoding to_utf8(...), to_utf16(...), and same-encoding to_utf32(...) entry points.

Behavior¶

All transformation members return owning validated strings.
Same-encoding to_utf8(), to_utf16(), and to_utf32() materialize ownership. On borrowed views and const& owning strings they copy; on owning rvalues without an explicit allocator they move the existing owned buffer.
Explicit allocator overloads construct the result with the supplied allocator. They stay allocator-semantics-preserving instead of promising unconditional buffer stealing.
Partial case-mapping overloads require both ends of the selected range to be character boundaries.
normalize(...) is whole-string only.
case_fold() implements Unicode case folding for caseless matching workflows.

Return value¶

Returns a new owning string in the target encoding or transformed form.

Complexity¶

Linear in the number of processed code units, plus any additional work required by Unicode case expansion or normalization.

Exceptions And `noexcept`¶

Partial case transforms may throw std::out_of_range for invalid boundaries.
All owning transforms may throw allocator or container exceptions.

Not noexcept.

Example¶

#include "unicode_ranges_all.hpp"

#include <print>

using namespace unicode_ranges;
using namespace unicode_ranges::literals;

int main()
{
    constexpr auto text = "straße café"_utf8_sv;

    std::println("{}", text.to_ascii_uppercase());       // STRAßE CAFé
    std::println("{}", text.to_uppercase());             // STRASSE CAFÉ
    std::println("{}", "CAFÉ Ω"_utf8_sv.to_lowercase()); // café ω
}

Optional ICU Locale-Aware Overloads¶

When the library is built with UTF8_RANGES_ENABLE_ICU=1, the string-view types also expose these overloads:

template <typename Allocator = std::allocator<char8_t>>
basic_utf8_string<Allocator> to_lowercase(locale_id locale, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
basic_utf8_string<Allocator> to_lowercase(size_type pos, size_type count, locale_id locale, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
basic_utf8_string<Allocator> to_uppercase(locale_id locale, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
basic_utf8_string<Allocator> to_uppercase(size_type pos, size_type count, locale_id locale, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
basic_utf8_string<Allocator> to_titlecase(locale_id locale, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char8_t>>
basic_utf8_string<Allocator> case_fold(locale_id locale, const Allocator& alloc = Allocator()) const;

template <typename Allocator = std::allocator<char16_t>>
basic_utf16_string<Allocator> to_lowercase(locale_id locale, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
basic_utf16_string<Allocator> to_lowercase(size_type pos, size_type count, locale_id locale, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
basic_utf16_string<Allocator> to_uppercase(locale_id locale, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
basic_utf16_string<Allocator> to_uppercase(size_type pos, size_type count, locale_id locale, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
basic_utf16_string<Allocator> to_titlecase(locale_id locale, const Allocator& alloc = Allocator()) const;
template <typename Allocator = std::allocator<char16_t>>
basic_utf16_string<Allocator> case_fold(locale_id locale, const Allocator& alloc = Allocator()) const;

The UTF-32 view type exposes the same ICU-gated locale-aware overload families with basic_utf32_string return types.

These overloads do not exist in the dependency-free default build.
to_lowercase(locale) and to_uppercase(locale) delegate to ICU locale-sensitive case mapping.
to_titlecase(locale) delegates to ICU locale-sensitive titlecasing.
to_titlecase(locale) is whole-string only because titlecasing depends on break-iterator context; the library does not expose partial locale-aware titlecasing overloads.
case_fold(locale) delegates to ICU case folding. In practice, the only fold-specific tailoring ICU exposes here is the Turkic special-I mode, so most locales behave the same as the default case_fold().
locale_id is a raw null-terminated token. locale_id{nullptr} is rejected with std::invalid_argument.
Non-null locale names are passed through to ICU, which may canonicalize them or fall back to a more general locale instead of failing.
A null locale token throws std::invalid_argument. ICU normalization or casing failures throw std::runtime_error.
If you need an exact availability check before calling a locale-aware overload, use is_available_locale(locale).

#include "unicode_ranges_all.hpp"

#include <print>

using namespace unicode_ranges;
using namespace unicode_ranges::literals;

int main()
{
#if UTF8_RANGES_HAS_ICU
    std::println("{}", u8"Iİ"_utf8_sv.case_fold());                      // ii̇
    std::println("{}", u8"Iİ"_utf8_sv.case_fold("tr"_locale));           // ıi
    std::println("{}", u8"Iİ"_utf8_sv.to_lowercase("tr"_locale));        // ıi
    std::println("{}", u8"iı"_utf8_sv.to_uppercase("tr"_locale));        // İI
    std::println("{}", u8"istanbul izmir"_utf8_sv.to_titlecase("tr"_locale)); // İstanbul İzmir
    std::println("{}", U"istanbul izmir"_utf32_sv.to_titlecase("tr"_locale)); // İstanbul İzmir
    std::println("{}", u8"I"_utf8_sv.eq_ignore_case(u8"ı"_utf8_sv, "tr"_locale)); // true
    std::println("{}", is_available_locale("tr"_locale));                // true
#else
    std::println("Enable ICU-backed locale casing to use _locale.");
#endif
}

Case-Insensitive Comparison¶

Synopsis¶

constexpr bool eq_ignore_case(utf8_string_view sv) const noexcept;
constexpr bool starts_with_ignore_case(utf8_string_view sv) const noexcept;
constexpr bool ends_with_ignore_case(utf8_string_view sv) const noexcept;
constexpr std::weak_ordering compare_ignore_case(utf8_string_view sv) const noexcept;

constexpr bool eq_ignore_case(utf16_string_view sv) const noexcept;
constexpr bool starts_with_ignore_case(utf16_string_view sv) const noexcept;
constexpr bool ends_with_ignore_case(utf16_string_view sv) const noexcept;
constexpr std::weak_ordering compare_ignore_case(utf16_string_view sv) const noexcept;

The UTF-32 view type exposes the same four non-allocating helpers with utf32_string_view.

Behavior¶

These members compare Unicode case-folded scalar sequences.
They do not allocate.
They do not normalize. Canonically equivalent text such as "é" and "e\u0301" still compares different unless the caller normalizes first.
This is deliberate: the library keeps normalization explicit instead of folding canonical equivalence into the default case-insensitive comparison path.
starts_with_ignore_case(...) and ends_with_ignore_case(...) operate on the folded sequences, so expansions such as ß -> ss are handled correctly.
compare_ignore_case(...) is lexicographic comparison of the folded scalar sequence. It is not locale collation.

Return value¶

eq_ignore_case(...), starts_with_ignore_case(...), and ends_with_ignore_case(...) return true when the folded comparison succeeds.
compare_ignore_case(...) returns std::weak_ordering:
equivalent when the folded sequences compare equal
less / greater for lexicographic ordering on the folded scalar stream

Complexity¶

Linear in the number of code units read from both operands.

Exceptions And `noexcept`¶

Do not throw and are declared noexcept.

Example¶

#include "unicode_ranges_all.hpp"

#include <print>

using namespace unicode_ranges;
using namespace unicode_ranges::literals;

int main()
{
    constexpr auto text = u8"Straße"_utf8_sv;
    constexpr auto text32 = U"Straße"_utf32_sv;

    std::println("{}", text.eq_ignore_case(u8"STRASSE"_utf8_sv)); // true
    std::println("{}", text.starts_with_ignore_case(u8"stras"_utf8_sv)); // true
    std::println("{}", text.ends_with_ignore_case(u8"SSE"_utf8_sv)); // true
    std::println("{}", text.compare_ignore_case(u8"strasse"_utf8_sv) == std::weak_ordering::equivalent); // true
    std::println("{}", !u8"é"_utf8_sv.eq_ignore_case(u8"e\u0301"_utf8_sv)); // true
    std::println("{}", text32.eq_ignore_case(U"STRASSE"_utf32_sv)); // true
}

Optional ICU Locale-Aware Overloads¶

When the library is built with UTF8_RANGES_ENABLE_ICU=1, the string-view types also expose:

bool eq_ignore_case(utf8_string_view sv, locale_id locale) const;
bool starts_with_ignore_case(utf8_string_view sv, locale_id locale) const;
bool ends_with_ignore_case(utf8_string_view sv, locale_id locale) const;
std::weak_ordering compare_ignore_case(utf8_string_view sv, locale_id locale) const;

bool eq_ignore_case(utf16_string_view sv, locale_id locale) const;
bool starts_with_ignore_case(utf16_string_view sv, locale_id locale) const;
bool ends_with_ignore_case(utf16_string_view sv, locale_id locale) const;
std::weak_ordering compare_ignore_case(utf16_string_view sv, locale_id locale) const;

The UTF-32 view type exposes the same ICU-gated locale-aware comparison overloads with utf32_string_view.

These overloads keep the same fold-only, non-normalizing semantics.
They do not materialize a temporary folded string.
The locale only affects ICU fold options. In practice, the meaningful difference is the Turkic special-I mode.
They are not noexcept: locale_id{nullptr} is rejected with std::invalid_argument, and ICU locale-handling failures surface as std::runtime_error.

View-Based Replacement Families¶

Synopsis¶

constexpr basic_utf8_string<> replace_at(size_type pos, size_type count, utf8_char other) const;
constexpr basic_utf8_string<> replace_at(size_type pos, size_type count, utf8_string_view other) const;
constexpr basic_utf8_string<> replace_at(size_type pos, utf8_char other) const;
constexpr basic_utf8_string<> replace_at(size_type pos, utf8_string_view other) const;

constexpr basic_utf8_string<> replace_all(utf8_char from, utf8_char to) const;
constexpr basic_utf8_string<> replace_all(utf8_char from, utf8_string_view to) const;
constexpr basic_utf8_string<> replace_all(utf8_string_view from, utf8_char to) const;
constexpr basic_utf8_string<> replace_all(utf8_string_view from, utf8_string_view to) const;
constexpr basic_utf8_string<> replace_all(std::span<const utf8_char> from, utf8_char to) const;
constexpr basic_utf8_string<> replace_all(std::span<const utf8_char> from, utf8_string_view to) const;
template <details::utf8_char_predicate Pred>
constexpr basic_utf8_string<> replace_all(Pred pred, utf8_char to) const;
template <details::utf8_char_predicate Pred>
constexpr basic_utf8_string<> replace_all(Pred pred, utf8_string_view to) const;

constexpr basic_utf8_string<> replace_n(size_type count, utf8_char from, utf8_char to) const;
constexpr basic_utf8_string<> replace_n(size_type count, utf8_char from, utf8_string_view to) const;
constexpr basic_utf8_string<> replace_n(size_type count, utf8_string_view from, utf8_char to) const;
constexpr basic_utf8_string<> replace_n(size_type count, utf8_string_view from, utf8_string_view to) const;
constexpr basic_utf8_string<> replace_n(size_type count, std::span<const utf8_char> from, utf8_char to) const;
constexpr basic_utf8_string<> replace_n(size_type count, std::span<const utf8_char> from, utf8_string_view to) const;
template <details::utf8_char_predicate Pred>
constexpr basic_utf8_string<> replace_n(size_type count, Pred pred, utf8_char to) const;
template <details::utf8_char_predicate Pred>
constexpr basic_utf8_string<> replace_n(size_type count, Pred pred, utf8_string_view to) const;

// Each family also has allocator-taking overloads.

The UTF-16 and UTF-32 view surfaces expose the same overload families with utf16_char / utf32_char, utf16_string_view / utf32_string_view, and basic_utf16_string / basic_utf32_string.

Behavior¶

Character and view overloads replace exact validated characters or exact validated substrings.
replace_at replaces a validated substring at one byte/code-unit/code-point position and returns an owning string.
Span overloads treat the span as a character set.
Predicate overloads replace every character for which the predicate returns true.
replace_n stops after at most count replacements.

Overload differences¶

The examples below use const auto text = "😄🇷🇴✨"_utf8_sv;.

Overload	Meaning	Example
`replace_at(pos, count, View to)`	replace one boundary-aligned substring by position	`text.replace_at(0, 4, "wow"_utf8_sv)`
`replace_at(pos, Char to)`	replace the single character starting at `pos`	`text.replace_at(0, u8"!"_u8c)`
`replace_all(Char from, Char to)`	replace an exact character everywhere	`text.replace_all("✨"_u8c, "🔥"_u8c)`
`replace_all(View from, View to)`	replace an exact validated substring everywhere	`text.replace_all("🇷🇴"_utf8_sv, "🎉"_utf8_sv)`
`replace_all(std::span<const Char> from, Char to)`	replace every character that belongs to a set	`text.replace_all(std::array{"😄"_u8c, "✨"_u8c}, "🎉"_u8c)`
`replace_all(Pred pred, View to)`	replace every character satisfying a predicate	`text.replace_all([](utf8_char ch) { return !ch.is_ascii(); }, "⭐"_utf8_sv)`

The same overload differences apply to replace_n(...), except that it stops after at most count replacements.

Again, the span overload is character-set based. std::array{"🇷"_u8c, "🇴"_u8c} replaces either regional-indicator character independently; it does not wait for the adjacent pair 🇷🇴.

Inspiration¶

The overall shape is familiar to users of C++ std::basic_string::replace and Rust's str and String replacement APIs, while extending them with character-set and predicate overloads.

Return value¶

Returns a new owning string in the same encoding as the source view.

Complexity¶

Linear in the source length plus the size of the produced output.

Exceptions And `noexcept`¶

replace_at throws std::out_of_range when pos is out of range or the affected range is not a valid UTF substring. All replacement families may throw allocator or container exceptions.

Not noexcept.

String Views¶

Construction And Raw View Access¶

Synopsis¶

Behavior¶

Return value¶

Complexity¶

Exceptions And noexcept¶

Example¶

Comparison, Streaming, Hashing, And Formatting¶

Synopsis¶

Behavior¶

Return value¶

Complexity¶

Exceptions And noexcept¶

Iteration Families¶

Synopsis¶

Behavior¶

Return value¶

Complexity¶

Exceptions And noexcept¶

Example¶

Size, Copy, Emptiness, ASCII, And Counts¶

Synopsis¶

Behavior¶

Return value¶

Complexity¶

Exceptions And noexcept¶

Normalization Queries¶

Synopsis¶

Behavior¶

Return value¶

Complexity¶

Exceptions And noexcept¶

contains¶

Synopsis¶

Behavior¶

Overload differences¶

Inspiration¶

Return value¶

Complexity¶

Exceptions And noexcept¶

Grapheme-Aware Search¶

Synopsis¶

Behavior¶

Return value¶

Complexity¶

Exceptions And noexcept¶

find And rfind¶

Synopsis¶

Behavior¶

Overload differences¶

Inspiration¶

Return value¶

Complexity¶

Exceptions And noexcept¶

Example¶

find_first_of, find_first_not_of, find_last_of, find_last_not_of¶

Synopsis¶

Behavior¶

Overload differences¶

Inspiration¶

Return value¶

Complexity¶

Exceptions And noexcept¶

Boundary Queries¶

Synopsis¶

Behavior¶

Return value¶

Complexity¶

Exceptions And noexcept¶

Example¶

Direct Access And Substrings¶

Synopsis¶

Behavior¶

Return value¶

Complexity¶

Exceptions And noexcept¶

Prefix And Suffix Tests¶

Synopsis¶

Behavior¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

`contains`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

`find` And `rfind`¶

Exceptions And `noexcept`¶

`find_first_of`, `find_first_not_of`, `find_last_of`, `find_last_not_of`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶

Exceptions And `noexcept`¶