Text Operations¶
Search¶
Both borrowed and owning text types expose STL-style search APIs, including:
containsfindrfindfind_first_of/find_first_not_offind_last_of/find_last_not_ofstarts_withends_with
These work with text, characters, predicates, and span-based character sets depending on the overload.
#include "unicode_ranges_all.hpp"
#include <print>
using namespace unicode_ranges;
using namespace unicode_ranges::literals;
int main()
{
constexpr auto view = "café café"_utf8_sv;
auto owned = "été en été"_utf8_s;
std::println("{}", view); // café café
std::println("{}", view.find("é"_u8c)); // 3
std::println("{}", view.rfind("é"_u8c)); // 9
std::println("{}",
owned.replace_all("é"_u8c, "e"_u8c)); // ete en ete
}
Split and match families¶
String views expose a broad split/match surface:
split,rsplitsplit_terminator,rsplit_terminatorsplitn,rsplitnsplit_inclusivesplit_trimmedmatches,match_indices,rmatches,rmatch_indicessplit_once,rsplit_oncesplit_whitespace,split_ascii_whitespace
Delimiter behavior is intentionally explicit. See the split sections in the string view reference.
#include "unicode_ranges_all.hpp"
#include <print>
using namespace unicode_ranges;
using namespace unicode_ranges::literals;
int main()
{
constexpr auto line = " café | thé | apă "_utf8_sv;
constexpr auto framed = "***café***"_utf8_sv;
for (auto part : line.split_trimmed("|"_utf8_sv))
{
std::println("[{}]", part);
}
// [café]
// [thé]
// [apă]
std::println("{}", framed.trim_matches("*"_u8c)); // café
}
The split APIs also compose cleanly with standard range pipelines:
#include "unicode_ranges_all.hpp"
#include <print>
#include <ranges>
using namespace unicode_ranges;
using namespace unicode_ranges::literals;
int main()
{
auto phrase = "Be the change you want to see in the world"_utf8_s;
phrase = phrase.split_ascii_whitespace()
| std::views::transform(&utf8_string_view::chars)
| std::views::join_with("_"_u8c)
| std::ranges::to<utf8_string>();
std::println("{}", phrase); // Be_the_change_you_want_to_see_in_the_world
}
Trim and prefix/suffix operations¶
Available operations include:
strip_prefix,strip_suffix,strip_circumfixtrim_prefix,trim_suffixtrim_start_matches,trim_end_matches,trim_matchestrim_start,trim_end,trimtrim_ascii_start,trim_ascii_end,trim_ascii
The matcher-based trim APIs accept characters, text, predicates, and character sets.
Boundary and access APIs¶
Important boundary-aware APIs include:
is_char_boundaryceil_char_boundaryfloor_char_boundaryis_grapheme_boundaryceil_grapheme_boundaryfloor_grapheme_boundarychar_atgrapheme_atgrapheme_substrsubstr
These are essential whenever offsets are expressed in code units but user-visible semantics depend on characters or graphemes.
#include "unicode_ranges_all.hpp"
#include <print>
using namespace unicode_ranges;
using namespace unicode_ranges::literals;
int main()
{
constexpr auto text = "é🇷🇴!"_utf8_sv;
std::println("{}", text.is_char_boundary(1)); // true
std::println("{}", text.is_grapheme_boundary(1)); // false
std::println("{}", text.ceil_grapheme_boundary(7)); // 11
std::println("{}", text.floor_grapheme_boundary(7)); // 3
std::println("{}", text.chars()); // [e, ́, 🇷, 🇴, !]
std::println("{::s}", text.graphemes()); // [é, 🇷🇴, !]
}
Reverse and replace on owning strings¶
Owning strings add mutating APIs such as:
insertpop_backerasereverse()reverse(pos, count = npos)reverse_graphemes()reverse_graphemes(pos, count = npos)replace(...)replace_all(...)replace_n(...)
Case-transformation APIs also support partial overloads on owning strings:
to_ascii_lowercase(pos, count)to_ascii_uppercase(pos, count)to_lowercase(pos, count)to_uppercase(pos, count)
#include "unicode_ranges_all.hpp"
#include <print>
using namespace unicode_ranges;
using namespace unicode_ranges::literals;
int main()
{
auto chars = "é🇷🇴!"_utf8_s;
auto graphemes = chars;
chars.reverse();
graphemes.reverse_graphemes();
std::println("{}", chars); // !🇴🇷́e
std::println("{}", graphemes); // !🇷🇴é
}
Return-unit semantics¶
The most important rule to keep in mind:
- UTF-8 view/string search offsets are byte offsets unless the API name says otherwise.
- UTF-16 view/string search offsets are code-unit offsets unless the API name says otherwise.
Character- and grapheme-oriented APIs are named explicitly and should be preferred when the distinction matters.
Grapheme-aware operations¶
Default Unicode grapheme segmentation is supported through:
graphemes()grapheme_indices()grapheme_count()- grapheme-aware searching and substring APIs
These use default Unicode grapheme-cluster rules rather than locale-specific tailoring.