Views, Literals, and Formatting¶
This page covers the helper view types in unicode_ranges::views, the compile-time validated literals in unicode_ranges::literals, and the shared formatting model used by the library-defined UTF-8, UTF-16, and UTF-32 types.
views::utf8_view, views::utf16_view, And views::utf32_view¶
Synopsis¶
class utf8_view : public std::ranges::view_interface<utf8_view> {
public:
constexpr std::u8string_view base() const noexcept;
constexpr iterator begin() const noexcept;
constexpr std::default_sentinel_t end() const noexcept;
constexpr std::size_t reserve_hint() const noexcept;
};
class utf16_view : public std::ranges::view_interface<utf16_view> {
public:
constexpr std::u16string_view base() const noexcept;
constexpr iterator begin() const noexcept;
constexpr std::default_sentinel_t end() const noexcept;
constexpr std::size_t reserve_hint() const noexcept;
};
class utf32_view : public std::ranges::view_interface<utf32_view> {
public:
constexpr std::u32string_view base() const noexcept;
constexpr iterator begin() const noexcept;
constexpr iterator end() const noexcept;
constexpr std::size_t size() const noexcept;
constexpr std::size_t reserve_hint() const noexcept;
};
Behavior¶
- These views adapt already-validated code-unit sequences into ranges of
utf8_char,utf16_char, orutf32_char. - The views inherit
std::ranges::view_interfaceand model lazy borrowed forward views. - They are normally obtained from validated text via
chars(), for example"ππ·π΄β¨"_utf8_sv.chars(),u"ππ·π΄β¨"_utf16_sv.chars(), orU"ππ·π΄β¨"_utf32_sv.chars(). - The views are cheap to copy.
reserve_hint()reports the number of source code units, which is a safe upper bound for the number of yielded characters.views::utf32_viewis a borrowed sized common random-access view because UTF-32 is fixed-width.
Return value¶
Construction returns the helper view directly.
Complexity¶
- Constructing the view is constant.
- Iterating the full view is linear in the source length.
Exceptions¶
None.
noexcept¶
All listed members are noexcept.
Example¶
#include "unicode_ranges_all.hpp"
#include <array>
#include <print>
#include <span>
#include <string_view>
using namespace unicode_ranges;
int main()
{
constexpr std::array<char, 3> broken_utf8{ 'A', static_cast<char>(0xFF), 'B' };
constexpr std::array<char16_t, 3> broken_utf16{ u'A', static_cast<char16_t>(0xD800u), u'B' };
const auto utf8_lossy = std::span{ broken_utf8 } | views::lossy_utf8;
const auto utf16_lossy = std::span{ broken_utf16 } | views::lossy_utf16;
std::println("{}", utf8_lossy); // [A, οΏ½, B]
std::println("{}", utf16_lossy); // [A, οΏ½, B]
}
views::reversed_utf8_view, views::reversed_utf16_view, And views::reversed_utf32_view¶
Synopsis¶
class reversed_utf8_view : public std::ranges::view_interface<reversed_utf8_view> {
public:
constexpr iterator begin() const noexcept;
constexpr std::default_sentinel_t end() const noexcept;
constexpr std::size_t reserve_hint() const noexcept;
};
class reversed_utf16_view : public std::ranges::view_interface<reversed_utf16_view> {
public:
constexpr iterator begin() const noexcept;
constexpr std::default_sentinel_t end() const noexcept;
constexpr std::size_t reserve_hint() const noexcept;
};
using reversed_utf32_view = std::ranges::reverse_view<utf32_view>;
Behavior¶
- These helper views inherit
std::ranges::view_interface. - They are lazy borrowed forward views over the same underlying storage.
- They are normally obtained from validated text via
reversed_chars(). - They iterate validated characters from the end without first materializing a reversed string.
views::reversed_utf32_viewis juststd::ranges::reverse_view<views::utf32_view>, so it is also sized, common, and random-access.
Complexity¶
- Constructing the view is constant.
- Iterating the full view is linear in the source length.
Exceptions¶
None.
noexcept¶
All listed members are noexcept.
Example¶
#include "unicode_ranges_all.hpp"
#include <print>
using namespace unicode_ranges;
using namespace unicode_ranges::literals;
int main()
{
constexpr auto utf8_text = "ππ·π΄β¨"_utf8_sv;
constexpr auto utf16_text = u"ππ·π΄β¨"_utf16_sv;
constexpr auto utf32_text = U"ππ·π΄β¨"_utf32_sv;
const auto utf8_chars = utf8_text.chars();
const auto utf8_reversed = utf8_text.reversed_chars();
const auto utf8_graphemes = utf8_text.graphemes();
const auto utf16_chars = utf16_text.chars();
const auto utf16_reversed = utf16_text.reversed_chars();
const auto utf16_graphemes = utf16_text.graphemes();
const auto utf32_chars = utf32_text.chars();
const auto utf32_reversed = utf32_text.reversed_chars();
const auto utf32_graphemes = utf32_text.graphemes();
std::println("{}", utf8_chars); // [π, π·, π΄, β¨]
std::println("{}", utf8_reversed); // [β¨, π΄, π·, π]
std::println("{::s}", utf8_graphemes); // [π, π·π΄, β¨]
std::println("{}", utf16_chars); // [π, π·, π΄, β¨]
std::println("{}", utf16_reversed); // [β¨, π΄, π·, π]
std::println("{::s}", utf16_graphemes); // [π, π·π΄, β¨]
std::println("{}", utf32_chars); // [π, π·, π΄, β¨]
std::println("{}", utf32_reversed); // [β¨, π΄, π·, π]
std::println("{::s}", utf32_graphemes); // [π, π·π΄, β¨]
std::println("{}", "β¨"_u8c); // β¨
std::println("{}", U"π·π΄"_grapheme_utf32); // π·π΄
}
views::grapheme_cluster_view<CharT>¶
Synopsis¶
template <typename CharT>
class grapheme_cluster_view : public std::ranges::view_interface<grapheme_cluster_view<CharT>> {
public:
using cluster_type = std::conditional_t<
std::same_as<CharT, char8_t>,
utf8_string_view,
std::conditional_t<std::same_as<CharT, char16_t>, utf16_string_view, utf32_string_view>>;
constexpr iterator begin() const noexcept;
constexpr std::default_sentinel_t end() const noexcept;
constexpr std::size_t reserve_hint() const noexcept;
};
Behavior¶
grapheme_cluster_view<char8_t>yieldsutf8_string_viewgrapheme clusters.grapheme_cluster_view<char16_t>yieldsutf16_string_viewgrapheme clusters.grapheme_cluster_view<char32_t>yieldsutf32_string_viewgrapheme clusters.- The view inherits
std::ranges::view_interface. - It is normally obtained from validated text via
graphemes(). - It is a lazy borrowed forward view and computes grapheme boundaries on demand during iteration.
Complexity¶
- Constructing the view is constant.
- Iterating the full view is linear in the number of code units plus the segmentation work required by Unicode grapheme rules.
Exceptions¶
None.
noexcept¶
All listed members are noexcept.
views::lossy_utf8_view, views::lossy_utf16_view, And views::lossy_utf32_view¶
Synopsis¶
template <typename CharT>
class lossy_utf8_view : public std::ranges::view_interface<lossy_utf8_view<CharT>> {
public:
lossy_utf8_view() = default;
constexpr lossy_utf8_view(std::basic_string_view<CharT> base) noexcept;
constexpr iterator begin() const noexcept;
constexpr std::default_sentinel_t end() const noexcept;
constexpr std::size_t reserve_hint() const noexcept;
};
template <typename CharT>
class lossy_utf16_view : public std::ranges::view_interface<lossy_utf16_view<CharT>> {
public:
lossy_utf16_view() = default;
constexpr lossy_utf16_view(std::basic_string_view<CharT> base) noexcept;
constexpr iterator begin() const noexcept;
constexpr std::default_sentinel_t end() const noexcept;
constexpr std::size_t reserve_hint() const noexcept;
};
template <typename CharT>
class lossy_utf32_view : public std::ranges::view_interface<lossy_utf32_view<CharT>> {
public:
lossy_utf32_view() = default;
constexpr lossy_utf32_view(std::basic_string_view<CharT> base) noexcept;
constexpr iterator begin() const noexcept;
constexpr iterator end() const noexcept;
constexpr std::size_t size() const noexcept;
constexpr std::size_t reserve_hint() const noexcept;
};
struct lossy_utf8_fn : std::ranges::range_adaptor_closure<lossy_utf8_fn> {
template<lossy_utf8_viewable_range R>
constexpr auto operator()(R&& range) const noexcept;
};
struct lossy_utf16_fn : std::ranges::range_adaptor_closure<lossy_utf16_fn> {
template<lossy_utf16_viewable_range R>
constexpr auto operator()(R&& range) const noexcept;
};
inline constexpr lossy_utf8_fn lossy_utf8{};
inline constexpr lossy_utf16_fn lossy_utf16{};
inline constexpr lossy_utf32_fn lossy_utf32{};
Behavior¶
- Lossy views adapt possibly-invalid UTF input into a character range.
- Invalid input is replaced with
replacement_character. lossy_utf8_viewreplaces each malformed UTF-8 subsequence with a single replacement character, then resumes decoding at the next byte that was not part of that malformed subsequence.- Valid subsequences are yielded unchanged.
- The view types inherit
std::ranges::view_interfaceand behave as lazy borrowed forward views. - The closure objects are
std::ranges::range_adaptor_closure-style adapters, which makes the lossy views pipe-friendly. lossy_utf32_view<CharT>is a borrowed sized common random-access view because each input element yields exactly one output scalar.
Complexity¶
Linear in the source length.
Exceptions¶
None.
noexcept¶
All listed members are noexcept.
Compile-Time Validated Literals¶
Synopsis¶
using namespace unicode_ranges::literals;
consteval utf8_char operator ""_u8c();
consteval utf16_char operator ""_u16c();
consteval utf32_char operator ""_u32c();
consteval utf8_string_view operator ""_utf8_sv();
consteval utf16_string_view operator ""_utf16_sv();
consteval utf32_string_view operator ""_utf32_sv();
constexpr utf8_string operator ""_utf8_s();
constexpr utf16_string operator ""_utf16_s();
constexpr utf32_string operator ""_utf32_s();
consteval utf8_string_view operator ""_grapheme_utf8();
consteval utf16_string_view operator ""_grapheme_utf16();
consteval utf32_string_view operator ""_grapheme_utf32();
Behavior¶
_u8c,_u16c, and_u32crequire exactly one valid character in the corresponding encoding._utf8_sv,_utf16_sv, and_utf32_svrequire fully valid UTF literals._utf8_s,_utf16_s, and_utf32_sbuild owning strings from validated literals._grapheme_utf8,_grapheme_utf16, and_grapheme_utf32require exactly one grapheme cluster.
Return value¶
Returns the corresponding validated character, view, or owning string.
Exceptions¶
Invalid literal contents are rejected during constant evaluation.
Example¶
#include "unicode_ranges_all.hpp"
#include <print>
using namespace unicode_ranges;
using namespace unicode_ranges::literals;
int main()
{
constexpr auto text = u8"\U0001F604\U0001F1F7\U0001F1F4\u2728"_utf8_sv;
constexpr auto sparkle = u8"\u2728"_u8c;
constexpr auto flag = u8"\U0001F1F7\U0001F1F4"_grapheme_utf8;
constexpr auto rocket = U"\U0001F680"_u32c;
constexpr auto text32 = U"\U0001F604\U0001F1F7\U0001F1F4\u2728"_utf32_sv;
auto owned = U"\U0001F604\u2728"_utf32_s;
std::println("{}", text); // ππ·π΄β¨
std::println("{}", text.find(sparkle)); // 12
std::println("{}", text.chars()); // [π, π·, π΄, β¨]
std::println("{::s}", text.graphemes()); // [π, π·π΄, β¨]
std::println("{}", flag); // π·π΄
std::println("{}", rocket); // π
std::println("{}", text32); // ππ·π΄β¨
std::println("{}", owned.replace_all(U"\u2728"_u32c, U"\U0001F525"_u32c)); // ππ₯
}
Optional ICU Locale Tokens¶
Synopsis¶
#if UTF8_RANGES_HAS_ICU
struct locale_id
{
const char* name = nullptr;
};
using namespace unicode_ranges::literals;
consteval locale_id operator ""_locale(const char* name, std::size_t size);
bool is_available_locale(locale_id locale) noexcept;
#endif
Behavior¶
- This API family exists only when the library is built with
UTF8_RANGES_ENABLE_ICU=1. locale_idis a non-owning null-terminated locale token for ICU-backed casing operations._localevalidates string literals at compile time and rejects embedded NUL bytes.- Raw
locale_id{ ... }values must point to storage that stays alive for the duration of the call. is_available_locale(...)is a non-throwing probe against the current ICU data set.- Locale-aware casing overloads may still succeed for locales that are not explicitly available, because ICU may canonicalize or fall back to a more general locale.
Return value¶
_localereturns alocale_id.is_available_locale(...)returnstruewhen the current ICU data set explicitly exposes the locale andfalseotherwise.
Complexity¶
_localeis constant evaluation only.is_available_locale(...)is linear in the number of locales exposed by the current ICU data set.
Exceptions¶
_localerejects embedded NUL bytes during constant evaluation.is_available_locale(...)does not throw.
noexcept¶
is_available_locale(...)isnoexcept.
Example¶
#include "unicode_ranges_all.hpp"
#include <print>
using namespace unicode_ranges;
using namespace unicode_ranges::literals;
int main()
{
#if UTF8_RANGES_HAS_ICU
std::println("{}", u8"I\u0130"_utf8_sv.case_fold()); // iiΜ
std::println("{}", u8"I\u0130"_utf8_sv.case_fold("tr"_locale)); // Δ±i
std::println("{}", u8"I\u0130"_utf8_sv.to_lowercase("tr"_locale)); // Δ±i
std::println("{}", u8"i\u0131"_utf8_sv.to_uppercase("tr"_locale)); // Δ°I
std::println("{}", u8"istanbul izmir"_utf8_sv.to_titlecase("tr"_locale)); // Δ°stanbul Δ°zmir
std::println("{}", U"istanbul izmir"_utf32_sv.to_titlecase("tr"_locale)); // Δ°stanbul Δ°zmir
std::println("{}", u8"I"_utf8_sv.eq_ignore_case(u8"\u0131"_utf8_sv, "tr"_locale)); // true
std::println("{}", is_available_locale("tr"_locale)); // true
#else
std::println("Enable ICU-backed locale casing to use _locale.");
#endif
}
Formatting And Printing¶
Synopsis¶
template<> struct std::formatter<utf8_char, char>;
template<> struct std::formatter<utf8_char, wchar_t>;
template<> struct std::formatter<utf16_char, char>;
template<> struct std::formatter<utf16_char, wchar_t>;
template<> struct std::formatter<utf32_char, char>;
template<> struct std::formatter<utf32_char, wchar_t>;
template<> struct std::formatter<utf8_string_view, char>;
template<> struct std::formatter<utf16_string_view, char>;
template<> struct std::formatter<utf32_string_view, char>;
template<typename Allocator> struct std::formatter<basic_utf8_string<Allocator>, char>;
template<typename Allocator> struct std::formatter<basic_utf16_string<Allocator>, char>;
template<typename Allocator> struct std::formatter<basic_utf32_string<Allocator>, char>;
Behavior¶
- Characters format as text by default.
- Character
std::formatterspecializations also support numeric presentationsd,b,B,o,x, andX, which printas_scalar(). - String and string-view formatters print textual content.
- On standard libraries with C++23 range-format support for custom views, range formatting composes with the library formatters, which is why examples such as
std::println("{}", text.chars())can work directly.
Two practical printing rules used throughout this documentation:
std::println("{}", text.chars())prints a range of validated characters.std::println("{::s}", text.graphemes())applies string formatting to each grapheme cluster, which is usually the cleanest textual representation.
Compatibility note:
- the library-defined character, string-view, and owning-string types have direct formatter support
- direct formatting of helper views such as
utf8_viewandgrapheme_cluster_view<char8_t>depends on the standard library's implementation of C++23 range formatting - this currently works with the MSVC STL and with libc++
- libstdc++ 14 does not currently format these custom helper views directly, so the GCC docs-example CI job is allowed to fail without blocking the overall workflow
Complexity¶
Linear in the amount of formatted text.
Exceptions¶
Formatter parsing may throw std::format_error for unsupported presentation types.
noexcept¶
Not noexcept.
Borrowed-Range Status¶
Synopsis¶
template <> inline constexpr bool std::ranges::enable_borrowed_range<views::utf8_view> = true;
template <> inline constexpr bool std::ranges::enable_borrowed_range<views::reversed_utf8_view> = true;
template <typename CharT> inline constexpr bool std::ranges::enable_borrowed_range<views::lossy_utf8_view<CharT>> = true;
template <> inline constexpr bool std::ranges::enable_borrowed_range<views::utf16_view> = true;
template <> inline constexpr bool std::ranges::enable_borrowed_range<views::reversed_utf16_view> = true;
template <typename CharT> inline constexpr bool std::ranges::enable_borrowed_range<views::lossy_utf16_view<CharT>> = true;
template <typename CharT> inline constexpr bool std::ranges::enable_borrowed_range<views::grapheme_cluster_view<CharT>> = true;
Behavior¶
These specializations tell the ranges library that the helper views may safely borrow from the underlying storage instead of forcing owning semantics. Rvalue owning strings return separate move-only owning view wrappers; those wrappers intentionally own the source string and are not borrowed ranges.
In other words:
- the helper adapters on this page are real range views, implemented as
std::ranges::view_interfacesubclasses - they are lazy rather than eagerly materialized
- and they remain borrowed ranges, so iterators and subviews may refer back to the original storage