The simple question again: having an std::string, determine which of its characters are digits, symbols, white spaces etc. with respect to the user's language and regional settings (locale).
I managed to split the string into a set of characters using the boost locale boundary analysis tool:
std::string text = u8"生きるか死ぬか";
boost::locale::boundary::segment_index<std::string::const_iterator> characters(
boost::locale::boundary::character,
text.begin(), text.end(),
boost::locale::generator()("ja_JP.UTF-8"));
for (const auto& ch : characters) {
// each 'ch' is a single character in japanese language
}
However, I further do not see any way to determine if ch is a digit or a symbol or anything else.
There are boost string classification algorithms, but these don't seem to be working with.. whatever *segment_index::iterator is.
Nor I can apply std::isalpha(std::locale), because I'm unsure if it is possible to convert the boost segment into a char or wchar_t.
Is there any neat way to classify symbols?