2002-08-20 02:44:19 +02:00
|
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
2001-09-18 01:24:40 +02:00
|
|
|
<html>
|
|
|
|
<head>
|
2001-10-09 22:18:14 +02:00
|
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
|
|
<meta name="AUTHOR" content="pme@gcc.gnu.org (Phil Edwards)">
|
|
|
|
<meta name="KEYWORDS" content="HOWTO, libstdc++, GCC, g++, libg++, STL">
|
|
|
|
<meta name="DESCRIPTION" content="HOWTO for the libstdc++ chapter 22.">
|
|
|
|
<meta name="GENERATOR" content="vi and eight fingers">
|
2001-09-18 01:24:40 +02:00
|
|
|
<title>libstdc++-v3 HOWTO: Chapter 22</title>
|
2001-10-11 20:41:47 +02:00
|
|
|
<link rel="StyleSheet" href="../lib3styles.css">
|
2001-09-18 01:24:40 +02:00
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
|
2001-10-09 22:18:14 +02:00
|
|
|
<h1 class="centered"><a name="top">Chapter 22: Localization</a></h1>
|
2001-09-18 01:24:40 +02:00
|
|
|
|
|
|
|
<p>Chapter 22 deals with the C++ localization facilities.
|
|
|
|
</p>
|
2001-09-15 02:41:11 +02:00
|
|
|
<!-- I wanted to write that sentence in something requiring an exotic font,
|
|
|
|
like Cryllic or Kanji. Probably more work than such cuteness is worth,
|
|
|
|
but I still think it'd be funny.
|
|
|
|
-->
|
2000-04-21 22:33:34 +02:00
|
|
|
|
|
|
|
|
|
|
|
<!-- ####################################################### -->
|
2001-09-18 01:24:40 +02:00
|
|
|
<hr>
|
|
|
|
<h1>Contents</h1>
|
|
|
|
<ul>
|
|
|
|
<li><a href="#1">class locale</a>
|
|
|
|
<li><a href="#2">class codecvt</a>
|
|
|
|
<li><a href="#3">class ctype</a>
|
|
|
|
<li><a href="#4">class messages</a>
|
|
|
|
<li><a href="#5">Bjarne Stroustrup on Locales</a>
|
|
|
|
<li><a href="#6">Nathan Myers on Locales</a>
|
|
|
|
<li><a href="#7">Correct Transformations</a>
|
|
|
|
</ul>
|
2000-04-21 22:33:34 +02:00
|
|
|
|
2001-08-08 04:49:01 +02:00
|
|
|
<!-- ####################################################### -->
|
|
|
|
|
2001-09-18 01:24:40 +02:00
|
|
|
<hr>
|
|
|
|
<h2><a name="1">class locale</a></h2>
|
2001-10-09 22:18:14 +02:00
|
|
|
<p>Notes made during the implementation of locales can be found
|
|
|
|
<a href="locale.html">here</a>.
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
2000-04-21 22:33:34 +02:00
|
|
|
|
2001-09-18 01:24:40 +02:00
|
|
|
<hr>
|
|
|
|
<h2><a name="2">class codecvt</a></h2>
|
2001-10-09 22:18:14 +02:00
|
|
|
<p>Notes made during the implementation of codecvt can be found
|
|
|
|
<a href="codecvt.html">here</a>.
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
2001-08-08 04:49:01 +02:00
|
|
|
|
2001-10-09 22:18:14 +02:00
|
|
|
<p>The following is the abstract from the implementation notes:
|
2002-08-20 02:44:19 +02:00
|
|
|
</p>
|
2001-10-09 22:18:14 +02:00
|
|
|
<blockquote>
|
2001-08-08 04:49:01 +02:00
|
|
|
The standard class codecvt attempts to address conversions between
|
|
|
|
different character encoding schemes. In particular, the standard
|
|
|
|
attempts to detail conversions between the implementation-defined
|
|
|
|
wide characters (hereafter referred to as wchar_t) and the standard
|
|
|
|
type char that is so beloved in classic "C" (which can
|
|
|
|
now be referred to as narrow characters.) This document attempts
|
|
|
|
to describe how the GNU libstdc++-v3 implementation deals with the
|
|
|
|
conversion between wide and narrow characters, and also presents a
|
|
|
|
framework for dealing with the huge number of other encodings that
|
|
|
|
iconv can convert, including Unicode and UTF8. Design issues and
|
|
|
|
requirements are addressed, and examples of correct usage for both
|
|
|
|
the required specializations for wide and narrow characters and the
|
|
|
|
implementation-provided extended functionality are given.
|
2001-10-09 22:18:14 +02:00
|
|
|
</blockquote>
|
2000-04-21 22:33:34 +02:00
|
|
|
|
2001-09-18 01:24:40 +02:00
|
|
|
<hr>
|
|
|
|
<h2><a name="3">class ctype</a></h2>
|
2001-10-09 22:18:14 +02:00
|
|
|
<p>Notes made during the implementation of ctype can be found
|
|
|
|
<a href="ctype.html">here</a>.
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<hr>
|
|
|
|
<h2><a name="4">class messages</a></h2>
|
2001-10-09 22:18:14 +02:00
|
|
|
<p>Notes made during the implementation of messages can be found
|
|
|
|
<a href="messages.html">here</a>.
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<hr>
|
|
|
|
<h2><a name="5">Stroustrup on Locales</a></h2>
|
|
|
|
<p>Dr. Bjarne Stroustrup has released a
|
|
|
|
<a href="http://www.research.att.com/~bs/3rd_loc0.html">pointer</a>
|
2000-04-21 22:33:34 +02:00
|
|
|
to Appendix D of his book,
|
2001-09-18 01:24:40 +02:00
|
|
|
<a href="http://www.research.att.com/~bs/3rd.html">The C++
|
|
|
|
Programming Language (3rd Edition)</a>. It is a detailed
|
2000-04-21 22:33:34 +02:00
|
|
|
description of locales and how to use them.
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
|
|
|
<p>He also writes:
|
2002-08-20 02:44:19 +02:00
|
|
|
</p>
|
2001-10-09 22:18:14 +02:00
|
|
|
<blockquote><em>
|
2000-09-19 23:44:30 +02:00
|
|
|
Please note that I still consider this detailed description of
|
2001-10-09 22:18:14 +02:00
|
|
|
locales beyond the needs of most C++ programmers. It is written
|
2000-09-19 23:44:30 +02:00
|
|
|
with experienced programmers in mind and novices will do best to
|
|
|
|
avoid it.
|
2001-10-09 22:18:14 +02:00
|
|
|
</em></blockquote>
|
2000-04-21 22:33:34 +02:00
|
|
|
|
2001-09-18 01:24:40 +02:00
|
|
|
<hr>
|
|
|
|
<h2><a name="6">Nathan Myers on Locales</a></h2>
|
2001-10-09 22:18:14 +02:00
|
|
|
<p>An article entitled "The Standard C++ Locale" was
|
|
|
|
published in Dr. Dobb's Journal and can be found
|
2001-09-18 01:24:40 +02:00
|
|
|
<a href="http://www.cantrip.org/locale.html">here</a>.
|
|
|
|
</p>
|
2000-08-31 03:17:53 +02:00
|
|
|
|
2001-09-18 01:24:40 +02:00
|
|
|
<hr>
|
|
|
|
<h2><a name="7">Correct Transformations</a></h2>
|
2001-10-09 22:18:14 +02:00
|
|
|
<!-- Jumping directly to here from chapter 21. -->
|
2001-09-18 01:24:40 +02:00
|
|
|
<p>A very common question on newsgroups and mailing lists is, "How
|
2001-02-07 01:03:21 +01:00
|
|
|
do I do <foo> to a character string?" where <foo> is
|
2000-09-19 23:44:30 +02:00
|
|
|
a task such as changing all the letters to uppercase, to lowercase,
|
|
|
|
testing for digits, etc. A skilled and conscientious programmer
|
|
|
|
will follow the question with another, "And how do I make the
|
|
|
|
code portable?"
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
|
|
|
<p>(Poor innocent programmer, you have no idea the depths of trouble
|
2000-09-19 23:44:30 +02:00
|
|
|
you are getting yourself into. 'Twould be best for your sanity if
|
|
|
|
you dropped the whole idea and took up basket weaving instead. No?
|
|
|
|
Fine, you asked for it...)
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
|
|
|
<p>The task of changing the case of a letter or classifying a character
|
2000-09-19 23:44:30 +02:00
|
|
|
as numeric, graphical, etc, all depends on the cultural context of the
|
|
|
|
program at runtime. So, first you must take the portability question
|
|
|
|
into account. Once you have localized the program to a particular
|
|
|
|
natural language, only then can you perform the specific task.
|
|
|
|
Unfortunately, specializing a function for a human language is not
|
|
|
|
as simple as declaring
|
2001-09-18 01:24:40 +02:00
|
|
|
<code> extern "Danish" int tolower (int); </code>.
|
|
|
|
</p>
|
|
|
|
<p>The C++ code to do all this proceeds in the same way. First, a locale
|
2000-09-19 23:44:30 +02:00
|
|
|
is created. Then member functions of that locale are called to
|
|
|
|
perform minor tasks. Continuing the example from Chapter 21, we wish
|
|
|
|
to use the following convenience functions:
|
2002-08-20 02:44:19 +02:00
|
|
|
</p>
|
2001-10-09 22:18:14 +02:00
|
|
|
<pre>
|
2000-09-19 23:44:30 +02:00
|
|
|
namespace std {
|
|
|
|
template <class charT>
|
|
|
|
charT
|
|
|
|
toupper (charT c, const locale& loc) const;
|
|
|
|
template <class charT>
|
|
|
|
charT
|
|
|
|
tolower (charT c, const locale& loc) const;
|
2001-10-09 22:18:14 +02:00
|
|
|
}</pre>
|
2002-08-20 02:44:19 +02:00
|
|
|
<p>
|
2000-09-19 23:44:30 +02:00
|
|
|
This function extracts the appropriate "facet" from the
|
2001-09-18 01:24:40 +02:00
|
|
|
locale <em>loc</em> and calls the appropriate member function of that
|
|
|
|
facet, passing <em>c</em> as its argument. The resulting character
|
2000-09-19 23:44:30 +02:00
|
|
|
is returned.
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
|
|
|
<p>For the C/POSIX locale, the results are the same as calling the
|
|
|
|
classic C <code>toupper/tolower</code> function that was used in previous
|
2000-09-19 23:44:30 +02:00
|
|
|
examples. For other locales, the code should Do The Right Thing.
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
|
|
|
<p>Of course, these functions take a second argument, and the
|
2000-09-19 23:44:30 +02:00
|
|
|
transformation algorithm's operator argument can only take a single
|
|
|
|
parameter. So we write simple wrapper structs to handle that.
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
|
|
|
<p>The next-to-final version of the code started in Chapter 21 looks like:
|
2002-08-20 02:44:19 +02:00
|
|
|
</p>
|
2001-10-09 22:18:14 +02:00
|
|
|
<pre>
|
2000-09-19 23:44:30 +02:00
|
|
|
#include <iterator> // for back_inserter
|
|
|
|
#include <locale>
|
|
|
|
#include <string>
|
|
|
|
#include <algorithm>
|
|
|
|
#include <cctype> // old <ctype.h>
|
|
|
|
|
|
|
|
struct Toupper
|
|
|
|
{
|
2002-07-31 21:34:08 +02:00
|
|
|
Toupper(std::locale const& l) : loc(l) {;}
|
2000-09-19 23:44:30 +02:00
|
|
|
char operator() (char c) { return std::toupper(c,loc); }
|
|
|
|
private:
|
|
|
|
std::locale const& loc;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct Tolower
|
|
|
|
{
|
2002-07-31 21:34:08 +02:00
|
|
|
Tolower(std::locale const& l) : loc(l) {;}
|
2000-09-19 23:44:30 +02:00
|
|
|
char operator() (char c) { return std::tolower(c,loc); }
|
|
|
|
private:
|
|
|
|
std::locale const& loc;
|
|
|
|
};
|
|
|
|
|
|
|
|
int main ()
|
|
|
|
{
|
2002-07-31 21:34:08 +02:00
|
|
|
std::string s("Some Kind Of Initial Input Goes Here");
|
|
|
|
std::locale loc_c("C");
|
|
|
|
Toupper up(loc_c);
|
|
|
|
Tolower down(loc_c);
|
2000-09-19 23:44:30 +02:00
|
|
|
|
2002-07-31 21:34:08 +02:00
|
|
|
// Change everything into upper case.
|
|
|
|
std::transform(s.begin(), s.end(), s.begin(), up);
|
2000-09-19 23:44:30 +02:00
|
|
|
|
2002-07-31 21:34:08 +02:00
|
|
|
// Change everything into lower case.
|
|
|
|
std::transform(s.begin(), s.end(), s.begin(), down);
|
2000-09-19 23:44:30 +02:00
|
|
|
|
|
|
|
// Change everything back into upper case, but store the
|
2002-07-31 21:34:08 +02:00
|
|
|
// result in a different string.
|
2000-09-19 23:44:30 +02:00
|
|
|
std::string capital_s;
|
2002-07-31 21:34:08 +02:00
|
|
|
std::transform(s.begin(), s.end(), std::back_inserter(capital_s), up);
|
2001-10-09 22:18:14 +02:00
|
|
|
}</pre>
|
2001-09-18 01:24:40 +02:00
|
|
|
<p>The final version of the code uses <code>bind2nd</code> to eliminate
|
2000-09-19 23:44:30 +02:00
|
|
|
the wrapper structs, but the resulting code is tricky. I have not
|
|
|
|
shown it here because no compilers currently available to me will
|
|
|
|
handle it.
|
2001-09-18 01:24:40 +02:00
|
|
|
</p>
|
2000-04-21 22:33:34 +02:00
|
|
|
|
|
|
|
|
|
|
|
<!-- ####################################################### -->
|
|
|
|
|
2001-09-18 01:24:40 +02:00
|
|
|
<hr>
|
2001-10-09 22:18:14 +02:00
|
|
|
<p class="fineprint"><em>
|
2001-10-04 22:03:22 +02:00
|
|
|
See <a href="../17_intro/license.html">license.html</a> for copying conditions.
|
2000-04-21 22:33:34 +02:00
|
|
|
Comments and suggestions are welcome, and may be sent to
|
2001-10-09 22:18:14 +02:00
|
|
|
<a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>.
|
2001-09-18 01:24:40 +02:00
|
|
|
</em></p>
|
2000-04-21 22:33:34 +02:00
|
|
|
|
|
|
|
|
2001-09-18 01:24:40 +02:00
|
|
|
</body>
|
|
|
|
</html>
|