1 files changed, 728 insertions, 0 deletions
diff --git a/libstdc++-v3/doc/html/manual/facets.html b/libstdc++-v3/doc/html/manual/facets.html
new file mode 100644
index 000000000..cfe89bc0d
--- /dev/null
+++ b/libstdc++-v3/doc/html/manual/facets.html
@@ -0,0 +1,728 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Facets</title><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"/><meta name="keywords" content="&#10;      ISO C++&#10;    , &#10;      library&#10;    "/><link rel="home" href="../spine.html" title="The GNU C++ Library"/><link rel="up" href="localization.html" title="Chapter 8.  Localization"/><link rel="prev" href="localization.html" title="Chapter 8.  Localization"/><link rel="next" href="containers.html" title="Chapter 9.  Containers"/></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Facets</th></tr><tr><td align="left"><a accesskey="p" href="localization.html">Prev</a> </td><th width="60%" align="center">Chapter 8. 
+  Localization
+  
+</th><td align="right"> <a accesskey="n" href="containers.html">Next</a></td></tr></table><hr/></div><div class="section" title="Facets"><div class="titlepage"><div><div><h2 class="title"><a id="std.localization.facet"/>Facets</h2></div></div></div><div class="section" title="ctype"><div class="titlepage"><div><div><h3 class="title"><a id="std.localization.facet.ctype"/>ctype</h3></div></div></div><div class="section" title="Implementation"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.impl"/>Implementation</h4></div></div></div><div class="section" title="Specializations"><div class="titlepage"><div><div><h5 class="title"><a id="id476560"/>Specializations</h5></div></div></div><p>
+For the required specialization codecvt&lt;wchar_t, char, mbstate_t&gt; ,
+conversions are made between the internal character set (always UCS4
+on GNU/Linux) and whatever the currently selected locale for the
+LC_CTYPE category implements.
+</p><p>
+The two required specializations are implemented as follows:
+</p><p>
+<code class="code">
+ctype&lt;char&gt;
+</code>
+</p><p>
+This is simple specialization. Implementing this was a piece of cake.
+</p><p>
+<code class="code">
+ctype&lt;wchar_t&gt;
+</code>
+</p><p>
+This specialization, by specifying all the template parameters, pretty
+much ties the hands of implementors. As such, the implementation is
+straightforward, involving mcsrtombs for the conversions between char
+to wchar_t and wcsrtombs for conversions between wchar_t and char.
+</p><p>
+Neither of these two required specializations deals with Unicode
+characters.
+</p></div></div><div class="section" title="Future"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.future"/>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p>
+   How to deal with the global locale issue?
+   </p></li><li class="listitem"><p>
+   How to deal with different types than char, wchar_t? </p></li><li class="listitem"><p>
+   Overlap between codecvt/ctype: narrow/widen
+   </p></li><li class="listitem"><p>
+       Mask typedef in codecvt_base, argument types in codecvt.  what
+       is know about this type?
+   </p></li><li class="listitem"><p>
+   Why mask* argument in codecvt?
+   </p></li><li class="listitem"><p>
+       Can this be made (more) generic? is there a simple way to
+       straighten out the configure-time mess that is a by-product of
+       this class?
+   </p></li><li class="listitem"><p>
+       Get the ctype&lt;wchar_t&gt;::mask stuff under control. Need to
+       make some kind of static table, and not do lookup every time
+       somebody hits the do_is... functions. Too bad we can't just
+       redefine mask for ctype&lt;wchar_t&gt;
+   </p></li><li class="listitem"><p>
+       Rename abstract base class. See if just smash-overriding is a
+       better approach. Clarify, add sanity to naming.
+     </p></li></ul></div></div><div class="bibliography" title="Bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.biblio"/>Bibliography</h4></div></div></div><div class="biblioentry"><a id="id476684"/><p><span class="citetitle"><em class="citetitle">
+      The GNU C Library
+    </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2007 FSF. </span><span class="pagenums">Chapters 6  Character Set Handling and 7 Locales and Internationalization. </span></p></div><div class="biblioentry"><a id="id476724"/><p><span class="citetitle"><em class="citetitle">
+      Correspondence
+    </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2002 . </span></p></div><div class="biblioentry"><a id="id476750"/><p><span class="citetitle"><em class="citetitle">
+      ISO/IEC 14882:1998 Programming languages - C++
+    </em>. </span><span class="copyright">Copyright © 1998 ISO. </span></p></div><div class="biblioentry"><a id="id476769"/><p><span class="citetitle"><em class="citetitle">
+      ISO/IEC 9899:1999 Programming languages - C
+    </em>. </span><span class="copyright">Copyright © 1999 ISO. </span></p></div><div class="biblioentry"><a id="id476788"/><p><span class="biblioid">
+      . </span><span class="citetitle"><em class="citetitle">
+	The Open Group Base Specifications, Issue 6 (IEEE Std. 1003.1-2004)
+      </em>. </span><span class="copyright">Copyright © 1999 
+      The Open Group/The Institute of Electrical and Electronics Engineers, Inc.. </span></p></div><div class="biblioentry"><a id="id476817"/><p><span class="citetitle"><em class="citetitle">
+      The C++ Programming Language, Special Edition
+    </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername">
+	Addison Wesley
+      . </span></span></p></div><div class="biblioentry"><a id="id476856"/><p><span class="citetitle"><em class="citetitle">
+      Standard C++ IOStreams and Locales
+    </em>. </span><span class="subtitle">
+      Advanced Programmer's Guide and Reference
+    . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername">
+	Addison Wesley Longman
+      . </span></span></p></div></div></div><div class="section" title="codecvt"><div class="titlepage"><div><div><h3 class="title"><a id="std.localization.facet.codecvt"/>codecvt</h3></div></div></div><p>
+The standard class codecvt attempts to address conversions between
+different character encoding schemes. In particular, the standard
+attempts to detail conversions between the implementation-defined wide
+characters (hereafter referred to as wchar_t) and the standard type
+char that is so beloved in classic <span class="quote">“<span class="quote">C</span>”</span> (which can now be
+referred to as narrow characters.)  This document attempts to describe
+how the GNU libstdc++ implementation deals with the conversion between
+wide and narrow characters, and also presents a framework for dealing
+with the huge number of other encodings that iconv can convert,
+including Unicode and UTF8. Design issues and requirements are
+addressed, and examples of correct usage for both the required
+specializations for wide and narrow characters and the
+implementation-provided extended functionality are given.
+</p><div class="section" title="Requirements"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.req"/>Requirements</h4></div></div></div><p>
+Around page 425 of the C++ Standard, this charming heading comes into view:
+</p><div class="blockquote"><blockquote class="blockquote"><p>
+22.2.1.5 - Template class codecvt
+</p></blockquote></div><p>
+The text around the codecvt definition gives some clues:
+</p><div class="blockquote"><blockquote class="blockquote"><p>
+<span class="emphasis"><em>
+-1- The class codecvt&lt;internT,externT,stateT&gt; is for use when
+converting from one codeset to another, such as from wide characters
+to multibyte characters, between wide character encodings such as
+Unicode and EUC.
+</em></span>
+</p></blockquote></div><p>
+Hmm. So, in some unspecified way, Unicode encodings and
+translations between other character sets should be handled by this
+class.
+</p><div class="blockquote"><blockquote class="blockquote"><p>
+<span class="emphasis"><em>
+-2- The stateT argument selects the pair of codesets being mapped between.
+</em></span>
+</p></blockquote></div><p>
+Ah ha! Another clue...
+</p><div class="blockquote"><blockquote class="blockquote"><p>
+<span class="emphasis"><em>
+-3- The instantiations required in the Table ??
+(lib.locale.category), namely codecvt&lt;wchar_t,char,mbstate_t&gt; and
+codecvt&lt;char,char,mbstate_t&gt;, convert the implementation-defined
+native character set. codecvt&lt;char,char,mbstate_t&gt; implements a
+degenerate conversion; it does not convert at
+all. codecvt&lt;wchar_t,char,mbstate_t&gt; converts between the native
+character sets for tiny and wide characters. Instantiations on
+mbstate_t perform conversion between encodings known to the library
+implementor.  Other encodings can be converted by specializing on a
+user-defined stateT type. The stateT object can contain any state that
+is useful to communicate to or from the specialized do_convert member.
+</em></span>
+</p></blockquote></div><p>
+At this point, a couple points become clear:
+</p><p>
+One: The standard clearly implies that attempts to add non-required
+(yet useful and widely used) conversions need to do so through the
+third template parameter, stateT.</p><p>
+Two: The required conversions, by specifying mbstate_t as the third
+template parameter, imply an implementation strategy that is mostly
+(or wholly) based on the underlying C library, and the functions
+mcsrtombs and wcsrtombs in particular.</p></div><div class="section" title="Design"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.design"/>Design</h4></div></div></div><div class="section" title="wchar_t Size"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.wchar_t_size"/><span class="type">wchar_t</span> Size</h5></div></div></div><p>
+      The simple implementation detail of wchar_t's size seems to
+      repeatedly confound people. Many systems use a two byte,
+      unsigned integral type to represent wide characters, and use an
+      internal encoding of Unicode or UCS2. (See AIX, Microsoft NT,
+      Java, others.) Other systems, use a four byte, unsigned integral
+      type to represent wide characters, and use an internal encoding
+      of UCS4. (GNU/Linux systems using glibc, in particular.) The C
+      programming language (and thus C++) does not specify a specific
+      size for the type wchar_t.
+    </p><p>
+      Thus, portable C++ code cannot assume a byte size (or endianness) either.
+    </p></div><div class="section" title="Support for Unicode"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.unicode"/>Support for Unicode</h5></div></div></div><p>
+    Probably the most frequently asked question about code conversion
+    is: "So dudes, what's the deal with Unicode strings?"
+    The dude part is optional, but apparently the usefulness of
+    Unicode strings is pretty widely appreciated. Sadly, this specific
+    encoding (And other useful encodings like UTF8, UCS4, ISO 8859-10,
+    etc etc etc) are not mentioned in the C++ standard.
+  </p><p>
+    A couple of comments:
+  </p><p>
+    The thought that all one needs to convert between two arbitrary
+    codesets is two types and some kind of state argument is
+    unfortunate. In particular, encodings may be stateless. The naming
+    of the third parameter as stateT is unfortunate, as what is really
+    needed is some kind of generalized type that accounts for the
+    issues that abstract encodings will need. The minimum information
+    that is required includes:
+  </p><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p>
+	Identifiers for each of the codesets involved in the
+	conversion. For example, using the iconv family of functions
+	from the Single Unix Specification (what used to be called
+	X/Open) hosted on the GNU/Linux operating system allows
+	bi-directional mapping between far more than the following
+	tantalizing possibilities:
+      </p><p>
+	(An edited list taken from <code class="code">`iconv --list`</code> on a
+	Red Hat 6.2/Intel system:
+      </p><div class="blockquote"><blockquote class="blockquote"><pre class="programlisting">
+8859_1, 8859_9, 10646-1:1993, 10646-1:1993/UCS4, ARABIC, ARABIC7,
+ASCII, EUC-CN, EUC-JP, EUC-KR, EUC-TW, GREEK-CCIcode, GREEK, GREEK7-OLD,
+GREEK7, GREEK8, HEBREW, ISO-8859-1, ISO-8859-2, ISO-8859-3,
+ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8,
+ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14,
+ISO-8859-15, ISO-10646, ISO-10646/UCS2, ISO-10646/UCS4,
+ISO-10646/UTF-8, ISO-10646/UTF8, SHIFT-JIS, SHIFT_JIS, UCS-2, UCS-4,
+UCS2, UCS4, UNICODE, UNICODEBIG, UNICODELIcodeLE, US-ASCII, US, UTF-8,
+UTF-16, UTF8, UTF16).
+</pre></blockquote></div><p>
+For iconv-based implementations, string literals for each of the
+encodings (i.e. "UCS-2" and "UTF-8") are necessary,
+although for other,
+non-iconv implementations a table of enumerated values or some other
+mechanism may be required.
+</p></li><li class="listitem"><p>
+ Maximum length of the identifying string literal.
+</p></li><li class="listitem"><p>
+ Some encodings require explicit endian-ness. As such, some kind
+  of endian marker or other byte-order marker will be necessary. See
+  "Footnotes for C/C++ developers" in Haible for more information on
+  UCS-2/Unicode endian issues. (Summary: big endian seems most likely,
+  however implementations, most notably Microsoft, vary.)
+</p></li><li class="listitem"><p>
+ Types representing the conversion state, for conversions involving
+  the machinery in the "C" library, or the conversion descriptor, for
+  conversions using iconv (such as the type iconv_t.)  Note that the
+  conversion descriptor encodes more information than a simple encoding
+  state type.
+</p></li><li class="listitem"><p>
+ Conversion descriptors for both directions of encoding. (i.e., both
+  UCS-2 to UTF-8 and UTF-8 to UCS-2.)
+</p></li><li class="listitem"><p>
+ Something to indicate if the conversion requested if valid.
+</p></li><li class="listitem"><p>
+ Something to represent if the conversion descriptors are valid.
+</p></li><li class="listitem"><p>
+ Some way to enforce strict type checking on the internal and
+  external types. As part of this, the size of the internal and
+  external types will need to be known.
+</p></li></ul></div></div><div class="section" title="Other Issues"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.issues"/>Other Issues</h5></div></div></div><p>
+In addition, multi-threaded and multi-locale environments also impact
+the design and requirements for code conversions. In particular, they
+affect the required specialization codecvt&lt;wchar_t, char, mbstate_t&gt;
+when implemented using standard "C" functions.
+</p><p>
+Three problems arise, one big, one of medium importance, and one small.
+</p><p>
+First, the small: mcsrtombs and wcsrtombs may not be multithread-safe
+on all systems required by the GNU tools. For GNU/Linux and glibc,
+this is not an issue.
+</p><p>
+Of medium concern, in the grand scope of things, is that the functions
+used to implement this specialization work on null-terminated
+strings. Buffers, especially file buffers, may not be null-terminated,
+thus giving conversions that end prematurely or are otherwise
+incorrect. Yikes!
+</p><p>
+The last, and fundamental problem, is the assumption of a global
+locale for all the "C" functions referenced above. For something like
+C++ iostreams (where codecvt is explicitly used) the notion of
+multiple locales is fundamental. In practice, most users may not run
+into this limitation. However, as a quality of implementation issue,
+the GNU C++ library would like to offer a solution that allows
+multiple locales and or simultaneous usage with computationally
+correct results. In short, libstdc++ is trying to offer, as an
+option, a high-quality implementation, damn the additional complexity!
+</p><p>
+For the required specialization codecvt&lt;wchar_t, char, mbstate_t&gt; ,
+conversions are made between the internal character set (always UCS4
+on GNU/Linux) and whatever the currently selected locale for the
+LC_CTYPE category implements.
+</p></div></div><div class="section" title="Implementation"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.impl"/>Implementation</h4></div></div></div><p>
+The two required specializations are implemented as follows:
+</p><p>
+<code class="code">
+codecvt&lt;char, char, mbstate_t&gt;
+</code>
+</p><p>
+This is a degenerate (i.e., does nothing) specialization. Implementing
+this was a piece of cake.
+</p><p>
+<code class="code">
+codecvt&lt;char, wchar_t, mbstate_t&gt;
+</code>
+</p><p>
+This specialization, by specifying all the template parameters, pretty
+much ties the hands of implementors. As such, the implementation is
+straightforward, involving mcsrtombs for the conversions between char
+to wchar_t and wcsrtombs for conversions between wchar_t and char.
+</p><p>
+Neither of these two required specializations deals with Unicode
+characters. As such, libstdc++ implements a partial specialization
+of the codecvt class with and iconv wrapper class, encoding_state as the
+third template parameter.
+</p><p>
+This implementation should be standards conformant. First of all, the
+standard explicitly points out that instantiations on the third
+template parameter, stateT, are the proper way to implement
+non-required conversions. Second of all, the standard says (in Chapter
+17) that partial specializations of required classes are a-ok. Third
+of all, the requirements for the stateT type elsewhere in the standard
+(see 21.1.2 traits typedefs) only indicate that this type be copy
+constructible.
+</p><p>
+As such, the type encoding_state is defined as a non-templatized, POD
+type to be used as the third type of a codecvt instantiation. This
+type is just a wrapper class for iconv, and provides an easy interface
+to iconv functionality.
+</p><p>
+There are two constructors for encoding_state:
+</p><p>
+<code class="code">
+encoding_state() : __in_desc(0), __out_desc(0)
+</code>
+</p><p>
+This default constructor sets the internal encoding to some default
+(currently UCS4) and the external encoding to whatever is returned by
+nl_langinfo(CODESET).
+</p><p>
+<code class="code">
+encoding_state(const char* __int, const char* __ext)
+</code>
+</p><p>
+This constructor takes as parameters string literals that indicate the
+desired internal and external encoding. There are no defaults for
+either argument.
+</p><p>
+One of the issues with iconv is that the string literals identifying
+conversions are not standardized. Because of this, the thought of
+mandating and or enforcing some set of pre-determined valid
+identifiers seems iffy: thus, a more practical (and non-migraine
+inducing) strategy was implemented: end-users can specify any string
+(subject to a pre-determined length qualifier, currently 32 bytes) for
+encodings. It is up to the user to make sure that these strings are
+valid on the target system.
+</p><p>
+<code class="code">
+void
+_M_init()
+</code>
+</p><p>
+Strangely enough, this member function attempts to open conversion
+descriptors for a given encoding_state object. If the conversion
+descriptors are not valid, the conversion descriptors returned will
+not be valid and the resulting calls to the codecvt conversion
+functions will return error.
+</p><p>
+<code class="code">
+bool
+_M_good()
+</code>
+</p><p>
+Provides a way to see if the given encoding_state object has been
+properly initialized. If the string literals describing the desired
+internal and external encoding are not valid, initialization will
+fail, and this will return false. If the internal and external
+encodings are valid, but iconv_open could not allocate conversion
+descriptors, this will also return false. Otherwise, the object is
+ready to convert and will return true.
+</p><p>
+<code class="code">
+encoding_state(const encoding_state&amp;)
+</code>
+</p><p>
+As iconv allocates memory and sets up conversion descriptors, the copy
+constructor can only copy the member data pertaining to the internal
+and external code conversions, and not the conversion descriptors
+themselves.
+</p><p>
+Definitions for all the required codecvt member functions are provided
+for this specialization, and usage of codecvt&lt;internal character type,
+external character type, encoding_state&gt; is consistent with other
+codecvt usage.
+</p></div><div class="section" title="Use"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.use"/>Use</h4></div></div></div><p>A conversions involving string literal.</p><pre class="programlisting">
+  typedef codecvt_base::result                  result;
+  typedef unsigned short                        unicode_t;
+  typedef unicode_t                             int_type;
+  typedef char                                  ext_type;
+  typedef encoding_state                          state_type;
+  typedef codecvt&lt;int_type, ext_type, state_type&gt; unicode_codecvt;
+
+  const ext_type*       e_lit = "black pearl jasmine tea";
+  int                   size = strlen(e_lit);
+  int_type              i_lit_base[24] =
+  { 25088, 27648, 24832, 25344, 27392, 8192, 28672, 25856, 24832, 29184,
+    27648, 8192, 27136, 24832, 29440, 27904, 26880, 28160, 25856, 8192, 29696,
+    25856, 24832, 2560
+  };
+  const int_type*       i_lit = i_lit_base;
+  const ext_type*       efrom_next;
+  const int_type*       ifrom_next;
+  ext_type*             e_arr = new ext_type[size + 1];
+  ext_type*             eto_next;
+  int_type*             i_arr = new int_type[size + 1];
+  int_type*             ito_next;
+
+  // construct a locale object with the specialized facet.
+  locale                loc(locale::classic(), new unicode_codecvt);
+  // sanity check the constructed locale has the specialized facet.
+  VERIFY( has_facet&lt;unicode_codecvt&gt;(loc) );
+  const unicode_codecvt&amp; cvt = use_facet&lt;unicode_codecvt&gt;(loc);
+  // convert between const char* and unicode strings
+  unicode_codecvt::state_type state01("UNICODE", "ISO_8859-1");
+  initialize_state(state01);
+  result r1 = cvt.in(state01, e_lit, e_lit + size, efrom_next,
+		     i_arr, i_arr + size, ito_next);
+  VERIFY( r1 == codecvt_base::ok );
+  VERIFY( !int_traits::compare(i_arr, i_lit, size) );
+  VERIFY( efrom_next == e_lit + size );
+  VERIFY( ito_next == i_arr + size );
+</pre></div><div class="section" title="Future"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.future"/>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p>
+   a. things that are sketchy, or remain unimplemented:
+      do_encoding, max_length and length member functions
+      are only weakly implemented. I have no idea how to do
+      this correctly, and in a generic manner.  Nathan?
+</p></li><li class="listitem"><p>
+   b. conversions involving std::string
+  </p><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p>
+      how should operators != and == work for string of
+      different/same encoding?
+      </p></li><li class="listitem"><p>
+      what is equal? A byte by byte comparison or an
+      encoding then byte comparison?
+      </p></li><li class="listitem"><p>
+      conversions between narrow, wide, and unicode strings
+      </p></li></ul></div></li><li class="listitem"><p>
+   c. conversions involving std::filebuf and std::ostream
+</p><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p>
+      how to initialize the state object in a
+      standards-conformant manner?
+      </p></li><li class="listitem"><p>
+      how to synchronize the "C" and "C++"
+      conversion information?
+      </p></li><li class="listitem"><p>
+      wchar_t/char internal buffers and conversions between
+      internal/external buffers?
+      </p></li></ul></div></li></ul></div></div><div class="bibliography" title="Bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.biblio"/>Bibliography</h4></div></div></div><div class="biblioentry"><a id="id477506"/><p><span class="citetitle"><em class="citetitle">
+      The GNU C Library
+    </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2007 FSF. </span><span class="pagenums">
+      Chapters 6 Character Set Handling and 7 Locales and Internationalization
+    . </span></p></div><div class="biblioentry"><a id="id477546"/><p><span class="citetitle"><em class="citetitle">
+      Correspondence
+    </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2002 . </span></p></div><div class="biblioentry"><a id="id477571"/><p><span class="citetitle"><em class="citetitle">
+      ISO/IEC 14882:1998 Programming languages - C++
+    </em>. </span><span class="copyright">Copyright © 1998 ISO. </span></p></div><div class="biblioentry"><a id="id477590"/><p><span class="citetitle"><em class="citetitle">
+      ISO/IEC 9899:1999 Programming languages - C
+    </em>. </span><span class="copyright">Copyright © 1999 ISO. </span></p></div><div class="biblioentry"><a id="id477609"/><p><span class="biblioid">
+    . </span><span class="citetitle"><em class="citetitle">
+      System Interface Definitions, Issue 7 (IEEE Std. 1003.1-2008)
+    </em>. </span><span class="copyright">Copyright © 2008 
+	The Open Group/The Institute of Electrical and Electronics
+	Engineers, Inc.
+      . </span></p></div><div class="biblioentry"><a id="id477639"/><p><span class="citetitle"><em class="citetitle">
+      The C++ Programming Language, Special Edition
+    </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername">
+	Addison Wesley
+      . </span></span></p></div><div class="biblioentry"><a id="id477677"/><p><span class="citetitle"><em class="citetitle">
+      Standard C++ IOStreams and Locales
+    </em>. </span><span class="subtitle">
+      Advanced Programmer's Guide and Reference
+    . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername">
+	Addison Wesley Longman
+      . </span></span></p></div><div class="biblioentry"><a id="id477724"/><p><span class="biblioid">
+    . </span><span class="citetitle"><em class="citetitle">
+      A brief description of Normative Addendum 1
+    </em>. </span><span class="author"><span class="firstname">Clive</span> <span class="surname">Feather</span>. </span><span class="pagenums">Extended Character Sets. </span></p></div><div class="biblioentry"><a id="id477754"/><p><span class="biblioid">
+	. </span><span class="citetitle"><em class="citetitle">
+	  The Unicode HOWTO
+	</em>. </span><span class="author"><span class="firstname">Bruno</span> <span class="surname">Haible</span>. </span></p></div><div class="biblioentry"><a id="id477779"/><p><span class="biblioid">
+    . </span><span class="citetitle"><em class="citetitle">
+      UTF-8 and Unicode FAQ for Unix/Linux
+    </em>. </span><span class="author"><span class="firstname">Markus</span> <span class="surname">Khun</span>. </span></p></div></div></div><div class="section" title="messages"><div class="titlepage"><div><div><h3 class="title"><a id="manual.localization.facet.messages"/>messages</h3></div></div></div><p>
+The std::messages facet implements message retrieval functionality
+equivalent to Java's java.text.MessageFormat .using either GNU gettext
+or IEEE 1003.1-200 functions.
+</p><div class="section" title="Requirements"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.req"/>Requirements</h4></div></div></div><p>
+The std::messages facet is probably the most vaguely defined facet in
+the standard library. It's assumed that this facility was built into
+the standard library in order to convert string literals from one
+locale to the other. For instance, converting the "C" locale's
+<code class="code">const char* c = "please"</code> to a German-localized <code class="code">"bitte"</code>
+during program execution.
+</p><div class="blockquote"><blockquote class="blockquote"><p>
+22.2.7.1 - Template class messages [lib.locale.messages]
+</p></blockquote></div><p>
+This class has three public member functions, which directly
+correspond to three protected virtual member functions.
+</p><p>
+The public member functions are:
+</p><p>
+<code class="code">catalog open(const string&amp;, const locale&amp;) const</code>
+</p><p>
+<code class="code">string_type get(catalog, int, int, const string_type&amp;) const</code>
+</p><p>
+<code class="code">void close(catalog) const</code>
+</p><p>
+While the virtual functions are:
+</p><p>
+<code class="code">catalog do_open(const string&amp;, const locale&amp;) const</code>
+</p><div class="blockquote"><blockquote class="blockquote"><p>
+<span class="emphasis"><em>
+-1- Returns: A value that may be passed to get() to retrieve a
+message, from the message catalog identified by the string name
+according to an implementation-defined mapping. The result can be used
+until it is passed to close().  Returns a value less than 0 if no such
+catalog can be opened.
+</em></span>
+</p></blockquote></div><p>
+<code class="code">string_type do_get(catalog, int, int, const string_type&amp;) const</code>
+</p><div class="blockquote"><blockquote class="blockquote"><p>
+<span class="emphasis"><em>
+-3- Requires: A catalog cat obtained from open() and not yet closed.
+-4- Returns: A message identified by arguments set, msgid, and dfault,
+according to an implementation-defined mapping. If no such message can
+be found, returns dfault.
+</em></span>
+</p></blockquote></div><p>
+<code class="code">void do_close(catalog) const</code>
+</p><div class="blockquote"><blockquote class="blockquote"><p>
+<span class="emphasis"><em>
+-5- Requires: A catalog cat obtained from open() and not yet closed.
+-6- Effects: Releases unspecified resources associated with cat.
+-7- Notes: The limit on such resources, if any, is implementation-defined.
+</em></span>
+</p></blockquote></div></div><div class="section" title="Design"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.design"/>Design</h4></div></div></div><p>
+A couple of notes on the standard.
+</p><p>
+First, why is <code class="code">messages_base::catalog</code> specified as a typedef
+to int? This makes sense for implementations that use
+<code class="code">catopen</code>, but not for others. Fortunately, it's not heavily
+used and so only a minor irritant.
+</p><p>
+Second, by making the member functions <code class="code">const</code>, it is
+impossible to save state in them. Thus, storing away information used
+in the 'open' member function for use in 'get' is impossible. This is
+unfortunate.
+</p><p>
+The 'open' member function in particular seems to be oddly
+designed. The signature seems quite peculiar. Why specify a <code class="code">const
+string&amp; </code> argument, for instance, instead of just <code class="code">const
+char*</code>? Or, why specify a <code class="code">const locale&amp;</code> argument that is
+to be used in the 'get' member function? How, exactly, is this locale
+argument useful? What was the intent? It might make sense if a locale
+argument was associated with a given default message string in the
+'open' member function, for instance. Quite murky and unclear, on
+reflection.
+</p><p>
+Lastly, it seems odd that messages, which explicitly require code
+conversion, don't use the codecvt facet. Because the messages facet
+has only one template parameter, it is assumed that ctype, and not
+codecvt, is to be used to convert between character sets.
+</p><p>
+It is implicitly assumed that the locale for the default message
+string in 'get' is in the "C" locale. Thus, all source code is assumed
+to be written in English, so translations are always from "en_US" to
+other, explicitly named locales.
+</p></div><div class="section" title="Implementation"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.impl"/>Implementation</h4></div></div></div><div class="section" title="Models"><div class="titlepage"><div><div><h5 class="title"><a id="messages.impl.models"/>Models</h5></div></div></div><p>
+    This is a relatively simple class, on the face of it. The standard
+    specifies very little in concrete terms, so generic
+    implementations that are conforming yet do very little are the
+    norm. Adding functionality that would be useful to programmers and
+    comparable to Java's java.text.MessageFormat takes a bit of work,
+    and is highly dependent on the capabilities of the underlying
+    operating system.
+  </p><p>
+    Three different mechanisms have been provided, selectable via
+    configure flags:
+  </p><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p>
+       generic
+     </p><p>
+       This model does very little, and is what is used by default.
+     </p></li><li class="listitem"><p>
+       gnu
+     </p><p>
+       The gnu model is complete and fully tested. It's based on the
+       GNU gettext package, which is part of glibc. It uses the
+       functions <code class="code">textdomain, bindtextdomain, gettext</code> to
+       implement full functionality. Creating message catalogs is a
+       relatively straight-forward process and is lightly documented
+       below, and fully documented in gettext's distributed
+       documentation.
+     </p></li><li class="listitem"><p>
+       ieee_1003.1-200x
+     </p><p>
+       This is a complete, though untested, implementation based on
+       the IEEE standard. The functions <code class="code">catopen, catgets,
+       catclose</code> are used to retrieve locale-specific messages
+       given the appropriate message catalogs that have been
+       constructed for their use. Note, the script <code class="code">
+       po2msg.sed</code> that is part of the gettext distribution can
+       convert gettext catalogs into catalogs that
+       <code class="code">catopen</code> can use.
+   </p></li></ul></div><p>
+A new, standards-conformant non-virtual member function signature was
+added for 'open' so that a directory could be specified with a given
+message catalog. This simplifies calling conventions for the gnu
+model.
+</p></div><div class="section" title="The GNU Model"><div class="titlepage"><div><div><h5 class="title"><a id="messages.impl.gnu"/>The GNU Model</h5></div></div></div><p>
+    The messages facet, because it is retrieving and converting
+    between characters sets, depends on the ctype and perhaps the
+    codecvt facet in a given locale. In addition, underlying "C"
+    library locale support is necessary for more than just the
+    <code class="code">LC_MESSAGES</code> mask: <code class="code">LC_CTYPE</code> is also
+    necessary. To avoid any unpleasantness, all bits of the "C" mask
+    (i.e. <code class="code">LC_ALL</code>) are set before retrieving messages.
+  </p><p>
+    Making the message catalogs can be initially tricky, but become
+    quite simple with practice. For complete info, see the gettext
+    documentation. Here's an idea of what is required:
+  </p><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p>
+       Make a source file with the required string literals that need
+       to be translated. See <code class="code">intl/string_literals.cc</code> for
+       an example.
+     </p></li><li class="listitem"><p>
+       Make initial catalog (see "4 Making the PO Template File" from
+       the gettext docs).</p><p>
+   <code class="code"> xgettext --c++ --debug string_literals.cc -o libstdc++.pot </code>
+   </p></li><li class="listitem"><p>Make language and country-specific locale catalogs.</p><p>
+   <code class="code">cp libstdc++.pot fr_FR.po</code>
+   </p><p>
+   <code class="code">cp libstdc++.pot de_DE.po</code>
+   </p></li><li class="listitem"><p>
+       Edit localized catalogs in emacs so that strings are
+       translated.
+     </p><p>
+   <code class="code">emacs fr_FR.po</code>
+   </p></li><li class="listitem"><p>Make the binary mo files.</p><p>
+   <code class="code">msgfmt fr_FR.po -o fr_FR.mo</code>
+   </p><p>
+   <code class="code">msgfmt de_DE.po -o de_DE.mo</code>
+   </p></li><li class="listitem"><p>Copy the binary files into the correct directory structure.</p><p>
+   <code class="code">cp fr_FR.mo (dir)/fr_FR/LC_MESSAGES/libstdc++.mo</code>
+   </p><p>
+   <code class="code">cp de_DE.mo (dir)/de_DE/LC_MESSAGES/libstdc++.mo</code>
+   </p></li><li class="listitem"><p>Use the new message catalogs.</p><p>
+   <code class="code">locale loc_de("de_DE");</code>
+   </p><p>
+   <code class="code">
+   use_facet&lt;messages&lt;char&gt; &gt;(loc_de).open("libstdc++", locale(), dir);
+   </code>
+   </p></li></ul></div></div></div><div class="section" title="Use"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.use"/>Use</h4></div></div></div><p>
+   A simple example using the GNU model of message conversion.
+ </p><pre class="programlisting">
+#include &lt;iostream&gt;
+#include &lt;locale&gt;
+using namespace std;
+
+void test01()
+{
+  typedef messages&lt;char&gt;::catalog catalog;
+  const char* dir =
+  "/mnt/egcs/build/i686-pc-linux-gnu/libstdc++/po/share/locale";
+  const locale loc_de("de_DE");
+  const messages&lt;char&gt;&amp; mssg_de = use_facet&lt;messages&lt;char&gt; &gt;(loc_de);
+
+  catalog cat_de = mssg_de.open("libstdc++", loc_de, dir);
+  string s01 = mssg_de.get(cat_de, 0, 0, "please");
+  string s02 = mssg_de.get(cat_de, 0, 0, "thank you");
+  cout &lt;&lt; "please in german:" &lt;&lt; s01 &lt;&lt; '\n';
+  cout &lt;&lt; "thank you in german:" &lt;&lt; s02 &lt;&lt; '\n';
+  mssg_de.close(cat_de);
+}
+</pre></div><div class="section" title="Future"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.future"/>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p>
+    Things that are sketchy, or remain unimplemented:
+  </p><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p>
+	  _M_convert_from_char, _M_convert_to_char are in flux,
+	  depending on how the library ends up doing character set
+	  conversions. It might not be possible to do a real character
+	  set based conversion, due to the fact that the template
+	  parameter for messages is not enough to instantiate the
+	  codecvt facet (1 supplied, need at least 2 but would prefer
+	  3).
+	</p></li><li class="listitem"><p>
+	  There are issues with gettext needing the global locale set
+	  to extract a message. This dependence on the global locale
+	  makes the current "gnu" model non MT-safe. Future versions
+	  of glibc, i.e. glibc 2.3.x will fix this, and the C++ library
+	  bits are already in place.
+	</p></li></ul></div></li><li class="listitem"><p>
+    Development versions of the GNU "C" library, glibc 2.3 will allow
+    a more efficient, MT implementation of std::messages, and will
+    allow the removal of the _M_name_messages data member. If this is
+    done, it will change the library ABI. The C++ parts to support
+    glibc 2.3 have already been coded, but are not in use: once this
+    version of the "C" library is released, the marked parts of the
+    messages implementation can be switched over to the new "C"
+    library functionality.
+  </p></li><li class="listitem"><p>
+    At some point in the near future, std::numpunct will probably use
+    std::messages facilities to implement truename/falsename
+    correctly. This is currently not done, but entries in
+    libstdc++.pot have already been made for "true" and "false" string
+    literals, so all that remains is the std::numpunct coding and the
+    configure/make hassles to make the installed library search its
+    own catalog. Currently the libstdc++.mo catalog is only searched
+    for the testsuite cases involving messages members.
+  </p></li><li class="listitem"><p> The following member functions:</p><p>
+   <code class="code">
+	catalog
+	open(const basic_string&lt;char&gt;&amp; __s, const locale&amp; __loc) const
+   </code>
+   </p><p>
+   <code class="code">
+   catalog
+   open(const basic_string&lt;char&gt;&amp;, const locale&amp;, const char*) const;
+   </code>
+   </p><p>
+   Don't actually return a "value less than 0 if no such catalog
+   can be opened" as required by the standard in the "gnu"
+   model. As of this writing, it is unknown how to query to see
+   if a specified message catalog exists using the gettext
+   package.
+   </p></li></ul></div></div><div class="bibliography" title="Bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.biblio"/>Bibliography</h4></div></div></div><div class="biblioentry"><a id="id478453"/><p><span class="citetitle"><em class="citetitle">
+      The GNU C Library
+    </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2007 FSF. </span><span class="pagenums">Chapters 6 Character Set Handling, and 7 Locales and Internationalization
+    . </span></p></div><div class="biblioentry"><a id="id478493"/><p><span class="citetitle"><em class="citetitle">
+      Correspondence
+    </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2002 . </span></p></div><div class="biblioentry"><a id="id478519"/><p><span class="citetitle"><em class="citetitle">
+      ISO/IEC 14882:1998 Programming languages - C++
+    </em>. </span><span class="copyright">Copyright © 1998 ISO. </span></p></div><div class="biblioentry"><a id="id478538"/><p><span class="citetitle"><em class="citetitle">
+      ISO/IEC 9899:1999 Programming languages - C
+    </em>. </span><span class="copyright">Copyright © 1999 ISO. </span></p></div><div class="biblioentry"><a id="id478557"/><p><span class="biblioid">
+    . </span><span class="citetitle"><em class="citetitle">
+      System Interface Definitions, Issue 7 (IEEE Std. 1003.1-2008)
+    </em>. </span><span class="copyright">Copyright © 2008 
+	The Open Group/The Institute of Electrical and Electronics
+	Engineers, Inc.
+      . </span></p></div><div class="biblioentry"><a id="id478586"/><p><span class="citetitle"><em class="citetitle">
+      The C++ Programming Language, Special Edition
+    </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername">
+	Addison Wesley
+      . </span></span></p></div><div class="biblioentry"><a id="id478624"/><p><span class="citetitle"><em class="citetitle">
+      Standard C++ IOStreams and Locales
+    </em>. </span><span class="subtitle">
+      Advanced Programmer's Guide and Reference
+    . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername">
+	Addison Wesley Longman
+      . </span></span></p></div><div class="biblioentry"><a id="id478672"/><p><span class="biblioid">
+      . </span><span class="citetitle"><em class="citetitle">
+	API Specifications, Java Platform
+      </em>. </span><span class="pagenums">java.util.Properties, java.text.MessageFormat,
+java.util.Locale, java.util.ResourceBundle
+    . </span></p></div><div class="biblioentry"><a id="id478694"/><p><span class="biblioid">
+    . </span><span class="citetitle"><em class="citetitle">
+      GNU gettext tools, version 0.10.38, Native Language Support
+      Library and Tools.
+    </em>. </span></p></div></div></div></div><div class="navfooter"><hr/><table width="100%" summary="Navigation footer"><tr><td align="left"><a accesskey="p" href="localization.html">Prev</a> </td><td align="center"><a accesskey="u" href="localization.html">Up</a></td><td align="right"> <a accesskey="n" href="containers.html">Next</a></td></tr><tr><td align="left" valign="top">Chapter 8. 
+  Localization
+  
+ </td><td align="center"><a accesskey="h" href="../spine.html">Home</a></td><td align="right" valign="top"> Chapter 9. 
+  Containers
+  
+</td></tr></table></div></body></html>