From 554fd8c5195424bdbcabf5de30fdc183aba391bd Mon Sep 17 00:00:00 2001 From: upstream source tree Date: Sun, 15 Mar 2015 20:14:05 -0400 Subject: obtained gcc-4.6.4.tar.bz2 from upstream website; verified gcc-4.6.4.tar.bz2.sig; imported gcc-4.6.4 source tree from verified upstream tarball. downloading a git-generated archive based on the 'upstream' tag should provide you with a source tree that is binary identical to the one extracted from the above tarball. if you have obtained the source via the command 'git clone', however, do note that line-endings of files in your working directory might differ from line-endings of the respective files in the upstream repository. --- libstdc++-v3/doc/xml/manual/strings.xml | 486 ++++++++++++++++++++++++++++++++ 1 file changed, 486 insertions(+) create mode 100644 libstdc++-v3/doc/xml/manual/strings.xml (limited to 'libstdc++-v3/doc/xml/manual/strings.xml') diff --git a/libstdc++-v3/doc/xml/manual/strings.xml b/libstdc++-v3/doc/xml/manual/strings.xml new file mode 100644 index 000000000..4d9fc64f7 --- /dev/null +++ b/libstdc++-v3/doc/xml/manual/strings.xml @@ -0,0 +1,486 @@ + + + + + Strings + <indexterm><primary>Strings</primary></indexterm> + + + + ISO C++ + + + library + + + + + + + +
String Classes + + +
Simple Transformations + + + Here are Standard, simple, and portable ways to perform common + transformations on a string instance, such as + "convert to all upper case." The word transformations + is especially apt, because the standard template function + transform<> is used. + + + This code will go through some iterations. Here's a simple + version: + + + #include <string> + #include <algorithm> + #include <cctype> // old <ctype.h> + + struct ToLower + { + char operator() (char c) const { return std::tolower(c); } + }; + + struct ToUpper + { + char operator() (char c) const { return std::toupper(c); } + }; + + int main() + { + std::string s ("Some Kind Of Initial Input Goes Here"); + + // Change everything into upper case + std::transform (s.begin(), s.end(), s.begin(), ToUpper()); + + // Change everything into lower case + std::transform (s.begin(), s.end(), s.begin(), ToLower()); + + // Change everything back into upper case, but store the + // result in a different string + std::string capital_s; + capital_s.resize(s.size()); + std::transform (s.begin(), s.end(), capital_s.begin(), ToUpper()); + } + + + Note that these calls all + involve the global C locale through the use of the C functions + toupper/tolower. This is absolutely guaranteed to work -- + but only if the string contains only characters + from the basic source character set, and there are only + 96 of those. Which means that not even all English text can be + represented (certain British spellings, proper names, and so forth). + So, if all your input forevermore consists of only those 96 + characters (hahahahahaha), then you're done. + + Note that the + ToUpper and ToLower function objects + are needed because toupper and tolower + are overloaded names (declared in <cctype> and + <locale>) so the template-arguments for + transform<> cannot be deduced, as explained in + this + message. + + At minimum, you can write short wrappers like + + + char toLower (char c) + { + return std::tolower(c); + } + (Thanks to James Kanze for assistance and suggestions on all of this.) + + Another common operation is trimming off excess whitespace. Much + like transformations, this task is trivial with the use of string's + find family. These examples are broken into multiple + statements for readability: + + + std::string str (" \t blah blah blah \n "); + + // trim leading whitespace + string::size_type notwhite = str.find_first_not_of(" \t\n"); + str.erase(0,notwhite); + + // trim trailing whitespace + notwhite = str.find_last_not_of(" \t\n"); + str.erase(notwhite+1); + Obviously, the calls to find could be inserted directly + into the calls to erase, in case your compiler does not + optimize named temporaries out of existence. + + +
+
Case Sensitivity + + + + + The well-known-and-if-it-isn't-well-known-it-ought-to-be + Guru of the Week + discussions held on Usenet covered this topic in January of 1998. + Briefly, the challenge was, write a 'ci_string' class which + is identical to the standard 'string' class, but is + case-insensitive in the same way as the (common but nonstandard) + C function stricmp(). + + + ci_string s( "AbCdE" ); + + // case insensitive + assert( s == "abcde" ); + assert( s == "ABCDE" ); + + // still case-preserving, of course + assert( strcmp( s.c_str(), "AbCdE" ) == 0 ); + assert( strcmp( s.c_str(), "abcde" ) != 0 ); + + The solution is surprisingly easy. The original answer was + posted on Usenet, and a revised version appears in Herb Sutter's + book Exceptional C++ and on his website as GotW 29. + + See? Told you it was easy! + + Added June 2000: The May 2000 issue of C++ + Report contains a fascinating article by + Matt Austern (yes, the Matt Austern) on why + case-insensitive comparisons are not as easy as they seem, and + why creating a class is the wrong way to go + about it in production code. (The GotW answer mentions one of + the principle difficulties; his article mentions more.) + + Basically, this is "easy" only if you ignore some things, + things which may be too important to your program to ignore. (I chose + to ignore them when originally writing this entry, and am surprised + that nobody ever called me on it...) The GotW question and answer + remain useful instructional tools, however. + + Added September 2000: James Kanze provided a link to a + Unicode + Technical Report discussing case handling, which provides some + very good information. + + +
+
Arbitrary Character Types + + + + + The std::basic_string is tantalizingly general, in that + it is parameterized on the type of the characters which it holds. + In theory, you could whip up a Unicode character class and instantiate + std::basic_string<my_unicode_char>, or assuming + that integers are wider than characters on your platform, maybe just + declare variables of type std::basic_string<int>. + + That's the theory. Remember however that basic_string has additional + type parameters, which take default arguments based on the character + type (called CharT here): + + + template <typename CharT, + typename Traits = char_traits<CharT>, + typename Alloc = allocator<CharT> > + class basic_string { .... }; + Now, allocator<CharT> will probably Do The Right + Thing by default, unless you need to implement your own allocator + for your characters. + + But char_traits takes more work. The char_traits + template is declared but not defined. + That means there is only + + + template <typename CharT> + struct char_traits + { + static void foo (type1 x, type2 y); + ... + }; + and functions such as char_traits<CharT>::foo() are not + actually defined anywhere for the general case. The C++ standard + permits this, because writing such a definition to fit all possible + CharT's cannot be done. + + The C++ standard also requires that char_traits be specialized for + instantiations of char and wchar_t, and it + is these template specializations that permit entities like + basic_string<char,char_traits<char>> to work. + + If you want to use character types other than char and wchar_t, + such as unsigned char and int, you will + need suitable specializations for them. For a time, in earlier + versions of GCC, there was a mostly-correct implementation that + let programmers be lazy but it broke under many situations, so it + was removed. GCC 3.4 introduced a new implementation that mostly + works and can be specialized even for int and other + built-in types. + + If you want to use your own special character class, then you have + a lot + of work to do, especially if you with to use i18n features + (facets require traits information but don't have a traits argument). + + Another example of how to specialize char_traits was given on the + mailing list and at a later date was put into the file + include/ext/pod_char_traits.h. We agree + that the way it's used with basic_string (scroll down to main()) + doesn't look nice, but that's because the + nice-looking first attempt turned out to not + be conforming C++, due to the rule that CharT must be a POD. + (See how tricky this is?) + + +
+ +
Tokenizing + + + + The Standard C (and C++) function strtok() leaves a lot to + be desired in terms of user-friendliness. It's unintuitive, it + destroys the character string on which it operates, and it requires + you to handle all the memory problems. But it does let the client + code decide what to use to break the string into pieces; it allows + you to choose the "whitespace," so to speak. + + A C++ implementation lets us keep the good things and fix those + annoyances. The implementation here is more intuitive (you only + call it once, not in a loop with varying argument), it does not + affect the original string at all, and all the memory allocation + is handled for you. + + It's called stringtok, and it's a template function. Sources are + as below, in a less-portable form than it could be, to keep this + example simple (for example, see the comments on what kind of + string it will accept). + + + +#include <string> +template <typename Container> +void +stringtok(Container &container, string const &in, + const char * const delimiters = " \t\n") +{ + const string::size_type len = in.length(); + string::size_type i = 0; + + while (i < len) + { + // Eat leading whitespace + i = in.find_first_not_of(delimiters, i); + if (i == string::npos) + return; // Nothing left but white space + + // Find the end of the token + string::size_type j = in.find_first_of(delimiters, i); + + // Push token + if (j == string::npos) + { + container.push_back(in.substr(i)); + return; + } + else + container.push_back(in.substr(i, j-i)); + + // Set up for next loop + i = j + 1; + } +} + + + + + The author uses a more general (but less readable) form of it for + parsing command strings and the like. If you compiled and ran this + code using it: + + + + + std::list<string> ls; + stringtok (ls, " this \t is\t\n a test "); + for (std::list<string>const_iterator i = ls.begin(); + i != ls.end(); ++i) + { + std::cerr << ':' << (*i) << ":\n"; + } + You would see this as output: + + + :this: + :is: + :a: + :test: + with all the whitespace removed. The original s is still + available for use, ls will clean up after itself, and + ls.size() will return how many tokens there were. + + As always, there is a price paid here, in that stringtok is not + as fast as strtok. The other benefits usually outweigh that, however. + + + Added February 2001: Mark Wilden pointed out that the + standard std::getline() function can be used with standard + istringstreams to perform + tokenizing as well. Build an istringstream from the input text, + and then use std::getline with varying delimiters (the three-argument + signature) to extract tokens into a string. + + + +
+
Shrink to Fit + + + + From GCC 3.4 calling s.reserve(res) on a + string s with res < s.capacity() will + reduce the string's capacity to std::max(s.size(), res). + + This behaviour is suggested, but not required by the standard. Prior + to GCC 3.4 the following alternative can be used instead + + + std::string(str.data(), str.size()).swap(str); + + This is similar to the idiom for reducing + a vector's memory usage + (see this FAQ + entry) but the regular copy constructor cannot be used + because libstdc++'s string is Copy-On-Write. + + In C++0x mode you can call + s.shrink_to_fit() to achieve the same effect as + s.reserve(s.size()). + + + +
+ +
CString (MFC) + + + + + A common lament seen in various newsgroups deals with the Standard + string class as opposed to the Microsoft Foundation Class called + CString. Often programmers realize that a standard portable + answer is better than a proprietary nonportable one, but in porting + their application from a Win32 platform, they discover that they + are relying on special functions offered by the CString class. + + Things are not as bad as they seem. In + this + message, Joe Buck points out a few very important things: + + + The Standard string supports all the operations + that CString does, with three exceptions. + + Two of those exceptions (whitespace trimming and case + conversion) are trivial to implement. In fact, we do so + on this page. + + The third is CString::Format, which allows formatting + in the style of sprintf. This deserves some mention: + + + + The old libg++ library had a function called form(), which did much + the same thing. But for a Standard solution, you should use the + stringstream classes. These are the bridge between the iostream + hierarchy and the string class, and they operate with regular + streams seamlessly because they inherit from the iostream + hierarchy. An quick example: + + + #include <iostream> + #include <string> + #include <sstream> + + string f (string& incoming) // incoming is "foo N" + { + istringstream incoming_stream(incoming); + string the_word; + int the_number; + + incoming_stream >> the_word // extract "foo" + >> the_number; // extract N + + ostringstream output_stream; + output_stream << "The word was " << the_word + << " and 3*N was " << (3*the_number); + + return output_stream.str(); + } + A serious problem with CString is a design bug in its memory + allocation. Specifically, quoting from that same message: + + + CString suffers from a common programming error that results in + poor performance. Consider the following code: + + CString n_copies_of (const CString& foo, unsigned n) + { + CString tmp; + for (unsigned i = 0; i < n; i++) + tmp += foo; + return tmp; + } + + This function is O(n^2), not O(n). The reason is that each += + causes a reallocation and copy of the existing string. Microsoft + applications are full of this kind of thing (quadratic performance + on tasks that can be done in linear time) -- on the other hand, + we should be thankful, as it's created such a big market for high-end + ix86 hardware. :-) + + If you replace CString with string in the above function, the + performance is O(n). + + Joe Buck also pointed out some other things to keep in mind when + comparing CString and the Standard string class: + + + CString permits access to its internal representation; coders + who exploited that may have problems moving to string. + + Microsoft ships the source to CString (in the files + MFC\SRC\Str{core,ex}.cpp), so you could fix the allocation + bug and rebuild your MFC libraries. + Note: It looks like the CString shipped + with VC++6.0 has fixed this, although it may in fact have been + one of the VC++ SPs that did it. + + string operations like this have O(n) complexity + if the implementors do it correctly. The libstdc++ + implementors did it correctly. Other vendors might not. + + While chapters of the SGI STL are used in libstdc++, their + string class is not. The SGI string is essentially + vector<char> and does not do any reference + counting like libstdc++'s does. (It is O(n), though.) + So if you're thinking about SGI's string or rope classes, + you're now looking at four possibilities: CString, the + libstdc++ string, the SGI string, and the SGI rope, and this + is all before any allocator or traits customizations! (More + choices than you can shake a stick at -- want fries with that?) + + + +
+
+ + + +
-- cgit v1.2.3