diff options
Diffstat (limited to 'libstdc++-v3/doc/xml/manual/containers.xml')
-rw-r--r-- | libstdc++-v3/doc/xml/manual/containers.xml | 462 |
1 files changed, 462 insertions, 0 deletions
diff --git a/libstdc++-v3/doc/xml/manual/containers.xml b/libstdc++-v3/doc/xml/manual/containers.xml new file mode 100644 index 000000000..377b1a2ee --- /dev/null +++ b/libstdc++-v3/doc/xml/manual/containers.xml @@ -0,0 +1,462 @@ +<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" + xml:id="std.containers" xreflabel="Containers"> +<?dbhtml filename="containers.html"?> + +<info><title> + Containers + <indexterm><primary>Containers</primary></indexterm> +</title> + <keywordset> + <keyword> + ISO C++ + </keyword> + <keyword> + library + </keyword> + </keywordset> +</info> + + + +<!-- Sect1 01 : Sequences --> +<section xml:id="std.containers.sequences" xreflabel="Sequences"><info><title>Sequences</title></info> +<?dbhtml filename="sequences.html"?> + + +<section xml:id="containers.sequences.list" xreflabel="list"><info><title>list</title></info> +<?dbhtml filename="list.html"?> + + <section xml:id="sequences.list.size" xreflabel="list::size() is O(n)"><info><title>list::size() is O(n)</title></info> + + <para> + Yes it is, and that's okay. This is a decision that we preserved + when we imported SGI's STL implementation. The following is + quoted from <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.sgi.com/tech/stl/FAQ.html">their FAQ</link>: + </para> + <blockquote> + <para> + The size() member function, for list and slist, takes time + proportional to the number of elements in the list. This was a + deliberate tradeoff. The only way to get a constant-time + size() for linked lists would be to maintain an extra member + variable containing the list's size. This would require taking + extra time to update that variable (it would make splice() a + linear time operation, for example), and it would also make the + list larger. Many list algorithms don't require that extra + word (algorithms that do require it might do better with + vectors than with lists), and, when it is necessary to maintain + an explicit size count, it's something that users can do + themselves. + </para> + <para> + This choice is permitted by the C++ standard. The standard says + that size() <quote>should</quote> be constant time, and + <quote>should</quote> does not mean the same thing as + <quote>shall</quote>. This is the officially recommended ISO + wording for saying that an implementation is supposed to do + something unless there is a good reason not to. + </para> + <para> + One implication of linear time size(): you should never write + </para> + <programlisting> + if (L.size() == 0) + ... + </programlisting> + + <para> + Instead, you should write + </para> + + <programlisting> + if (L.empty()) + ... + </programlisting> + </blockquote> + </section> +</section> + +<section xml:id="containers.sequences.vector" xreflabel="vector"><info><title>vector</title></info> +<?dbhtml filename="vector.html"?> + + <para> + </para> + <section xml:id="sequences.vector.management" xreflabel="Space Overhead Management"><info><title>Space Overhead Management</title></info> + + <para> + In <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/ml/libstdc++/2002-04/msg00105.html">this + message to the list</link>, Daniel Kostecky announced work on an + alternate form of <code>std::vector</code> that would support + hints on the number of elements to be over-allocated. The design + was also described, along with possible implementation choices. + </para> + <para> + The first two alpha releases were announced <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/ml/libstdc++/2002-07/msg00048.html">here</link> + and <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/ml/libstdc++/2002-07/msg00111.html">here</link>. + </para> + + </section></section> +</section> + +<!-- Sect1 02 : Associative --> +<section xml:id="std.containers.associative" xreflabel="Associative"><info><title>Associative</title></info> +<?dbhtml filename="associative.html"?> + + + <section xml:id="containers.associative.insert_hints" xreflabel="Insertion Hints"><info><title>Insertion Hints</title></info> + + <para> + Section [23.1.2], Table 69, of the C++ standard lists this + function for all of the associative containers (map, set, etc): + </para> + <programlisting> + a.insert(p,t); + </programlisting> + <para> + where 'p' is an iterator into the container 'a', and 't' is the + item to insert. The standard says that <quote><code>t</code> is + inserted as close as possible to the position just prior to + <code>p</code>.</quote> (Library DR #233 addresses this topic, + referring to <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1780.html">N1780</link>. + Since version 4.2 GCC implements the resolution to DR 233, so + that insertions happen as close as possible to the hint. For + earlier releases the hint was only used as described below. + </para> + <para> + Here we'll describe how the hinting works in the libstdc++ + implementation, and what you need to do in order to take + advantage of it. (Insertions can change from logarithmic + complexity to amortized constant time, if the hint is properly + used.) Also, since the current implementation is based on the + SGI STL one, these points may hold true for other library + implementations also, since the HP/SGI code is used in a lot of + places. + </para> + <para> + In the following text, the phrases <emphasis>greater + than</emphasis> and <emphasis>less than</emphasis> refer to the + results of the strict weak ordering imposed on the container by + its comparison object, which defaults to (basically) + <quote><</quote>. Using those phrases is semantically sloppy, + but I didn't want to get bogged down in syntax. I assume that if + you are intelligent enough to use your own comparison objects, + you are also intelligent enough to assign <quote>greater</quote> + and <quote>lesser</quote> their new meanings in the next + paragraph. *grin* + </para> + <para> + If the <code>hint</code> parameter ('p' above) is equivalent to: + </para> + <itemizedlist> + <listitem> + <para> + <code>begin()</code>, then the item being inserted should + have a key less than all the other keys in the container. + The item will be inserted at the beginning of the container, + becoming the new entry at <code>begin()</code>. + </para> + </listitem> + <listitem> + <para> + <code>end()</code>, then the item being inserted should have + a key greater than all the other keys in the container. The + item will be inserted at the end of the container, becoming + the new entry before <code>end()</code>. + </para> + </listitem> + <listitem> + <para> + neither <code>begin()</code> nor <code>end()</code>, then: + Let <code>h</code> be the entry in the container pointed to + by <code>hint</code>, that is, <code>h = *hint</code>. Then + the item being inserted should have a key less than that of + <code>h</code>, and greater than that of the item preceding + <code>h</code>. The new item will be inserted between + <code>h</code> and <code>h</code>'s predecessor. + </para> + </listitem> + </itemizedlist> + <para> + For <code>multimap</code> and <code>multiset</code>, the + restrictions are slightly looser: <quote>greater than</quote> + should be replaced by <quote>not less than</quote>and <quote>less + than</quote> should be replaced by <quote>not greater + than.</quote> (Why not replace greater with + greater-than-or-equal-to? You probably could in your head, but + the mathematicians will tell you that it isn't the same thing.) + </para> + <para> + If the conditions are not met, then the hint is not used, and the + insertion proceeds as if you had called <code> a.insert(t) + </code> instead. (<emphasis>Note </emphasis> that GCC releases + prior to 3.0.2 had a bug in the case with <code>hint == + begin()</code> for the <code>map</code> and <code>set</code> + classes. You should not use a hint argument in those releases.) + </para> + <para> + This behavior goes well with other containers' + <code>insert()</code> functions which take an iterator: if used, + the new item will be inserted before the iterator passed as an + argument, same as the other containers. + </para> + <para> + <emphasis>Note </emphasis> also that the hint in this + implementation is a one-shot. The older insertion-with-hint + routines check the immediately surrounding entries to ensure that + the new item would in fact belong there. If the hint does not + point to the correct place, then no further local searching is + done; the search begins from scratch in logarithmic time. + </para> + </section> + + + <section xml:id="containers.associative.bitset" xreflabel="bitset"><info><title>bitset</title></info> + <?dbhtml filename="bitset.html"?> + + <section xml:id="associative.bitset.size_variable" xreflabel="Variable"><info><title>Size Variable</title></info> + + <para> + No, you cannot write code of the form + </para> + <!-- Careful, the leading spaces in PRE show up directly. --> + <programlisting> + #include <bitset> + + void foo (size_t n) + { + std::bitset<n> bits; + .... + } + </programlisting> + <para> + because <code>n</code> must be known at compile time. Your + compiler is correct; it is not a bug. That's the way templates + work. (Yes, it <emphasis>is</emphasis> a feature.) + </para> + <para> + There are a couple of ways to handle this kind of thing. Please + consider all of them before passing judgement. They include, in + no chaptericular order: + </para> + <itemizedlist> + <listitem><para>A very large N in <code>bitset<N></code>.</para></listitem> + <listitem><para>A container<bool>.</para></listitem> + <listitem><para>Extremely weird solutions.</para></listitem> + </itemizedlist> + <para> + <emphasis>A very large N in + <code>bitset<N></code>. </emphasis> It has been + pointed out a few times in newsgroups that N bits only takes up + (N/8) bytes on most systems, and division by a factor of eight is + pretty impressive when speaking of memory. Half a megabyte given + over to a bitset (recall that there is zero space overhead for + housekeeping info; it is known at compile time exactly how large + the set is) will hold over four million bits. If you're using + those bits as status flags (e.g., + <quote>changed</quote>/<quote>unchanged</quote> flags), that's a + <emphasis>lot</emphasis> of state. + </para> + <para> + You can then keep track of the <quote>maximum bit used</quote> + during some testing runs on representative data, make note of how + many of those bits really need to be there, and then reduce N to + a smaller number. Leave some extra space, of course. (If you + plan to write code like the incorrect example above, where the + bitset is a local variable, then you may have to talk your + compiler into allowing that much stack space; there may be zero + space overhead, but it's all allocated inside the object.) + </para> + <para> + <emphasis>A container<bool>. </emphasis> The + Committee made provision for the space savings possible with that + (N/8) usage previously mentioned, so that you don't have to do + wasteful things like <code>Container<char></code> or + <code>Container<short int></code>. Specifically, + <code>vector<bool></code> is required to be specialized for + that space savings. + </para> + <para> + The problem is that <code>vector<bool></code> doesn't + behave like a normal vector anymore. There have been + journal articles which discuss the problems (the ones by Herb + Sutter in the May and July/August 1999 issues of C++ Report cover + it well). Future revisions of the ISO C++ Standard will change + the requirement for <code>vector<bool></code> + specialization. In the meantime, <code>deque<bool></code> + is recommended (although its behavior is sane, you probably will + not get the space savings, but the allocation scheme is different + than that of vector). + </para> + <para> + <emphasis>Extremely weird solutions. </emphasis> If + you have access to the compiler and linker at runtime, you can do + something insane, like figuring out just how many bits you need, + then writing a temporary source code file. That file contains an + instantiation of <code>bitset</code> for the required number of + bits, inside some wrapper functions with unchanging signatures. + Have your program then call the compiler on that file using + Position Independent Code, then open the newly-created object + file and load those wrapper functions. You'll have an + instantiation of <code>bitset<N></code> for the exact + <code>N</code> that you need at the time. Don't forget to delete + the temporary files. (Yes, this <emphasis>can</emphasis> be, and + <emphasis>has been</emphasis>, done.) + </para> + <!-- I wonder if this next paragraph will get me in trouble... --> + <para> + This would be the approach of either a visionary genius or a + raving lunatic, depending on your programming and management + style. Probably the latter. + </para> + <para> + Which of the above techniques you use, if any, are up to you and + your intended application. Some time/space profiling is + indicated if it really matters (don't just guess). And, if you + manage to do anything along the lines of the third category, the + author would love to hear from you... + </para> + <para> + Also note that the implementation of bitset used in libstdc++ has + <link linkend="manual.ext.containers.sgi">some extensions</link>. + </para> + + </section> + <section xml:id="associative.bitset.type_string" xreflabel="Type String"><info><title>Type String</title></info> + + <para> + </para> + <para> + Bitmasks do not take char* nor const char* arguments in their + constructors. This is something of an accident, but you can read + about the problem: follow the library's <quote>Links</quote> from + the homepage, and from the C++ information <quote>defect + reflector</quote> link, select the library issues list. Issue + number 116 describes the problem. + </para> + <para> + For now you can simply make a temporary string object using the + constructor expression: + </para> + <programlisting> + std::bitset<5> b ( std::string(<quote>10110</quote>) ); + </programlisting> + + <para> + instead of + </para> + + <programlisting> + std::bitset<5> b ( <quote>10110</quote> ); // invalid + </programlisting> + </section> + </section> + +</section> + +<!-- Sect1 03 : Interacting with C --> +<section xml:id="std.containers.c" xreflabel="Interacting with C"><info><title>Interacting with C</title></info> +<?dbhtml filename="containers_and_c.html"?> + + + <section xml:id="containers.c.vs_array" xreflabel="Containers vs. Arrays"><info><title>Containers vs. Arrays</title></info> + + <para> + You're writing some code and can't decide whether to use builtin + arrays or some kind of container. There are compelling reasons + to use one of the container classes, but you're afraid that + you'll eventually run into difficulties, change everything back + to arrays, and then have to change all the code that uses those + data types to keep up with the change. + </para> + <para> + If your code makes use of the standard algorithms, this isn't as + scary as it sounds. The algorithms don't know, nor care, about + the kind of <quote>container</quote> on which they work, since + the algorithms are only given endpoints to work with. For the + container classes, these are iterators (usually + <code>begin()</code> and <code>end()</code>, but not always). + For builtin arrays, these are the address of the first element + and the <link linkend="iterators.predefined.end">past-the-end</link> element. + </para> + <para> + Some very simple wrapper functions can hide all of that from the + rest of the code. For example, a pair of functions called + <code>beginof</code> can be written, one that takes an array, + another that takes a vector. The first returns a pointer to the + first element, and the second returns the vector's + <code>begin()</code> iterator. + </para> + <para> + The functions should be made template functions, and should also + be declared inline. As pointed out in the comments in the code + below, this can lead to <code>beginof</code> being optimized out + of existence, so you pay absolutely nothing in terms of increased + code size or execution time. + </para> + <para> + The result is that if all your algorithm calls look like + </para> + <programlisting> + std::transform(beginof(foo), endof(foo), beginof(foo), SomeFunction); + </programlisting> + <para> + then the type of foo can change from an array of ints to a vector + of ints to a deque of ints and back again, without ever changing + any client code. + </para> + +<programlisting> +// beginof +template<typename T> + inline typename vector<T>::iterator + beginof(vector<T> &v) + { return v.begin(); } + +template<typename T, unsigned int sz> + inline T* + beginof(T (&array)[sz]) { return array; } + +// endof +template<typename T> + inline typename vector<T>::iterator + endof(vector<T> &v) + { return v.end(); } + +template<typename T, unsigned int sz> + inline T* + endof(T (&array)[sz]) { return array + sz; } + +// lengthof +template<typename T> + inline typename vector<T>::size_type + lengthof(vector<T> &v) + { return v.size(); } + +template<typename T, unsigned int sz> + inline unsigned int + lengthof(T (&)[sz]) { return sz; } +</programlisting> + + <para> + Astute readers will notice two things at once: first, that the + container class is still a <code>vector<T></code> instead + of a more general <code>Container<T></code>. This would + mean that three functions for <code>deque</code> would have to be + added, another three for <code>list</code>, and so on. This is + due to problems with getting template resolution correct; I find + it easier just to give the extra three lines and avoid confusion. + </para> + <para> + Second, the line + </para> + <programlisting> + inline unsigned int lengthof (T (&)[sz]) { return sz; } + </programlisting> + <para> + looks just weird! Hint: unused parameters can be left nameless. + </para> + </section> + +</section> + +</chapter> |