5e623d0be4
2009-04-15 Benjamin Kosnik <bkoz@redhat.com> * doc/xml/manual/status_cxx1998.xml: Update to new table style. * doc/xml/gnu/gpl-3.0.xml: Add or adjust dbhtml markup. * doc/xml/gnu/fdl-1.2.xml: Same. * doc/xml/manual/numerics.xml: Same. * doc/xml/manual/concurrency.xml: Same. * doc/xml/manual/intro.xml: Same. * doc/xml/manual/status_cxxtr1.xml: Same. * doc/xml/manual/containers.xml: Same. * doc/xml/manual/io.xml: Same. * doc/xml/manual/utilities.xml: Same. * doc/xml/manual/support.xml: Same. * doc/xml/manual/using.xml: Same. * doc/xml/manual/localization.xml: Same. * doc/xml/manual/locale.xml: Same. * doc/xml/manual/extensions.xml: Same. * doc/xml/manual/appendix_contributing.xml: Same. * doc/xml/manual/diagnostics.xml: Same. * doc/xml/manual/status_cxx200x.xml: Same. From-SVN: r146139
451 lines
18 KiB
XML
451 lines
18 KiB
XML
<?xml version='1.0'?>
|
|
<!DOCTYPE part PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
|
|
[ ]>
|
|
|
|
<part id="manual.containers" xreflabel="Containers">
|
|
<?dbhtml filename="containers.html"?>
|
|
|
|
<partinfo>
|
|
<keywordset>
|
|
<keyword>
|
|
ISO C++
|
|
</keyword>
|
|
<keyword>
|
|
library
|
|
</keyword>
|
|
</keywordset>
|
|
</partinfo>
|
|
|
|
<title>
|
|
Containers
|
|
<indexterm><primary>Containers</primary></indexterm>
|
|
</title>
|
|
|
|
<!-- Chapter 01 : Sequences -->
|
|
<chapter id="manual.containers.sequences" xreflabel="Sequences">
|
|
<?dbhtml filename="sequences.html"?>
|
|
<title>Sequences</title>
|
|
|
|
<sect1 id="containers.sequences.list" xreflabel="list">
|
|
<?dbhtml filename="list.html"?>
|
|
<title>list</title>
|
|
<sect2 id="sequences.list.size" xreflabel="list::size() is O(n)">
|
|
<title>list::size() is O(n)</title>
|
|
<para>
|
|
Yes it is, and that's okay. This is a decision that we preserved
|
|
when we imported SGI's STL implementation. The following is
|
|
quoted from <ulink
|
|
url="http://www.sgi.com/tech/stl/FAQ.html">their FAQ</ulink>:
|
|
</para>
|
|
<blockquote>
|
|
<para>
|
|
The size() member function, for list and slist, takes time
|
|
proportional to the number of elements in the list. This was a
|
|
deliberate tradeoff. The only way to get a constant-time
|
|
size() for linked lists would be to maintain an extra member
|
|
variable containing the list's size. This would require taking
|
|
extra time to update that variable (it would make splice() a
|
|
linear time operation, for example), and it would also make the
|
|
list larger. Many list algorithms don't require that extra
|
|
word (algorithms that do require it might do better with
|
|
vectors than with lists), and, when it is necessary to maintain
|
|
an explicit size count, it's something that users can do
|
|
themselves.
|
|
</para>
|
|
<para>
|
|
This choice is permitted by the C++ standard. The standard says
|
|
that size() <quote>should</quote> be constant time, and
|
|
<quote>should</quote> does not mean the same thing as
|
|
<quote>shall</quote>. This is the officially recommended ISO
|
|
wording for saying that an implementation is supposed to do
|
|
something unless there is a good reason not to.
|
|
</para>
|
|
<para>
|
|
One implication of linear time size(): you should never write
|
|
</para>
|
|
<programlisting>
|
|
if (L.size() == 0)
|
|
...
|
|
</programlisting>
|
|
|
|
<para>
|
|
Instead, you should write
|
|
</para>
|
|
|
|
<programlisting>
|
|
if (L.empty())
|
|
...
|
|
</programlisting>
|
|
</blockquote>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="containers.sequences.vector" xreflabel="vector">
|
|
<?dbhtml filename="vector.html"?>
|
|
<title>vector</title>
|
|
<para>
|
|
</para>
|
|
<sect2 id="sequences.vector.management" xreflabel="Space Overhead Management">
|
|
<title>Space Overhead Management</title>
|
|
<para>
|
|
In <ulink
|
|
url="http://gcc.gnu.org/ml/libstdc++/2002-04/msg00105.html">this
|
|
message to the list</ulink>, Daniel Kostecky announced work on an
|
|
alternate form of <code>std::vector</code> that would support
|
|
hints on the number of elements to be over-allocated. The design
|
|
was also described, along with possible implementation choices.
|
|
</para>
|
|
<para>
|
|
The first two alpha releases were announced <ulink
|
|
url="http://gcc.gnu.org/ml/libstdc++/2002-07/msg00048.html">here</ulink>
|
|
and <ulink
|
|
url="http://gcc.gnu.org/ml/libstdc++/2002-07/msg00111.html">here</ulink>.
|
|
The releases themselves are available at
|
|
<ulink url="http://www.kotelna.sk/dk/sw/caphint/">
|
|
http://www.kotelna.sk/dk/sw/caphint/</ulink>.
|
|
</para>
|
|
|
|
</sect2></sect1>
|
|
</chapter>
|
|
|
|
<!-- Chapter 02 : Associative -->
|
|
<chapter id="manual.containers.associative" xreflabel="Associative">
|
|
<?dbhtml filename="associative.html"?>
|
|
<title>Associative</title>
|
|
|
|
<sect1 id="containers.associative.insert_hints" xreflabel="Insertion Hints">
|
|
<title>Insertion Hints</title>
|
|
<para>
|
|
Section [23.1.2], Table 69, of the C++ standard lists this
|
|
function for all of the associative containers (map, set, etc):
|
|
</para>
|
|
<programlisting>
|
|
a.insert(p,t);
|
|
</programlisting>
|
|
<para>
|
|
where 'p' is an iterator into the container 'a', and 't' is the
|
|
item to insert. The standard says that <quote><code>t</code> is
|
|
inserted as close as possible to the position just prior to
|
|
<code>p</code>.</quote> (Library DR #233 addresses this topic,
|
|
referring to <ulink
|
|
url='http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1780.html'>N1780</ulink>.
|
|
Since version 4.2 GCC implements the resolution to DR 233, so
|
|
that insertions happen as close as possible to the hint. For
|
|
earlier releases the hint was only used as described below.
|
|
</para>
|
|
<para>
|
|
Here we'll describe how the hinting works in the libstdc++
|
|
implementation, and what you need to do in order to take
|
|
advantage of it. (Insertions can change from logarithmic
|
|
complexity to amortized constant time, if the hint is properly
|
|
used.) Also, since the current implementation is based on the
|
|
SGI STL one, these points may hold true for other library
|
|
implementations also, since the HP/SGI code is used in a lot of
|
|
places.
|
|
</para>
|
|
<para>
|
|
In the following text, the phrases <emphasis>greater
|
|
than</emphasis> and <emphasis>less than</emphasis> refer to the
|
|
results of the strict weak ordering imposed on the container by
|
|
its comparison object, which defaults to (basically)
|
|
<quote><</quote>. Using those phrases is semantically sloppy,
|
|
but I didn't want to get bogged down in syntax. I assume that if
|
|
you are intelligent enough to use your own comparison objects,
|
|
you are also intelligent enough to assign <quote>greater</quote>
|
|
and <quote>lesser</quote> their new meanings in the next
|
|
paragraph. *grin*
|
|
</para>
|
|
<para>
|
|
If the <code>hint</code> parameter ('p' above) is equivalent to:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<code>begin()</code>, then the item being inserted should
|
|
have a key less than all the other keys in the container.
|
|
The item will be inserted at the beginning of the container,
|
|
becoming the new entry at <code>begin()</code>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<code>end()</code>, then the item being inserted should have
|
|
a key greater than all the other keys in the container. The
|
|
item will be inserted at the end of the container, becoming
|
|
the new entry at <code>end()</code>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
neither <code>begin()</code> nor <code>end()</code>, then:
|
|
Let <code>h</code> be the entry in the container pointed to
|
|
by <code>hint</code>, that is, <code>h = *hint</code>. Then
|
|
the item being inserted should have a key less than that of
|
|
<code>h</code>, and greater than that of the item preceding
|
|
<code>h</code>. The new item will be inserted between
|
|
<code>h</code> and <code>h</code>'s predecessor.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
For <code>multimap</code> and <code>multiset</code>, the
|
|
restrictions are slightly looser: <quote>greater than</quote>
|
|
should be replaced by <quote>not less than</quote>and <quote>less
|
|
than</quote> should be replaced by <quote>not greater
|
|
than.</quote> (Why not replace greater with
|
|
greater-than-or-equal-to? You probably could in your head, but
|
|
the mathematicians will tell you that it isn't the same thing.)
|
|
</para>
|
|
<para>
|
|
If the conditions are not met, then the hint is not used, and the
|
|
insertion proceeds as if you had called <code> a.insert(t)
|
|
</code> instead. (<emphasis>Note </emphasis> that GCC releases
|
|
prior to 3.0.2 had a bug in the case with <code>hint ==
|
|
begin()</code> for the <code>map</code> and <code>set</code>
|
|
classes. You should not use a hint argument in those releases.)
|
|
</para>
|
|
<para>
|
|
This behavior goes well with other containers'
|
|
<code>insert()</code> functions which take an iterator: if used,
|
|
the new item will be inserted before the iterator passed as an
|
|
argument, same as the other containers.
|
|
</para>
|
|
<para>
|
|
<emphasis>Note </emphasis> also that the hint in this
|
|
implementation is a one-shot. The older insertion-with-hint
|
|
routines check the immediately surrounding entries to ensure that
|
|
the new item would in fact belong there. If the hint does not
|
|
point to the correct place, then no further local searching is
|
|
done; the search begins from scratch in logarithmic time.
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="containers.associative.bitset" xreflabel="bitset">
|
|
<?dbhtml filename="bitset.html"?>
|
|
<title>bitset</title>
|
|
<sect2 id="associative.bitset.size_variable" xreflabel="Variable">
|
|
<title>Size Variable</title>
|
|
<para>
|
|
No, you cannot write code of the form
|
|
</para>
|
|
<!-- Careful, the leading spaces in PRE show up directly. -->
|
|
<programlisting>
|
|
#include <bitset>
|
|
|
|
void foo (size_t n)
|
|
{
|
|
std::bitset<n> bits;
|
|
....
|
|
}
|
|
</programlisting>
|
|
<para>
|
|
because <code>n</code> must be known at compile time. Your
|
|
compiler is correct; it is not a bug. That's the way templates
|
|
work. (Yes, it <emphasis>is</emphasis> a feature.)
|
|
</para>
|
|
<para>
|
|
There are a couple of ways to handle this kind of thing. Please
|
|
consider all of them before passing judgement. They include, in
|
|
no particular order:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem><para>A very large N in <code>bitset<N></code>.</para></listitem>
|
|
<listitem><para>A container<bool>.</para></listitem>
|
|
<listitem><para>Extremely weird solutions.</para></listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
<emphasis>A very large N in
|
|
<code>bitset<N></code>. </emphasis> It has been
|
|
pointed out a few times in newsgroups that N bits only takes up
|
|
(N/8) bytes on most systems, and division by a factor of eight is
|
|
pretty impressive when speaking of memory. Half a megabyte given
|
|
over to a bitset (recall that there is zero space overhead for
|
|
housekeeping info; it is known at compile time exactly how large
|
|
the set is) will hold over four million bits. If you're using
|
|
those bits as status flags (e.g.,
|
|
<quote>changed</quote>/<quote>unchanged</quote> flags), that's a
|
|
<emphasis>lot</emphasis> of state.
|
|
</para>
|
|
<para>
|
|
You can then keep track of the <quote>maximum bit used</quote>
|
|
during some testing runs on representative data, make note of how
|
|
many of those bits really need to be there, and then reduce N to
|
|
a smaller number. Leave some extra space, of course. (If you
|
|
plan to write code like the incorrect example above, where the
|
|
bitset is a local variable, then you may have to talk your
|
|
compiler into allowing that much stack space; there may be zero
|
|
space overhead, but it's all allocated inside the object.)
|
|
</para>
|
|
<para>
|
|
<emphasis>A container<bool>. </emphasis> The
|
|
Committee made provision for the space savings possible with that
|
|
(N/8) usage previously mentioned, so that you don't have to do
|
|
wasteful things like <code>Container<char></code> or
|
|
<code>Container<short int></code>. Specifically,
|
|
<code>vector<bool></code> is required to be specialized for
|
|
that space savings.
|
|
</para>
|
|
<para>
|
|
The problem is that <code>vector<bool></code> doesn't
|
|
behave like a normal vector anymore. There have been recent
|
|
journal articles which discuss the problems (the ones by Herb
|
|
Sutter in the May and July/August 1999 issues of C++ Report cover
|
|
it well). Future revisions of the ISO C++ Standard will change
|
|
the requirement for <code>vector<bool></code>
|
|
specialization. In the meantime, <code>deque<bool></code>
|
|
is recommended (although its behavior is sane, you probably will
|
|
not get the space savings, but the allocation scheme is different
|
|
than that of vector).
|
|
</para>
|
|
<para>
|
|
<emphasis>Extremely weird solutions. </emphasis> If
|
|
you have access to the compiler and linker at runtime, you can do
|
|
something insane, like figuring out just how many bits you need,
|
|
then writing a temporary source code file. That file contains an
|
|
instantiation of <code>bitset</code> for the required number of
|
|
bits, inside some wrapper functions with unchanging signatures.
|
|
Have your program then call the compiler on that file using
|
|
Position Independent Code, then open the newly-created object
|
|
file and load those wrapper functions. You'll have an
|
|
instantiation of <code>bitset<N></code> for the exact
|
|
<code>N</code> that you need at the time. Don't forget to delete
|
|
the temporary files. (Yes, this <emphasis>can</emphasis> be, and
|
|
<emphasis>has been</emphasis>, done.)
|
|
</para>
|
|
<!-- I wonder if this next paragraph will get me in trouble... -->
|
|
<para>
|
|
This would be the approach of either a visionary genius or a
|
|
raving lunatic, depending on your programming and management
|
|
style. Probably the latter.
|
|
</para>
|
|
<para>
|
|
Which of the above techniques you use, if any, are up to you and
|
|
your intended application. Some time/space profiling is
|
|
indicated if it really matters (don't just guess). And, if you
|
|
manage to do anything along the lines of the third category, the
|
|
author would love to hear from you...
|
|
</para>
|
|
<para>
|
|
Also note that the implementation of bitset used in libstdc++ has
|
|
<ulink url="../ext/sgiexts.html#ch23">some extensions</ulink>.
|
|
</para>
|
|
|
|
</sect2>
|
|
<sect2 id="associative.bitset.type_string" xreflabel="Type String">
|
|
<title>Type String</title>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
Bitmasks do not take char* nor const char* arguments in their
|
|
constructors. This is something of an accident, but you can read
|
|
about the problem: follow the library's <quote>Links</quote> from
|
|
the homepage, and from the C++ information <quote>defect
|
|
reflector</quote> link, select the library issues list. Issue
|
|
number 116 describes the problem.
|
|
</para>
|
|
<para>
|
|
For now you can simply make a temporary string object using the
|
|
constructor expression:
|
|
</para>
|
|
<programlisting>
|
|
std::bitset<5> b ( std::string(<quote>10110</quote>) );
|
|
</programlisting>
|
|
|
|
<para>
|
|
instead of
|
|
</para>
|
|
|
|
<programlisting>
|
|
std::bitset<5> b ( <quote>10110</quote> ); // invalid
|
|
</programlisting>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
<!-- Chapter 03 : Interacting with C -->
|
|
<chapter id="manual.containers.c" xreflabel="Interacting with C">
|
|
<?dbhtml filename="containers_and_c.html"?>
|
|
<title>Interacting with C</title>
|
|
|
|
<sect1 id="containers.c.vs_array" xreflabel="Containers vs. Arrays">
|
|
<title>Containers vs. Arrays</title>
|
|
<para>
|
|
You're writing some code and can't decide whether to use builtin
|
|
arrays or some kind of container. There are compelling reasons
|
|
to use one of the container classes, but you're afraid that
|
|
you'll eventually run into difficulties, change everything back
|
|
to arrays, and then have to change all the code that uses those
|
|
data types to keep up with the change.
|
|
</para>
|
|
<para>
|
|
If your code makes use of the standard algorithms, this isn't as
|
|
scary as it sounds. The algorithms don't know, nor care, about
|
|
the kind of <quote>container</quote> on which they work, since
|
|
the algorithms are only given endpoints to work with. For the
|
|
container classes, these are iterators (usually
|
|
<code>begin()</code> and <code>end()</code>, but not always).
|
|
For builtin arrays, these are the address of the first element
|
|
and the <ulink
|
|
url="../24_iterators/howto.html#2">past-the-end</ulink> element.
|
|
</para>
|
|
<para>
|
|
Some very simple wrapper functions can hide all of that from the
|
|
rest of the code. For example, a pair of functions called
|
|
<code>beginof</code> can be written, one that takes an array,
|
|
another that takes a vector. The first returns a pointer to the
|
|
first element, and the second returns the vector's
|
|
<code>begin()</code> iterator.
|
|
</para>
|
|
<para>
|
|
The functions should be made template functions, and should also
|
|
be declared inline. As pointed out in the comments in the code
|
|
below, this can lead to <code>beginof</code> being optimized out
|
|
of existence, so you pay absolutely nothing in terms of increased
|
|
code size or execution time.
|
|
</para>
|
|
<para>
|
|
The result is that if all your algorithm calls look like
|
|
</para>
|
|
<programlisting>
|
|
std::transform(beginof(foo), endof(foo), beginof(foo), SomeFunction);
|
|
</programlisting>
|
|
<para>
|
|
then the type of foo can change from an array of ints to a vector
|
|
of ints to a deque of ints and back again, without ever changing
|
|
any client code.
|
|
</para>
|
|
<para>
|
|
This author has a collection of such functions, called
|
|
<quote>*of</quote> because they all extend the builtin
|
|
<quote>sizeof</quote>. It started with some Usenet discussions
|
|
on a transparent way to find the length of an array. A
|
|
simplified and much-reduced version for easier reading is <ulink
|
|
url="wrappers_h.txt">given here</ulink>.
|
|
</para>
|
|
<para>
|
|
Astute readers will notice two things at once: first, that the
|
|
container class is still a <code>vector<T></code> instead
|
|
of a more general <code>Container<T></code>. This would
|
|
mean that three functions for <code>deque</code> would have to be
|
|
added, another three for <code>list</code>, and so on. This is
|
|
due to problems with getting template resolution correct; I find
|
|
it easier just to give the extra three lines and avoid confusion.
|
|
</para>
|
|
<para>
|
|
Second, the line
|
|
</para>
|
|
<programlisting>
|
|
inline unsigned int lengthof (T (&)[sz]) { return sz; }
|
|
</programlisting>
|
|
<para>
|
|
looks just weird! Hint: unused parameters can be left nameless.
|
|
</para>
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
</part>
|