2006-06-15 01:09:51 +02:00
|
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
|
|
|
|
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|
|
|
<head>
|
|
|
|
<meta name="generator" content=
|
|
|
|
"HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" />
|
|
|
|
|
|
|
|
<title>Motivation</title>
|
|
|
|
<meta http-equiv="Content-Type" content=
|
|
|
|
"text/html; charset=us-ascii" />
|
|
|
|
</head>
|
|
|
|
|
|
|
|
<body>
|
|
|
|
<div id="page">
|
|
|
|
<h1>Motivation</h1>
|
|
|
|
|
|
|
|
<p>Many fine associative-container libraries were already
|
|
|
|
written, most notably, the STL's associative containers. Why
|
|
|
|
then write another library? This section shows some possible
|
|
|
|
advantages of this library, when considering the challenges in
|
|
|
|
<a href="introduction.html">Introduction</a>. Many of these
|
|
|
|
points stem from the fact that the STL introduced
|
|
|
|
associative-containers in a two-step process (first
|
|
|
|
standardizing tree-based containers, only then adding
|
|
|
|
hash-based containers, which are fundamentally different), did
|
|
|
|
not standardize priority queues as containers, and (in our
|
|
|
|
opinion) overloads the iterator concept.</p>
|
|
|
|
|
|
|
|
<h2><a name="assoc" id="assoc">Associative Containers</a></h2>
|
|
|
|
|
|
|
|
<h3><a name="assoc_policies" id="assoc_policies">More
|
|
|
|
Configuration Choices</a></h3>
|
|
|
|
|
|
|
|
<p>Associative containers require a relatively large number of
|
|
|
|
policies to function efficiently in various settings. In some
|
|
|
|
cases this is needed for making their common operations more
|
|
|
|
efficient, and in other cases this allows them to support a
|
|
|
|
larger set of operations</p>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>Hash-based containers, for example, support look-up and
|
|
|
|
insertion methods (<i>e.g.</i>, <tt>find</tt> and
|
|
|
|
<tt>insert</tt>). In order to locate elements quickly, they
|
|
|
|
are supplied a hash functor, which instruct how to transform
|
|
|
|
a key object into some size type; <i>e.g.</i>, a hash functor
|
|
|
|
might transform <tt>"hello"</tt> into <tt>1123002298</tt>. A
|
|
|
|
hash table, though, requires transforming each key object
|
|
|
|
into some size-type type in some specific domain;
|
|
|
|
<i>e.g.</i>, a hash table with a 128-long table might
|
|
|
|
transform <tt>"hello"</tt> into position 63. The policy by
|
|
|
|
which the hash value is transformed into a position within
|
|
|
|
the table can dramatically affect performance (see <a href=
|
|
|
|
"hash_based_containers.html#hash_policies">Design::Associative
|
|
|
|
Containers::Hash-Based Containers::Hash Policies</a>).
|
|
|
|
Hash-based containers also do not resize naturally (as
|
|
|
|
opposed to tree-based containers, for example). The
|
|
|
|
appropriate resize policy is unfortunately intertwined with
|
|
|
|
the policy that transforms hash value into a position within
|
|
|
|
the table (see <a href=
|
|
|
|
"hash_based_containers.html#resize_policies">Design::Associative
|
|
|
|
Containers::Hash-Based Containers::Resize Policies</a>).
|
|
|
|
|
|
|
|
<p><a href=
|
|
|
|
"assoc_performance_tests.html#hash_based">Associative-Container
|
|
|
|
Performance Tests::Hash-Based Containers</a> quantifies
|
|
|
|
some of these points.</p>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>Tree-based containers, for example, also support look-up
|
|
|
|
and insertion methods, and are primarily useful when
|
|
|
|
maintaining order between elements is important. In some
|
|
|
|
cases, though, one can utilize their balancing algorithms for
|
|
|
|
completely different purposes.
|
|
|
|
|
|
|
|
<p>Figure <a href="#node_invariants">Metadata for
|
|
|
|
order-statistics and interval intersections</a>-A, for
|
|
|
|
example, shows a tree whose each node contains two entries:
|
|
|
|
a floating-point key, and some size-type <i>metadata</i>
|
|
|
|
(in bold beneath it) that is the number of nodes in the
|
|
|
|
sub-tree. (<i>E.g.</i>, the root has key 0.99, and has 5
|
|
|
|
nodes (including itself) in its sub-tree.) A container based
|
|
|
|
on this data structure can obviously answer efficiently
|
|
|
|
whether 0.3 is in the container object, but it can also
|
|
|
|
answer what is the order of 0.3 among all those in the
|
|
|
|
container object [<a href=
|
|
|
|
"references.html#clrs2001">clrs2001</a>] (see <a href=
|
|
|
|
"assoc_examples.html#tree_like_based">Associative Container
|
|
|
|
Examples::Tree-Like-Based Containers (Trees and
|
|
|
|
Tries)</a>).</p>
|
|
|
|
|
|
|
|
<p>As another example, Figure <a href=
|
|
|
|
"#node_invariants">Metadata for order-statistics and
|
|
|
|
interval intersections</a>-B shows a tree whose each node
|
|
|
|
contains two entries: a half-open geometric line interval,
|
|
|
|
and a number <i>metadata</i> (in bold beneath it) that is
|
|
|
|
the largest endpoint of all intervals in its sub-tree.
|
|
|
|
(<i>E.g.</i>, the root describes the interval <i>[20,
|
|
|
|
36)</i>, and the largest endpoint in its sub-tree is 99.) A
|
|
|
|
container based on this data structure can obviously answer
|
|
|
|
efficiently whether <i>[3, 41)</i> is in the container
|
|
|
|
object, but it can also answer efficiently whether the
|
|
|
|
container object has intervals that intersect <i>[3,
|
|
|
|
41)</i> (see <a href=
|
|
|
|
"assoc_examples.html#tree_like_based">Associative Container
|
|
|
|
Examples::Tree-Like-Based Containers (Trees and
|
|
|
|
Tries)</a>). These types of queries are very useful in
|
|
|
|
geometric algorithms and lease-management algorithms.</p>
|
|
|
|
|
|
|
|
<p>It is important to note, however, that as the trees are
|
|
|
|
modified, their internal structure changes. To maintain
|
|
|
|
these invariants, one must supply some policy that is aware
|
|
|
|
of these changes (see <a href=
|
|
|
|
"tree_based_containers.html#invariants">Design::Associative
|
|
|
|
Containers::Tree-Based Containers::Node Invariants</a>);
|
|
|
|
without this, it would be better to use a linked list (in
|
|
|
|
itself very efficient for these purposes).</p>
|
|
|
|
|
|
|
|
<p><a href=
|
|
|
|
"assoc_performance_tests.html#tree_like_based">Associative-Container
|
|
|
|
Performance Tests::Tree-Like-Based Containers</a>
|
|
|
|
quantifies some of these points.</p>
|
|
|
|
</li>
|
|
|
|
</ol>
|
|
|
|
|
|
|
|
<h6 class="c1"><a name="node_invariants" id=
|
|
|
|
"node_invariants"><img src="node_invariants.png" alt=
|
|
|
|
"no image" /></a></h6>
|
|
|
|
|
|
|
|
<h6 class="c1">Metadata for order-statistics and interval
|
|
|
|
intersections.</h6>
|
|
|
|
|
|
|
|
<h3><a name="assoc_ds_genericity" id="assoc_ds_genericity">More
|
|
|
|
Data Structures and Traits</a></h3>
|
|
|
|
|
|
|
|
<p>The STL contains associative containers based on red-black
|
|
|
|
trees and collision-chaining hash tables. These are obviously
|
|
|
|
very useful, but they are not ideal for all types of
|
|
|
|
settings.</p>
|
|
|
|
|
|
|
|
<p>Figure <a href=
|
|
|
|
"#different_underlying_data_structures">Different underlying
|
|
|
|
data structures</a> shows different underlying data structures
|
|
|
|
(the ones currently supported in <tt>pb_ds</tt>). A shows a
|
|
|
|
collision-chaining hash-table, B shows a probing hash-table, C
|
|
|
|
shows a red-black tree, D shows a splay tree, E shows a tree
|
|
|
|
based on an ordered vector(implicit in the order of the
|
|
|
|
elements), F shows a PATRICIA trie, and G shows a list-based
|
|
|
|
container with update policies.</p>
|
|
|
|
|
|
|
|
<p>Each of these data structures has some performance benefits,
|
|
|
|
in terms of speed, size or both (see <a href=
|
|
|
|
"assoc_performance_tests.html">Associative-Container
|
|
|
|
Performance Tests</a> and <a href=
|
|
|
|
"assoc_performance_tests.html#dss_family_choice">Associative-Container
|
|
|
|
Performance Tests::Observations::Underlying Data-Structure
|
|
|
|
Families</a>). For now, though, note that <i>e.g.</i>,
|
|
|
|
vector-based trees and probing hash tables manipulate memory
|
|
|
|
more efficiently than red-black trees and collision-chaining
|
|
|
|
hash tables, and that list-based associative containers are
|
|
|
|
very useful for constructing "multimaps" (see <a href=
|
|
|
|
"#assoc_mapping_semantics">Alternative to Multiple Equivalent
|
|
|
|
Keys</a>, <a href=
|
|
|
|
"assoc_performance_tests.html#multimaps">Associative Container
|
|
|
|
Performance Tests::Multimaps</a>, and <a href=
|
|
|
|
"assoc_performance_tests.html#msc">Associative Container
|
|
|
|
Performance Tests::Observations::Mapping-Semantics
|
|
|
|
Considerations</a>).</p>
|
|
|
|
|
|
|
|
<h6 class="c1"><a name="different_underlying_data_structures"
|
|
|
|
id="different_underlying_data_structures"><img src=
|
|
|
|
"different_underlying_dss.png" alt="no image" /></a></h6>
|
|
|
|
|
|
|
|
<h6 class="c1">Different underlying data structures.</h6>
|
|
|
|
|
|
|
|
<p>Now consider a function manipulating a generic associative
|
|
|
|
container, <i>e.g.</i>,</p>
|
|
|
|
<pre>
|
|
|
|
<b>template</b><
|
|
|
|
<b>class</b> Cntnr>
|
|
|
|
<b>int</b>
|
|
|
|
some_op_sequence
|
|
|
|
(Cntnr &r_cnt)
|
|
|
|
{
|
|
|
|
...
|
|
|
|
}
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
<p>Ideally, the underlying data structure of <tt>Cntnr</tt>
|
|
|
|
would not affect what can be done with <tt>r_cnt</tt>.
|
|
|
|
Unfortunately, this is not the case.</p>
|
|
|
|
|
|
|
|
<p>For example, if <tt>Cntnr</tt> is <tt>std::map</tt>, then
|
|
|
|
the function can use <tt>std::for_each(r_cnt.find(foo),
|
|
|
|
r_cnt.find(bar), foobar)</tt> in order to apply <tt>foobar</tt>
|
|
|
|
to all elements between <tt>foo</tt> and <tt>bar</tt>. If
|
|
|
|
<tt>Cntnr</tt> is a hash-based container, then this call's
|
|
|
|
results are undefined.</p>
|
|
|
|
|
|
|
|
<p>Also, if <tt>Cntnr</tt> is tree-based, the type and object
|
|
|
|
of the comparison functor can be accessed. If <tt>Cntnr</tt> is
|
|
|
|
hash based, these queries are nonsensical.</p>
|
|
|
|
|
|
|
|
<p>There are various other differences based on the container's
|
|
|
|
underlying data structure. For one, they can be constructed by,
|
|
|
|
and queried for, different policies. Furthermore:</p>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>Containers based on C, D, E and F store elements in a
|
|
|
|
meaningful order; the others store elements in a meaningless
|
|
|
|
(and probably time-varying) order. By implication, only
|
|
|
|
containers based on C, D, E and F can support erase
|
|
|
|
operations taking an iterator and returning an iterator to
|
|
|
|
the following element without performance loss (see <a href=
|
|
|
|
"#assoc_ers_methods">Slightly Different Methods::Methods
|
|
|
|
Related to Erase</a>).</li>
|
|
|
|
|
|
|
|
<li>Containers based on C, D, E, and F can be split and
|
|
|
|
joined efficiently, while the others cannot. Containers based
|
|
|
|
on C and D, furthermore, can guarantee that this is
|
|
|
|
exception-free; containers based on E cannot guarantee
|
|
|
|
this.</li>
|
|
|
|
|
|
|
|
<li>Containers based on all but E can guarantee that erasing
|
|
|
|
an element is exception free; containers based on E cannot
|
|
|
|
guarantee this. Containers based on all but B and E can
|
|
|
|
guarantee that modifying an object of their type does not
|
|
|
|
invalidate iterators or references to their elements, while
|
|
|
|
containers based on B and E cannot. Containers based on C, D,
|
|
|
|
and E can furthermore make a stronger guarantee, namely that
|
|
|
|
modifying an object of their type does not affect the order
|
|
|
|
of iterators.</li>
|
|
|
|
</ol>
|
|
|
|
|
|
|
|
<p>A unified tag and traits system (as used for the STL's
|
|
|
|
iterators, for example) can ease generic manipulation of
|
|
|
|
associative containers based on different underlying
|
|
|
|
data structures (see <a href=
|
|
|
|
"tutorial.html#assoc_ds_gen">Tutorial::Associative
|
|
|
|
Containers::Determining Containers' Attributes</a> and <a href=
|
|
|
|
"ds_gen.html#container_traits">Design::Associative
|
|
|
|
Containers::Data-Structure Genericity::Data-Structure Tags and
|
|
|
|
Traits</a>).</p>
|
|
|
|
|
|
|
|
<h3><a name="assoc_diff_it" id="assoc_diff_it">Differentiating
|
|
|
|
between Iterator Types</a></h3>
|
|
|
|
|
|
|
|
<p>Iterators are centric to the STL's design, because of the
|
|
|
|
container/algorithm/iterator decomposition that allows an
|
|
|
|
algorithm to operate on a range through iterators of some
|
|
|
|
sequence (<i>e.g.</i>, one originating from a container).
|
|
|
|
Iterators, then, are useful because they allow going over a
|
|
|
|
<u>sequence</u>. The STL also uses iterators for accessing a
|
|
|
|
<u>specific</u> element - <i>e.g.</i>, when an associative
|
|
|
|
container returns one through <tt>find</tt>. The STL, however,
|
|
|
|
consistently uses the same types of iterators for both
|
|
|
|
purposes: going over a range, and accessing a specific found
|
|
|
|
element. Before the introduction of hash-based containers to
|
|
|
|
the STL, this made sense (with the exception of priority
|
|
|
|
queues, which are discussed in <a href="#pq">Priority
|
|
|
|
Queues</a>).</p>
|
|
|
|
|
|
|
|
<p>Using the STL's associative containers together with
|
|
|
|
non-order-preserving associative containers (and also because
|
|
|
|
of priority-queues container), there is a possible need for
|
|
|
|
different types of iterators for self-organizing containers -
|
|
|
|
the iterator concept seems overloaded to mean two different
|
|
|
|
things (in some cases). The following subsections explain this;
|
|
|
|
<a href="tutorial.html#assoc_find_range">Tutorial::Associative
|
|
|
|
Containers::Point-Type and Range-Type Methods</a> explains an
|
|
|
|
alternative design which does not complicate the use of
|
|
|
|
order-preserving containers, but is better for unordered
|
|
|
|
containers; <a href=
|
|
|
|
"ds_gen.html#find_range">Design::Associative
|
|
|
|
Containers::Data-Structure Genericity::Point-Type and
|
|
|
|
Range-Type Methods</a> explains the design further.</p>
|
|
|
|
|
|
|
|
<h4><a name="assoc_find_it_range_it" id=
|
|
|
|
"assoc_find_it_range_it">Using Point-Type Iterators for
|
|
|
|
Range-Type Operations</a></h4>
|
|
|
|
|
|
|
|
<p>Suppose <tt>cntnr</tt> is some associative container, and
|
|
|
|
say <tt>c</tt> is an object of type <tt>cntnr</tt>. Then what
|
|
|
|
will be the outcome of</p>
|
|
|
|
<pre>
|
|
|
|
std::for_each(c.find(1), c.find(5), foo);
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
<p>If <tt>cntnr</tt> is a tree-based container object, then an
|
|
|
|
in-order walk will apply <tt>foo</tt> to the relevant elements,
|
|
|
|
<i>e.g.</i>, as in Figure <a href="#range_it_in_hts">Range
|
|
|
|
iteration in different data structures</a> -A. If <tt>c</tt> is
|
|
|
|
a hash-based container, then the order of elements between any
|
|
|
|
two elements is undefined (and probably time-varying); there is
|
|
|
|
no guarantee that the elements traversed will coincide with the
|
|
|
|
<i>logical</i> elements between 1 and 5, <i>e.g.</i>, as in
|
|
|
|
Figure <a href="#range_it_in_hts">Range iteration in different
|
|
|
|
data structures</a>-B.</p>
|
|
|
|
|
|
|
|
<h6 class="c1"><a name="range_it_in_hts" id=
|
|
|
|
"range_it_in_hts"><img src="point_iterators_range_ops_1.png"
|
|
|
|
alt="no image" /></a></h6>
|
|
|
|
|
|
|
|
<h6 class="c1">Range iteration in different
|
|
|
|
data structures.</h6>
|
|
|
|
|
|
|
|
<p>In our opinion, this problem is not caused just because
|
|
|
|
red-black trees are order preserving while collision-chaining
|
|
|
|
hash tables are (generally) not - it is more fundamental. Most
|
|
|
|
of the STL's containers order sequences in a well-defined
|
|
|
|
manner that is determined by their <u>interface</u>: calling
|
|
|
|
<tt>insert</tt> on a tree-based container modifies its sequence
|
|
|
|
in a predictable way, as does calling <tt>push_back</tt> on a
|
|
|
|
list or a vector. Conversely, collision-chaining hash tables,
|
|
|
|
probing hash tables, priority queues, and list-based containers
|
|
|
|
(which are very useful for "multimaps") are self-organizing
|
|
|
|
data structures; the effect of each operation modifies their
|
|
|
|
sequences in a manner that is (practically) determined by their
|
|
|
|
<u>implementation</u>.</p>
|
|
|
|
|
|
|
|
<p>Consequently, applying an algorithm to a sequence obtained
|
|
|
|
from most containers <u>may or may not</u> make sense, but
|
|
|
|
applying it to a sub-sequence of a self-organizing container
|
|
|
|
<u>does not</u>.</p>
|
|
|
|
|
|
|
|
<h4><a name="assoc_range_it_for_find_it" id=
|
|
|
|
"assoc_range_it_for_find_it">The Cost of Enabling Range
|
|
|
|
Capabilities to Point-Type Iterators</a></h4>
|
|
|
|
|
|
|
|
<p>Suppose <tt>c</tt> is some collision-chaining hash-based
|
|
|
|
container object, and one calls <tt>c.find(3)</tt>. Then what
|
|
|
|
composes the returned iterator?</p>
|
|
|
|
|
|
|
|
<p>Figure <a href="#find_its_in_hash_tables">Point-type
|
|
|
|
iterators in hash tables</a>-A shows the simplest (and most
|
|
|
|
efficient) implementation of a collision-chaining hash table.
|
|
|
|
The little box marked <tt>point_iterator</tt> shows an object
|
|
|
|
that contains a pointer to the element's node. Note that this
|
|
|
|
"iterator" has no way to move to the next element (<i>i.e.</i>,
|
|
|
|
it cannot support <tt><b>operator</b>++</tt>). Conversely, the
|
|
|
|
little box marked <tt>iterator</tt> stores both a pointer to
|
|
|
|
the element, as well as some other information (<i>e.g.</i>,
|
|
|
|
the bucket number of the element). the second iterator, then,
|
|
|
|
is "heavier" than the first one- it requires more time and
|
|
|
|
space. If we were to use a different container to
|
|
|
|
cross-reference into this hash-table using these iterators - it
|
|
|
|
would take much more space. As noted in <a href=
|
|
|
|
"#assoc_find_it_range_it">Using Point-Type Iterators for
|
|
|
|
Range-Type Operations</a>, nothing much can be done by
|
|
|
|
incrementing these iterators, so why is this extra information
|
|
|
|
needed?</p>
|
|
|
|
|
|
|
|
<p>Alternatively, one might create a collision-chaining
|
|
|
|
hash-table where the lists might be linked, forming a
|
|
|
|
monolithic total-element list, as in Figure <a href=
|
|
|
|
"#find_its_in_hash_tables">Point-type iterators in hash
|
|
|
|
tables</a>-B (this seems similar to the Dinkumware design
|
|
|
|
[<a href="references.html#dinkumware_stl">dinkumware_stl</a>]).
|
|
|
|
Here the iterators are as light as can be, but the hash-table's
|
|
|
|
operations are more complicated.</p>
|
|
|
|
|
|
|
|
<h6 class="c1"><a name="find_its_in_hash_tables" id=
|
|
|
|
"find_its_in_hash_tables"><img src=
|
|
|
|
"point_iterators_range_ops_2.png" alt="no image" /></a></h6>
|
|
|
|
|
|
|
|
<h6 class="c1">Point-type iterators in hash tables.</h6>
|
|
|
|
|
|
|
|
<p>It should be noted that containers based on
|
|
|
|
collision-chaining hash-tables are not the only ones with this
|
|
|
|
type of behavior; many other self-organizing data structures
|
|
|
|
display it as well.</p>
|
|
|
|
|
|
|
|
<h4><a name="assoc_inv_guar" id="assoc_inv_guar">Invalidation
|
|
|
|
Guarantees</a></h4>
|
|
|
|
|
|
|
|
<p>Consider the following snippet:</p>
|
|
|
|
<pre>
|
|
|
|
it = c.find(3);
|
|
|
|
|
|
|
|
c.erase(5);
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
<p>Following the call to <tt>erase</tt>, what is the validity
|
|
|
|
of <tt>it</tt>: can it be de-referenced? can it be
|
|
|
|
incremented?</p>
|
|
|
|
|
|
|
|
<p>The answer depends on the underlying data structure of the
|
|
|
|
container. Figure <a href=
|
|
|
|
"#invalidation_guarantee_erase">Effect of erase in different
|
|
|
|
underlying data structures</a> shows three cases: A1 and A2
|
|
|
|
show a red-black tree; B1 and B2 show a probing hash-table; C1
|
|
|
|
and C2 show a collision-chaining hash table.</p>
|
|
|
|
|
|
|
|
<h6 class="c1"><a name="invalidation_guarantee_erase" id=
|
|
|
|
"invalidation_guarantee_erase"><img src=
|
|
|
|
"invalidation_guarantee_erase.png" alt="no image" /></a></h6>
|
|
|
|
|
|
|
|
<h6 class="c1">Effect of erase in different underlying
|
|
|
|
data structures.</h6>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>Erasing 5 from A1 yields A2. Clearly, an iterator to 3
|
|
|
|
can be de-referenced and incremented. The sequence of
|
|
|
|
iterators changed, but in a way that is well-defined by the
|
|
|
|
<u>interface</u>.</li>
|
|
|
|
|
|
|
|
<li>Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
|
|
|
|
not valid at all - it cannot be de-referenced or incremented;
|
|
|
|
the order of iterators changed in a way that is (practically)
|
|
|
|
determined by the <u>implementation</u> and not by the
|
|
|
|
<u>interface</u>.</li>
|
|
|
|
|
|
|
|
<li>Erasing 5 from C1 yields C2. Here the situation is more
|
|
|
|
complicated. On the one hand, there is no problem in
|
|
|
|
de-referencing <tt>it</tt>. On the other hand, the order of
|
|
|
|
iterators changed in a way that is (practically) determined
|
|
|
|
by the <u>implementation</u> and not by the
|
|
|
|
<u>interface</u>.</li>
|
|
|
|
</ol>
|
|
|
|
|
|
|
|
<p>So in classic STL, it is not always possible to express
|
|
|
|
whether <tt>it</tt> is valid or not. This is true also for
|
|
|
|
<tt>insert</tt>, <i>e.g.</i>. Again, the iterator concept seems
|
|
|
|
overloaded.</p>
|
|
|
|
|
|
|
|
<h3><a name="assoc_methods" id="assoc_methods">Slightly
|
|
|
|
Different Methods</a></h3>
|
|
|
|
|
|
|
|
<p>[<a href="references.html#meyers02both">meyers02both</a>]
|
|
|
|
points out that a class's methods should comprise only
|
|
|
|
operations which depend on the class's internal structure;
|
|
|
|
other operations are best designed as external functions.
|
|
|
|
Possibly, therefore, the STL's associative containers lack some
|
|
|
|
useful methods, and provide some other methods which would be
|
|
|
|
better left out (<i>e.g.</i>, [<a href=
|
|
|
|
"references.html#sgi_stl">sgi_stl</a>] ).</p>
|
|
|
|
|
|
|
|
<h4><a name="assoc_ers_methods" id="assoc_ers_methods">Methods
|
|
|
|
Related to Erase</a></h4>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>Order-preserving STL associative containers provide the
|
|
|
|
method
|
|
|
|
<pre>
|
|
|
|
iterator
|
|
|
|
erase
|
|
|
|
(iterator it)
|
|
|
|
</pre>which takes an iterator, erases the corresponding element,
|
|
|
|
and returns an iterator to the following element. Also hash-based
|
|
|
|
STL associative containers provide this method. This <u>seemingly
|
|
|
|
increases</u> genericity between associative containers, since, <i>
|
|
|
|
e.g.</i>, it is possible to use
|
|
|
|
<pre>
|
|
|
|
<b>typename</b> C::iterator it = c.begin();
|
|
|
|
<b>typename</b> C::iterator e_it = c.end();
|
|
|
|
|
|
|
|
<b>while</b>(it != e_it)
|
|
|
|
it = pred(*it)? c.erase(it) : ++it;
|
|
|
|
</pre>in order to erase from a container object <tt>
|
|
|
|
c</tt> all element which match a predicate <tt>pred</tt>.
|
|
|
|
However, in a different sense this actually
|
|
|
|
<u>decreases</u> genericity: an integral implication of
|
|
|
|
this method is that tree-based associative containers'
|
|
|
|
memory use is linear in the total number of elements they
|
|
|
|
store, while hash-based containers' memory use is unbounded
|
|
|
|
in the total number of elements they store. Assume a
|
|
|
|
hash-based container is allowed to decrease its size when
|
|
|
|
an element is erased. Then the elements might be rehashed,
|
|
|
|
which means that there is no "next" element - it is simply
|
|
|
|
undefined. Consequently, it is possible to infer from the
|
|
|
|
fact that STL hash-based containers provide this method
|
|
|
|
that they cannot downsize when elements are erased
|
|
|
|
(<a href="assoc_performance_tests.html#hash_based">Performance
|
|
|
|
Tests::Hash-Based Container Tests</a> quantifies this.) As
|
|
|
|
a consequence, different code is needed to manipulate
|
|
|
|
different containers, assuming that memory should be
|
|
|
|
conserved. <tt>pb_ds</tt>'s non-order preserving
|
|
|
|
associative containers omit this method.
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>All of <tt>pb_ds</tt>'s associative containers include a
|
|
|
|
conditional-erase method
|
|
|
|
<pre>
|
|
|
|
<b>template</b><
|
|
|
|
<b>class</b> Pred>
|
|
|
|
size_type
|
|
|
|
erase_if
|
|
|
|
(Pred pred)
|
|
|
|
</pre>which erases all elements matching a predicate. This is
|
|
|
|
probably the only way to ensure linear-time multiple-item erase
|
|
|
|
which can actually downsize a container.
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>STL associative containers provide methods for
|
|
|
|
multiple-item erase of the form
|
|
|
|
<pre>
|
|
|
|
size_type
|
|
|
|
erase
|
|
|
|
(It b,
|
|
|
|
It e)
|
|
|
|
</pre>erasing a range of elements given by a pair of iterators. For
|
|
|
|
tree-based or trie-based containers, this can implemented more
|
|
|
|
efficiently as a (small) sequence of split and join operations. For
|
|
|
|
other, unordered, containers, this method isn't much better than an
|
|
|
|
external loop. Moreover, if <tt>c</tt> is a hash-based container,
|
|
|
|
then, <i>e.g.</i>, <tt>c.erase(c.find(2), c.find(5))</tt> is almost
|
|
|
|
certain to do something different than erasing all elements whose
|
|
|
|
keys are between 2 and 5, and is likely to produce other undefined
|
|
|
|
behavior.
|
|
|
|
</li>
|
|
|
|
</ol>
|
|
|
|
|
|
|
|
<h4><a name="assoc_split_join_methods" id=
|
|
|
|
"assoc_split_join_methods">Methods Related to Split and
|
|
|
|
Join</a></h4>
|
|
|
|
|
|
|
|
<p>It is well-known that tree-based and trie-based container
|
|
|
|
objects can be efficiently split or joined [<a href=
|
|
|
|
"references.html#clrs2001">clrs2001</a>]. Externally splitting
|
|
|
|
or joining trees is super-linear, and, furthermore, can throw
|
|
|
|
exceptions. Split and join methods, consequently, seem good
|
|
|
|
choices for tree-based container methods, especially, since as
|
|
|
|
noted just before, they are efficient replacements for erasing
|
|
|
|
sub-sequences. <a href=
|
|
|
|
"assoc_performance_tests.html#tree_like_based">Performance
|
|
|
|
Tests::Tree-Like-Based Container Tests</a> quantifies this.</p>
|
|
|
|
|
|
|
|
<h4><a name="assoc_insert_methods" id=
|
|
|
|
"assoc_insert_methods">Methods Related to Insert</a></h4>
|
|
|
|
|
|
|
|
<p>STL associative containers provide methods of the form</p>
|
|
|
|
<pre>
|
|
|
|
<b>template</b><
|
|
|
|
<b>class</b> It>
|
|
|
|
size_type
|
|
|
|
insert
|
|
|
|
(It b,
|
|
|
|
It e);
|
|
|
|
</pre>for inserting a range of elements given by a pair of
|
|
|
|
iterators. At best, this can be implemented as an external loop,
|
|
|
|
or, even more efficiently, as a join operation (for the case of
|
|
|
|
tree-based or trie-based containers). Moreover, these methods seem
|
|
|
|
similar to constructors taking a range given by a pair of
|
|
|
|
iterators; the constructors, however, are transactional, whereas
|
|
|
|
the insert methods are not; this is possibly confusing.
|
|
|
|
|
|
|
|
<h4><a name="assoc_equiv_comp_methods" id=
|
|
|
|
"assoc_equiv_comp_methods">Functions Related to
|
|
|
|
Comparison</a></h4>
|
|
|
|
|
|
|
|
<p>Associative containers are parametrized by policies
|
|
|
|
allowing to test key equivalence; <i>e.g.</i> a hash-based
|
|
|
|
container can do this through its equivalence functor, and a
|
|
|
|
tree-based container can do this through its comparison
|
|
|
|
functor. In addition, some STL associative containers have
|
|
|
|
global function operators, <i>e.g.</i>,
|
|
|
|
<tt><b>operator</b>==</tt> and <tt><b>operator</b><=</tt>,
|
|
|
|
that allow comparing entire associative containers.</p>
|
|
|
|
|
|
|
|
<p>In our opinion, these functions are better left out. To
|
|
|
|
begin with, they do not significantly improve over an external
|
|
|
|
loop. More importantly, however, they are possibly misleading -
|
|
|
|
<tt><b>operator</b>==</tt>, for example, usually checks for
|
|
|
|
equivalence, or interchangeability, but the associative
|
|
|
|
container cannot check for values' equivalence, only keys'
|
|
|
|
equivalence; also, are two containers considered equivalent if
|
|
|
|
they store the same values in different order? this is an
|
|
|
|
arbitrary decision.</p>
|
|
|
|
|
|
|
|
<h3><a name="assoc_mapping_semantics" id=
|
|
|
|
"assoc_mapping_semantics">Alternative to Multiple Equivalent
|
|
|
|
Keys</a></h3>
|
|
|
|
|
|
|
|
<p>Maps (or sets) allow mapping (or storing) unique-key values.
|
|
|
|
The STL, however, also supplies associative containers which
|
|
|
|
map (or store) multiple values with equivalent keys:
|
|
|
|
<tt>std::multimap</tt>, <tt>std::multiset</tt>,
|
|
|
|
<tt>std::tr1::unordered_multimap</tt>, and
|
|
|
|
<tt>unordered_multiset</tt>. We first discuss how these might
|
|
|
|
be used, then why we think it is best to avoid them.</p>
|
|
|
|
|
|
|
|
<p>Suppose one builds a simple bank-account application that
|
|
|
|
records for each client (identified by an <tt>std::string</tt>)
|
|
|
|
and account-id (marked by an <tt><b>unsigned long</b></tt>) -
|
|
|
|
the balance in the account (described by a
|
|
|
|
<tt><b>float</b></tt>). Suppose further that ordering this
|
|
|
|
information is not useful, so a hash-based container is
|
|
|
|
preferable to a tree based container. Then one can use</p>
|
|
|
|
<pre>
|
|
|
|
std::tr1::unordered_map<std::pair<std::string, <b>unsigned long</b>>, <b>float</b>, ...>
|
|
|
|
</pre>which <u>hashes every combination of client and
|
|
|
|
account-id</u>. This might work well, except for the fact that it
|
|
|
|
is now impossible to efficiently list all of the accounts of a
|
|
|
|
specific client (this would practically require iterating over all
|
|
|
|
entries). Instead, one can use
|
|
|
|
<pre>
|
|
|
|
std::tr1::unordered_multimap<std::pair<std::string, <tt><b>unsigned long</b></tt>>, <b>float</b>, ...>
|
|
|
|
</pre>which <u>hashes every client</u>, and <u>decides equivalence
|
|
|
|
based on client</u> only. This will ensure that all accounts
|
|
|
|
belonging to a specific user are stored consecutively.
|
|
|
|
|
|
|
|
<p>Also, suppose one wants an integers' priority queue
|
|
|
|
(<i>i.e.,</i> a container that supports <tt>push</tt>,
|
|
|
|
<tt>pop</tt>, and <tt>top</tt> operations, the last of which
|
|
|
|
returns the largest <tt><b>int</b></tt>) that also supports
|
|
|
|
operations such as <tt>find</tt> and <tt>lower_bound</tt>. A
|
|
|
|
reasonable solution is to build an adapter over
|
|
|
|
<tt>std::set<<b>int</b>></tt>. In this adapter,
|
|
|
|
<i>e.g.</i>, <tt>push</tt> will just call the tree-based
|
|
|
|
associative container's <tt>insert</tt> method; <tt>pop</tt>
|
|
|
|
will call its <tt>end</tt> method, and use it to return the
|
|
|
|
preceding element (which must be the largest). Then this might
|
|
|
|
work well, except that the container object cannot hold
|
|
|
|
multiple instances of the same integer (<tt>push(4)</tt>,
|
|
|
|
<i>e.g.</i>, will be a no-op if <tt>4</tt> is already in the
|
|
|
|
container object). If multiple keys are necessary, then one
|
|
|
|
might build the adapter over an
|
|
|
|
<tt>std::multiset<<b>int</b>></tt>.</p>
|
|
|
|
|
|
|
|
<p class="c1">STL non-unique-mapping containers, then, are
|
|
|
|
useful when (1) a key can be decomposed in to a primary key and
|
|
|
|
a secondary key, (2) a key is needed multiple times, or (3) any
|
|
|
|
combination of (1) and (2).</p>
|
|
|
|
|
|
|
|
<p>Figure <a href="#embedded_lists_1">Non-unique mapping
|
|
|
|
containers in the STL's design</a> shows how the STL's design
|
|
|
|
works internally; in this figure nodes shaded equally represent
|
|
|
|
equivalent-key values. Equivalent keys are stored consecutively
|
|
|
|
using the properties of the underlying data structure: binary
|
|
|
|
search trees (Figure <a href="#embedded_lists_1">Non-unique
|
|
|
|
mapping containers in the STL's design</a>-A) store
|
|
|
|
equivalent-key values consecutively (in the sense of an
|
|
|
|
in-order walk) naturally; collision-chaining hash tables
|
|
|
|
(Figure <a href="#embedded_lists_1">Non-unique mapping
|
|
|
|
containers in the STL's design</a>-B) store equivalent-key
|
|
|
|
values in the same bucket, the bucket can be arranged so that
|
|
|
|
equivalent-key values are consecutive.</p>
|
|
|
|
|
|
|
|
<h6 class="c1"><a name="embedded_lists_1" id=
|
|
|
|
"embedded_lists_1"><img src="embedded_lists_1.png" alt=
|
|
|
|
"no image" /></a></h6>
|
|
|
|
|
|
|
|
<h6 class="c1">Non-unique mapping containers in the STL's
|
|
|
|
design.</h6>
|
|
|
|
|
|
|
|
<p>Put differently, STL non-unique mapping
|
|
|
|
associative-containers are associative containers that map
|
|
|
|
primary keys to linked lists that are embedded into the
|
|
|
|
container. Figure <a href="#embedded_lists_2">Effect of
|
|
|
|
embedded lists in STL multimaps</a> shows again the two
|
|
|
|
containers from Figure <a href="#embedded_lists_1">Non-unique
|
|
|
|
mapping containers in the STL's design</a>, this time with the
|
|
|
|
embedded linked lists of the grayed nodes marked
|
|
|
|
explicitly.</p>
|
|
|
|
|
|
|
|
<h6 class="c1"><a name="embedded_lists_2" id=
|
|
|
|
"embedded_lists_2"><img src="embedded_lists_2.png" alt=
|
|
|
|
"no image" /></a></h6>
|
|
|
|
|
|
|
|
<h6 class="c1">Effect of embedded lists in STL multimaps.</h6>
|
|
|
|
|
|
|
|
<p>These embedded linked lists have several disadvantages.</p>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>The underlying data structure embeds the linked lists
|
|
|
|
according to its own consideration, which means that the
|
|
|
|
search path for a value might include several different
|
|
|
|
equivalent-key values. For example, the search path for the
|
|
|
|
the black node in either of Figures <a href=
|
|
|
|
"#embedded_lists_1">Non-unique mapping containers in the
|
|
|
|
STL's design</a> A or B, includes more than a single gray
|
|
|
|
node.</li>
|
|
|
|
|
|
|
|
<li>The links of the linked lists are the underlying
|
|
|
|
data structures' nodes, which typically are quite structured.
|
|
|
|
<i>E.g.</i>, in the case of tree-based containers (Figure
|
|
|
|
<a href="#embedded_lists_2">Effect of embedded lists in STL
|
|
|
|
multimaps</a>-B), each "link" is actually a node with three
|
|
|
|
pointers (one to a parent and two to children), and a
|
|
|
|
relatively-complicated iteration algorithm. The linked lists,
|
|
|
|
therefore, can take up quite a lot of memory, and iterating
|
|
|
|
over all values equal to a given key (<i>e.g.</i>, through
|
|
|
|
the return value of the STL's <tt>equal_range</tt>) can be
|
|
|
|
expensive.</li>
|
|
|
|
|
|
|
|
<li>The primary key is stored multiply; this uses more
|
|
|
|
memory.</li>
|
|
|
|
|
|
|
|
<li>Finally, the interface of this design excludes several
|
|
|
|
useful underlying data structures. <i>E.g.</i>, of all the
|
|
|
|
unordered self-organizing data structures, practically only
|
|
|
|
collision-chaining hash tables can (efficiently) guarantee
|
|
|
|
that equivalent-key values are stored consecutively.</li>
|
|
|
|
</ol>
|
|
|
|
|
|
|
|
<p>The above reasons hold even when the ratio of secondary keys
|
|
|
|
to primary keys (or average number of identical keys) is small,
|
|
|
|
but when it is large, there are more severe problems:</p>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>The underlying data structures order the links inside
|
|
|
|
each embedded linked-lists according to their internal
|
|
|
|
considerations, which effectively means that each of the
|
|
|
|
links is unordered. Irrespective of the underlying
|
|
|
|
data structure, searching for a specific value can degrade to
|
|
|
|
linear complexity.</li>
|
|
|
|
|
|
|
|
<li>Similarly to the above point, it is impossible to apply
|
|
|
|
to the secondary keys considerations that apply to primary
|
|
|
|
keys. For example, it is not possible to maintain secondary
|
|
|
|
keys by sorted order.</li>
|
|
|
|
|
|
|
|
<li>While the interface "understands" that all equivalent-key
|
|
|
|
values constitute a distinct list (<i>e.g.</i>, through
|
|
|
|
<tt>equal_range</tt>), the underlying data structure
|
|
|
|
typically does not. This means, <i>e.g.</i>, that operations
|
|
|
|
such as erasing from a tree-based container all values whose
|
|
|
|
keys are equivalent to a a given key can be super-linear in
|
|
|
|
the size of the tree; this is also true also for several
|
|
|
|
other operations that target a specific list.</li>
|
|
|
|
</ol>
|
|
|
|
|
|
|
|
<p>In <tt>pb_ds</tt>, therefore, all associative containers map
|
|
|
|
(or store) unique-key values. One can (1) map primary keys to
|
|
|
|
secondary associative-containers (<i>i.e.</i>, containers of
|
|
|
|
secondary keys) or non-associative containers (2) map identical
|
|
|
|
keys to a size-type representing the number of times they
|
|
|
|
occur, or (3) any combination of (1) and (2). Instead of
|
|
|
|
allowing multiple equivalent-key values, <tt>pb_ds</tt>
|
|
|
|
supplies associative containers based on underlying
|
|
|
|
data structures that are suitable as secondary
|
|
|
|
associative-containers (see <a href=
|
|
|
|
"assoc_performance_tests.html#msc">Associative-Container
|
|
|
|
Performance Tests::Observations::Mapping-Semantics
|
|
|
|
Considerations</a>).</p>
|
|
|
|
|
|
|
|
<p>Figures <a href="#embedded_lists_3">Non-unique mapping
|
|
|
|
containers in <tt>pb_ds</tt></a> A and B show the equivalent
|
|
|
|
structures in <tt>pb_ds</tt>'s design, to those in Figures
|
|
|
|
<a href="#embedded_lists_1">Non-unique mapping containers in
|
|
|
|
the STL's design</a> A and B, respectively. Each shaded box
|
|
|
|
represents some size-type or secondary
|
|
|
|
associative-container.</p>
|
|
|
|
|
|
|
|
<h6 class="c1"><a name="embedded_lists_3" id=
|
|
|
|
"embedded_lists_3"><img src="embedded_lists_3.png" alt=
|
|
|
|
"no image" /></a></h6>
|
|
|
|
|
|
|
|
<h6 class="c1">Non-unique mapping containers in the
|
|
|
|
<tt>pb_ds</tt>.</h6>
|
|
|
|
|
|
|
|
<p>In the first example above, then, one would use an
|
|
|
|
associative container mapping each user to an associative
|
|
|
|
container which maps each application id to a start time (see
|
|
|
|
<a href=
|
2008-09-22 22:56:08 +02:00
|
|
|
"http://gcc.gnu.org/viewcvs/*checkout*/trunk/libstdc%2B%2B-v3/testsuite/ext/pb_ds/example/basic_multimap.cc"><tt>basic_multimap.cc</tt></a>);
|
2006-06-15 01:09:51 +02:00
|
|
|
in the second example, one would use an associative container
|
|
|
|
mapping each <tt><b>int</b></tt> to some size-type indicating
|
|
|
|
the number of times it logically occurs (see <a href=
|
2008-09-22 22:56:08 +02:00
|
|
|
"http://gcc.gnu.org/viewcvs/*checkout*/trunk/libstdc%2B%2B-v3/testsuite/ext/pb_ds/example/basic_multiset.cc"><tt>basic_multiset.cc</tt></a>).</p>
|
2006-06-15 01:09:51 +02:00
|
|
|
|
|
|
|
<p><a href=
|
|
|
|
"assoc_performance_tests.html#multimaps">Associative-Container
|
|
|
|
Performance Tests::Multimaps</a> quantifies some of these
|
|
|
|
points, and <a href=
|
|
|
|
"assoc_performance_tests.html#msc">Associative-Container
|
|
|
|
Performance Tests::Observations::Mapping-Semantics
|
|
|
|
Considerations</a> shows some simple calculations.</p>
|
|
|
|
|
|
|
|
<p><a href="assoc_examples.html#mmaps">Associative-Container
|
|
|
|
Examples::Multimaps</a> shows some simple examples of using
|
|
|
|
"multimaps".</p>
|
|
|
|
|
|
|
|
<p><a href="lu_based_containers.html">Design::Associative
|
|
|
|
Containers::List-Based Containers</a> discusses types of
|
|
|
|
containers especially suited as secondary
|
|
|
|
associative-containers.</p>
|
|
|
|
|
|
|
|
<h2><a name="pq" id="pq">Priority Queues</a></h2>
|
|
|
|
|
|
|
|
<h3><a name="pq_more_ops" id="pq_more_ops">Slightly Different
|
|
|
|
Methods</a></h3>
|
|
|
|
|
|
|
|
<p>Priority queues are containers that allow efficiently
|
|
|
|
inserting values and accessing the maximal value (in the sense
|
|
|
|
of the container's comparison functor); <i>i.e.</i>, their
|
|
|
|
interface supports <tt>push</tt> and <tt>pop</tt>. The STL's
|
|
|
|
priority queues indeed support these methods, but they support
|
|
|
|
little else. For algorithmic and software-engineering purposes,
|
|
|
|
other methods are needed:</p>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>Many graph algorithms [<a href=
|
|
|
|
"references.html#clrs2001">clrs2001</a>] require increasing a
|
|
|
|
value in a priority queue (again, in the sense of the
|
|
|
|
container's comparison functor), or joining two
|
|
|
|
priority-queue objects.</li>
|
|
|
|
|
|
|
|
<li>It is sometimes necessary to erase an arbitrary value in
|
|
|
|
a priority queue. For example, consider the <tt>select</tt>
|
|
|
|
function for monitoring file descriptors:
|
|
|
|
<pre>
|
|
|
|
<b>int</b>
|
|
|
|
select
|
|
|
|
(<b>int</b> nfds,
|
|
|
|
fd_set *readfds,
|
|
|
|
fd_set *writefds,
|
|
|
|
fd_set *errorfds,
|
|
|
|
<b>struct</b> timeval *timeout);
|
|
|
|
</pre>then, as the <tt>select</tt> manual page [<a href=
|
|
|
|
"references.html#select_man">select_man</a>] states:
|
|
|
|
|
|
|
|
<p><q>The nfds argument specifies the range of file
|
|
|
|
descriptors to be tested. The select() function tests file
|
|
|
|
descriptors in the range of 0 to nfds-1.</q></p>
|
|
|
|
|
|
|
|
<p>It stands to reason, therefore, that we might wish to
|
|
|
|
maintain a minimal value for <tt>nfds</tt>, and priority
|
|
|
|
queues immediately come to mind. Note, though, that when a
|
|
|
|
socket is closed, the minimal file description might
|
|
|
|
change; in the absence of an efficient means to erase an
|
|
|
|
arbitrary value from a priority queue, we might as well
|
|
|
|
avoid its use altogether.</p>
|
|
|
|
|
|
|
|
<p><a href="pq_examples.html#xref">Priority-Queue
|
|
|
|
Examples::Cross-Referencing</a> shows examples for these
|
|
|
|
types of operations.</p>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>STL containers typically support iterators. It is
|
|
|
|
somewhat unusual for <tt>std::priority_queue</tt> to omit
|
|
|
|
them (see, <i>e.g.</i>, [<a href=
|
|
|
|
"references.html#meyers01stl">meyers01stl</a>]). One might
|
|
|
|
ask why do priority queues need to support iterators, since
|
|
|
|
they are self-organizing containers with a different purpose
|
|
|
|
than abstracting sequences. There are several reasons:
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>Iterators (even in self-organizing containers) are
|
|
|
|
useful for many purposes, <i>e.g.</i>, cross-referencing
|
|
|
|
containers, serialization, and debugging code that uses
|
|
|
|
these containers.</li>
|
|
|
|
|
|
|
|
<li>The STL's hash-based containers support iterators,
|
|
|
|
even though they too are self-organizing containers with
|
|
|
|
a different purpose than abstracting sequences.</li>
|
|
|
|
|
|
|
|
<li>In STL-like containers, it is natural to specify the
|
|
|
|
interface of operations for modifying a value or erasing
|
|
|
|
a value (discussed previously) in terms of a iterators.
|
|
|
|
This is discussed further in <a href=
|
|
|
|
"pq_design.html#pq_it">Design::Priority
|
|
|
|
Queues::Iterators</a>. It should be noted that the STL's
|
|
|
|
containers also use iterators for accessing and
|
|
|
|
manipulating a specific value. <i>E.g.</i>, in hash-based
|
|
|
|
containers, one checks the existence of a key by
|
|
|
|
comparing the iterator returned by <tt>find</tt> to the
|
|
|
|
iterator returned by <tt>end</tt>, and not by comparing a
|
|
|
|
pointer returned by <tt>find</tt> to <tt>NULL</tt>.</li>
|
|
|
|
</ol>
|
|
|
|
</li>
|
|
|
|
</ol>
|
|
|
|
|
|
|
|
<p><a href="pq_performance_tests.html">Performance
|
|
|
|
Tests::Priority Queues</a> quantifies some of these points.</p>
|
|
|
|
|
|
|
|
<h3><a name="pq_ds_genericity" id="pq_ds_genericity">More Data
|
|
|
|
Structures and Traits</a></h3>
|
|
|
|
|
|
|
|
<p>There are three main implementations of priority queues: the
|
|
|
|
first employs a binary heap, typically one which uses a
|
|
|
|
sequence; the second uses a tree (or forest of trees), which is
|
|
|
|
typically less structured than an associative container's tree;
|
|
|
|
the third simply uses an associative container. These are
|
|
|
|
shown, respectively, in Figures <a href=
|
|
|
|
"#pq_different_underlying_dss">Underlying Priority-Queue
|
|
|
|
Data-Structures</a> A1 and A2, B, and C.</p>
|
|
|
|
|
|
|
|
<h6 class="c1"><a name="pq_different_underlying_dss" id=
|
|
|
|
"pq_different_underlying_dss"><img src=
|
|
|
|
"pq_different_underlying_dss.png" alt="no image" /></a></h6>
|
|
|
|
|
|
|
|
<h6 class="c1">Underlying Priority-Queue Data-Structures.</h6>
|
|
|
|
|
|
|
|
<p>No single implementation can completely replace any of the
|
|
|
|
others. Some have better <tt>push</tt> and <tt>pop</tt>
|
|
|
|
amortized performance, some have better bounded (worst case)
|
|
|
|
response time than others, some optimize a single method at the
|
|
|
|
expense of others, <i>etc.</i>. In general the "best"
|
|
|
|
implementation is dictated by the problem (see <a href=
|
|
|
|
"pq_performance_tests.html#pq_observations">Performance
|
|
|
|
Tests::Priority Queues::Observations</a>).</p>
|
|
|
|
|
|
|
|
<p>As with associative containers (see <a href=
|
|
|
|
"#assoc_ds_genericity">Associative Containers::Traits for
|
|
|
|
Underlying Data-Structures</a>), the more implementations
|
|
|
|
co-exist, the more necessary a traits mechanism is for handling
|
|
|
|
generic containers safely and efficiently. This is especially
|
|
|
|
important for priority queues, since the invalidation
|
|
|
|
guarantees of one of the most useful data structures - binary
|
|
|
|
heaps - is markedly different than those of most of the
|
|
|
|
others.</p>
|
|
|
|
|
|
|
|
<p><a href="pq_design.html#pq_traits">Design::Priority
|
|
|
|
Queues::Traits</a> discusses this further.</p>
|
|
|
|
|
|
|
|
<h3><a name="pq_binary_heap" id="pq_binary_heap">Binary Heap
|
|
|
|
Implementation</a></h3>
|
|
|
|
|
|
|
|
<p>Binary heaps are one of the most useful underlying
|
|
|
|
data structures for priority queues. They are very efficient in
|
|
|
|
terms of memory (since they don't require per-value structure
|
|
|
|
metadata), and have the best amortized <tt>push</tt> and
|
|
|
|
<tt>pop</tt> performance for primitive types (<i>e.g.</i>,
|
|
|
|
<tt><b>int</b></tt>s).</p>
|
|
|
|
|
|
|
|
<p>The STL's <tt>priority_queue</tt> implements this data
|
|
|
|
structure as an adapter over a sequence, typically
|
|
|
|
<tt>std::vector</tt> or <tt>std::deque</tt>, which correspond
|
|
|
|
to Figures <a href="#pq_different_underlying_dss">Underlying
|
|
|
|
Priority-Queue Data-Structures</a> A1 and A2, respectively.</p>
|
|
|
|
|
|
|
|
<p>This is indeed an elegant example of the adapter concept and
|
|
|
|
the algorithm/container/iterator decomposition (see [<a href=
|
|
|
|
"references.html#nelson96stlpq">nelson96stlpql</a>]). There are
|
|
|
|
possibly reasons, however, why a binary-heap priority queue
|
|
|
|
would be better implemented as a container instead of a
|
|
|
|
sequence adapter:</p>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li><tt>std::priority_queue</tt> cannot erase values from its
|
|
|
|
adapted sequence (irrespective of the sequence type). This
|
|
|
|
means that the memory use of an <tt>std::priority_queue</tt>
|
|
|
|
object is always proportional to the maximal number of values
|
|
|
|
it ever contained, and not to the number of values that it
|
|
|
|
currently contains (see <a href=
|
|
|
|
"priority_queue_text_pop_mem_usage_test.html">Priority Queue
|
|
|
|
Text <tt>pop</tt> Memory Use Test</a>); this implementation
|
|
|
|
of binary heaps acts very differently than other underlying
|
|
|
|
data structures (<i>e.g.</i>, pairing heaps).</li>
|
|
|
|
|
|
|
|
<li>Some combinations of adapted sequences and value types
|
|
|
|
are very inefficient or just don't make sense. If one uses
|
|
|
|
<tt>std::priority_queue<std::vector<std::string>
|
|
|
|
> ></tt>, for example, then not only will each
|
|
|
|
operation perform a logarithmic number of
|
|
|
|
<tt>std::string</tt> assignments, but, furthermore, any
|
|
|
|
operation (including <tt>pop</tt>) can render the container
|
|
|
|
useless due to exceptions. Conversely, if one uses
|
|
|
|
<tt>std::priority_queue<std::deque<<b>int</b>> >
|
|
|
|
></tt>, then each operation uses incurs a logarithmic
|
|
|
|
number of indirect accesses (through pointers) unnecessarily.
|
|
|
|
It might be better to let the container make a conservative
|
|
|
|
deduction whether to use the structure in Figures <a href=
|
|
|
|
"#pq_different_underlying_dss">Underlying Priority-Queue
|
|
|
|
Data-Structures</a> A1 or A2.</li>
|
|
|
|
|
|
|
|
<li>There does not seem to be a systematic way to determine
|
|
|
|
what exactly can be done with the priority queue.
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>If <tt>p</tt> is a priority queue adapting an
|
|
|
|
<tt>std::vector</tt>, then it is possible to iterate over
|
|
|
|
all values by using <tt>&p.top()</tt> and
|
|
|
|
<tt>&p.top() + p.size()</tt>, but this will not work
|
|
|
|
if <tt>p</tt> is adapting an <tt>std::deque</tt>; in any
|
|
|
|
case, one cannot use <tt>p.begin()</tt> and
|
|
|
|
<tt>p.end()</tt>. If a different sequence is adapted, it
|
|
|
|
is even more difficult to determine what can be
|
|
|
|
done.</li>
|
|
|
|
|
|
|
|
<li>If <tt>p</tt> is a priority queue adapting an
|
|
|
|
<tt>std::deque</tt>, then the reference return by
|
|
|
|
<tt>p.top()</tt> will remain valid until it is popped,
|
|
|
|
but if <tt>p</tt> adapts an <tt>std::vector</tt>, the
|
|
|
|
next <tt>push</tt> will invalidate it. If a different
|
|
|
|
sequence is adapted, it is even more difficult to
|
|
|
|
determine what can be done.</li>
|
|
|
|
</ol>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>Sequence-based binary heaps can still implement
|
|
|
|
linear-time <tt>erase</tt> and <tt>modify</tt> operations.
|
|
|
|
This means that if one needs, <i>e.g.</i>, to erase a small
|
|
|
|
(say logarithmic) number of values, then one might still
|
|
|
|
choose this underlying data structure. Using
|
|
|
|
<tt>std::priority_queue</tt>, however, this will generally
|
|
|
|
change the order of growth of the entire sequence of
|
|
|
|
operations.</li>
|
|
|
|
</ol>
|
|
|
|
</div>
|
|
|
|
</body>
|
|
|
|
</html>
|