[multiple changes]

2004-02-11  Stefan Olsson  <stefan@xapa.se>

	* docs/html/ext/mt_allocator.html: New.

2004-02-11  Benjamin Kosnik  <bkoz@redhat.com>

	* docs/html/20_util/allocator.html: New file, consolidate
	allocator information here. Revamp.
	* docs/html/documentation.html: Change links.
	* docs/html/20_util/howto.html: Same.
	* docs/html/ext/howto.html: Same.

From-SVN: r77687
This commit is contained in:
Benjamin Kosnik 2004-02-12 01:11:48 +00:00
parent 9288d11204
commit 1c86f39d32
6 changed files with 900 additions and 249 deletions

View File

@ -1,3 +1,15 @@
2004-02-11 Stefan Olsson <stefan@xapa.se>
* docs/html/ext/mt_allocator.html: New.
2004-02-11 Benjamin Kosnik <bkoz@redhat.com>
* docs/html/20_util/allocator.html: New file, consolidate
allocator information here. Revamp.
* docs/html/documentation.html: Change links.
* docs/html/20_util/howto.html: Same.
* docs/html/ext/howto.html: Same.
2004-02-11 Paolo Carlini <pcarlini@suse.de>
PR libstdc++/13731 (first part: write)

View File

@ -0,0 +1,468 @@
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta name="AUTHOR" content="pme@gcc.gnu.org (Phil Edwards) and bkoz@gcc.gnu.org (Benjamin Kosnik)" />
<meta name="KEYWORDS" content="c++, libstdc++, g++, allocator, memory" />
<meta name="DESCRIPTION" content="Allocators and allocation" />
<meta name="GENERATOR" content="emacs and ten fingers" />
<title>Allocators and allocation</title>
<link rel="StyleSheet" href="../lib3styles.css" type="text/css" />
<link rel="Copyright" href="../17_intro/license.html" type="text/html" />
</head>
<body>
<h1 class="centered"><a name="top">Allocators and allocation</a></h1>
<p class="fineprint"><em>
The latest version of this document is always available at
<a href="http://gcc.gnu.org/onlinedocs/libstdc++/20_util/allocator.html">
http://gcc.gnu.org/onlinedocs/libstdc++/20_util/allocator.html</a>.
</em></p>
<p><em>
To the <a href="http://gcc.gnu.org/libstdc++/">libstdc++-v3 homepage</a>.
</em></p>
<!-- ####################################################### -->
<hr />
<p> The C++ Standard encapsulates memory management characteristics
for strings, container classes, and parts of iostreams in a
template class called <code>std::allocator</code>.
</p>
<h3 class="left">
<a name="standard requirements">Standard requirements</a>
</h3>
<p>The C++ standard only gives a few directives in this area:
</p>
<ul>
<li>When you add elements to a container, and the container must allocate
more memory to hold them, the container makes the request via its
<code>Allocator</code> template parameter. This includes adding
chars to the string class, which acts as a regular STL container
in this respect.
</li>
<li>The default <code>Allocator</code> of every container-of-T is
<code>std::allocator&lt;T&gt;</code>.
</li>
<li>The interface of the <code>allocator&lt;T&gt;</code> class is
extremely simple. It has about 20 public declarations (nested
typedefs, member functions, etc), but the two which concern us most
are:
<pre>
T* allocate (size_type n, const void* hint = 0);
void deallocate (T* p, size_type n);</pre>
(This is a simplification; the real signatures use nested typedefs.)
The <code>&quot;n&quot;</code> arguments in both those functions is a
<em>count</em> of the number of T's to allocate space for,
<em>not their total size</em>.
</li>
<li>&quot;The storage is obtained by calling
<code>::operator new(size_t)</code>, but it is unspecified when or
how often this function is called. The use of <code>hint</code>
is unspecified, but intended as an aid to locality if an
implementation so desires.&quot; [20.4.1.1]/6
</li>
</ul>
<p> Complete details cam be found in the C++ standard, look in
[20.4 Memory].
</p>
<h3 class="left">
<a name="probs possibilities">Problems and Possibilities</a>
</h3>
<p>The easiest way of fulfilling the requirements is to call operator new
each time a container needs memory, and to call operator delete each
time the container releases memory. <strong>BUT</strong>
<a href="http://gcc.gnu.org/ml/libstdc++/2001-05/msg00105.html">this
method is horribly slow</a>.
</p>
<p>Or we can keep old memory around, and reuse it in a pool to save time.
The old libstdc++-v2 used a memory pool, and so do we. As of 3.0,
<a href="http://gcc.gnu.org/ml/libstdc++/2001-05/msg00136.html">it's
on by default</a>. The pool is shared among all the containers in the
program: when your program's std::vector&lt;int&gt; gets cut in half
and frees a bunch of its storage, that memory can be reused by the
private std::list&lt;WonkyWidget&gt; brought in from a KDE library
that you linked against. And we don't have to call operators new and
delete to pass the memory on, either, which is a speed bonus.
<strong>BUT</strong>...
</p>
<p>What about threads? No problem: in a threadsafe environment, the
memory pool is manipulated atomically, so you can grow a container in
one thread and shrink it in another, etc. <strong>BUT</strong> what
if threads in libstdc++-v3 aren't set up properly?
<a href="../faq/index.html#5_6">That's been answered already</a>.
</p>
<p><strong>BUT</strong> what if you want to use your own allocator? What
if you plan on using a runtime-loadable version of malloc() which uses
shared telepathic anonymous mmap'd sections serializable over a
network, so that memory requests <em>should</em> go through malloc?
And what if you need to debug it?
</p>
<h3 class="left">
<a name="stdallocator">Implementation details of <code>std::allocator</code></a>
</h3>
<p> The implementation of <code> std::allocator</code> has continued
to evolve through successive releases. Here's a brief history.
</p>
<h5 class="left">
<a name="30allocator"> 3.0, 3.1, 3.2, 3.3 </a>
</h5>
<p> During this period, all allocators were written to the SGI
style, and all STL containers expected this interface. This
interface had a traits class called <code>_Alloc_traits</code> that
attempted to provide more information for compile-time allocation
selection and optimization. This traits class had another allocator
wrapper, <code>__simple_alloc&lt;T,A&gt;</code>, which was a
wrapper around another allocator, A, which itself is an allocator
for instances of T. But wait, there's more:
<code>__allocator&lt;T,A&gt;</code> is another adapter. Many of
the provided allocator classes were SGI style: such classes can be
changed to a conforming interface with this wrapper:
<code>__allocator&lt;T, __alloc&gt;</code> is thus the same as
<code>allocator&lt;T&gt;</code>.
</p>
<p> The class <code>std::allocator</code> use the typedef
<code>__alloc</code> to select an underlying allocator that
satisfied memory allocation requests. The selection of this
underlying allocator was not user-configurable.
</p>
<h5 class="left">
<a name="34allocator"> 3.4 </a>
</h5>
<p> For this and later releases, the only allocator interface that
is support is the standard C++ interface. As such, all STL
containers have been adjusted, and all external allocators have
been modified to support this change. Because of this,
<code>__simple_alloc, __allocator, __alloc, </code> and <code>
_Alloc_traits</code> have all been removed.
</p>
<p> The class <code>std::allocator</code> just has typedef,
constructor, and rebind members. It inherits from one of the
high-speed extension allocators, covered below. Thus, all
allocation and deallocation depends on the base class.
</p>
<p> The base class that <code>std::allocator</code> is derived from
is not user-configurable.
</p>
<h5 class="left">
<a name="benchmarks"> How the default allocation strategy is selected.</a>
</h5>
<p> It's difficult to pick an allocation strategy that will provide
maximum utility, without excessively penalizing some behavior. In
fact, it's difficult just deciding which typical actions to measure
for speed.
</p>
<p> Three synthetic benchmarks have been created that provide data
that is used to compare different C++ allocators. These tests are:
</p>
<ul>
<li>Insertion. Over multiple iterations, various STL container
objects have elements inserted to some maximum amount. A variety
of allocators are tested.
Test source <a
href="http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc%2b%2b-v3/testsuite/performance/20_util/allocator/insert.cc?only_with_tag=MAIN">here.</a>
</li>
<li>Insertion, clear, and re-insertion in a multi-threaded
environment. Over multiple iterations, several threads are
started that insert elements into a STL container, then assign a
null instance of the same type to clear memory, and then
re-insert the same number of elements. Several STL containers and
multiple allocators are tested. This test shows the ability of
the allocator to reclaim memory on a pre-thread basis, as well as
measuring thread contention for memory resources.
Test source
<a href="http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc%2b%2b-v3/testsuite/performance/20_util/allocator/insert_insert.cc">
here.</a>
</li>
<li>A threaded producer/consumer model.
Test source
<a href="http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc%2b%2b-v3/testsuite/performance/20_util/allocator/producer_consumer.cc">
here.</a>
</li>
</ul>
<h5 class="left">
<a name="forcenew"> Disabling memory caching.</a>
</h5>
<p> In use, <code>std::allocator</code> may allocate and deallocate
using implementation-specified strategies and heuristics. Because of
this, every call to an allocator object's <code> allocate</code>
member function may not actually call the global operator new. This
situation is also duplicated for calls to the <code>
deallocate</code> member function.
</p>
<p> This can be confusing.
</p>
<p> In particular, this can make debugging memory errors more
difficult, especially when using third party tools like valgrind or
debug versions of <code> new</code>.
</p>
<p> There are various ways to solve this problem. One would be to
use a custom allocator that just called operators <code> new
</code> and <code> delete</code> directly, for every
allocation. (See include/ext/new_allocator.h, for instance.)
However, that option would involve changing source code to use the a
non-default allocator. Another option is to force the default
allocator to remove caching and pools, and to directly allocate
with every call of <code> allocate</code> and directly deallocate
with every call of <code> deallocate</code>, regardless of
efficiency. As it turns out, this last option is available,
although the exact mechanism has evolved with time.
</p>
<p> For GCC releases from 2.95 through the 3.1 series, defining
<code>__USE_MALLOC</code> on the gcc command line would change the
default allocation strategy to instead use <code> malloc</code> and
<code> free</code>. See
<a href="../23_containers/howto.html#3">this note</a>
for details as to why this was something needing improvement.
</p>
<p>Starting with GCC 3.2, and continued in the 3.3 series, to
globally disable memory caching within the library for the
default allocator, merely set GLIBCPP_FORCE_NEW (at this time,
with any value) in the system's environment before running the
program. If your program crashes with GLIBCPP_FORCE_NEW in the
environment, it likely means that you linked against objects
built against the older library. Code to support this extension
is fully compatible with 3.2 code if GLIBCPP_FORCE_NEW is not in
the environment.
</p>
<p> As it turns out, the 3.4 code base continues to use this
mechanism, only the environment variable has been changed to
GLIBCXX_FORCE_NEW.
</p>
<h3 class="left">
<a name="ext allocators">Other allocators</a>
</h3>
<p> Several other allocators are provided as part of this
implementation. The location of the extension allocators and their
names have changed, but in all cases, functionality is
equivalent. Starting with gcc-3.4, all extension allocators are
standard style. Before this point, SGI style was the norm. Because of
this, the number of template arguments also changed. Here's a simple
chart to track the changes.
</p>
<table title="extension allocators" border="1">
<tr>
<th>Allocator (3.4)</th>
<th>Header (3.4)</th>
<th>Allocator (3.[0-3])</th>
<th>Header (3.[0-3])</th>
</tr>
<tr>
<td>__gnu_cxx::new_allocator&lt;T&gt;</td>
<td>&lt;ext/new_allocator.h&gt;</td>
<td>std::__new_alloc</td>
<td>&lt;memory&gt;</td>
</tr>
<tr>
<td>__gnu_cxx::malloc_allocator&lt;T&gt;</td>
<td>&lt;ext/malloc_allocator.h&gt;</td>
<td>std::__malloc_alloc_template&lt;int&gt;</td>
<td>&lt;memory&gt;</td>
</tr>
<tr>
<td>__gnu_cxx::debug_allocator&lt;T&gt;</td>
<td>&lt;ext/debug_allocator.h&gt;</td>
<td>std::debug_alloc&lt;T&gt;</td>
<td>&lt;memory&gt;</td>
</tr>
<tr>
<td>__gnu_cxx::__pool_alloc&lt;bool, int&gt;</td>
<td>&lt;ext/pool_allocator.h&gt;</td>
<td>std::__default_alloc_template&lt;bool,int&gt;</td>
<td>&lt;memory&gt;</td>
</tr>
<tr>
<td>__gnu_cxx::__mt_alloc&lt;T&gt;</td>
<td>&lt;ext/mt_allocator.h&gt;</td>
<td></td>
<td></td>
</tr>
</table>
<p>More details on each of these allocators follows. </p>
<ul>
<li><code>new_allocator</code>
<p>Simply wraps <code>::operator new</code>
and <code>::operator delete</code>.
</p>
</li>
<li><code>malloc_allocator</code>
<p>Simply wraps
<code>malloc</code> and <code>free</code>. There is also a hook
for an out-of-memory handler (for new/delete this is taken care of
elsewhere).
</p>
</li>
<li><code>debug_allocator</code>
<p> A wrapper around an
arbitrary allocator A. It passes on slightly increased size
requests to A, and uses the extra memory to store size information.
When a pointer is passed to <code>deallocate()</code>, the stored
size is checked, and assert() is used to guarantee they match.
</p>
</li>
<li><code>__pool_alloc</code>
<p> A high-performance, single pool allocator. The reusable
memory is shared among identical instantiations of this type.
It calls through <code>::operator new</code> to obtain new memory
when its lists run out. If a client container requests a block
larger than a certain threshold size, then the pool is bypassed,
and the allocate/deallocate request is passed to
<code>::operator new</code> directly. </p>
<p> This class take a boolean template parameter, called
<code>thr</code>, and an integer template parameter, called
<code>inst</code>.
</p>
<p>The <code>inst</code> number is used to track additional memory
pools. The point of the number is to allow multiple
instantiations of the classes without changing the semantics at
all. All three of
</p>
<pre>
typedef __pool_alloc&lt;true,0&gt; normal;
typedef __pool_alloc&lt;true,1&gt; private;
typedef __pool_alloc&lt;true,42&gt; also_private;</pre>
<p>behave exactly the same way. However, the memory pool for each type
(and remember that different instantiations result in different types)
remains separate.
</p>
<p>The library uses <strong>0</strong> in all its instantiations. If you
wish to keep separate free lists for a particular purpose, use a
different number.
</p>
<p>The <code>thr</code> boolean determines whether the pool should
be manipulated atomically or not. When thr=true, the allocator
is is threadsafe, while thr=false, and is slightly faster but
unsafe for multiple threads.
</p>
<p>(Note that the GCC thread abstraction layer allows us to provide safe
zero-overhead stubs for the threading routines, if threads were
disabled at configuration time.)
</p>
</li>
<li><code>__mt_alloc</code>
<p>A high-performance
fixed-size allocator. It has its own documentation, found <a
href="../ext/mt_allocator.html">here</a>.
</p>
</li>
</ul>
<h3 class="left">
<a name="using custom allocators">Using a specific allocator</a>
</h3>
<p>You can specify different memory management schemes on a
per-container basis, by overriding the default
<code>Allocator</code> template parameter. For example, an easy
(but non-portable) method of specifying that only malloc/free
should be used instead of the default node allocator is:
</p>
<pre>
std::list &lt;int, __gnu_cxx::malloc_allocator&lt;int&gt; &gt; malloc_list;</pre>
Likewise, a debugging form of whichever allocator is currently in use:
<pre>
std::deque &lt;int, __gnu_cxx::debug_allocator&lt;std::allocator&lt;int&gt; &gt; &gt; debug_deque;</pre>
<h3 class="left">
<a name="custom allocators">Writing custom allocators</a>
</h3>
<p> Writing a portable C++ allocator would dictate that the
interface would look much like the one specified for <code>
std::allocator</code>. Additional member functions, but not
subtractions, would be permissible.
</p>
<p> Probably the best place to start would be to copy one of the
extension allocators already shipped with libstdc++: say, <code>
new_allocator </code>.
</p>
<h3 class="left">
<a name="biblio">Bibliography / Further Reading</a>
</h3>
<p>
ISO/IEC 14882:1998 Programming languages - C++ [20.4 Memory]
</p>
<p>
Austern, Matt, C/C++ Users Journal.
<a href="http://www.cuj.com/documents/s=8000/cujcexp1812austern/">The Standard Librarian: What Are Allocators Good
For?</a>
</p>
<p>
Berger, Emery,
<a href="http://www.cs.umass.edu/~emery/hoard/"> The Hoard memory allocator </a>
</p>
<p>
Berger, Emery with Ben Zorn & Kathryn McKinley, OOPSLA 2002
<a href="http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf">Reconsidering Custom Memory Allocation</a>
</p>
<p>
Kreft, Klaus and Angelika Langer, C++ Report, June 1998
<a href="http://www.langer.camelot.de/Articles/C++Report/Allocators/Allocators.html">Allocator Types</a>
</p>
<p>
Stroustrup, Bjarne, 19.4 Allocators, The C++ Programming
Language, Special Edition, Addison Wesley, Inc. 2000
</p>
<p>
Yen, Felix, <a href="http://home.earthlink.net/~brimar/yalloc/">Yalloc: A Recycling C++ Allocator</a>
</p>
<hr />
<p>Return <a href="#top">to the top of the page</a> or
<a href="http://gcc.gnu.org/libstdc++/">to the libstdc++ homepage</a>.
</p>
<!-- ####################################################### -->
<hr />
<p class="fineprint"><em>
See <a href="../17_intro/license.html">license.html</a> for copying conditions.
Comments and suggestions are welcome, and may be sent to
<a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>.
</em></p>
</body>
</html>

View File

@ -219,7 +219,7 @@
<hr />
<h2><a name="5">Memory allocators</a></h2>
<p>The available free store (&quot;heap&quot;) management classes are
described <a href="../ext/howto.html">here</a>.
described <a href="allocator.html">here</a>.
</p>
<p>Return <a href="#top">to top of page</a> or
<a href="../faq/index.html">to the FAQ</a>.

View File

@ -126,8 +126,8 @@
<li><a href="18_support/howto.html#2">Implementation properties</a></li>
<li><a href="18_support/howto.html#3">Start and Termination</a></li>
<li><a href="18_support/howto.html#4">Verbose <code>terminate</code></a></li>
<li><a href="18_support/howto.html#6">Dynamic memory management</a></li>
<li><a href="18_support/howto.html#7">RTTI, the ABI, and demangling</a></li>
<li><a href="18_support/howto.html#5">Dynamic memory management</a></li>
<li><a href="18_support/howto.html#6">RTTI, the ABI, and demangling</a></li>
</ul>
</li>
@ -145,7 +145,7 @@
<li><a href="20_util/howto.html#2"><code>auto_ptr</code> inside container classes</a></li>
<li><a href="20_util/howto.html#3">Functors</a></li>
<li><a href="20_util/howto.html#4">Pairs</a></li>
<li><a href="20_util/howto.html#5">Memory allocators</a></li>
<li><a href="20_util/allocator.html">Allocators and allocation</a></li>
</ul>
</li>
@ -226,8 +226,7 @@
<ul>
<li><a href="ext/howto.html#1">Ropes and trees and hashes, oh my!</a></li>
<li><a href="ext/howto.html#2">Added members and types</a></li>
<li><a href="ext/howto.html#3">Allocators (versions 3.0, 3.1, 3.2, 3.3)</a></li>
<li><a href="ext/howto.html#6">Allocators (version 3.4)</a></li>
<li><a href="ext/mt_allocator.html"><code>__mt_alloc</code> </a></li>
<li><a href="ext/howto.html#4">Compile-time checks</a></li>
<li><a href="ext/howto.html#5">LWG Issues</a></li>
<li><a href="ext/../18_support/howto.html#5">Demangling</a></li>

View File

@ -48,8 +48,7 @@
<ul>
<li><a href="#1">Ropes and trees and hashes, oh my!</a></li>
<li><a href="#2">Added members and types</a></li>
<li><a href="#3">Allocators (versions 3.0, 3.1, 3.2, 3.3)</a></li>
<li><a href="#6">Allocators (version 3.4)</a></li>
<li><a href="mt_allocator.html"><code>__mt_alloc</code> </a></li>
<li><a href="#4">Compile-time checks</a></li>
<li><a href="#5">LWG Issues</a></li>
<li><a href="../18_support/howto.html#5">Demangling</a></li>
@ -140,247 +139,6 @@
<a href="../faq/index.html">to the FAQ</a>.
</p>
<hr />
<h2><a name="3">Allocators (versions 3.0, 3.1, 3.2, 3.3)</a></h2>
<p>Thread-safety, space efficiency, high speed, portability... this is a
mess. Where to begin?
</p>
<h3>The Rules</h3>
<p>The C++ standard only gives a few directives in this area:
</p>
<ul>
<li>When you add elements to a container, and the container must allocate
more memory to hold them, the container makes the request via its
<code>Allocator</code> template parameter. This includes adding
char's to the string class, which acts as a regular STL container
in this respect.
</li>
<li>The default <code>Allocator</code> of every container-of-T is
<code>std::allocator&lt;T&gt;</code>.
</li>
<li>The interface of the <code>allocator&lt;T&gt;</code> class is
extremely simple. It has about 20 public declarations (nested
typedefs, member functions, etc), but the two which concern us most
are:
<pre>
T* allocate (size_type n, const void* hint = 0);
void deallocate (T* p, size_type n);</pre>
(This is a simplicifcation; the real signatures use nested typedefs.)
The <code>&quot;n&quot;</code> arguments in both those functions is a
<em>count</em> of the number of T's to allocate space for,
<em>not their total size</em>.
</li>
<li>&quot;The storage is obtained by calling
<code>::operator new(size_t)</code>, but it is unspecified when or
how often this function is called. The use of <code>hint</code>
is unspecified, but intended as an aid to locality if an
implementation so desires.&quot; [20.4.1.1]/6
</li>
</ul>
<h3>Problems and Possibilities</h3>
<p>The easiest way of fulfilling the requirements is to call operator new
each time a container needs memory, and to call operator delete each
time the container releases memory. <strong>BUT</strong>
<a href="http://gcc.gnu.org/ml/libstdc++/2001-05/msg00105.html">this
method is horribly slow</a>.
</p>
<p>Or we can keep old memory around, and reuse it in a pool to save time.
The old libstdc++-v2 used a memory pool, and so do we. As of 3.0,
<a href="http://gcc.gnu.org/ml/libstdc++/2001-05/msg00136.html">it's
on by default</a>. The pool is shared among all the containers in the
program: when your program's std::vector&lt;int&gt; gets cut in half
and frees a bunch of its storage, that memory can be reused by the
private std::list&lt;WonkyWidget&gt; brought in from a KDE library
that you linked against. And we don't have to call operators new and
delete to pass the memory on, either, which is a speed bonus.
<strong>BUT</strong>...
</p>
<p>What about threads? No problem: in a threadsafe environment, the
memory pool is manipulated atomically, so you can grow a container in
one thread and shrink it in another, etc. <strong>BUT</strong> what
if threads in libstdc++-v3 aren't set up properly?
<a href="../faq/index.html#5_6">That's been answered already</a>.
</p>
<p><strong>BUT</strong> what if you want to use your own allocator? What
if you plan on using a runtime-loadable version of malloc() which uses
shared telepathic anonymous mmap'd sections serializable over a
network, so that memory requests <em>should</em> go through malloc?
And what if you need to debug it?
</p>
<p>Well then:
</p>
<h3>Available allocators in namespace std</h3>
<p>First I'll describe the situation as it exists for the code which
was released in GCC 3.1 and 3.2. Then I'll describe the differences
for 3.0. The allocator classes also have source documentation,
which is described <a href="../documentation.html#4">here</a> (you
will need to retrieve the maintainer-level docs, as almost none of
these entities are in the ISO standard).
</p>
<p>As a general rule of thumb, users are not allowed to use names which
begin with an underscore. This means that to be portable between
compilers, none of the following may be used in your program directly.
(If you decide to be unportable, then you're free do do what you want,
but it's not our fault if stuff breaks.) They are presented here for
information for maintainers and contributors in addition to users.
</p>
<p>These classes are always available:
</p>
<ul>
<li><code>__new_alloc</code> simply wraps <code>::operator new</code>
and <code>::operator delete</code>.
</li>
<li><code>__malloc_alloc_template&lt;int inst&gt;</code> simply wraps
<code>malloc</code> and <code>free</code>. There is also a hook
for an out-of-memory handler (for new/delete this is taken care of
elsewhere). The <code>inst</code> parameter is described below.
This class was called <code>malloc_alloc</code> in earlier versions.
</li>
<li><code>allocator&lt;T&gt;</code> has already been described; it is
The Standard Allocator for instances of T. It uses the internal
<code>__alloc</code> typedef (see below) to satisy its requests.
</li>
<li><code>__simple_alloc&lt;T,A&gt;</code> is a wrapper around another
allocator, A, which itself is an allocator for instances of T.
This is primarily used in an internal &quot;allocator traits&quot;
class which helps encapsulate the different styles of allocators.
</li>
<li><code>__debug_alloc&lt;A&gt;</code> is also a wrapper around an
arbitrary allocator A. It passes on slightly increased size
requests to A, and uses the extra memory to store size information.
When a pointer is passed to <code>deallocate()</code>, the stored
size is checked, and assert() is used to guarantee they match.
</li>
<li><code>__allocator&lt;T,A&gt;</code> is an adaptor. Many of these
allocator classes have a consistent yet non-standard interface.
Such classes can be changed to a conforming interface with this
wrapper: <code>__allocator&lt;T, __alloc&gt;</code> is thus the
same as <code>allocator&lt;T&gt;</code>.
</li>
</ul>
<p>Normally,
<code> __default_alloc_template&lt;bool thr, int inst&gt; </code>
is also available. This is the high-speed pool, called the default
node allocator. The reusable memory is shared among identical
instantiations of
this type. It calls through <code>__new_alloc</code> to obtain
new memory when its lists run out. If a client container requests a
block larger than a certain threshold size, then the pool is bypassed,
and the allocate/deallocate request is passed to
<code>__new_alloc</code> directly.
</p>
<p>Its <code>inst</code> parameter is described below. The
<code>thr</code> boolean determines whether the pool should be
manipulated atomically or not. Two typedefs are provided:
<code>__alloc</code> is defined as this node allocator with thr=true,
and therefore is threadsafe, while <code>__single_client_alloc</code>
defines thr=false, and is slightly faster but unsafe for multiple
threads.
</p>
<p>(Note that the GCC thread abstraction layer allows us to provide safe
zero-overhead stubs for the threading routines, if threads were
disabled at configuration time. In this situation,
<code>__alloc</code> should not be noticably slower than
<code>__single_client_alloc</code>.)
</p>
<p>[Another threadsafe allocator where each thread keeps its own free
list, so that no locking is needed, might be described here.]
</p>
<h3>A cannon to swat a fly:<code> __USE_MALLOC</code></h3>
<p>If you've already read <a href="../23_containers/howto.html#3">this
advice</a> but still think you remember how to use this macro from
SGI STL days. We have removed it in gcc 3.3. See next section
for the new way to get the same effect.
</p>
<h3>Globally disabling memory caching:<code> GLIBCXX_FORCE_NEW</code></h3>
<p>Starting with gcc 3.3, if you want to globally disable memory
caching within the library for the default allocator (i.e.
the one you get for all library objects when you do not specify
which one to use), merely set GLIBCXX_FORCE_NEW (at this time,
with any value) into your environment before running the
program. You will obtain a similar effect without having to
recompile your entire program and the entire library (the new
operator in gcc is a light wrapper around malloc). If your
program crashes with GLIBCXX_FORCE_NEW in the environment,
it likely means that you linked against objects built against
the older library. Code to support this extension is fully
compatible with 3.2 code if GLIBCXX_FORCE_NEW is not in the
environment. Prior to GCC 3.4, this variable was spelt
GLIBCPP_FORCE_NEW.
</p>
<h3>Writing your own allocators</h3>
<p>Depending on your application (a specific program, a generic library,
etc), allocator classes tend to be one of two styles: &quot;SGI&quot;
or &quot;standard&quot;. See the comments in stl_alloc.h for more
information on this crucial difference.
</p>
<p>At the bottom of that header is a helper type,
<code>_Alloc_traits</code>, and various specializations of it. This
allows the container classes to make possible compile-time
optimizations based on features of the allocator. You should provide
a specialization of this type for your allocator (doing so takes only
two or three statements).
</p>
<h3>Using non-default allocators</h3>
<p>You can specify different memory management schemes on a per-container
basis, by overriding the default <code>Allocator</code> template
parameter. For example, an easy
(but nonportable)
method of specifying that only malloc/free should be used instead of
the default node allocator is:
</p>
<pre>
std::list &lt;my_type, std::__malloc_alloc_template&lt;0&gt; &gt; my_malloc_based_list;</pre>
Likewise, a debugging form of whichever allocator is currently in use:
<pre>
std::deque &lt;my_type, std::__debug_alloc&lt;std::__alloc&gt; &gt; debug_deque;</pre>
<h3><code>inst</code></h3>
<p>The <code>__malloc_alloc_template</code> and
<code>__default_alloc_template</code> classes take an integer parameter,
called inst here. This number is completely unused.
</p>
<p>The point of the number is to allow multiple instantiations of the
classes without changing the semantics at all. All three of
</p>
<pre>
typedef __default_alloc_template&lt;true,0&gt; normal;
typedef __default_alloc_template&lt;true,1&gt; private;
typedef __default_alloc_template&lt;true,42&gt; also_private;</pre>
<p>behave exactly the same way. However, the memory pool for each type
(and remember that different instantiations result in different types)
remains separate.
</p>
<p>The library uses <strong>0</strong> in all its instantiations. If you
wish to keep separate free lists for a particular purpose, use a
different number.
</p>
<h3>3.0.x</h3>
<p>For 3.0.x, many of the names were incorrectly <em>not</em> prefixed
with underscores. So symbols such as &quot;std::single_client_alloc&quot;
are present. Be very careful to not depend on these names any more
than you would depend on implementation-only names.
</p>
<p>Certain macros like <code>_NOTHREADS</code> and <code>__STL_THREADS</code>
can affect the 3.0.x allocators. Do not use them. Those macros have
been completely removed for 3.1.
</p>
<p>Return <a href="#top">to top of page</a> or
<a href="../faq/index.html">to the FAQ</a>.
</p>
<hr />
<h2><a name="6">Allocators (version 3.4)</a></h2>
<p>Changes are coming...
</p>
<p>If you plan on writing your own allocators,
<a href="../documentation.html#4">source documentation</a> is
available. You'll need to get the &quot;maintainers&quot; collection
in order to see the helper classes and extra notes.
</p>
<p>Return <a href="#top">to top of page</a> or
<a href="../faq/index.html">to the FAQ</a>.
</p>
<hr />
<h2><a name="4">Compile-time checks</a></h2>
<p>Currently libstdc++-v3 uses the concept checkers from the Boost

View File

@ -0,0 +1,414 @@
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta name="AUTHOR" content="Stefan Olsson <stefan@xapa.se>" />
<meta name="KEYWORDS" content="c++, libstdc++, g++, allocator, memory" />
<meta name="DESCRIPTION" content="Allocators and allocation" />
<meta name="GENERATOR" content="emacs and ten fingers" />
<title>A fixed-size, multi-thread optimized allocator</title>
<link rel="StyleSheet" href="../lib3styles.css" type="text/css" />
<link rel="Copyright" href="../17_intro/license.html" type="text/html" />
</head>
<body>
<h1 class="centered"><a name="top">A fixed-size, multi-thread optimized allocator</a></h1>
<p class="fineprint"><em>
The latest version of this document is always available at
<a href="http://gcc.gnu.org/onlinedocs/libstdc++/ext/mt_allocator.html">
http://gcc.gnu.org/onlinedocs/libstdc++/ext/mt_allocator.html</a>.
</em></p>
<p><em>
To the <a href="http://gcc.gnu.org/libstdc++/">libstdc++-v3 homepage</a>.
</em></p>
<!-- ####################################################### -->
<hr />
<h3 class="left">
<a name="intro">Introduction</a>
</h3>
<p> The mt allocator [hereinafter referred to simply as "the
allocator"] is a fixed size (power of two) allocator that was
initially developed specifically to suit the needs of multi threaded
applications [hereinafter referred to as an MT application]. Over time
the allocator has evolved and been improved in many ways, one of the
being that it now also does a good job in single threaded applications
[hereinafter referred to as a ST application]. (Note: In this
document, when referring to single threaded applications this also
includes applications that are compiled with gcc without thread
support enabled. This is accomplished using ifdef's on __GTHREADS)
</p>
<p>
The aim of this document is to describe - from a application point of
view - the "inner workings" of the allocator.
</p>
<h3 class="left">
<a name="init">Initialization</a>
</h3>
<p>
The static variables (pointers to freelists, tuning parameters etc)
are initialized to their default values at file scope, i.e.:
</p>
<pre>
template<typename _Tp> size_t
__mt_alloc<_Tp>::_S_freelist_headroom = 10;
</pre>
<p>
The very first allocate() call will always call the _S_init() function.
In order to make sure that this function is called exactly once we make use
of a __gthread_once (with _S_once_mt and _S_init as arguments) call in MT
applications and check a static bool (_S_initialized) in ST applications.
</p>
<p>
The _S_init() function:
- If the GLIBCXX_FORCE_NEW environment variable is set, it sets the bool
_S_force_new to true and then returns. This will cause subsequent calls to
allocate() to return memory directly from a new() call, and deallocate will
only do a delete() call.
</p>
<p>
- If the GLIBCXX_FORCE_NEW environment variable is not set, both ST and MT
applications will:
- Calculate the number of bins needed. A bin is a specific power of two size
of bytes. I.e., by default the allocator will deal with requests of up to
128 bytes (or whatever the value of _S_max_bytes is when _S_init() is
called). This means that there will be bins of the following sizes
(in bytes): 1, 2, 4, 8, 16, 32, 64, 128.
- Create the _S_binmap array. All requests are rounded up to the next
"large enough" bin. I.e., a request for 29 bytes will cause a block from
the "32 byte bin" to be returned to the application. The purpose of
_S_binmap is to speed up the process of finding out which bin to use.
I.e., the value of _S_binmap[ 29 ] is initialized to 5 (bin 5 = 32 bytes).
</p>
<p>
- Create the _S_bin array. This array consists of bin_records. There will be
as many bin_records in this array as the number of bins that we calculated
earlier. I.e., if _S_max_bytes = 128 there will be 8 entries.
Each bin_record is then initialized:
- bin_record->first = An array of pointers to block_records. There will be
as many block_records pointers as there are maximum number of threads
(in a ST application there is only 1 thread, in a MT application there
are _S_max_threads).
This holds the pointer to the first free block for each thread in this
bin. I.e., if we would like to know where the first free block of size 32
for thread number 3 is we would look this up by: _S_bin[ 5 ].first[ 3 ]
- bin_record->last = See above, the only difference being that this points
to the last record on the same freelist.
The above created block_record pointers members are now initialized to
their initial values. I.e. _S_bin[ n ].first[ n ] = NULL;
</p>
<p>
- Additionally a MT application will:
- Create a list of free thread id's. The pointer to the first entry
is stored in _S_thread_freelist_first. The reason for this approach is
that the __gthread_self() call will not return a value that corresponds to
the maximum number of threads allowed but rather a process id number or
something else. So what we do is that we create a list of thread_records.
This list is _S_max_threads long and each entry holds a size_t thread_id
which is initialized to 1, 2, 3, 4, 5 and so on up to _S_max_threads.
Each time a thread calls allocate() or deallocate() we call
_S_get_thread_id() which looks at the value of _S_thread_key which is a
thread local storage pointer. If this is NULL we know that this is a newly
created thread and we pop the first entry from this list and saves the
pointer to this record in the _S_thread_key variable. The next time
we will get the pointer to the thread_record back and we use the
thread_record->thread_id as identification. I.e., the first thread that
calls allocate will get the first record in this list and thus be thread
number 1 and will then find the pointer to its first free 32 byte block
in _S_bin[ 5 ].first[ 1 ]
When we create the _S_thread_key we also define a destructor
(_S_thread_key_destr) which means that when the thread dies, this
thread_record is returned to the front of this list and the thread id
can then be reused if a new thread is created.
This list is protected by a mutex (_S_thread_freelist_mutex) which is only
locked when records are removed/added to the list.
</p>
<p>
- Initialize the free and used counters of each bin_record:
- bin_record->free = An array of size_t. This keeps track of the number
of blocks on a specific thread's freelist in each bin. I.e., if a thread
has 12 32-byte blocks on it's freelists and allocates one of these, this
counter would be decreased to 11.
- bin_record->used = An array of size_t. This keeps track of the number
of blocks currently in use of this size by this thread. I.e., if a thread
has made 678 requests (and no deallocations...) of 32-byte blocks this
counter will read 678.
The above created arrays are now initialized with their initial values.
I.e. _S_bin[ n ].free[ n ] = 0;
</p>
<p>
- Initialize the mutex of each bin_record:
The bin_record->mutex is used to protect the global freelist. This concept
of a global freelist is explained in more detail in the section
"A multi threaded example", but basically this mutex is locked whenever
a block of memory is retrieved or returned to the global freelist for this
specific bin. This only occurs when a number of blocks are grabbed from the
global list to a thread specific list or when a thread decides to return
some blocks to the global freelist.
</p>
<h3 class="left">
<a name="st_example">A single threaded example (and a primer for the multi threaded example!)</a>
</h3>
<p>
Let's start by describing how the data on a freelist is laid out in memory.
This is the first two blocks in freelist for thread id 3 in bin 3 (8 bytes):
</p>
<pre>
+----------------+
| next* ---------|--+ (_S_bin[ 3 ].first[ 3 ] points here)
| | |
| | |
| | |
+----------------+ |
| thread_id = 3 | |
| | |
| | |
| | |
+----------------+ |
| DATA | | (A pointer to here is what is returned to the
| | | the application when needed)
| | |
| | |
| | |
| | |
| | |
| | |
+----------------+ |
+----------------+ |
| next* |<-+ (If next == NULL it's the last one on the list and
| | then the _S_bin[ 3 ].last[ 3 ] pointer points to
| | here as well)
| |
+----------------+
| thread_id = 3 |
| |
| |
| |
+----------------+
| DATA |
| |
| |
| |
| |
| |
| |
| |
+----------------+
</pre>
<p>
With this in mind we simplify things a bit for a while and say that there is
only one thread (a ST application). In this case all operations are made to
what is referred to as the global pool - thread id 0 (No thread may be
assigned this id since they span from 1 to _S_max_threads in a MT application).
</p>
<p>
When the application requests memory (calling allocate()) we first look at the
requested size and if this is > _S_max_bytes we call new() directly and return.
</p>
<p>
If the requested size is within limits we start by finding out from which
bin we should serve this request by looking in _S_binmap.
</p>
<p>
A quick look at _S_bin[ bin ].first[ 0 ] tells us if there are any blocks of
this size on the freelist (0). If this is not NULL - fine, just remove the
block that _S_bin[ bin ].first[ 0 ] points to from the list,
update _S_bin[ bin ].first[ 0 ] and return a pointer to that blocks data.
</p>
<p>
If the freelist is empty (the pointer is NULL) we must get memory from the
system and build us a freelist within this memory. All requests for new memory
is made in chunks of _S_chunk_size. Knowing the size of a block_record and
the bytes that this bin stores we then calculate how many blocks we can create
within this chunk, build the list, remove the first block, update the pointers
(_S_bin[ bin ].first[ 0 ] and _S_bin[ bin ].last[ 0 ]) and return a pointer
to that blocks data.
</p>
<p>
Deallocation is equally simple; the pointer is casted back to a block_record
pointer, lookup which bin to use based on the size, add the block to the end
of the global freelist (with the next pointer set to NULL) and update the
pointers as needed (_S_bin[ bin ].first[ 0 ] and _S_bin[ bin ].last[ 0 ]).
</p>
<h3 class="left">
<a name="mt_example">A multi threaded example</a>
</h3>
<p>
In the ST example we never used the thread_id variable present in each block.
Let's start by explaining the purpose of this in a MT application.
</p>
<p>
The concept of "ownership" was introduced since many MT applications
allocate and deallocate memory to shared containers from different
threads (such as a cache shared amongst all threads). This introduces
a problem if the allocator only returns memory to the current threads
freelist (I.e., there might be one thread doing all the allocation and
thus obtaining ever more memory from the system and another thread
that is getting a longer and longer freelist - this will in the end
consume all available memory).
</p>
<p>
Each time a block is moved from the global list (where ownership is
irrelevant), to a threads freelist (or when a new freelist is built
from a chunk directly onto a threads freelist or when a deallocation
occurs on a block which was not allocated by the same thread id as the
one doing the deallocation) the thread id is set to the current one.
</p>
<p>
What's the use? Well, when a deallocation occurs we can now look at
the thread id and find out if it was allocated by another thread id
and decrease the used counter of that thread instead, thus keeping the
free and used counters correct. And keeping the free and used counters
corrects is very important since the relationship between these two
variables decides if memory should be returned to the global pool or
not when a deallocation occurs.
</p>
<p>
When the application requests memory (calling allocate()) we first
look at the requested size and if this is > _S_max_bytes we call new()
directly and return.
</p>
<p>
If the requested size is within limits we start by finding out from which
bin we should serve this request by looking in _S_binmap.
</p>
<p>
A call to _S_get_thread_id() returns the thread id for the calling thread
(and if no value has been set in _S_thread_key, a new id is assigned and
returned).
</p>
<p>
A quick look at _S_bin[ bin ].first[ thread_id ] tells us if there are
any blocks of this size on the current threads freelist. If this is
not NULL - fine, just remove the block that _S_bin[ bin ].first[
thread_id ] points to from the list, update _S_bin[ bin ].first[
thread_id ], update the free and used counters and return a pointer to
that blocks data.
</p>
<p>
If the freelist is empty (the pointer is NULL) we start by looking at
the global freelist (0). If there are blocks available on the global
freelist we lock this bins mutex and move up to block_count (the
number of blocks of this bins size that will fit into a _S_chunk_size)
or until end of list - whatever comes first - to the current threads
freelist and at the same time change the thread_id ownership and
update the counters and pointers. When the bins mutex has been
unlocked, we remove the block that _S_bin[ bin ].first[ thread_id ]
points to from the list, update _S_bin[ bin ].first[ thread_id ],
update the free and used counters, and return a pointer to that blocks
data.
</p>
<p>
The reason that the number of blocks moved to the current threads
freelist is limited to block_count is to minimize the chance that a
subsequent deallocate() call will return the excess blocks to the
global freelist (based on the _S_freelist_headroom calculation, see
below).
</p>
<p>
However if there isn't any memory on the global pool we need to get
memory from the system - this is done in exactly the same way as in a
single threaded application with one major difference; the list built
in the newly allocated memory (of _S_chunk_size size) is added to the
current threads freelist instead of to the global.
</p>
<p>
The basic process of a deallocation call is simple: always add the
block to the end of the current threads freelist and update the
counters and pointers (as described earlier with the specific check of
ownership that causes the used counter of the thread that originally
allocated the block to be decreased instead of the current threads
counter).
</p>
<p>
And here comes the free and used counters to service. Each time a
deallocation() call is made, the length of the current threads
freelist is compared to the amount memory in use by this thread.
</p>
<p>
Let's go back to the example of an application that has one thread
that does all the allocations and one that deallocates. Both these
threads use say 516 32-byte blocks that was allocated during thread
creation for example. Their used counters will both say 516 at this
point. The allocation thread now grabs 1000 32-byte blocks and puts
them in a shared container. The used counter for this thread is now
1516.
</p>
<p>
The deallocation thread now deallocates 500 of these blocks. For each
deallocation made the used counter of the allocating thread is
decreased and the freelist of the deallocation thread gets longer and
longer. But the calculation made in deallocate() will limit the length
of the freelist in the deallocation thread to _S_freelist_headroom %
of it's used counter. In this case, when the freelist (given that the
_S_freelist_headroom is at it's default value of 10%) exceeds 52
(516/10) blocks will be returned to the global pool where the
allocating thread may pick them up and reuse them.
</p>
<p>
In order to reduce lock contention (since this requires this bins
mutex to be locked) this operation is also made in chunks of blocks
(just like when chunks of blocks are moved from the global freelist to
a threads freelist mentioned above). The "formula" used can probably
be improved to further reduce the risk of blocks being "bounced back
and forth" between freelists.
</p>
<hr />
<p>Return <a href="#top">to the top of the page</a> or
<a href="http://gcc.gnu.org/libstdc++/">to the libstdc++ homepage</a>.
</p>
<!-- ####################################################### -->
<hr />
<p class="fineprint"><em>
See <a href="../17_intro/license.html">license.html</a> for copying conditions.
Comments and suggestions are welcome, and may be sent to
<a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>.
</em></p>
</body>
</html>