Memory

Memory
Prev	Chapter 6. + Utilities + +	Next

+ Memory contains three general areas. First, function and operator + calls via new and delete + operator or member function calls. Second, allocation via + allocator. And finally, smart pointer and + intelligent pointer abstractions. +

Allocators

+ Memory management for Standard Library entities is encapsulated in a + class template called allocator. The + allocator abstraction is used throughout the + library in string, container classes, + algorithms, and parts of iostreams. This class, and base classes of + it, are the superset of available free store (“heap”) + management classes. +

Requirements

+ The C++ standard only gives a few directives in this area: +

+ When you add elements to a container, and the container must + allocate more memory to hold them, the container makes the + request via its Allocator template + parameter, which is usually aliased to + allocator_type. This includes adding chars + to the string class, which acts as a regular STL container in + this respect. +
+ The default Allocator argument of every + container-of-T is allocator<T>. +
+ The interface of the allocator<T> class is + extremely simple. It has about 20 public declarations (nested + typedefs, member functions, etc), but the two which concern us most + are: +
```
+	 T*    allocate   (size_type n, const void* hint = 0);
+	 void  deallocate (T* p, size_type n);
+       
```
+ The n arguments in both those + functions is a count of the number of + T's to allocate space for, not their + total size. + (This is a simplification; the real signatures use nested typedefs.) +
+ The storage is obtained by calling ::operator + new, but it is unspecified when or how + often this function is called. The use of the + hint is unspecified, but intended as an + aid to locality if an implementation so + desires. [20.4.1.1]/6 +

+ Complete details can be found in the C++ standard, look in + [20.4 Memory]. +

+ The easiest way of fulfilling the requirements is to call + operator new each time a container needs + memory, and to call operator delete each time + the container releases memory. This method may be slower + than caching the allocations and re-using previously-allocated + memory, but has the advantage of working correctly across a wide + variety of hardware and operating systems, including large + clusters. The __gnu_cxx::new_allocator + implements the simple operator new and operator delete semantics, + while __gnu_cxx::malloc_allocator + implements much the same thing, only with the C language functions + std::malloc and free. +

+ Another approach is to use intelligence within the allocator + class to cache allocations. This extra machinery can take a variety + of forms: a bitmap index, an index into an exponentially increasing + power-of-two-sized buckets, or simpler fixed-size pooling cache. + The cache is shared among all the containers in the program: when + your program's std::vector<int> gets + cut in half and frees a bunch of its storage, that memory can be + reused by the private + std::list<WonkyWidget> brought in from + a KDE library that you linked against. And operators + new and delete are not + always called to pass the memory on, either, which is a speed + bonus. Examples of allocators that use these techniques are + __gnu_cxx::bitmap_allocator, + __gnu_cxx::pool_allocator, and + __gnu_cxx::__mt_alloc. +

+ Depending on the implementation techniques used, the underlying + operating system, and compilation environment, scaling caching + allocators can be tricky. In particular, order-of-destruction and + order-of-creation for memory pools may be difficult to pin down + with certainty, which may create problems when used with plugins + or loading and unloading shared objects in memory. As such, using + caching allocators on systems that do not support + abi::__cxa_atexit is not recommended. +

Implementation

Interface Design

+ The only allocator interface that + is supported is the standard C++ interface. As such, all STL + containers have been adjusted, and all external allocators have + been modified to support this change. +

+ The class allocator just has typedef, + constructor, and rebind members. It inherits from one of the + high-speed extension allocators, covered below. Thus, all + allocation and deallocation depends on the base class. +

+ The base class that allocator is derived from + may not be user-configurable. +

Selecting Default Allocation Policy

+ It's difficult to pick an allocation strategy that will provide + maximum utility, without excessively penalizing some behavior. In + fact, it's difficult just deciding which typical actions to measure + for speed. +

+ Three synthetic benchmarks have been created that provide data + that is used to compare different C++ allocators. These tests are: +

+ Insertion. +
+ Over multiple iterations, various STL container + objects have elements inserted to some maximum amount. A variety + of allocators are tested. + Test source for sequence + and associative + containers. +
+ Insertion and erasure in a multi-threaded environment. +
+ This test shows the ability of the allocator to reclaim memory + on a per-thread basis, as well as measuring thread contention + for memory resources. + Test source + here. +
+ A threaded producer/consumer model. +
+ Test source for + sequence + and + associative + containers. +

+ The current default choice for + allocator is + __gnu_cxx::new_allocator. +

Disabling Memory Caching

+ In use, allocator may allocate and + deallocate using implementation-specified strategies and + heuristics. Because of this, every call to an allocator object's + allocate member function may not actually + call the global operator new. This situation is also duplicated + for calls to the deallocate member + function. +

+ This can be confusing. +

+ In particular, this can make debugging memory errors more + difficult, especially when using third party tools like valgrind or + debug versions of new. +

+ There are various ways to solve this problem. One would be to use + a custom allocator that just called operators + new and delete + directly, for every allocation. (See + include/ext/new_allocator.h, for instance.) + However, that option would involve changing source code to use + a non-default allocator. Another option is to force the + default allocator to remove caching and pools, and to directly + allocate with every call of allocate and + directly deallocate with every call of + deallocate, regardless of efficiency. As it + turns out, this last option is also available. +

+ To globally disable memory caching within the library for the + default allocator, merely set + GLIBCXX_FORCE_NEW (with any value) in the + system's environment before running the program. If your program + crashes with GLIBCXX_FORCE_NEW in the + environment, it likely means that you linked against objects + built against the older library (objects which might still using the + cached allocations...). +

Using a Specific Allocator

+ You can specify different memory management schemes on a + per-container basis, by overriding the default + Allocator template parameter. For example, an easy + (but non-portable) method of specifying that only malloc or free + should be used instead of the default node allocator is: +

+    std::list <int, __gnu_cxx::malloc_allocator<int> >  malloc_list;

+ Likewise, a debugging form of whichever allocator is currently in use: +

+    std::deque <int, __gnu_cxx::debug_allocator<std::allocator<int> > >  debug_deque;
+

Custom Allocators

+ Writing a portable C++ allocator would dictate that the interface + would look much like the one specified for + allocator. Additional member functions, but + not subtractions, would be permissible. +

+ Probably the best place to start would be to copy one of the + extension allocators: say a simple one like + new_allocator. +

Extension Allocators

+ Several other allocators are provided as part of this + implementation. The location of the extension allocators and their + names have changed, but in all cases, functionality is + equivalent. Starting with gcc-3.4, all extension allocators are + standard style. Before this point, SGI style was the norm. Because of + this, the number of template arguments also changed. Here's a simple + chart to track the changes. +

+ More details on each of these extension allocators follows. +

+ new_allocator +
+ Simply wraps ::operator new + and ::operator delete. +
+ malloc_allocator +
+ Simply wraps malloc and + free. There is also a hook for an + out-of-memory handler (for + new/delete this is + taken care of elsewhere). +
+ array_allocator +
+ Allows allocations of known and fixed sizes using existing + global or external storage allocated via construction of + std::tr1::array objects. By using this + allocator, fixed size containers (including + std::string) can be used without + instances calling ::operator new and + ::operator delete. This capability + allows the use of STL abstractions without runtime + complications or overhead, even in situations such as program + startup. For usage examples, please consult the testsuite. +
+ debug_allocator +
+ A wrapper around an arbitrary allocator A. It passes on + slightly increased size requests to A, and uses the extra + memory to store size information. When a pointer is passed + to deallocate(), the stored size is + checked, and assert() is used to + guarantee they match. +
+ throw_allocator +
+ Includes memory tracking and marking abilities as well as hooks for + throwing exceptions at configurable intervals (including random, + all, none). +
+ __pool_alloc +
+ A high-performance, single pool allocator. The reusable + memory is shared among identical instantiations of this type. + It calls through ::operator new to + obtain new memory when its lists run out. If a client + container requests a block larger than a certain threshold + size, then the pool is bypassed, and the allocate/deallocate + request is passed to ::operator new + directly. +
+ Older versions of this class take a boolean template + parameter, called thr, and an integer template + parameter, called inst. +
+ The inst number is used to track additional memory + pools. The point of the number is to allow multiple + instantiations of the classes without changing the semantics at + all. All three of +
```
+    typedef  __pool_alloc<true,0>    normal;
+    typedef  __pool_alloc<true,1>    private;
+    typedef  __pool_alloc<true,42>   also_private;
+   
```
+ behave exactly the same way. However, the memory pool for each type + (and remember that different instantiations result in different types) + remains separate. +
+ The library uses 0 in all its instantiations. If you + wish to keep separate free lists for a particular purpose, use a + different number. +
The thr boolean determines whether the + pool should be manipulated atomically or not. When + thr = true, the allocator + is thread-safe, while thr = + false, is slightly faster but unsafe for + multiple threads. +
+ For thread-enabled configurations, the pool is locked with a + single big lock. In some situations, this implementation detail + may result in severe performance degradation. +
+ (Note that the GCC thread abstraction layer allows us to provide + safe zero-overhead stubs for the threading routines, if threads + were disabled at configuration time.) +
+ __mt_alloc +
+ A high-performance fixed-size allocator with + exponentially-increasing allocations. It has its own + documentation, found here. +
+ bitmap_allocator +
+ A high-performance allocator that uses a bit-map to keep track + of the used and unused memory locations. It has its own + documentation, found here. +

Bibliography

+ ISO/IEC 14882:1998 Programming languages - C++ + . + isoc++_1998 + 20.4 Memory.

+ . + The Standard Librarian: What Are Allocators Good For? + . Matt Austern. + C/C++ Users Journal + .

+ . + The Hoard Memory Allocator + . Emery Berger.

+ . + Allocator Types + . Klaus Kreft. Angelika Langer. + C/C++ Users Journal + .

Yalloc: A Recycling C++ Allocator. Felix Yen.

auto_ptr

Limitations

Explaining all of the fun and delicious things that can + happen with misuse of the auto_ptr class + template (called AP here) would take some + time. Suffice it to say that the use of AP + safely in the presence of copying has some subtleties. +

+ The AP class is a really + nifty idea for a smart pointer, but it is one of the dumbest of + all the smart pointers -- and that's fine. +

+ AP is not meant to be a supersmart solution to all resource + leaks everywhere. Neither is it meant to be an effective form + of garbage collection (although it can help, a little bit). + And it can notbe used for arrays! +

+ AP is meant to prevent nasty leaks in the + presence of exceptions. That's all. This + code is AP-friendly: +

+    // Not a recommend naming scheme, but good for web-based FAQs.
+    typedef std::auto_ptr<MyClass>  APMC;
+
+    extern function_taking_MyClass_pointer (MyClass*);
+    extern some_throwable_function ();
+
+    void func (int data)
+    {
+	APMC  ap (new MyClass(data));
+
+	some_throwable_function();   // this will throw an exception
+
+	function_taking_MyClass_pointer (ap.get());
+    }
+

When an exception gets thrown, the instance of MyClass that's + been created on the heap will be delete'd as the stack is + unwound past func(). +

Changing that code as follows is not AP-friendly: +

+	APMC  ap (new MyClass[22]);
+

You will get the same problems as you would without the use + of AP: +

+	char*  array = new char[10];       // array new...
+	...
+	delete array;                      // ...but single-object delete
+

+ AP cannot tell whether the pointer you've passed at creation points + to one or many things. If it points to many things, you are about + to die. AP is trivial to write, however, so you could write your + own auto_array_ptr for that situation (in fact, this has + been done many times; check the mailing lists, Usenet, Boost, etc). +

Use in Containers

All of the containers + described in the standard library require their contained types + to have, among other things, a copy constructor like this: +

+    struct My_Type
+    {
+	My_Type (My_Type const&);
+    };
+

+ Note the const keyword; the object being copied shouldn't change. + The template class auto_ptr (called AP here) does not + meet this requirement. Creating a new AP by copying an existing + one transfers ownership of the pointed-to object, which means that + the AP being copied must change, which in turn means that the + copy ctors of AP do not take const objects. +

+ The resulting rule is simple: Never ever use a + container of auto_ptr objects. The standard says that + “undefined” behavior is the result, but it is + guaranteed to be messy. +

+ To prevent you from doing this to yourself, the + concept checks built + in to this implementation will issue an error if you try to + compile code like this: +

+    #include <vector>
+    #include <memory>
+
+    void f()
+    {
+	std::vector< std::auto_ptr<int> >   vec_ap_int;
+    }
+

+Should you try this with the checks enabled, you will see an error. +

Implementation

Class Hierarchy

+A shared_ptr<T> contains a pointer of +type T* and an object of type +__shared_count. The shared_count contains a +pointer of type _Sp_counted_base* which points to the +object that maintains the reference-counts and destroys the managed +resource. +

_Sp_counted_base<Lp>: +The base of the hierarchy is parameterized on the lock policy (see below.) +_Sp_counted_base doesn't depend on the type of pointer being managed, +it only maintains the reference counts and calls virtual functions when +the counts drop to zero. The managed object is destroyed when the last +strong reference is dropped, but the _Sp_counted_base itself must exist +until the last weak reference is dropped. +
_Sp_counted_base_impl<Ptr, Deleter, Lp>: +Inherits from _Sp_counted_base and stores a pointer of type Ptr +and a deleter of type Deleter. _Sp_deleter is +used when the user doesn't supply a custom deleter. Unlike Boost's, this +default deleter is not "checked" because GCC already issues a warning if +delete is used with an incomplete type. +This is the only derived type used by shared_ptr<Ptr> +and it is never used by shared_ptr, which uses one of +the following types, depending on how the shared_ptr is constructed. +
_Sp_counted_ptr<Ptr, Lp>: +Inherits from _Sp_counted_base and stores a pointer of type Ptr, +which is passed to delete when the last reference is dropped. +This is the simplest form and is used when there is no custom deleter or +allocator. +
_Sp_counted_deleter<Ptr, Deleter, Alloc>: +Inherits from _Sp_counted_ptr and adds support for custom deleter and +allocator. Empty Base Optimization is used for the allocator. This class +is used even when the user only provides a custom deleter, in which case +allocator is used as the allocator. +
_Sp_counted_ptr_inplace<Tp, Alloc, Lp>: +Used by allocate_shared and make_shared. +Contains aligned storage to hold an object of type Tp, +which is constructed in-place with placement new. +Has a variadic template constructor allowing any number of arguments to +be forwarded to Tp's constructor. +Unlike the other _Sp_counted_* classes, this one is parameterized on the +type of object, not the type of pointer; this is purely a convenience +that simplifies the implementation slightly. +

Thread Safety

+C++0x-only features are: rvalue-ref/move support, allocator support, +aliasing constructor, make_shared & allocate_shared. Additionally, +the constructors taking auto_ptr parameters are +deprecated in C++0x mode. +

+The +Thread +Safety section of the Boost shared_ptr documentation says "shared_ptr +objects offer the same level of thread safety as built-in types." +The implementation must ensure that concurrent updates to separate shared_ptr +instances are correct even when those instances share a reference count e.g. +

+shared_ptr<A> a(new A);
+shared_ptr<A> b(a);
+
+// Thread 1     // Thread 2
+   a.reset();      b.reset();
+

+The dynamically-allocated object must be destroyed by exactly one of the +threads. Weak references make things even more interesting. +The shared state used to implement shared_ptr must be transparent to the +user and invariants must be preserved at all times. +The key pieces of shared state are the strong and weak reference counts. +Updates to these need to be atomic and visible to all threads to ensure +correct cleanup of the managed resource (which is, after all, shared_ptr's +job!) +On multi-processor systems memory synchronisation may be needed so that +reference-count updates and the destruction of the managed resource are +race-free. +

+The function _Sp_counted_base::_M_add_ref_lock(), called when +obtaining a shared_ptr from a weak_ptr, has to test if the managed +resource still exists and either increment the reference count or throw +bad_weak_ptr. +In a multi-threaded program there is a potential race condition if the last +reference is dropped (and the managed resource destroyed) between testing +the reference count and incrementing it, which could result in a shared_ptr +pointing to invalid memory. +

+The Boost shared_ptr (as used in GCC) features a clever lock-free +algorithm to avoid the race condition, but this relies on the +processor supporting an atomic Compare-And-Swap +instruction. For other platforms there are fall-backs using mutex +locks. Boost (as of version 1.35) includes several different +implementations and the preprocessor selects one based on the +compiler, standard library, platform etc. For the version of +shared_ptr in libstdc++ the compiler and library are fixed, which +makes things much simpler: we have an atomic CAS or we don't, see Lock +Policy below for details. +

Selecting Lock Policy

+There is a single _Sp_counted_base class, +which is a template parameterized on the enum +__gnu_cxx::_Lock_policy. The entire family of classes is +parameterized on the lock policy, right up to +__shared_ptr, __weak_ptr and +__enable_shared_from_this. The actual +std::shared_ptr class inherits from +__shared_ptr with the lock policy parameter +selected automatically based on the thread model and platform that +libstdc++ is configured for, so that the best available template +specialization will be used. This design is necessary because it would +not be conforming for shared_ptr to have an +extra template parameter, even if it had a default value. The +available policies are: +

+ _S_Atomic +
+Selected when GCC supports a builtin atomic compare-and-swap operation +on the target processor (see Atomic +Builtins.) The reference counts are maintained using a lock-free +algorithm and GCC's atomic builtins, which provide the required memory +synchronisation. +
+ _S_Mutex +
+The _Sp_counted_base specialization for this policy contains a mutex, +which is locked in add_ref_lock(). This policy is used when GCC's atomic +builtins aren't available so explicit memory barriers are needed in places. +
+ _S_Single +
+This policy uses a non-reentrant add_ref_lock() with no locking. It is +used when libstdc++ is built without --enable-threads. +

+ For all three policies, reference count increments and + decrements are done via the functions in + ext/atomicity.h, which detect if the program + is multi-threaded. If only one thread of execution exists in + the program then less expensive non-atomic operations are used. +