Chapter 20. Allocators

Chapter 20. Allocators
Prev	Part III. + Extensions + +	Next

+ The mt allocator [hereinafter referred to simply as "the allocator"] + is a fixed size (power of two) allocator that was initially + developed specifically to suit the needs of multi threaded + applications [hereinafter referred to as an MT application]. Over + time the allocator has evolved and been improved in many ways, in + particular it now also does a good job in single threaded + applications [hereinafter referred to as a ST application]. (Note: + In this document, when referring to single threaded applications + this also includes applications that are compiled with gcc without + thread support enabled. This is accomplished using ifdef's on + __GTHREADS). This allocator is tunable, very flexible, and capable + of high-performance. +

+ The aim of this document is to describe - from an application point of + view - the "inner workings" of the allocator. +

Design Issues

Overview

There are three general components to the allocator: a datum +describing the characteristics of the memory pool, a policy class +containing this pool that links instantiation types to common or +individual pools, and a class inheriting from the policy class that is +the actual allocator. +

The datum describing pools characteristics is +

+  template<bool _Thread>
+    class __pool
+

This class is parametrized on thread support, and is explicitly +specialized for both multiple threads (with bool==true) +and single threads (via bool==false.) It is possible to +use a custom pool datum instead of the default class that is provided. +

There are two distinct policy classes, each of which can be used +with either type of underlying pool datum. +

+  template<bool _Thread>
+    struct __common_pool_policy
+
+  template<typename _Tp, bool _Thread>
+    struct __per_type_pool_policy
+

The first policy, __common_pool_policy, implements a +common pool. This means that allocators that are instantiated with +different types, say char and long will both +use the same pool. This is the default policy. +

The second policy, __per_type_pool_policy, implements +a separate pool for each instantiating type. Thus, char +and long will use separate pools. This allows per-type +tuning, for instance. +

Putting this all together, the actual allocator class is +

+  template<typename _Tp, typename _Poolp = __default_policy>
+    class __mt_alloc : public __mt_alloc_base<_Tp>,  _Poolp
+

This class has the interface required for standard library allocator +classes, namely member functions allocate and +deallocate, plus others. +

Implementation

Tunable Parameters

Certain allocation parameters can be modified, or tuned. There +exists a nested struct __pool_base::_Tune that contains all +these parameters, which include settings for +

Alignment
Maximum bytes before calling ::operator new directly
Minimum bytes
Size of underlying global allocations
Maximum number of supported threads
Migration of deallocations to the global free list
Shunt for global new and delete

Adjusting parameters for a given instance of an allocator can only +happen before any allocations take place, when the allocator itself is +initialized. For instance: +

+#include <ext/mt_allocator.h>
+
+struct pod
+{
+  int i;
+  int j;
+};
+
+int main()
+{
+  typedef pod value_type;
+  typedef __gnu_cxx::__mt_alloc<value_type> allocator_type;
+  typedef __gnu_cxx::__pool_base::_Tune tune_type;
+
+  tune_type t_default;
+  tune_type t_opt(16, 5120, 32, 5120, 20, 10, false);
+  tune_type t_single(16, 5120, 32, 5120, 1, 10, false);
+
+  tune_type t;
+  t = allocator_type::_M_get_options();
+  allocator_type::_M_set_options(t_opt);
+  t = allocator_type::_M_get_options();
+
+  allocator_type a;
+  allocator_type::pointer p1 = a.allocate(128);
+  allocator_type::pointer p2 = a.allocate(5128);
+
+  a.deallocate(p1, 128);
+  a.deallocate(p2, 5128);
+
+  return 0;
+}
+

++----------------+
+| next* ---------|--+  (_S_bin[ 3 ].first[ 3 ] points here)
+|                |  |
+|                |  |
+|                |  |
++----------------+  |
+| thread_id = 3  |  |
+|                |  |
+|                |  |
+|                |  |
++----------------+  |
+| DATA           |  |  (A pointer to here is what is returned to the
+|                |  |   the application when needed)
+|                |  |
+|                |  |
+|                |  |
+|                |  |
+|                |  |
+|                |  |
++----------------+  |
++----------------+  |
+| next*          |<-+  (If next == NULL it's the last one on the list)
+|                |
+|                |
+|                |
++----------------+
+| thread_id = 3  |
+|                |
+|                |
+|                |
++----------------+
+| DATA           |
+|                |
+|                |
+|                |
+|                |
+|                |
+|                |
+|                |
++----------------+
+

+With this in mind we simplify things a bit for a while and say that there is +only one thread (a ST application). In this case all operations are made to +what is referred to as the global pool - thread id 0 (No thread may be +assigned this id since they span from 1 to _S_max_threads in a MT application). +

+When the application requests memory (calling allocate()) we first look at the +requested size and if this is > _S_max_bytes we call new() directly and return. +

+If the requested size is within limits we start by finding out from which +bin we should serve this request by looking in _S_binmap. +

+A quick look at _S_bin[ bin ].first[ 0 ] tells us if there are any blocks of +this size on the freelist (0). If this is not NULL - fine, just remove the +block that _S_bin[ bin ].first[ 0 ] points to from the list, +update _S_bin[ bin ].first[ 0 ] and return a pointer to that blocks data. +

+If the freelist is empty (the pointer is NULL) we must get memory from the +system and build us a freelist within this memory. All requests for new memory +is made in chunks of _S_chunk_size. Knowing the size of a block_record and +the bytes that this bin stores we then calculate how many blocks we can create +within this chunk, build the list, remove the first block, update the pointer +(_S_bin[ bin ].first[ 0 ]) and return a pointer to that blocks data. +

+Deallocation is equally simple; the pointer is casted back to a block_record +pointer, lookup which bin to use based on the size, add the block to the front +of the global freelist and update the pointer as needed +(_S_bin[ bin ].first[ 0 ]). +

+The decision to add deallocated blocks to the front of the freelist was made +after a set of performance measurements that showed that this is roughly 10% +faster than maintaining a set of "last pointers" as well. +