diff options
author | upstream source tree <ports@midipix.org> | 2015-03-15 20:14:05 -0400 |
---|---|---|
committer | upstream source tree <ports@midipix.org> | 2015-03-15 20:14:05 -0400 |
commit | 554fd8c5195424bdbcabf5de30fdc183aba391bd (patch) | |
tree | 976dc5ab7fddf506dadce60ae936f43f58787092 /libstdc++-v3/doc/html/manual/bk01pt03ch19s07.html | |
download | cbb-gcc-4.6.4-554fd8c5195424bdbcabf5de30fdc183aba391bd.tar.bz2 cbb-gcc-4.6.4-554fd8c5195424bdbcabf5de30fdc183aba391bd.tar.xz |
obtained gcc-4.6.4.tar.bz2 from upstream website;upstream
verified gcc-4.6.4.tar.bz2.sig;
imported gcc-4.6.4 source tree from verified upstream tarball.
downloading a git-generated archive based on the 'upstream' tag
should provide you with a source tree that is binary identical
to the one extracted from the above tarball.
if you have obtained the source via the command 'git clone',
however, do note that line-endings of files in your working
directory might differ from line-endings of the respective
files in the upstream repository.
Diffstat (limited to 'libstdc++-v3/doc/html/manual/bk01pt03ch19s07.html')
-rw-r--r-- | libstdc++-v3/doc/html/manual/bk01pt03ch19s07.html | 558 |
1 files changed, 558 insertions, 0 deletions
diff --git a/libstdc++-v3/doc/html/manual/bk01pt03ch19s07.html b/libstdc++-v3/doc/html/manual/bk01pt03ch19s07.html new file mode 100644 index 000000000..8c134e75f --- /dev/null +++ b/libstdc++-v3/doc/html/manual/bk01pt03ch19s07.html @@ -0,0 +1,558 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Diagnostics</title><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"/><meta name="keywords" content=" C++ , library , profile "/><meta name="keywords" content=" ISO C++ , library "/><link rel="home" href="../spine.html" title="The GNU C++ Library"/><link rel="up" href="profile_mode.html" title="Chapter 19. Profile Mode"/><link rel="prev" href="bk01pt03ch19s06.html" title="Developer Information"/><link rel="next" href="ext_allocators.html" title="Chapter 20. Allocators"/></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Diagnostics</th></tr><tr><td align="left"><a accesskey="p" href="bk01pt03ch19s06.html">Prev</a> </td><th width="60%" align="center">Chapter 19. Profile Mode</th><td align="right"> <a accesskey="n" href="ext_allocators.html">Next</a></td></tr></table><hr/></div><div class="section" title="Diagnostics"><div class="titlepage"><div><div><h2 class="title"><a id="manual.ext.profile_mode.diagnostics"/>Diagnostics</h2></div></div></div><p> + The table below presents all the diagnostics we intend to implement. + Each diagnostic has a corresponding compile time switch + <code class="code">-D_GLIBCXX_PROFILE_<diagnostic></code>. + Groups of related diagnostics can be turned on with a single switch. + For instance, <code class="code">-D_GLIBCXX_PROFILE_LOCALITY</code> is equivalent to + <code class="code">-D_GLIBCXX_PROFILE_SOFTWARE_PREFETCH + -D_GLIBCXX_PROFILE_RBTREE_LOCALITY</code>. + </p><p> + The benefit, cost, expected frequency and accuracy of each diagnostic + was given a grade from 1 to 10, where 10 is highest. + A high benefit means that, if the diagnostic is accurate, the expected + performance improvement is high. + A high cost means that turning this diagnostic on leads to high slowdown. + A high frequency means that we expect this to occur relatively often. + A high accuracy means that the diagnostic is unlikely to be wrong. + These grades are not perfect. They are just meant to guide users with + specific needs or time budgets. + </p><div class="table"><a id="id487386"/><p class="title"><strong>Table 19.2. Profile Diagnostics</strong></p><div class="table-contents"><table summary="Profile Diagnostics" border="1"><colgroup><col style="text-align: left" class="c1"/><col style="text-align: left" class="c2"/><col style="text-align: left" class="c3"/><col style="text-align: left" class="c4"/><col style="text-align: left" class="c5"/><col style="text-align: left" class="c6"/><col style="text-align: left" class="c7"/></colgroup><thead><tr><th style="text-align: left">Group</th><th style="text-align: left">Flag</th><th style="text-align: left">Benefit</th><th style="text-align: left">Cost</th><th style="text-align: left">Freq.</th><th style="text-align: left">Implemented</th><td class="auto-generated"> </td></tr></thead><tbody><tr><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.containers" title="Containers"> + CONTAINERS</a></td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.hashtable_too_small" title="Hashtable Too Small"> + HASHTABLE_TOO_SMALL</a></td><td style="text-align: left">10</td><td style="text-align: left">1</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">yes</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.hashtable_too_large" title="Hashtable Too Large"> + HASHTABLE_TOO_LARGE</a></td><td style="text-align: left">5</td><td style="text-align: left">1</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">yes</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.inefficient_hash" title="Inefficient Hash"> + INEFFICIENT_HASH</a></td><td style="text-align: left">7</td><td style="text-align: left">3</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">yes</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.vector_too_small" title="Vector Too Small"> + VECTOR_TOO_SMALL</a></td><td style="text-align: left">8</td><td style="text-align: left">1</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">yes</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.vector_too_large" title="Vector Too Large"> + VECTOR_TOO_LARGE</a></td><td style="text-align: left">5</td><td style="text-align: left">1</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">yes</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.vector_to_hashtable" title="Vector to Hashtable"> + VECTOR_TO_HASHTABLE</a></td><td style="text-align: left">7</td><td style="text-align: left">7</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">no</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.hashtable_to_vector" title="Hashtable to Vector"> + HASHTABLE_TO_VECTOR</a></td><td style="text-align: left">7</td><td style="text-align: left">7</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">no</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.vector_to_list" title="Vector to List"> + VECTOR_TO_LIST</a></td><td style="text-align: left">8</td><td style="text-align: left">5</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">yes</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.list_to_vector" title="List to Vector"> + LIST_TO_VECTOR</a></td><td style="text-align: left">10</td><td style="text-align: left">5</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">no</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.assoc_ord_to_unord" title="Ordered to Unordered Associative Container"> + ORDERED_TO_UNORDERED</a></td><td style="text-align: left">10</td><td style="text-align: left">5</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">only map/unordered_map</td></tr><tr><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.algorithms" title="Algorithms"> + ALGORITHMS</a></td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.algorithms.sort" title="Sort Algorithm Performance"> + SORT</a></td><td style="text-align: left">7</td><td style="text-align: left">8</td><td style="text-align: left"> </td><td style="text-align: left">7</td><td style="text-align: left">no</td></tr><tr><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.locality" title="Data Locality"> + LOCALITY</a></td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.locality.sw_prefetch" title="Need Software Prefetch"> + SOFTWARE_PREFETCH</a></td><td style="text-align: left">8</td><td style="text-align: left">8</td><td style="text-align: left"> </td><td style="text-align: left">5</td><td style="text-align: left">no</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.locality.linked" title="Linked Structure Locality"> + RBTREE_LOCALITY</a></td><td style="text-align: left">4</td><td style="text-align: left">8</td><td style="text-align: left"> </td><td style="text-align: left">5</td><td style="text-align: left">no</td></tr><tr><td style="text-align: left"> </td><td style="text-align: left"><a class="link" href="bk01pt03ch19s07.html#manual.ext.profile_mode.analysis.mthread.false_share" title="False Sharing"> + FALSE_SHARING</a></td><td style="text-align: left">8</td><td style="text-align: left">10</td><td style="text-align: left"> </td><td style="text-align: left">10</td><td style="text-align: left">no</td></tr></tbody></table></div></div><br class="table-break"/><div class="section" title="Diagnostic Template"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.template"/>Diagnostic Template</h3></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_<diagnostic></code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> What problem will it diagnose? + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>. + What is the fundamental reason why this is a problem</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> + Percentage reduction in execution time. When reduction is more than + a constant factor, describe the reduction rate formula. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> + What would the advise look like?</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> + What stdlibc++ components need to be instrumented?</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + How do we decide when to issue the advice?</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + How do we measure benefits? Math goes here.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +program code +... +advice sample +</pre><p> +</p></li></ul></div></div><div class="section" title="Containers"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.containers"/>Containers</h3></div></div></div><p> +<span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_CONTAINERS</code>. +</p><div class="section" title="Hashtable Too Small"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.hashtable_too_small"/>Hashtable Too Small</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_HASHTABLE_TOO_SMALL</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect hashtables with many + rehash operations, small construction size and large destruction size. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> Rehash is very expensive. + Read content, follow chains within bucket, evaluate hash function, place at + new location in different order.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> 36%. + Code similar to example below. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> + Set initial size to N at construction site S. + </p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> + <code class="code">unordered_set, unordered_map</code> constructor, destructor, rehash. + </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + For each dynamic instance of <code class="code">unordered_[multi]set|map</code>, + record initial size and call context of the constructor. + Record size increase, if any, after each relevant operation such as insert. + Record the estimated rehash cost.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Number of individual rehash operations * cost per rehash.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 unordered_set<int> us; +2 for (int k = 0; k < 1000000; ++k) { +3 us.insert(k); +4 } + +foo.cc:1: advice: Changing initial unordered_set size from 10 to 1000000 saves 1025530 rehash operations. +</pre><p> +</p></li></ul></div></div><div class="section" title="Hashtable Too Large"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.hashtable_too_large"/>Hashtable Too Large</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_HASHTABLE_TOO_LARGE</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect hashtables which are + never filled up because fewer elements than reserved are ever + inserted. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> Save memory, which + is good in itself and may also improve memory reference performance through + fewer cache and TLB misses.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> unknown. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> + Set initial size to N at construction site S. + </p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> + <code class="code">unordered_set, unordered_map</code> constructor, destructor, rehash. + </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + For each dynamic instance of <code class="code">unordered_[multi]set|map</code>, + record initial size and call context of the constructor, and correlate it + with its size at destruction time. + </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Number of iteration operations + memory saved.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 vector<unordered_set<int>> v(100000, unordered_set<int>(100)) ; +2 for (int k = 0; k < 100000; ++k) { +3 for (int j = 0; j < 10; ++j) { +4 v[k].insert(k + j); +5 } +6 } + +foo.cc:1: advice: Changing initial unordered_set size from 100 to 10 saves N +bytes of memory and M iteration steps. +</pre><p> +</p></li></ul></div></div><div class="section" title="Inefficient Hash"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.inefficient_hash"/>Inefficient Hash</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_INEFFICIENT_HASH</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect hashtables with polarized + distribution. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> A non-uniform + distribution may lead to long chains, thus possibly increasing complexity + by a factor up to the number of elements. + </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> factor up + to container size. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Change hash function + for container built at site S. Distribution score = N. Access score = S. + Longest chain = C, in bucket B. + </p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> + <code class="code">unordered_set, unordered_map</code> constructor, destructor, [], + insert, iterator. + </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + Count the exact number of link traversals. + </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Total number of links traversed.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +class dumb_hash { + public: + size_t operator() (int i) const { return 0; } +}; +... + unordered_set<int, dumb_hash> hs; + ... + for (int i = 0; i < COUNT; ++i) { + hs.find(i); + } +</pre><p> +</p></li></ul></div></div><div class="section" title="Vector Too Small"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_too_small"/>Vector Too Small</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_VECTOR_TOO_SMALL</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span>Detect vectors with many + resize operations, small construction size and large destruction size.. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>Resizing can be expensive. + Copying large amounts of data takes time. Resizing many small vectors may + have allocation overhead and affect locality.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>%. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> + Set initial size to N at construction site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + For each dynamic instance of <code class="code">vector</code>, + record initial size and call context of the constructor. + Record size increase, if any, after each relevant operation such as + <code class="code">push_back</code>. Record the estimated resize cost. + </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Total number of words copied * time to copy a word.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 vector<int> v; +2 for (int k = 0; k < 1000000; ++k) { +3 v.push_back(k); +4 } + +foo.cc:1: advice: Changing initial vector size from 10 to 1000000 saves +copying 4000000 bytes and 20 memory allocations and deallocations. +</pre><p> +</p></li></ul></div></div><div class="section" title="Vector Too Large"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_too_large"/>Vector Too Large</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_VECTOR_TOO_LARGE</code> + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span>Detect vectors which are + never filled up because fewer elements than reserved are ever + inserted. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>Save memory, which + is good in itself and may also improve memory reference performance through + fewer cache and TLB misses.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>%. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> + Set initial size to N at construction site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + For each dynamic instance of <code class="code">vector</code>, + record initial size and call context of the constructor, and correlate it + with its size at destruction time.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Total amount of memory saved.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 vector<vector<int>> v(100000, vector<int>(100)) ; +2 for (int k = 0; k < 100000; ++k) { +3 for (int j = 0; j < 10; ++j) { +4 v[k].insert(k + j); +5 } +6 } + +foo.cc:1: advice: Changing initial vector size from 100 to 10 saves N +bytes of memory and may reduce the number of cache and TLB misses. +</pre><p> +</p></li></ul></div></div><div class="section" title="Vector to Hashtable"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_to_hashtable"/>Vector to Hashtable</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_VECTOR_TO_HASHTABLE</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect uses of + <code class="code">vector</code> that can be substituted with <code class="code">unordered_set</code> + to reduce execution time. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> + Linear search in a vector is very expensive, whereas searching in a hashtable + is very quick.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>factor up + to container size. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace + <code class="code">vector</code> with <code class="code">unordered_set</code> at site S. + </p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code> + operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + For each dynamic instance of <code class="code">vector</code>, + record call context of the constructor. Issue the advice only if the + only methods called on this <code class="code">vector</code> are <code class="code">push_back</code>, + <code class="code">insert</code> and <code class="code">find</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Cost(vector::push_back) + cost(vector::insert) + cost(find, vector) - + cost(unordered_set::insert) + cost(unordered_set::find). + </p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 vector<int> v; +... +2 for (int i = 0; i < 1000; ++i) { +3 find(v.begin(), v.end(), i); +4 } + +foo.cc:1: advice: Changing "vector" to "unordered_set" will save about 500,000 +comparisons. +</pre><p> +</p></li></ul></div></div><div class="section" title="Hashtable to Vector"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.hashtable_to_vector"/>Hashtable to Vector</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_HASHTABLE_TO_VECTOR</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect uses of + <code class="code">unordered_set</code> that can be substituted with <code class="code">vector</code> + to reduce execution time. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> + Hashtable iterator is slower than vector iterator.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>95%. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace + <code class="code">unordered_set</code> with <code class="code">vector</code> at site S. + </p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">unordered_set</code> + operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + For each dynamic instance of <code class="code">unordered_set</code>, + record call context of the constructor. Issue the advice only if the + number of <code class="code">find</code>, <code class="code">insert</code> and <code class="code">[]</code> + operations on this <code class="code">unordered_set</code> are small relative to the + number of elements, and methods <code class="code">begin</code> or <code class="code">end</code> + are invoked (suggesting iteration).</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Number of .</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 unordered_set<int> us; +... +2 int s = 0; +3 for (unordered_set<int>::iterator it = us.begin(); it != us.end(); ++it) { +4 s += *it; +5 } + +foo.cc:1: advice: Changing "unordered_set" to "vector" will save about N +indirections and may achieve better data locality. +</pre><p> +</p></li></ul></div></div><div class="section" title="Vector to List"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_to_list"/>Vector to List</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_VECTOR_TO_LIST</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where + <code class="code">vector</code> could be substituted with <code class="code">list</code> for + better performance. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> + Inserting in the middle of a vector is expensive compared to inserting in a + list. + </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>factor up to + container size. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace vector with list + at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code> + operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + For each dynamic instance of <code class="code">vector</code>, + record the call context of the constructor. Record the overhead of each + <code class="code">insert</code> operation based on current size and insert position. + Report instance with high insertion overhead. + </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + (Sum(cost(vector::method)) - Sum(cost(list::method)), for + method in [push_back, insert, erase]) + + (Cost(iterate vector) - Cost(iterate list))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 vector<int> v; +2 for (int i = 0; i < 10000; ++i) { +3 v.insert(v.begin(), i); +4 } + +foo.cc:1: advice: Changing "vector" to "list" will save about 5,000,000 +operations. +</pre><p> +</p></li></ul></div></div><div class="section" title="List to Vector"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.list_to_vector"/>List to Vector</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_LIST_TO_VECTOR</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where + <code class="code">list</code> could be substituted with <code class="code">vector</code> for + better performance. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> + Iterating through a vector is faster than through a list. + </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>64%. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace list with vector + at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code> + operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + Issue the advice if there are no <code class="code">insert</code> operations. + </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + (Sum(cost(vector::method)) - Sum(cost(list::method)), for + method in [push_back, insert, erase]) + + (Cost(iterate vector) - Cost(iterate list))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 list<int> l; +... +2 int sum = 0; +3 for (list<int>::iterator it = l.begin(); it != l.end(); ++it) { +4 sum += *it; +5 } + +foo.cc:1: advice: Changing "list" to "vector" will save about 1000000 indirect +memory references. +</pre><p> +</p></li></ul></div></div><div class="section" title="List to Forward List (Slist)"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.list_to_slist"/>List to Forward List (Slist)</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_LIST_TO_SLIST</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where + <code class="code">list</code> could be substituted with <code class="code">forward_list</code> for + better performance. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> + The memory footprint of a forward_list is smaller than that of a list. + This has beneficial effects on memory subsystem, e.g., fewer cache misses. + </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>40%. + Note that the reduction is only noticeable if the size of the forward_list + node is in fact larger than that of the list node. For memory allocators + with size classes, you will only notice an effect when the two node sizes + belong to different allocator size classes. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace list with + forward_list at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">list</code> + operations and iteration methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + Issue the advice if there are no <code class="code">backwards</code> traversals + or insertion before a given node. + </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Always true.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 list<int> l; +... +2 int sum = 0; +3 for (list<int>::iterator it = l.begin(); it != l.end(); ++it) { +4 sum += *it; +5 } + +foo.cc:1: advice: Change "list" to "forward_list". +</pre><p> +</p></li></ul></div></div><div class="section" title="Ordered to Unordered Associative Container"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.assoc_ord_to_unord"/>Ordered to Unordered Associative Container</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_ORDERED_TO_UNORDERED</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where ordered + associative containers can be replaced with unordered ones. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> + Insert and search are quicker in a hashtable than in + a red-black tree.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>52%. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> + Replace set with unordered_set at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> + <code class="code">set</code>, <code class="code">multiset</code>, <code class="code">map</code>, + <code class="code">multimap</code> methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + Issue the advice only if we are not using operator <code class="code">++</code> on any + iterator on a particular <code class="code">[multi]set|map</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + (Sum(cost(hashtable::method)) - Sum(cost(rbtree::method)), for + method in [insert, erase, find]) + + (Cost(iterate hashtable) - Cost(iterate rbtree))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 set<int> s; +2 for (int i = 0; i < 100000; ++i) { +3 s.insert(i); +4 } +5 int sum = 0; +6 for (int i = 0; i < 100000; ++i) { +7 sum += *s.find(i); +8 } +</pre><p> +</p></li></ul></div></div></div><div class="section" title="Algorithms"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.algorithms"/>Algorithms</h3></div></div></div><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_ALGORITHMS</code>. + </p><div class="section" title="Sort Algorithm Performance"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.algorithms.sort"/>Sort Algorithm Performance</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_SORT</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Give measure of sort algorithm + performance based on actual input. For instance, advise Radix Sort over + Quick Sort for a particular call context. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> + See papers: + <a class="link" href="http://portal.acm.org/citation.cfm?doid=1065944.1065981"> + A framework for adaptive algorithm selection in STAPL</a> and + <a class="link" href="http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=4228227"> + Optimizing Sorting with Machine Learning Algorithms</a>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>60%. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Change sort algorithm + at site S from X Sort to Y Sort.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> <code class="code">sort</code> + algorithm.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + Issue the advice if the cost model tells us that another sort algorithm + would do better on this input. Requires us to know what algorithm we + are using in our sort implementation in release mode.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Runtime(algo) for algo in [radix, quick, merge, ...]</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +</pre><p> +</p></li></ul></div></div></div><div class="section" title="Data Locality"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.locality"/>Data Locality</h3></div></div></div><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_LOCALITY</code>. + </p><div class="section" title="Need Software Prefetch"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.locality.sw_prefetch"/>Need Software Prefetch</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_SOFTWARE_PREFETCH</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Discover sequences of indirect + memory accesses that are not regular, thus cannot be predicted by + hardware prefetchers. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> + Indirect references are hard to predict and are very expensive when they + miss in caches.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>25%. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Insert prefetch + instruction.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Vector iterator and + access operator []. + </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + First, get cache line size and page size from system. + Then record iterator dereference sequences for which the value is a pointer. + For each sequence within a container, issue a warning if successive pointer + addresses are not within cache lines and do not form a linear pattern + (otherwise they may be prefetched by hardware). + If they also step across page boundaries, make the warning stronger. + </p><p>The same analysis applies to containers other than vector. + However, we cannot give the same advice for linked structures, such as list, + as there is no random access to the n-th element. The user may still be + able to benefit from this information, for instance by employing frays (user + level light weight threads) to hide the latency of chasing pointers. + </p><p> + This analysis is a little oversimplified. A better cost model could be + created by understanding the capability of the hardware prefetcher. + This model could be trained automatically by running a set of synthetic + cases. + </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Total distance between pointer values of successive elements in vectors + of pointers.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 int zero = 0; +2 vector<int*> v(10000000, &zero); +3 for (int k = 0; k < 10000000; ++k) { +4 v[random() % 10000000] = new int(k); +5 } +6 for (int j = 0; j < 10000000; ++j) { +7 count += (*v[j] == 0 ? 0 : 1); +8 } + +foo.cc:7: advice: Insert prefetch instruction. +</pre><p> +</p></li></ul></div></div><div class="section" title="Linked Structure Locality"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.locality.linked"/>Linked Structure Locality</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_RBTREE_LOCALITY</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Give measure of locality of + objects stored in linked structures (lists, red-black trees and hashtables) + with respect to their actual traversal patterns. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>Allocation can be tuned + to a specific traversal pattern, to result in better data locality. + See paper: + <a class="link" href="http://www.springerlink.com/content/8085744l00x72662/"> + Custom Memory Allocation for Free</a>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>30%. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> + High scatter score N for container built at site S. + Consider changing allocation sequence or choosing a structure conscious + allocator.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Methods of all + containers using linked structures.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + First, get cache line size and page size from system. + Then record the number of successive elements that are on different line + or page, for each traversal method such as <code class="code">find</code>. Give advice + only if the ratio between this number and the number of total node hops + is above a threshold.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Sum(same_cache_line(this,previous))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> + 1 set<int> s; + 2 for (int i = 0; i < 10000000; ++i) { + 3 s.insert(i); + 4 } + 5 set<int> s1, s2; + 6 for (int i = 0; i < 10000000; ++i) { + 7 s1.insert(i); + 8 s2.insert(i); + 9 } +... + // Fast, better locality. +10 for (set<int>::iterator it = s.begin(); it != s.end(); ++it) { +11 sum += *it; +12 } + // Slow, elements are further apart. +13 for (set<int>::iterator it = s1.begin(); it != s1.end(); ++it) { +14 sum += *it; +15 } + +foo.cc:5: advice: High scatter score NNN for set built here. Consider changing +the allocation sequence or switching to a structure conscious allocator. +</pre><p> +</p></li></ul></div></div></div><div class="section" title="Multithreaded Data Access"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.mthread"/>Multithreaded Data Access</h3></div></div></div><p> + The diagnostics in this group are not meant to be implemented short term. + They require compiler support to know when container elements are written + to. Instrumentation can only tell us when elements are referenced. + </p><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_MULTITHREADED</code>. + </p><div class="section" title="Data Dependence Violations at Container Level"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.mthread.ddtest"/>Data Dependence Violations at Container Level</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_DDTEST</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect container elements + that are referenced from multiple threads in the parallel region or + across parallel regions. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> + Sharing data between threads requires communication and perhaps locking, + which may be expensive. + </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>?%. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Change data + distribution or parallel algorithm.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Container access methods + and iterators. + </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + Keep a shadow for each container. Record iterator dereferences and + container member accesses. Issue advice for elements referenced by + multiple threads. + See paper: <a class="link" href="http://portal.acm.org/citation.cfm?id=207110.207148"> + The LRPD test: speculative run-time parallelization of loops with + privatization and reduction parallelization</a>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Number of accesses to elements referenced from multiple threads + </p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +</pre><p> +</p></li></ul></div></div><div class="section" title="False Sharing"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.mthread.false_share"/>False Sharing</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_FALSE_SHARING</code>. + </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect elements in the + same container which share a cache line, are written by at least one + thread, and accessed by different threads. + </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> Under these assumptions, + cache protocols require + communication to invalidate lines, which may be expensive. + </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>68%. + </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Reorganize container + or use padding to avoid false sharing.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Container access methods + and iterators. + </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> + First, get the cache line size. + For each shared container, record all the associated iterator dereferences + and member access methods with the thread id. Compare the address lists + across threads to detect references in two different threads to the same + cache line. Issue a warning only if the ratio to total references is + significant. Do the same for iterator dereference values if they are + pointers.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> + Number of accesses to same cache line from different threads. + </p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> +</p><pre class="programlisting"> +1 vector<int> v(2, 0); +2 #pragma omp parallel for shared(v, SIZE) schedule(static, 1) +3 for (i = 0; i < SIZE; ++i) { +4 v[i % 2] += i; +5 } + +OMP_NUM_THREADS=2 ./a.out +foo.cc:1: advice: Change container structure or padding to avoid false +sharing in multithreaded access at foo.cc:4. Detected N shared cache lines. +</pre><p> +</p></li></ul></div></div></div><div class="section" title="Statistics"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.statistics"/>Statistics</h3></div></div></div><p> +<span class="emphasis"><em>Switch:</em></span> + <code class="code">_GLIBCXX_PROFILE_STATISTICS</code>. +</p><p> + In some cases the cost model may not tell us anything because the costs + appear to offset the benefits. Consider the choice between a vector and + a list. When there are both inserts and iteration, an automatic advice + may not be issued. However, the programmer may still be able to make use + of this information in a different way. +</p><p> + This diagnostic will not issue any advice, but it will print statistics for + each container construction site. The statistics will contain the cost + of each operation actually performed on the container. +</p></div></div><div class="navfooter"><hr/><table width="100%" summary="Navigation footer"><tr><td align="left"><a accesskey="p" href="bk01pt03ch19s06.html">Prev</a> </td><td align="center"><a accesskey="u" href="profile_mode.html">Up</a></td><td align="right"> <a accesskey="n" href="ext_allocators.html">Next</a></td></tr><tr><td align="left" valign="top">Developer Information </td><td align="center"><a accesskey="h" href="../spine.html">Home</a></td><td align="right" valign="top"> Chapter 20. Allocators</td></tr></table></div></body></html> |