Design

Table 19.1. Profile Code Location

Code LocationUse
libstdc++-v3/include/std/*Preprocessor code to redirect to profile extension headers.
libstdc++-v3/include/profile/*Profile extension public headers (map, vector, ...).
libstdc++-v3/include/profile/impl/*Profile extension internals. Implementation files are only included from impl/profiler.h, which is the only file included from the public headers.

In order to get our instrumented library version included instead of the release one, we use the same wrapper model as the debug mode. We subclass entities from the release version. Wherever _GLIBCXX_PROFILE is defined, the release namespace is std::__norm, whereas the profile namespace is std::__profile. Using plain std translates into std::__profile.

Whenever possible, we try to wrap at the public interface level, e.g., in unordered_set rather than in hashtable, in order not to depend on implementation.

Mixing object files built with and without the profile mode must not affect the program execution. However, there are no guarantees to the accuracy of diagnostics when using even a single object not built with -D_GLIBCXX_PROFILE. Currently, mixing the profile mode with debug and parallel extensions is not allowed. Mixing them at compile time will result in preprocessor errors. Mixing them at link time is undefined.

Instead of instrumenting every public entry and exit point, we chose to add instrumentation on demand, as needed by individual diagnostics. The main reason is that some diagnostics require us to extract bits of internal state that are particular only to that diagnostic. We plan to formalize this later, after we learn more about the requirements of several diagnostics.

All the instrumentation points can be switched on and off using -D[_NO]_GLIBCXX_PROFILE_<diagnostic> options. With all the instrumentation calls off, there should be negligible overhead over the release version. This property is needed to support diagnostics based on timing of internal operations. For such diagnostics, we anticipate turning most of the instrumentation off in order to prevent profiling overhead from polluting time measurements, and thus diagnostics.

All the instrumentation on/off compile time switches live in include/profile/profiler.h.

For practical reasons, the instrumentation library processes the trace partially rather than dumping it to disk in raw form. Each event is processed when it occurs. It is usually attached a cost and it is aggregated into the database of a specific diagnostic class. The cost model is based largely on the standard performance guarantees, but in some cases we use knowledge about GCC's standard library implementation.

Information is indexed by (1) call stack and (2) instance id or address to be able to understand and summarize precise creation-use-destruction dynamic chains. Although the analysis is sensitive to dynamic instances, the reports are only sensitive to call context. Whenever a dynamic instance is destroyed, we accumulate its effect to the corresponding entry for the call stack of its constructor location.

For details, see paper presented at CGO 2009.

Final analysis takes place offline, and it is based entirely on the generated trace and debugging info in the application binary. See section Diagnostics for a list of analysis types that we plan to support.

The input to the analysis is a table indexed by profile type and call stack. The data type for each entry depends on the profile type.

While it is likely that cost models become complex as we get into more sophisticated analysis, we will try to follow a simple set of rules at the beginning.

There are two types of reports. First, if we recognize a pattern for which we have a substitute that is likely to give better performance, we print the advice and estimated performance gain. The advice is usually associated to a code position and possibly a call stack.

Second, we report performance characteristics for which we do not have a clear solution for improvement. For instance, we can point to the user the top 10 multimap locations which have the worst data locality in actual traversals. Although this does not offer a solution, it helps the user focus on the key problems and ignore the uninteresting ones.

First, we want to make sure we preserve the behavior of the release mode. You can just type "make check-profile", which builds and runs the whole test suite in profile mode.

Second, we want to test the correctness of each diagnostic. We created a profile directory in the test suite. Each diagnostic must come with at least two tests, one for false positives and one for false negatives.