Thursday, March 18, 2010

AMD CodeAnalyst Performance Analyzer for Windows®

Sharing some experience with AMD CodeAnalyst profiler.

This profiler from AMD is free, simple, lightweight and implemented as a Visual Studio plugin (apparently there is Linux version, but I did not try it yet). 
Default profiling mode is time-sampling. 

I took a  few screenshots:

From Blog



From Blog




From Blog


Pros and Cons:

1. Easy to use, you can start your session in a few minutes (+)
2. Hot-spot detection is easy, chart based, results are generally believable (+)
3. Profiler is lean and works fast, profiled program is not slowed down. (+)
3. AMD CodeAnalyst is free (+)
4. Different screens (chart and data) sometimes give contradicting results, which is a bit confusing. (-)
    In the screenshots above 1 and 3 indicate different proportion in hot-spot data (maybe I don't understand something or bug in visualization).
5. I failed to get call stack for hot-spots even though configuration promised to collect it. (-)
    This can be a big issue for large projects with multiple code paths.
    Similar tools like Intel Parallel Studio can show call stacks and allows easy call stack analysis.


Saturday, January 9, 2010

BitMagic library version 3.6.3 released

This release fixes critical bug in 64-bit mode (#define BM64OPT) and improves performance of bit-counting related operations.

Parallel superscalar bit-counting

While working on version 3.6.3 collective Internet wisdom brought yet another interesting bit-counting algorithm, which turns to be efficient for 64-bit systems with superscalar execution (a form of parallelizm, where linear code with data-independent instructions is actually executed in parallel).

This code below implements population count for 4 64-bit integers. (Tested with GCC 4.4.1 64-bit Linux). This becomes part of BM 3.6.3 (when SourceForge recovers from its (I hope temporary) down).

inline
unsigned bitcount64_4way(bm::id64_t x, bm::id64_t y,
                    bm::id64_t u, bm::id64_t v)
{
    const bm::id64_t m1 = 0x5555555555555555;
    const bm::id64_t m2 = 0x3333333333333333;
    const bm::id64_t m3 = 0x0F0F0F0F0F0F0F0F;
    const bm::id64_t m4 = 0x000000FF000000FF;

    x = x - ((x >> 1) & m1);
    y = y - ((y >> 1) & m1);
    u = u - ((u >> 1) & m1);
    v = v - ((v >> 1) & m1);
    x = (x & m2) + ((x >> 2) & m2);
    y = (y & m2) + ((y >> 2) & m2);
    u = (u & m2) + ((u >> 2) & m2);
    v = (v & m2) + ((v >> 2) & m2);
    x = x + y;
    u = u + v;
    x = (x & m3) + ((x >> 4) & m3);
    u = (u & m3) + ((u >> 4) & m3);
    x = x + u;
    x = x + (x >> 8);
    x = x + (x >> 16);
    x = x & m4;
    x = x + (x >> 32);
    return x & 0x000001FF;
}

SourceForge down :-(

Thursday, January 7, 2010