New features:
- added configurations for MacOS-X
- optimizations for SSE 4.2 in 64-bit mode
now it is fine to use:
#define BM64OPT
#define BMSSE42OPT
This two options produce SIMD vectorized code for SSE 4.2 which uses 64-bit aware bit-counting
- new algorithm for picking random subset out of bvector<>
see sample10.cpp
Saturday, December 19, 2009
Wednesday, December 16, 2009
SSE2 vs CUDA in parallel bit stream transposition / similarity. Part 2.
We made a second attempt to work with CUDA to better understand if it looks like a suitable technology for database and data-mining acceleration.
With a very valuable help from our readers we prepared a second version of GPU benchmark and tried to analyze performance and power consumption.
CUDA vs SSE. Part 2.
Friday, December 11, 2009
Win7 vs Ubuntu 9.10 (32-bit vs 64-bit)
This benchmark is designed to illustrate platform-compiler choice for BitMagic derived computational algorithms.
Test total numbers (shorter - faster)
Test Platform: Nehalem Core i5, 2.67GHz (Turbo Boost enabled)
BitMagic Library 3.6.2 (not yet released)
Windows 7 32-bit used default 32-bit compile using MSVC8. SSE4 optimization disabled.
Ubuntu 9.10 64-bit GCC 4.4.1 with compile options ( -march=core2 -m64 -msse4 )
BitMagic 64-bit optimizations + SSE4 optimizations:
#define BMSSE42OPT
#define BM64OPT
(this combination will be supported in BitMagic Library 3.6.2).
All measurements in seconds (shorter - faster).
As you can see, 32-bit default compile using stock compiler looses in almost every test to 64-bit SSE4.2 version, tuned for the platform.
Yes, this experiment intentionally staged for Windows7+MSVC to loose (by using unfavorable compilation and optimization settings: stock x86 32-bit software suffers from register starvation, ofter ignores presence of SIMD SSE unit. Having said that such resource under-utilization often happens in real-life and it is often ignored as insignificant. From this data we can see that sometimes insignificant things can add-up to measurable values.
Test total numbers (shorter - faster)
From Blog |
Test Platform: Nehalem Core i5, 2.67GHz (Turbo Boost enabled)
BitMagic Library 3.6.2 (not yet released)
Windows 7 32-bit used default 32-bit compile using MSVC8. SSE4 optimization disabled.
Ubuntu 9.10 64-bit GCC 4.4.1 with compile options ( -march=core2 -m64 -msse4 )
BitMagic 64-bit optimizations + SSE4 optimizations:
#define BMSSE42OPT
#define BM64OPT
(this combination will be supported in BitMagic Library 3.6.2).
All measurements in seconds (shorter - faster).
From Blog |
As you can see, 32-bit default compile using stock compiler looses in almost every test to 64-bit SSE4.2 version, tuned for the platform.
Yes, this experiment intentionally staged for Windows7+MSVC to loose (by using unfavorable compilation and optimization settings: stock x86 32-bit software suffers from register starvation, ofter ignores presence of SIMD SSE unit. Having said that such resource under-utilization often happens in real-life and it is often ignored as insignificant. From this data we can see that sometimes insignificant things can add-up to measurable values.
Subscribe to:
Posts (Atom)