We all know that GPUs can be incredibly fast at some single precision floating point algorithms.
Developers of BitMagic Library Igor Tolstoy and Anatoliy Kuznetsov made an attempt to implement parallel bit-stream transposition and similarity algorithm (a lot of bit-shifting, population counting and general purpose integer logic) to better understand CUDA and GP-GPU computing in comparison with current (and future) Intel vectorization architectures (SSE2).
Benchmarking, CUDA and SSE source codes, some speculation about Larabee and implications for data-mining and large scale databases here:
http://bmagic.sourceforge.net/bmcudasse2.html
Showing posts with label SSE2 CUDA GPGPU. Show all posts
Showing posts with label SSE2 CUDA GPGPU. Show all posts
Monday, August 17, 2009
Subscribe to:
Posts (Atom)