CUDPP is a library of data-parallel algorithm primitives such as parallel prefix-sum (“scan”), parallel sort, and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables. CUDPP runs on processors that support CUDA.
For detailed information, see the CUDPP Documentation. A good place to start is the simpleCUDPP Example.