CUDPP 2.0
CUDA Data-Parallel Primitives Library
|
CUDPP Change Log Release 2.0 9 August 2011 - New thread-safe public interface -- requires creating a CUDPP instance with cudppCreate, and passing it to all functions - Added cudppReduce parallel reductions - Added cudppTridiagonal parallel tridiagonal linear system solver - Added cudppHashTable parallel hash table data structure - Added 64-bit type support (double, long long, and unsigned long long), implemented in cudppReduce, cudppScan, cudppMultiscan, cudppSegmentedScan, cudppCompact, cudppRadixSort, cudppTridiagonal - Fixed various bugs in cudppSegmentedScan - Replaced radix sort implementation with thrust::sort() due to performance advantages and simplicity. There are regressions in sort performance for smaller-sized arrays, which will be addressed in the next release. - Reverse sorting now supported (use CUDPP_OPTION_BACKWARD option when creating the sort plan) - cudppRadixSort now supports char, uchar, int, uint, float, double, longlong, and ulonglong keys. - Removed all support for device emulation - Removed all dependencies on CUTIL; removed common/ subdirectory, added minimal set of app utilities in apps/common/ subdirectory - Improved coverage of cudpp_testrig Release: 1.1.1 (Bugfix release) 29 April 2010 - Fix scan, segmented scan, and radix sort correctness on Fermi (sm_20) architecture GPUs(proper use of "volatile" keyword) - Some initial small optimizations for radix sort and scan on Fermi (sm_20) architecture - Fix emulation mode radix sort of very small arrays - Fix radix sort on 64-bit OSes by using __launch_bounds__ in CUDA 3.0 - Minor efficiency improvement to radix sort test in cudpp_testrig - Fixed incorrect identity for min operator - Fixes for unix and Mac OS X Snow Leopard builds - Fixes for 64-bit windows builds - Bibliography updates - Minor documentation fixes Release: 1.1 1 July 2009 - Switched to pure BSD license. - Added new radix sort implementation under cudppSort() (based on Satish et al. IPDPS '09 paper). All previous sorts have been removed. - Added cudppRand() pseudorandom number generation (based on Tzeng and Wei I3D 08 paper). - Added support for backward segmented scan. - Fixed satGL example to run in a native window on OS X, rather than an X11 window. - Removed Visual Studio 7.1 (2003) project files. CUDA 2.1 and later dropped support for VS7.1. - Miscellaneous bug fixes. - In docs, Added list of publications that use CUDPP, including both text and bibtex citation format. - In docs, Updated list of publications of algorithms includedin CUDPP. - Miscellaneous Documentation improvements. Release: 1.0 alpha 20 April 2008 - Implemented new public interface based on plan objects - Changed the interface for cudppCompact so that it takes a user-defined isValid flags array rather than computing it based on the values in the input array. - Fixed various bugs in cudppCompact, and made backwards compact (aka reverse-and- compact) work correctly. - Fixed emudebug memory fault with global radix sorts of non-block-aligned sizes (change was in setFlagBit) - Added segmented scan - works with add, multiply, min, max operators - Added support for inclusive scans and segmented scans - Scan now supports operators add, multiply, min, and max - Replaced tree-based "Blelloch" scan algorithm with new "warp scan" algorithm. About 10% faster on current GPUs and far simpler code (no need for bank conflict avoidance code, hacky #defines in inner loop, etc.). - Added typename keyword in front of "TypeToVector::Result" so that CUDPP compiles with CUDA 2.0. Release: cudpp_gems3-2 19 November 2007 - Fixed performance regression in cudppScan introduced by CUDA 1.1. Improves performance of scan in CUDA 1.1 by up to 16%. - Scans with MAX operator now function correctly for signed floats (float MAX identity is now -FLT_MAX, was FLT_MIN) - Added a 64-bit (windows XP) configuration to cudpp.vcproj (64-bit windows currently only minimally tested) - Changed Makefile to add "64" to the libcudpp names in 64-bit linux. - Added changelog Release: cudpp_gems3-1 5 November 2007 - Initial CUDPP public beta release