CUDPP  2.1
CUDA Data-Parallel Primitives Library
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
CUDPP Change Log
CUDPP Change Log
Release 2.1
9 October 2013
- Added cudppCompress lossless data compression algorithms which implement
the Move-to-Front transform, Burrows-Wheeler Transform, and
Huffman encoding
- Added cudppListRank parallel list ranking
- Renamed cudppSort to cudppRadixSort
- Added cudppMergeSort parallel merge sort
- Added cudppStringSort parallel string sort
- Moved source code to Github:
- Moved documentation pages from cudpp.h to
- Added CUDPP_GENCODE_* CMake options to make CUDA target architecture compilation more flexible
Release 2.0
9 August 2011
- New thread-safe public interface -- requires creating a CUDPP instance
with cudppCreate, and passing it to all functions
- Added cudppReduce parallel reductions
- Added cudppTridiagonal parallel tridiagonal linear system solver
- Added cudppHashTable parallel hash table data structure
- Added 64-bit type support (double, long long, and unsigned long long),
implemented in cudppReduce, cudppScan, cudppMultiscan, cudppSegmentedScan,
- Fixed various bugs in cudppSegmentedScan
- Replaced radix sort implementation with thrust::sort() due to performance
advantages and simplicity. There are regressions in sort performance for
smaller-sized arrays, which will be addressed in the next release.
- Reverse sorting now supported (use CUDPP_OPTION_BACKWARD option when
creating the sort plan)
- cudppRadixSort now supports char, uchar, int, uint, float, double, longlong,
and ulonglong keys.
- Removed all support for device emulation
- Removed all dependencies on CUTIL; removed common/ subdirectory, added
minimal set of app utilities in apps/common/ subdirectory
- Improved coverage of cudpp_testrig
Release: 1.1.1 (Bugfix release)
29 April 2010
- Fix scan, segmented scan, and radix sort correctness on Fermi (sm_20)
architecture GPUs(proper use of "volatile" keyword)
- Some initial small optimizations for radix sort and scan on Fermi
(sm_20) architecture
- Fix emulation mode radix sort of very small arrays
- Fix radix sort on 64-bit OSes by using __launch_bounds__ in CUDA 3.0
- Minor efficiency improvement to radix sort test in cudpp_testrig
- Fixed incorrect identity for min operator
- Fixes for unix and Mac OS X Snow Leopard builds
- Fixes for 64-bit windows builds
- Bibliography updates
- Minor documentation fixes
Release: 1.1
1 July 2009
- Switched to pure BSD license.
- Added new radix sort implementation under cudppSort() (based on Satish et al.
IPDPS '09 paper). All previous sorts have been removed.
- Added cudppRand() pseudorandom number generation (based on Tzeng and Wei I3D 08
- Added support for backward segmented scan.
- Fixed satGL example to run in a native window on OS X, rather than an X11 window.
- Removed Visual Studio 7.1 (2003) project files. CUDA 2.1 and later dropped
support for VS7.1.
- Miscellaneous bug fixes.
- In docs, Added list of publications that use CUDPP, including both text and
bibtex citation format.
- In docs, Updated list of publications of algorithms includedin CUDPP.
- Miscellaneous Documentation improvements.
Release: 1.0 alpha
20 April 2008
- Implemented new public interface based on plan objects
- Changed the interface for cudppCompact so that it takes a user-defined isValid
flags array rather than
computing it based on the values in the input array.
- Fixed various bugs in cudppCompact, and made backwards compact (aka reverse-and-
compact) work correctly.
- Fixed emudebug memory fault with global radix sorts of non-block-aligned sizes
(change was in setFlagBit)
- Added segmented scan - works with add, multiply, min, max operators
- Added support for inclusive scans and segmented scans
- Scan now supports operators add, multiply, min, and max
- Replaced tree-based "Blelloch" scan algorithm with new "warp scan" algorithm.
About 10% faster on current GPUs and far simpler code (no need for bank conflict
avoidance code, hacky #defines in inner loop, etc.).
- Added typename keyword in front of "TypeToVector::Result" so that CUDPP compiles
with CUDA 2.0.
Release: cudpp_gems3-2
19 November 2007
- Fixed performance regression in cudppScan introduced by CUDA 1.1. Improves
performance of scan in CUDA 1.1 by up to 16%.
- Scans with MAX operator now function correctly for signed floats (float MAX
identity is now -FLT_MAX, was FLT_MIN)
- Added a 64-bit (windows XP) configuration to cudpp.vcproj (64-bit windows
currently only minimally tested)
- Changed Makefile to add "64" to the libcudpp names in 64-bit linux.
- Added changelog
Release: cudpp_gems3-1
5 November 2007
- Initial CUDPP public beta release