CUDPP: CUDPP Change Log

CUDPP Change Log

Release: 1.1.1 (Bugfix release)
29 April 2010
- Fix scan, segmented scan, and radix sort correctness on Fermi (sm_20) 
  architecture GPUs(proper use of "volatile" keyword)
- Some initial small optimizations for radix sort and scan on Fermi 
  (sm_20) architecture
- Fix emulation mode radix sort of very small arrays
- Fix radix sort on 64-bit OSes by using __launch_bounds__ in CUDA 3.0
- Minor efficiency improvement to radix sort test in cudpp_testrig
- Fixed incorrect identity for min operator
- Fixes for unix and Mac OS X Snow Leopard builds
- Fixes for 64-bit windows builds
- Bibliography updates
- Minor documentation fixes

Release: 1.1
1 July 2009
- Switched to pure BSD license.
- Added new radix sort implementation under cudppSort() (based on Satish et al. 
  IPDPS '09 paper).  All previous sorts have been removed.
- Added cudppRand() pseudorandom number generation (based on Tzeng and Wei I3D 08 
  paper).
- Added support for backward segmented scan.
- Fixed satGL example to run in a native window on OS X, rather than an X11 window.
- Removed Visual Studio 7.1 (2003) project files.  CUDA 2.1 and later dropped 
  support for VS7.1.
- Miscellaneous bug fixes.
- In docs, Added list of publications that use CUDPP, including both text and 
  bibtex citation format.
- In docs, Updated list of publications of algorithms includedin CUDPP.
- Miscellaneous Documentation improvements.

Release: 1.0 alpha
20 April 2008
- Implemented new public interface based on plan objects
- Changed the interface for cudppCompact so that it takes a user-defined isValid 
  flags array rather than 
  computing it based on the values in the input array.
- Fixed various bugs in cudppCompact, and made backwards compact (aka reverse-and-
  compact) work correctly.
- Fixed emudebug memory fault with global radix sorts of non-block-aligned sizes 
  (change was in setFlagBit)
- Added segmented scan - works with add, multiply, min, max operators
- Added support for inclusive scans and segmented scans
- Scan now supports operators add, multiply, min, and max
- Replaced tree-based "Blelloch" scan algorithm with new "warp scan" algorithm.  
  About 10% faster on current GPUs and far simpler code (no need for bank conflict 
  avoidance code, hacky #defines in inner loop, etc.).
- Added typename keyword in front of "TypeToVector::Result" so that CUDPP compiles 
  with CUDA 2.0.

Release: cudpp_gems3-2
19 November 2007
- Fixed performance regression in cudppScan introduced by CUDA 1.1.  Improves 
  performance of scan in CUDA 1.1 by up to 16%.
- Scans with MAX operator now function correctly for signed floats (float MAX 
  identity is now -FLT_MAX, was FLT_MIN)
- Added a 64-bit (windows XP) configuration to cudpp.vcproj (64-bit windows 
  currently only minimally tested)
- Changed Makefile to add "64" to the libcudpp names in 64-bit linux.
- Added changelog

Release: cudpp_gems3-1
5 November 2007
- Initial CUDPP public beta release