cudpp/2.2/changelog.html

CUDPP Change Log


Release 2.2

31 Aug 2014

- Release czar: Leyuan Wang. Thanks!

- Added cudppSuffixArray parallel skew algorithm for computing suffix array

- Replaced the cudppStringSort in burrowsWheelerTransform in cudppCompress

  with cudppSuffixArray to achieve better performance

- Fixed bugs in cudppMoveToFrontTransform where originally only inputs with

  value smaller than 15 work

- Fixed bugs to support cudppCompress to compress text containing all possible unsigned char values with the range of [1...255]

- Changed test files for cudppCompress and cudppMoveToFrontTransform to

  target the new BWT method

- Added -skiplargetests for MTF tests in order to avoid launch-timed-out errors

- Fixed bugs to make cudppStringSort compatible for compute capability less than 2.0

- Makefile fixes for OS X with clang compilation


Release 2.1

9 October 2013

- Release czar: Edmund Yan. Thanks!

- Added cudppCompress lossless data compression algorithms which implement

  the Move-to-Front transform, Burrows-Wheeler Transform, and

  Huffman encoding

- Added cudppMoveToFrontTransform and cudppBurrowsWheelerTransform

- Added cudppListRank parallel list ranking

- Renamed cudppSort to cudppRadixSort

- Added cudppMergeSort parallel merge sort

- Added cudppStringSort parallel string sort

- Moved source code to Github: http://cudpp.github.io

- Moved documentation pages from cudpp.h to README.md

- Added CUDPP_GENCODE_* CMake options to make CUDA target architecture compilation more flexible


Release 2.0

9 August 2011

- New thread-safe public interface -- requires creating a CUDPP instance

  with cudppCreate, and passing it to all functions

- Added cudppReduce parallel reductions

- Added cudppTridiagonal parallel tridiagonal linear system solver

- Added cudppHashTable parallel hash table data structure

- Added 64-bit type support (double, long long, and unsigned long long),

  implemented in cudppReduce, cudppScan, cudppMultiscan, cudppSegmentedScan,

  cudppCompact, cudppRadixSort, cudppTridiagonal

- Fixed various bugs in cudppSegmentedScan

- Replaced radix sort implementation with thrust::sort() due to performance

  advantages and simplicity. There are regressions in sort performance for

  smaller-sized arrays, which will be addressed in the next release.

- Reverse sorting now supported (use CUDPP_OPTION_BACKWARD option when

  creating the sort plan)

- cudppRadixSort now supports char, uchar, int, uint, float, double, longlong,

  and ulonglong keys.

- Removed all support for device emulation

- Removed all dependencies on CUTIL; removed common/ subdirectory, added

  minimal set of app utilities in apps/common/ subdirectory

- Improved coverage of cudpp_testrig


Release: 1.1.1 (Bugfix release)

29 April 2010

- Fix scan, segmented scan, and radix sort correctness on Fermi (sm_20)

  architecture GPUs(proper use of "volatile" keyword)

- Some initial small optimizations for radix sort and scan on Fermi

  (sm_20) architecture

- Fix emulation mode radix sort of very small arrays

- Fix radix sort on 64-bit OSes by using __launch_bounds__ in CUDA 3.0

- Minor efficiency improvement to radix sort test in cudpp_testrig

- Fixed incorrect identity for min operator

- Fixes for unix and Mac OS X Snow Leopard builds

- Fixes for 64-bit windows builds

- Bibliography updates

- Minor documentation fixes


Release: 1.1

1 July 2009

- Switched to pure BSD license.

- Added new radix sort implementation under cudppSort() (based on Satish et al.

  IPDPS '09 paper).  All previous sorts have been removed.

- Added cudppRand() pseudorandom number generation (based on Tzeng and Wei I3D 08

  paper).

- Added support for backward segmented scan.

- Fixed satGL example to run in a native window on OS X, rather than an X11 window.

- Removed Visual Studio 7.1 (2003) project files.  CUDA 2.1 and later dropped

  support for VS7.1.

- Miscellaneous bug fixes.

- In docs, Added list of publications that use CUDPP, including both text and

  bibtex citation format.

- In docs, Updated list of publications of algorithms includedin CUDPP.

- Miscellaneous Documentation improvements.


Release: 1.0 alpha

20 April 2008

- Implemented new public interface based on plan objects

- Changed the interface for cudppCompact so that it takes a user-defined isValid

  flags array rather than

  computing it based on the values in the input array.

- Fixed various bugs in cudppCompact, and made backwards compact (aka reverse-and-

  compact) work correctly.

- Fixed emudebug memory fault with global radix sorts of non-block-aligned sizes

  (change was in setFlagBit)

- Added segmented scan - works with add, multiply, min, max operators

- Added support for inclusive scans and segmented scans

- Scan now supports operators add, multiply, min, and max

- Replaced tree-based "Blelloch" scan algorithm with new "warp scan" algorithm.

  About 10% faster on current GPUs and far simpler code (no need for bank conflict

  avoidance code, hacky #defines in inner loop, etc.).

- Added typename keyword in front of "TypeToVector::Result" so that CUDPP compiles

  with CUDA 2.0.


Release: cudpp_gems3-2

19 November 2007

- Fixed performance regression in cudppScan introduced by CUDA 1.1.  Improves

  performance of scan in CUDA 1.1 by up to 16%.

- Scans with MAX operator now function correctly for signed floats (float MAX

  identity is now -FLT_MAX, was FLT_MIN)

- Added a 64-bit (windows XP) configuration to cudpp.vcproj (64-bit windows

  currently only minimally tested)

- Changed Makefile to add "64" to the libcudpp names in 64-bit linux.

- Added changelog


Release: cudpp_gems3-1

5 November 2007

- Initial CUDPP public beta release