CUDPP
2.2
CUDA Data-Parallel Primitives Library
|
CUDPP application-level scan routines. More...
#include <cstdlib>
#include <cstdio>
#include <assert.h>
#include "cuda_util.h"
#include "cudpp.h"
#include "cudpp_util.h"
#include "cudpp_plan.h"
#include "cudpp_manager.h"
#include "kernel/segmented_scan_kernel.cuh"
#include "kernel/vector_kernel.cuh"
Functions | |
Segmented Scan Functions | |
template<typename T , class Op , bool isBackward, bool isExclusive, bool doShiftFlagsLeft> | |
void | segmentedScanArrayRecursive (T *d_out, const T *d_idata, const unsigned int *d_iflags, T **d_blockSums, unsigned int **d_blockFlags, unsigned int **d_blockIndices, int numElements, int level, bool sm12OrBetterHw) |
Perform recursive scan on arbitrary size arrays. More... | |
void | allocSegmentedScanStorage (CUDPPSegmentedScanPlan *plan) |
Allocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class. More... | |
void | freeSegmentedScanStorage (CUDPPSegmentedScanPlan *plan) |
Deallocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class. More... | |
template<typename T , bool isBackward, bool isExclusive> | |
void | cudppSegmentedScanDispatchOperator (void *d_out, const void *d_in, const unsigned int *d_iflags, int numElements, const CUDPPSegmentedScanPlan *plan) |
template<bool isBackward, bool isExclusive> | |
void | cudppSegmentedScanDispatchType (void *d_out, const void *d_in, const unsigned int *d_iflags, int numElements, const CUDPPSegmentedScanPlan *plan) |
void | cudppSegmentedScanDispatch (void *d_out, const void *d_in, const unsigned int *d_iflags, int numElements, const CUDPPSegmentedScanPlan *plan) |
Dispatch function to perform a scan (prefix sum) on an array with the specified configuration. More... | |
CUDPP application-level scan routines.
void segmentedScanArrayRecursive | ( | T * | d_out, |
const T * | d_idata, | ||
const unsigned int * | d_iflags, | ||
T ** | d_blockSums, | ||
unsigned int ** | d_blockFlags, | ||
unsigned int ** | d_blockIndices, | ||
int | numElements, | ||
int | level, | ||
bool | sm12OrBetterHw | ||
) |
Perform recursive scan on arbitrary size arrays.
This is the CPU-side workhorse function of the segmented scan engine. This function invokes the CUDA kernels which perform the segmented scan on individual blocks.
Scans of large arrays must be split (possibly recursively) into a hierarchy of block scans, where each block is scanned by a single CUDA thread block. At each recursive level of the segmentedScanArrayRecursive first invokes a kernel to scan all blocks of that level, and if the level has more than one block, it calls itself recursively. On returning from each recursive level, the total sum of each block from the level below is added to all elements of the first segment of the corresponding block in this level.
Template parameter T is the data type of the input data. Template parameter op is the binary operator of the segmented scan. Template parameter isBackward specifies whether the direction is backward (not implemented). It is forward if it is false. Template parameter isExclusive specifies whether the segmented scan is exclusive (true) or inclusive (false).
[out] | d_out | The output array for the segmented scan results |
[in] | d_idata | The input array to be scanned |
[in] | d_iflags | The input flags vector which specifies the segments. The first element of a segment is marked by a 1 in the corresponding position in d_iflags vector. All other elements of d_iflags is 0. |
[out] | d_blockSums | Array of arrays of per-block sums (one array per recursive level, allocated by allocScanStorage()) |
[out] | d_blockFlags | Array of arrays of per-block OR-reductions of flags (one array per recursive level, allocated by allocScanStorage()) |
[out] | d_blockIndices | Array of arrays of per-block min-reductions of indices (one array per recursive level, allocated by allocSegmentedScanStorage()). An index for a particular position i in a block is calculated as - if d_iflags [i] is set then it is the 1-based index of that position (i.e if d_iflags [10] is set then index is 11 ) otherwise the index is INT_MAX (the identity element of a min operator) |
[in] | numElements | The number of elements in the array to scan |
[in] | level | The current recursive level of the scan |
[in] | sm12OrBetterHw | True if running on sm_12 or higher GPU, false otherwise |
void allocSegmentedScanStorage | ( | CUDPPSegmentedScanPlan * | plan | ) |
Allocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class.
Segmented scans of large arrays must be split (possibly recursively) into a hierarchy of block segmented scans, where each block is scanned by a single CUDA thread block. At each recursive level of the scan, we need an array in which to store the total sums of all blocks in that level. Also at this level we have two more arrays - one which contains the OR-reductions of flags of all blocks at that level and the second which contains the min-reductions of indices of all blocks at that levels This function computes the amount of storage needed and allocates it.
[in] | plan | Pointer to CUDPPSegmentedScanPlan object containing segmented scan options and number of elements, which is used to compute storage requirements. |
void freeSegmentedScanStorage | ( | CUDPPSegmentedScanPlan * | plan | ) |
Deallocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class.
These arrays must have been allocated by allocSegmentedScanStorage(), which is called by the constructor of CUDPPSegmentedScanPlan.
[in] | plan | CUDPPSegmentedScanPlan class initialized by its constructor. |
void cudppSegmentedScanDispatch | ( | void * | d_out, |
const void * | d_in, | ||
const unsigned int * | d_iflags, | ||
int | numElements, | ||
const CUDPPSegmentedScanPlan * | plan | ||
) |
Dispatch function to perform a scan (prefix sum) on an array with the specified configuration.
This is the dispatch routine which calls segmentedScanArrayRecursive() with appropriate template parameters and arguments to achieve the scan as specified in plan.
[in] | numElements | The number of elements to scan |
[in] | plan | Segmented Scan configuration (plan), initialized by CUDPPSegmentedScanPlan constructor |
[in] | d_in | The input array |
[in] | d_iflags | The input flags array |
[out] | d_out | The output array of segmented scan results |