CUDPP  2.3
CUDA Data-Parallel Primitives Library
segmented_scan_app.cu File Reference

CUDPP application-level scan routines. More...

#include <cstdlib>
#include <cstdio>
#include <assert.h>
#include "cuda_util.h"
#include "cudpp.h"
#include "cudpp_util.h"
#include "cudpp_plan.h"
#include "cudpp_manager.h"
#include "kernel/segmented_scan_kernel.cuh"
#include "kernel/vector_kernel.cuh"

Functions

Segmented Scan Functions
template<typename T , class Op , bool isBackward, bool isExclusive, bool doShiftFlagsLeft>
void segmentedScanArrayRecursive (T *d_out, const T *d_idata, const unsigned int *d_iflags, T **d_blockSums, unsigned int **d_blockFlags, unsigned int **d_blockIndices, int numElements, int level, bool sm12OrBetterHw)
 Perform recursive scan on arbitrary size arrays. More...
 
void allocSegmentedScanStorage (CUDPPSegmentedScanPlan *plan)
 Allocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class. More...
 
void freeSegmentedScanStorage (CUDPPSegmentedScanPlan *plan)
 Deallocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class. More...
 
template<typename T , bool isBackward, bool isExclusive>
void cudppSegmentedScanDispatchOperator (void *d_out, const void *d_in, const unsigned int *d_iflags, int numElements, const CUDPPSegmentedScanPlan *plan)
 
template<bool isBackward, bool isExclusive>
void cudppSegmentedScanDispatchType (void *d_out, const void *d_in, const unsigned int *d_iflags, int numElements, const CUDPPSegmentedScanPlan *plan)
 
void cudppSegmentedScanDispatch (void *d_out, const void *d_in, const unsigned int *d_iflags, int numElements, const CUDPPSegmentedScanPlan *plan)
 Dispatch function to perform a scan (prefix sum) on an array with the specified configuration. More...
 

Detailed Description

CUDPP application-level scan routines.

segmented_scan_app.cu

Function Documentation

template<typename T , class Op , bool isBackward, bool isExclusive, bool doShiftFlagsLeft>
void segmentedScanArrayRecursive ( T *  d_out,
const T *  d_idata,
const unsigned int *  d_iflags,
T **  d_blockSums,
unsigned int **  d_blockFlags,
unsigned int **  d_blockIndices,
int  numElements,
int  level,
bool  sm12OrBetterHw 
)

Perform recursive scan on arbitrary size arrays.

This is the CPU-side workhorse function of the segmented scan engine. This function invokes the CUDA kernels which perform the segmented scan on individual blocks.

Scans of large arrays must be split (possibly recursively) into a hierarchy of block scans, where each block is scanned by a single CUDA thread block. At each recursive level of the segmentedScanArrayRecursive first invokes a kernel to scan all blocks of that level, and if the level has more than one block, it calls itself recursively. On returning from each recursive level, the total sum of each block from the level below is added to all elements of the first segment of the corresponding block in this level.

Template parameter T is the data type of the input data. Template parameter op is the binary operator of the segmented scan. Template parameter isBackward specifies whether the direction is backward (not implemented). It is forward if it is false. Template parameter isExclusive specifies whether the segmented scan is exclusive (true) or inclusive (false).

Parameters
[out]d_outThe output array for the segmented scan results
[in]d_idataThe input array to be scanned
[in]d_iflagsThe input flags vector which specifies the segments. The first element of a segment is marked by a 1 in the corresponding position in d_iflags vector. All other elements of d_iflags is 0.
[out]d_blockSumsArray of arrays of per-block sums (one array per recursive level, allocated by allocScanStorage())
[out]d_blockFlagsArray of arrays of per-block OR-reductions of flags (one array per recursive level, allocated by allocScanStorage())
[out]d_blockIndicesArray of arrays of per-block min-reductions of indices (one array per recursive level, allocated by allocSegmentedScanStorage()). An index for a particular position i in a block is calculated as - if d_iflags[i] is set then it is the 1-based index of that position (i.e if d_iflags[10] is set then index is 11) otherwise the index is INT_MAX (the identity element of a min operator)
[in]numElementsThe number of elements in the array to scan
[in]levelThe current recursive level of the scan
[in]sm12OrBetterHwTrue if running on sm_12 or higher GPU, false otherwise
void allocSegmentedScanStorage ( CUDPPSegmentedScanPlan plan)

Allocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class.

Segmented scans of large arrays must be split (possibly recursively) into a hierarchy of block segmented scans, where each block is scanned by a single CUDA thread block. At each recursive level of the scan, we need an array in which to store the total sums of all blocks in that level. Also at this level we have two more arrays - one which contains the OR-reductions of flags of all blocks at that level and the second which contains the min-reductions of indices of all blocks at that levels This function computes the amount of storage needed and allocates it.

Parameters
[in]planPointer to CUDPPSegmentedScanPlan object containing segmented scan options and number of elements, which is used to compute storage requirements.
void freeSegmentedScanStorage ( CUDPPSegmentedScanPlan plan)

Deallocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class.

These arrays must have been allocated by allocSegmentedScanStorage(), which is called by the constructor of CUDPPSegmentedScanPlan.

Parameters
[in]planCUDPPSegmentedScanPlan class initialized by its constructor.
void cudppSegmentedScanDispatch ( void *  d_out,
const void *  d_in,
const unsigned int *  d_iflags,
int  numElements,
const CUDPPSegmentedScanPlan plan 
)

Dispatch function to perform a scan (prefix sum) on an array with the specified configuration.

This is the dispatch routine which calls segmentedScanArrayRecursive() with appropriate template parameters and arguments to achieve the scan as specified in plan.

Parameters
[in]numElementsThe number of elements to scan
[in]planSegmented Scan configuration (plan), initialized by CUDPPSegmentedScanPlan constructor
[in]d_inThe input array
[in]d_iflagsThe input flags array
[out]d_outThe output array of segmented scan results