CUDPP  2.3
CUDA Data-Parallel Primitives Library
reduce_app.cu File Reference

CUDPP application-level reduction routines. More...

#include <stdio.h>
#include "cuda_util.h"
#include "cudpp_plan.h"
#include "cudpp_util.h"
#include "kernel/reduce_kernel.cuh"

Functions

Reduce Functions
template<class T , class Oper >
void reduceBlocks (T *d_odata, const T *d_idata, size_t numElements, const CUDPPReducePlan *plan)
 Per-block reduction function. More...
 
template<class Oper , class T >
void reduceArray (T *d_odata, const T *d_idata, size_t numElements, const CUDPPReducePlan *plan)
 Array reduction function. More...
 
void allocReduceStorage (CUDPPReducePlan *plan)
 Allocate intermediate arrays used by reductions. More...
 
void freeReduceStorage (CUDPPReducePlan *plan)
 Deallocate intermediate block sums arrays in a CUDPPReducePlan object. More...
 
void cudppReduceDispatch (void *d_odata, const void *d_idata, size_t numElements, const CUDPPReducePlan *plan)
 Dispatch function to perform a parallel reduction on an array with the specified configuration. More...
 

Detailed Description

CUDPP application-level reduction routines.

reduce_app.cu

Function Documentation

template<class T , class Oper >
void reduceBlocks ( T *  d_odata,
const T *  d_idata,
size_t  numElements,
const CUDPPReducePlan plan 
)

Per-block reduction function.

This function dispatches the appropriate reduction kernel given the size of the blocks.

Parameters
[out]d_odataThe output data pointer. Each block writes a single output element.
[in]d_idataThe input data pointer.
[in]numElementsThe number of elements to be reduced.
[in]planA pointer to the plan structure for the reduction.
template<class Oper , class T >
void reduceArray ( T *  d_odata,
const T *  d_idata,
size_t  numElements,
const CUDPPReducePlan plan 
)

Array reduction function.

Performs multi-level reduction on large arrays using reduceBlocks().

Parameters
[out]d_odataThe output data pointer. This is a pointer to a single element.
[in]d_idataThe input data pointer.
[in]numElementsThe number of elements to be reduced.
[in]planA pointer to the plan structure for the reduction.
void allocReduceStorage ( CUDPPReducePlan plan)

Allocate intermediate arrays used by reductions.

Reductions of large arrays must be split into multiple blocks, where each block is reduced by a single CUDA thread block. Each block writes its partial sum to global memory where it is reduced to a single element in a second pass.

Parameters
[in,out]planPointer to CUDPPReducePlan object containing options and number of elements, which is used to compute storage requirements, and within which intermediate storage is allocated.
Todo:
should this flag an error?
void freeReduceStorage ( CUDPPReducePlan plan)

Deallocate intermediate block sums arrays in a CUDPPReducePlan object.

These arrays must have been allocated by allocScanStorage(), which is called by the constructor of cudppReducePlan().

Parameters
[in,out]planPointer to CUDPPReducePlan object initialized by allocScanStorage().
void cudppReduceDispatch ( void *  d_odata,
const void *  d_idata,
size_t  numElements,
const CUDPPReducePlan plan 
)

Dispatch function to perform a parallel reduction on an array with the specified configuration.

This is the dispatch routine which calls reduceArray() with appropriate template parameters and arguments to achieve the scan as specified in plan.

Parameters
[out]d_odataThe output array of scan results
[in]d_idataThe input array
[in]numElementsThe number of elements to scan
[in]planPointer to CUDPPReducePlan object containing reduce options and intermediate storage