CUDPP application-level reduction routines. More...

#include <stdio.h>
#include "cuda_util.h"
#include "cudpp_plan.h"
#include "cudpp_util.h"
#include "kernel/reduce_kernel.cuh"

Functions
Reduce Functions
template<class T , class Oper >
void	reduceBlocks (T d_odata, const T d_idata, size_t numElements, const CUDPPReducePlan *plan)
	Per-block reduction function.
template<class Oper , class T >
void	reduceArray (T d_odata, const T d_idata, size_t numElements, const CUDPPReducePlan *plan)
	Array reduction function.
void	allocReduceStorage (CUDPPReducePlan *plan)
	Allocate intermediate arrays used by reductions.
void	freeReduceStorage (CUDPPReducePlan *plan)
	Deallocate intermediate block sums arrays in a CUDPPReducePlan object.
void	cudppReduceDispatch (void d_odata, const void d_idata, size_t numElements, const CUDPPReducePlan *plan)
	Dispatch function to perform a parallel reduction on an array with the specified configuration.

Detailed Description

CUDPP application-level reduction routines.

reduce_app.cu

Function Documentation

template<class T , class Oper >

void reduceBlocks	(	T *	d_odata,
		const T *	d_idata,
		size_t	numElements,
		const CUDPPReducePlan *	plan
	)

Per-block reduction function.

This function dispatches the appropriate reduction kernel given the size of the blocks.

Parameters:

[out]	d_odata	The output data pointer. Each block writes a single output element.
[in]	d_idata	The input data pointer.
[in]	numElements	The number of elements to be reduced.
[in]	plan	A pointer to the plan structure for the reduction.

template<class Oper , class T >

void reduceArray	(	T *	d_odata,
		const T *	d_idata,
		size_t	numElements,
		const CUDPPReducePlan *	plan
	)

Array reduction function.

Performs multi-level reduction on large arrays using reduceBlocks().

Parameters:

[out]	d_odata	The output data pointer. This is a pointer to a single element.
[in]	d_idata	The input data pointer.
[in]	numElements	The number of elements to be reduced.
[in]	plan	A pointer to the plan structure for the reduction.

void allocReduceStorage ( CUDPPReducePlan * plan )

Allocate intermediate arrays used by reductions.

Reductions of large arrays must be split into multiple blocks, where each block is reduced by a single CUDA thread block. Each block writes its partial sum to global memory where it is reduced to a single element in a second pass.

Parameters:

[in,out] plan Pointer to CUDPPReducePlan object containing options and number of elements, which is used to compute storage requirements, and within which intermediate storage is allocated.

Todo:: should this flag an error?

void freeReduceStorage ( CUDPPReducePlan * plan )

Deallocate intermediate block sums arrays in a CUDPPReducePlan object.

These arrays must have been allocated by allocScanStorage(), which is called by the constructor of cudppReducePlan().

Parameters:

[in,out] plan Pointer to CUDPPReducePlan object initialized by allocScanStorage().

void cudppReduceDispatch	(	void *	d_odata,
		const void *	d_idata,
		size_t	numElements,
		const CUDPPReducePlan *	plan
	)

Dispatch function to perform a parallel reduction on an array with the specified configuration.

This is the dispatch routine which calls reduceArray() with appropriate template parameters and arguments to achieve the scan as specified in plan.

Parameters:

[out]	d_odata	The output array of scan results
[in]	d_idata	The input array
[in]	numElements	The number of elements to scan
[in]	plan	Pointer to CUDPPReducePlan object containing reduce options and intermediate storage

Functions

Detailed Description

Function Documentation