Compact Functions
void	calculatCompactLaunchParams (const unsigned int numElements, unsigned int &numThreads, unsigned int &numBlocks, unsigned int &numEltsPerBlock)
	Calculate launch parameters for compactArray().
template<class T >
void	compactArray (T d_out, size_t d_numValidElements, const T d_in, const unsigned int d_isValid, size_t numElements, const CUDPPCompactPlan *plan)
	Compact the non-zero elements of an array.
void	allocCompactStorage (CUDPPCompactPlan *plan)
	Allocate intermediate arrays used by cudppCompact().
void	freeCompactStorage (CUDPPCompactPlan *plan)
	Deallocate intermediate storage used by cudppCompact().
void	cudppCompactDispatch (void d_out, size_t d_numValidElements, const void d_in, const unsigned int d_isValid, size_t numElements, const CUDPPCompactPlan *plan)
	Dispatch compactArray for the specified datatype.
RadixSort Functions
typedef unsigned int	uint
template<uint nbits, uint startbit, bool flip, bool unflip>
void	radixSortStep (uint keys, uint values, const CUDPPRadixSortPlan *plan, uint numElements)
	Perform one step of the radix sort. Sorts by nbits key bits per step, starting at startbit.
template<bool flip>
void	radixSortSingleBlock (uint keys, uint values, uint numElements)
	Single-block optimization for sorts of fewer than 4 * CTA_SIZE elements.
void	radixSort (uint keys, uint values, const CUDPPRadixSortPlan *plan, size_t numElements, bool flipBits, int keyBits)
	Main radix sort function.
void	radixSortFloatKeys (float keys, uint values, const CUDPPRadixSortPlan *plan, size_t numElements, bool negativeKeys, int keyBits)
	Wrapper to call main radix sort function. For float configuration.
template<uint nbits, uint startbit, bool flip, bool unflip>
void	radixSortStepKeysOnly (uint keys, const CUDPPRadixSortPlan plan, uint numElements)
	Perform one step of the radix sort. Sorts by nbits key bits per step, starting at startbit.
template<bool flip>
void	radixSortSingleBlockKeysOnly (uint *keys, uint numElements)
	Optimization for sorts of fewer than 4 * CTA_SIZE elements (keys only).
void	radixSortKeysOnly (uint keys, const CUDPPRadixSortPlan plan, bool flipBits, size_t numElements, int keyBits)
	Main radix sort function. For keys only configuration.
void	radixSortFloatKeysOnly (float keys, const CUDPPRadixSortPlan plan, bool negativeKeys, size_t numElements, int keyBits)
	Wrapper to call main radix sort function. For floats and keys only.
void	initDeviceParameters (CUDPPRadixSortPlan *plan)
void	allocRadixSortStorage (CUDPPRadixSortPlan *plan)
	From the programmer-specified sort configuration, creates internal memory for performing the sort.
void	freeRadixSortStorage (CUDPPRadixSortPlan *plan)
	Deallocates intermediate memory from allocRadixSortStorage.
void	cudppRadixSortDispatch (void keys, void values, size_t numElements, int keyBits, const CUDPPRadixSortPlan *plan)
	Dispatch function to perform a sort on an array with a specified configuration.
Scan Functions
template<class T , bool isBackward, bool isExclusive, CUDPPOperator op>
void	scanArrayRecursive (T d_out, const T d_in, T *d_blockSums, size_t numElements, size_t numRows, const size_t rowPitches, int level)
	Perform recursive scan on arbitrary size arrays.
void	allocScanStorage (CUDPPScanPlan *plan)
	Allocate intermediate arrays used by scan.
void	freeScanStorage (CUDPPScanPlan *plan)
	Deallocate intermediate block sums arrays in a CUDPPScanPlan object.
void	cudppScanDispatch (void d_out, const void d_in, size_t numElements, size_t numRows, const CUDPPScanPlan *plan)
	Dispatch function to perform a scan (prefix sum) on an array with the specified configuration.
Segmented Scan Functions
template<class T , CUDPPOperator op, bool isBackward, bool isExclusive, bool doShiftFlagsLeft>
void	segmentedScanArrayRecursive (T d_out, const T d_idata, const unsigned int d_iflags, T d_blockSums, unsigned int d_blockFlags, unsigned int *d_blockIndices, int numElements, int level)
	Perform recursive scan on arbitrary size arrays.
void	allocSegmentedScanStorage (CUDPPSegmentedScanPlan *plan)
	Allocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class.
void	freeSegmentedScanStorage (CUDPPSegmentedScanPlan *plan)
	Deallocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class.
void	cudppSegmentedScanDispatch (void d_out, const void d_idata, const unsigned int d_iflags, int numElements, const CUDPPSegmentedScanPlan plan)
	Dispatch function to perform a scan (prefix sum) on an array with the specified configuration.
Sparse Matrix-Vector Multiply Functions
template<class T >
void	sparseMatrixVectorMultiply (T d_y, const T d_x, const CUDPPSparseMatrixVectorMultiplyPlan *plan)
	Perform matrix-vector multiply for sparse matrices and vectors of arbitrary size.
void	allocSparseMatrixVectorMultiplyStorage (CUDPPSparseMatrixVectorMultiplyPlan plan, const void A, const unsigned int rowindx, const unsigned int indx)
	Allocate intermediate product, flags and rowFindx (index of the last element of each row) array .
void	freeSparseMatrixVectorMultiplyStorage (CUDPPSparseMatrixVectorMultiplyPlan *plan)
	Deallocate intermediate product, flags and rowFindx (index of the last element of each row) array .
void	cudppSparseMatrixVectorMultiplyDispatch (void d_y, const void d_x, const CUDPPSparseMatrixVectorMultiplyPlan *plan)
	Dispatch function to perform a sparse matrix-vector multiply with the specified configuration.

Detailed Description

The CUDPP Application-Level API contains functions that run on the host CPU and invoke GPU routines in the CUDPP Kernel-Level API. Application-Level API functions are used by CUDPP Public Interface functions to implement CUDPP's core functionality.

Function Documentation

void calculatCompactLaunchParams	(	const unsigned int	numElements,
		unsigned int &	numThreads,
		unsigned int &	numBlocks,
		unsigned int &	numEltsPerBlock
	)

Calculate launch parameters for compactArray().

Calculates the block size and number of blocks from the total number of elements and the maximum threads per block. Called by compactArray().

The calculation is pretty straightforward - the number of blocks is calculated by dividing the number of input elements by the product of the number of threads in each CTA and the number of elements each thread will process. numThreads and numEltsPerBlock are also simple to calculate. Please note that in cases where numElements is not an exact multiple of SCAN_ELTS_PER_THREAD * CTA_SIZE we would have threads which do nothing or have a thread which will process less than SCAN_ELTS_PER_THREAD elements.

Parameters:

[in]	numElements	Number of elements to sort
[out]	numThreads	Number of threads in each block
[out]	numBlocks	Number of blocks
[out]	numEltsPerBlock	Number of elements processed per block

template<class T >

void compactArray	(	T *	d_out,
		size_t *	d_numValidElements,
		const T *	d_in,
		const unsigned int *	d_isValid,
		size_t	numElements,
		const CUDPPCompactPlan *	plan
	)

Compact the non-zero elements of an array.

Given an input array d_in, compactArray() outputs a compacted version which does not have null (zero) elements. Also ouputs the number of non-zero elements in the compacted array. Called by cudppCompactDispatch().

The algorithm is straightforward, involving two steps (most of the complexity is hidden in scan, invoked with cudppScanDispatch() ).

scanArray() performs a prefix sum on d_isValid to compute output indices.
compactData() takes d_in and an intermediate array of output indices as input and writes the values with valid flags in d_isValid into d_out using the output indices.

Parameters:

[out]	d_out	Array of compacted non-null elements
[out]	d_numValidElements	Pointer to unsigned int to store number of non-null elements
[in]	d_in	Input array
[out]	d_isValid	Array of flags, 1 for each non-null element, 0 for each null element. Same length as d_in
[in]	numElements	Number of elements in input array
[in]	plan	Pointer to the plan object used for this compact

void allocCompactStorage ( CUDPPCompactPlan * plan )

Allocate intermediate arrays used by cudppCompact().

In addition to the internal CUDPPScanPlan contained in CUDPPCompactPlan, CUDPPCompact also needs a temporary device array of output indices, which is allocated by this function.

Parameters:

plan	Pointer to CUDPPCompactPlan object within which intermediate storage is allocated.

void freeCompactStorage ( CUDPPCompactPlan * plan )

Deallocate intermediate storage used by cudppCompact().

Deallocates the output indices array allocated by allocCompactStorage().

Parameters:

plan	Pointer to CUDPPCompactPlan object initialized by allocCompactStorage().

void cudppCompactDispatch	(	void *	d_out,
		size_t *	d_numValidElements,
		const void *	d_in,
		const unsigned int *	d_isValid,
		size_t	numElements,
		const CUDPPCompactPlan *	plan
	)

Dispatch compactArray for the specified datatype.

A thin wrapper on top of compactArray which calls compactArray() for the data type specified in config. This is the app-level interface to compact used by cudppCompact().

Parameters:

[out]	d_out	Compacted array of non-zero elements
[out]	d_numValidElements	Pointer to an unsigned int to store the number of non-zero elements
[in]	d_in	Input array
[in]	d_isValid	Array of boolean valid flags with same length as d_in
[in]	numElements	Number of elements to compact
[in]	plan	Pointer to plan object for this compact

template<uint nbits, uint startbit, bool flip, bool unflip>

void radixSortStep	(	uint *	keys,
		uint *	values,
		const CUDPPRadixSortPlan *	plan,
		uint	numElements
	)

Perform one step of the radix sort. Sorts by nbits key bits per step, starting at startbit.

Uses cudppScanDispatch() for the prefix sum of radix counters.

Parameters:

[in,out]	keys	Keys to be sorted.
[in,out]	values	Associated values to be sorted (through keys).
[in]	plan	Configuration information for RadixSort.
[in]	numElements	Number of elements in the sort.

template<bool flip>

void radixSortSingleBlock	(	uint *	keys,
		uint *	values,
		uint	numElements
	)

Single-block optimization for sorts of fewer than 4 * CTA_SIZE elements.

Parameters:

[in,out]	keys	Keys to be sorted.
[in,out]	values	Associated values to be sorted (through keys).
	numElements	Number of elements in the sort.

void radixSort	(	uint *	keys,
		uint *	values,
		const CUDPPRadixSortPlan *	plan,
		size_t	numElements,
		bool	flipBits,
		int	keyBits
	)

Main radix sort function.

Main radix sort function. Sorts in place in the keys and values arrays, but uses the other device arrays as temporary storage. All pointer parameters are device pointers. Uses cudppScan() for the prefix sum of radix counters.

Parameters:

[in,out]	keys	Keys to be sorted.
[in,out]	values	Associated values to be sorted (through keys).
[in]	plan	Configuration information for RadixSort.
[in]	numElements	Number of elements in the sort.
[in]	flipBits	Is set true if key datatype is a float (neg. numbers) for special float sorting operations.
[in]	keyBits	Number of interesting bits in the key

void radixSortFloatKeys	(	float *	keys,
		uint *	values,
		const CUDPPRadixSortPlan *	plan,
		size_t	numElements,
		bool	negativeKeys,
		int	keyBits
	)

Wrapper to call main radix sort function. For float configuration.

Calls the main radix sort function. For float configuration.

Parameters:

[in,out]	keys	Keys to be sorted.
[in,out]	values	Associated values to be sorted (through keys).
[in]	plan	Configuration information for RadixSort.
[in]	numElements	Number of elements in the sort.
[in]	negativeKeys	Is set true if key datatype has neg. numbers.
[in]	keyBits	Number of interesting bits in the key

template<uint nbits, uint startbit, bool flip, bool unflip>

void radixSortStepKeysOnly	(	uint *	keys,
		const CUDPPRadixSortPlan *	plan,
		uint	numElements
	)

Perform one step of the radix sort. Sorts by nbits key bits per step, starting at startbit.

Parameters:

[in,out]	keys	Keys to be sorted.
[in]	plan	Configuration information for RadixSort.
[in]	numElements	Number of elements in the sort.

template<bool flip>

void radixSortSingleBlockKeysOnly	(	uint *	keys,
		uint	numElements
	)

Optimization for sorts of fewer than 4 * CTA_SIZE elements (keys only).

Parameters:

[in,out]	keys	Keys to be sorted.
	numElements	Number of elements in the sort.

void radixSortKeysOnly	(	uint *	keys,
		const CUDPPRadixSortPlan *	plan,
		bool	flipBits,
		size_t	numElements,
		int	keyBits
	)

Main radix sort function. For keys only configuration.

Main radix sort function. Sorts in place in the keys array, but uses the other device arrays as temporary storage. All pointer parameters are device pointers. Uses scan for the prefix sum of radix counters.

Parameters:

[in,out]	keys	Keys to be sorted.
[in]	plan	Configuration information for RadixSort.
[in]	flipBits	Is set true if key datatype is a float (neg. numbers) for special float sorting operations.
[in]	numElements	Number of elements in the sort.
[in]	keyBits	Number of interesting bits in the key

void radixSortFloatKeysOnly	(	float *	keys,
		const CUDPPRadixSortPlan *	plan,
		bool	negativeKeys,
		size_t	numElements,
		int	keyBits
	)

Wrapper to call main radix sort function. For floats and keys only.

Calls the radixSortKeysOnly function setting parameters for floats.

Parameters:

[in,out]	keys	Keys to be sorted.
[in]	plan	Configuration information for RadixSort.
[in]	negativeKeys	Is set true if key flipBits is to be true in radixSortKeysOnly().
[in]	numElements	Number of elements in the sort.
[in]	keyBits	Number of interesting bits in the key

void allocRadixSortStorage ( CUDPPRadixSortPlan * plan )

From the programmer-specified sort configuration, creates internal memory for performing the sort.

Parameters:

[in] plan Pointer to CUDPPRadixSortPlan object

void freeRadixSortStorage ( CUDPPRadixSortPlan * plan )

Deallocates intermediate memory from allocRadixSortStorage.

Parameters:

[in] plan Pointer to CUDPPRadixSortPlan object

void cudppRadixSortDispatch	(	void *	keys,
		void *	values,
		size_t	numElements,
		int	keyBits,
		const CUDPPRadixSortPlan *	plan
	)

Dispatch function to perform a sort on an array with a specified configuration.

This is the dispatch routine which calls radixSort...() with appropriate template parameters and arguments as specified by the plan.

Parameters:

[in,out]	keys	Keys to be sorted.
[in,out]	values	Associated values to be sorted (through keys).
[in]	numElements	Number of elements in the sort.
[in]	keyBits	Number of interesting bits in the key*
[in]	plan	Configuration information for RadixSort.

template<class T , bool isBackward, bool isExclusive, CUDPPOperator op>

void scanArrayRecursive	(	T *	d_out,
		const T *	d_in,
		T **	d_blockSums,
		size_t	numElements,
		size_t	numRows,
		const size_t *	rowPitches,
		int	level
	)

Perform recursive scan on arbitrary size arrays.

This is the CPU-side workhorse function of the scan engine. This function invokes the CUDA kernels which perform the scan on individual blocks.

Scans of large arrays must be split (possibly recursively) into a hierarchy of block scans, where each block is scanned by a single CUDA thread block. At each recursive level of the scanArrayRecursive first invokes a kernel to scan all blocks of that level, and if the level has more than one block, it calls itself recursively. On returning from each recursive level, the total sum of each block from the level below is added to all elements of the corresponding block in this level. See "Parallel Prefix Sum (Scan) in CUDA" for more information (see References ).

Template parameter T is the datatype; isBackward specifies backward or forward scan; isExclusive specifies exclusive or inclusive scan, and op specifies the binary associative operator to be used.

Parameters:

[out]	d_out	The output array for the scan results
[in]	d_in	The input array to be scanned
[out]	d_blockSums	Array of arrays of per-block sums (one array per recursive level, allocated by allocScanStorage())
[in]	numElements	The number of elements in the array to scan
[in]	numRows	The number of rows in the array to scan
[in]	rowPitches	Array of row pitches (one array per recursive level, allocated by allocScanStorage())
[in]	level	The current recursive level of the scan

void allocScanStorage ( CUDPPScanPlan * plan )

Allocate intermediate arrays used by scan.

Scans of large arrays must be split (possibly recursively) into a hierarchy of block scans, where each block is scanned by a single CUDA thread block. At each recursive level of the scan, we need an array in which to store the total sums of all blocks in that level. This function computes the amount of storage needed and allocates it.

Parameters:

plan	Pointer to CUDPPScanPlan object containing options and number of elements, which is used to compute storage requirements, and within which intermediate storage is allocated.

void freeScanStorage ( CUDPPScanPlan * plan )

Deallocate intermediate block sums arrays in a CUDPPScanPlan object.

These arrays must have been allocated by allocScanStorage(), which is called by the constructor of cudppScanPlan().

Parameters:

plan	Pointer to CUDPPScanPlan object initialized by allocScanStorage().

void cudppScanDispatch	(	void *	d_out,
		const void *	d_in,
		size_t	numElements,
		size_t	numRows,
		const CUDPPScanPlan *	plan
	)

Dispatch function to perform a scan (prefix sum) on an array with the specified configuration.

This is the dispatch routine which calls scanArrayRecursive() with appropriate template parameters and arguments to achieve the scan as specified in plan.

Parameters:

[out]	d_out	The output array of scan results
[in]	d_in	The input array
[in]	numElements	The number of elements to scan
[in]	numRows	The number of rows to scan in parallel
[in]	plan	Pointer to CUDPPScanPlan object containing scan options and intermediate storage

template<class T , CUDPPOperator op, bool isBackward, bool isExclusive, bool doShiftFlagsLeft>

void segmentedScanArrayRecursive	(	T *	d_out,
		const T *	d_idata,
		const unsigned int *	d_iflags,
		T **	d_blockSums,
		unsigned int **	d_blockFlags,
		unsigned int **	d_blockIndices,
		int	numElements,
		int	level
	)

Perform recursive scan on arbitrary size arrays.

This is the CPU-side workhorse function of the segmented scan engine. This function invokes the CUDA kernels which perform the segmented scan on individual blocks.

Scans of large arrays must be split (possibly recursively) into a hierarchy of block scans, where each block is scanned by a single CUDA thread block. At each recursive level of the segmentedScanArrayRecursive first invokes a kernel to scan all blocks of that level, and if the level has more than one block, it calls itself recursively. On returning from each recursive level, the total sum of each block from the level below is added to all elements of the first segment of the corresponding block in this level.

Template parameter T is the data type of the input data. Template parameter op is the binary operator of the segmented scan. Template parameter isBackward specifies whether the direction is backward (not implemented). It is forward if it is false. Template parameter isExclusive specifies whether the segmented scan is exclusive (true) or inclusive (false).

Parameters:

[out]	d_out	The output array for the segmented scan results
[in]	d_idata	The input array to be scanned
[in]	d_iflags	The input flags vector which specifies the segments. The first element of a segment is marked by a 1 in the corresponding position in d_iflags vector. All other elements of d_iflags is 0.
[out]	d_blockSums	Array of arrays of per-block sums (one array per recursive level, allocated by allocScanStorage())
[out]	d_blockFlags	Array of arrays of per-block OR-reductions of flags (one array per recursive level, allocated by allocScanStorage())
[out]	d_blockIndices	Array of arrays of per-block min-reductions of indices (one array per recursive level, allocated by allocSegmentedScanStorage()). An index for a particular position `i` in a block is calculated as - if `d_iflags`[i] is set then it is the 1-based index of that position (i.e if `d_iflags`[10] is set then index is `11`) otherwise the index is `INT_MAX` (the identity element of a min operator)
[in]	numElements	The number of elements in the array to scan
[in]	level	The current recursive level of the scan

void allocSegmentedScanStorage ( CUDPPSegmentedScanPlan * plan )

Allocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class.

Segmented scans of large arrays must be split (possibly recursively) into a hierarchy of block segmented scans, where each block is scanned by a single CUDA thread block. At each recursive level of the scan, we need an array in which to store the total sums of all blocks in that level. Also at this level we have two more arrays - one which contains the OR-reductions of flags of all blocks at that level and the second which contains the min-reductions of indices of all blocks at that levels This function computes the amount of storage needed and allocates it.

Parameters:

[in] plan Pointer to CUDPPSegmentedScanPlan object containing segmented scan options and number of elements, which is used to compute storage requirements.

void freeSegmentedScanStorage ( CUDPPSegmentedScanPlan * plan )

Deallocate intermediate block sums, block flags and block indices arrays in a CUDPPSegmentedScanPlan class.

These arrays must have been allocated by allocSegmentedScanStorage(), which is called by the constructor of CUDPPSegmentedScanPlan.

Parameters:

[in] plan CUDPPSegmentedScanPlan class initialized by its constructor.

void cudppSegmentedScanDispatch	(	void *	d_out,
		const void *	d_idata,
		const unsigned int *	d_iflags,
		int	numElements,
		const CUDPPSegmentedScanPlan *	plan
	)

Dispatch function to perform a scan (prefix sum) on an array with the specified configuration.

This is the dispatch routine which calls segmentedScanArrayRecursive() with appropriate template parameters and arguments to achieve the scan as specified in plan.

Parameters:

[in]	numElements	The number of elements to scan
[in]	plan	Segmented Scan configuration (plan), initialized by CUDPPSegmentedScanPlan constructor
[in]	d_idata	The input array
[in]	d_iflags	The input flags array
[out]	d_out	The output array of segmented scan results

template<class T >

void sparseMatrixVectorMultiply	(	T *	d_y,
		const T *	d_x,
		const CUDPPSparseMatrixVectorMultiplyPlan *	plan
	)

Perform matrix-vector multiply for sparse matrices and vectors of arbitrary size.

This function performs the sparse matrix-vector multiply by executing four steps.

1. The sparseMatrixVectorFetchAndMultiply() kernel does an element-wise multiplication of a each element e in CUDPPSparseMatrixVectorMultiplyPlan::m_d_A with the corresponding (i.e. in the same row as the column index of e in CUDPPSparseMatrixVectorMultiplyPlan::m_d_A) element in d_x and stores the product in CUDPPSparseMatrixVectorMultiplyPlan::m_d_prod. It also sets all elements of CUDPPSparseMatrixVectorMultiplyPlan::m_d_flags to 0.

2. The sparseMatrixVectorSetFlags() kernel iterates over each element in CUDPPSparseMatrixVectorMultiplyPlan::m_d_rowIndex and sets the corresponding position (indicated by CUDPPSparseMatrixVectorMultiplyPlan::m_d_rowIndex) in CUDPPSparseMatrixVectorMultiplyPlan::m_d_flags to 1.

3. Perform a segmented scan on CUDPPSparseMatrixVectorMultiplyPlan::m_d_prod with CUDPPSparseMatrixVectorMultiplyPlan::m_d_flags as the flag vector. The output is stored in CUDPPSparseMatrixVectorMultiplyPlan::m_d_prod.

4. The yGather() kernel goes over each element in CUDPPSparseMatrixVectorMultiplyPlan::m_d_rowFinalIndex and picks the corresponding element (indicated by CUDPPSparseMatrixVectorMultiplyPlan::m_d_rowFinalIndex) element from CUDPPSparseMatrixVectorMultiplyPlan::m_d_prod and stores it in d_y.

Parameters:

[out]	d_y	The output array for the sparse matrix-vector multiply (y vector)
[in]	d_x	The input x vector
[in]	plan	Pointer to the CUDPPSparseMatrixVectorMultiplyPlan object which stores the configuration and pointers to temporary buffers needed by this routine

void allocSparseMatrixVectorMultiplyStorage	(	CUDPPSparseMatrixVectorMultiplyPlan *	plan,
		const void *	A,
		const unsigned int *	rowindx,
		const unsigned int *	indx
	)

Allocate intermediate product, flags and rowFindx (index of the last element of each row) array .

Parameters:

[in]	plan	Pointer to CUDPPSparseMatrixVectorMultiplyPlan class containing sparse matrix-vector multiply options, number of non-zero elements and number of rows which is used to compute storage requirements
[in]	A	The matrix A
[in]	rowindx	The indices of elements in A which are the first element of their row
[in]	indx	The column number for each element in A

void freeSparseMatrixVectorMultiplyStorage ( CUDPPSparseMatrixVectorMultiplyPlan * plan )

Deallocate intermediate product, flags and rowFindx (index of the last element of each row) array .

These arrays must have been allocated by allocSparseMatrixVectorMultiplyStorage(), which is called by the constructor of CUDPPSparseMatrixVectorMultiplyPlan.

Parameters:

[in] plan Pointer to CUDPPSparseMatrixVectorMultiplyPlan plan initialized by its constructor.

void cudppSparseMatrixVectorMultiplyDispatch	(	void *	d_y,
		const void *	d_x,
		const CUDPPSparseMatrixVectorMultiplyPlan *	plan
	)

Dispatch function to perform a sparse matrix-vector multiply with the specified configuration.

This is the dispatch routine which calls sparseMatrixVectorMultiply() with appropriate template parameters and arguments

Parameters:

[out]	d_y	The output vector for y = A*x
[in]	d_x	The x vector for y = A*x
[in]	plan	The sparse matrix plan and data

Compact Functions

RadixSort Functions

Scan Functions

Segmented Scan Functions

Sparse Matrix-Vector Multiply Functions

Detailed Description

Function Documentation