CUDPP: Overview of CUDPP hash tables

Hash tables are useful for efficiently storing and retrieving sparse data sets. Unlike dense representations, the size of a hash table is generally proportional to the number of elements stored in it rather than the size of the space of possible elements. Hash tables should also have a small cost to insert items and a small cost to retrieve items. CUDPP hash tables have these properties.

CUDPP includes three different hash table types:

CUDPP_BASIC_HASH_TABLE	Stores a single value per key. The input is expected to be a set of key-value pairs, where the keys are all unique.
CUDPP_COMPACTING_HASH_TABLE	Assigns each key a unique identifier and allows O(1) translation between the key and the unique IDs. Input is a set of keys that may, or may not, be repeated.
CUDPP_MULTIVALUE_HASH_TABLE	Allows you to store multiple values for each key. Multiple values for the same key are represented by different key-value pairs in the input.

Using CUDPP hash tables

CUDPP supports four major routines for hash tables: creating a hash table (cudppHashTable), inserting items (typically key/value pairs) into a hash table (cudppHashInsert), retrieving values from a hash table given their keys (cudppHashRetrieve), and destroying the hash table (cudppDestroyHashTable). Each of these routines works with each of the 3 types of hash tables above.

A typical use of a hash table might look like this (see the sample application cudpp_hash_testrig for a complete example):

Create and populate two arrays in GPU device memory, one of keys, one of values.
Configure a CUDPPHashTableConfig data structure with:
- the type of hash table (CUDPPHashTableType);
- kInputSize, the number of items to be inserted into the hash table; and
- the space usage multiplier space_usage; the hash table will store kInputSize elements but require kInputSize * space_usage elements of storage. Smaller space_usage factors use less space overall but take longer to build. The cudpp_hash_testrig example tests with five space_usage factors from 1.05 to 2.0.
Initialize the table using cudppHashTable.
Insert the keys and arrays into the hash table using cudppHashInsert.
Create two arrays in GPU device memory, one populated with a list of retrieval keys, and one empty to retrieve the value associated with those keys.
Retrieve those values with cudppHashRetrieve.
When you're done, destroy the hash table with cudppDestroyHashTable.

Other software used in CUDPP's hash tables

CUDPP's hash table library relies on a good random number generator to ensure good hash function generation. CUDPP uses the Mersenne Twister implementation provided by Makoto Matsumoto, available here (and included in the CUDPP distribution). You may try using your system's rand() function and srand() functions, but keep in mind that Windows' generator produces numbers in a very small range.

The compacting hash table and multivalue hash table implementation use CUDPP's scan and sort functionality.

Hash table space limitations

The maximum size of the hash table implementations are primarily limited by the size of available memory. The figures below indicate the size of the hash table as a function of:

N, the number of elements in the hash table;
K, the number of unique keys in the input (for the multivalue and compacting hash table); and
M, the space multiplier for the hash table (for every 1 element to be stored in the hash table, we allocate space for M elements). M defaults to 1.25.

Basic hash table:

Other than some 32-bit values (like an error flag and the number of items in the stash), the only memory allocation is for the actual hash table itself:

Hash table: 2 * (M * N + 101) uint32s

101 is the size of the stash (in uint32s). We multiply by 2 because we store both keys and values.

Multi-value hash table:

In the multivalue hash table, memory is held until the hash table is deleted; it was set up this way to prevent mucking up the timings with memory management.

Hash table: 2 * (M * K + 101) uint32s
Values of all of the keys: N uint32s
Information about each key's values: 2 * K uint32s
Compacted list of the keys: K uint32s
Scratch memory, CUDPP scan: Enough to scan N items. Note that scan has its own limitation on size (~67M elements).
Scratch memory, radix sort: 2N uint32s. Note that sort has its own limitation on size (~2B elements).
Scratch memory, additional: N uint32s

Each key is capped at UINT_MAX values. Although the code doesn't do it, you could in theory creatively re-use some of the memory or move allocations around to avoid having to hold onto more scratch space than you really need. The CUDPP scan memory, for example, is allocated when the hash table is initialized but only used when it's being built.

Compacting hash table:

Hash table: 2 * (M * N + 101) uint32s
Compacted unique keys: K uint32s
Scratch memory: 3 * (M * N + 101) uint32s
Scratch memory, CUDPP scan: Enough to scan N + 101 uint32s. Note that scan has its own limitation on size (~67M elements).