Magic Number Perfect Hash Function
In this post we’ll look at how to use magic numbers to create a perfect hash function for a set of Key Value pairs with Integer keys that is known a ahead of time.
We’ll assume that we are using 64-bit integers as the keys in our Key Value pairs.
The Problem
Our goal is to create a perfect hash function (no collisions) for a set of data that is known ahead of time and will not change. Put another way, we will insert the data into Hash Table at the start, and only ever perform lookups after that.
While standard Hash Table implementations can do this, the hash function they use needs to be general purpose. This isn’t ideal for when we care about performance.
We don’t want a general purpose hash function since we know all keys ahead of time. We want to avoid using an expensive and slow general purpose hash function.
Standard Hash Table
Let’s say we have the following Key Value pair (6019811509317997855, 20)
. Using a normal Hash Table the value retrieval process would look something like this behind the scenes:
- Hash the key using the general hash function.
- Use the resulting Hash as an index for an Array that contains all the values.
This example skips over the many ways the internals of a Hash Table can work and ways collisions are dealt with (Linear/Quadratic Probing, Double hashing, Separate Chaining, Cuckoo hashing, etc), but that isn’t important because we’ll avoid that with Magic Numbers.
Using Magic Numbers
Here is how the value retrieval process would work if we used a Magic Number:
- Multiply the Key by the Magic Number and shift the result
N
bits to the right. - Use this value as an index for an Array that contains all the values.
The new hash function now consists entirely of a single multiplication and shift operation.
However, the obvious question is what is the Magic Number and what is the value of N
?
The answer is that they are values such that the above procedure works for every key in our known set of data and will never produce a collision. The rest of this post will explain how to calculate these values for a known set of data.
Calculating Magic Numbers
Below is a set of data with 5 Key Value pairs. Note that this can work perfectly fine for thousands of keys, although will require more memory and take longer to find better magic numbers.
1
2
3
4
5
6019811509317997855 => 20
8863454925401798656 => 40
13735527195181205504 => 60
10620837929843658752 => 80
5503223162953909248 => 100
- Key => Value
First off, let’s see how many bits we need to represent the index into our Array of values. As there are 5 keys, we’ll need 3-bits. More generally, we’ll need the smallest power of 2 that is larger than the number of keys we have.
- 2^2 = 4, which is less than 5. Therefore 2 bits ISN’T enough
- 2^3 = 8, which is greater than or equal to 5. Therefore 3 bits IS enough.
Example
We said above that 3-bits is enough to represent all the indices into our Array of values. This means that said Array would be 2^3 = 8 elements in size.
So how do we go from the Key to a unique 3-bit value? Like this:
- Find a magic number such that the first 3-bits are unique for every (Key * Magic Number)
15567010318032385463
is an example of such a magic number. Below is the result of each Key multiplied by the Magic Number. The result is shown in binary with the overflow truncated:
1
2
3
4
5
6019811509317997855 * 15567010318032385463 = 0000100110111101110000001101110010101000110010001000010000101001
8863454925401798656 * 15567010318032385463 = 1100010010111001011010110001100000111000110010011000100000000000
13735527195181205504 * 15567010318032385463 = 0011010111111001111111010010000001001000001011010111100000000000
10620837929843658752 * 15567010318032385463 = 1111000010111010101110110100010100110111100010111111000000000000
5503223162953909248 * 15567010318032385463 = 0101111000100000101000111101011101101110110001110001110000000000
Notice the first 3-bits of each result. They are all unique. Because we only care about these 3-bits, we’ll shift the result all the way to the right (64 - 3 = 61) so we only have these 3-bits left.
After right shifting by 61 we are left with the following:
1
2
3
4
5
(6019811509317997855 * 15567010318032385463) >> 61 = 000 = 0
(8863454925401798656 * 15567010318032385463) >> 61 = 110 = 6
(13735527195181205504 * 15567010318032385463) >> 61 = 001 = 1
(10620837929843658752 * 15567010318032385463) >> 61 = 111 = 7
(5503223162953909248 * 15567010318032385463) >> 61 = 010 = 2
The resulting values (0, 6, 1, 7, 2) are the indices we would use in the Array of values.
E.g. for the above Key 10620837929843658752
its value would be stored at the 7th position of the value Array.
If we want to perform a insert or lookup for a certain Key, this is what the pseudocode would look like:
1
2
3
4
5
6
7
8
9
10
11
12
const MAGIC_NUMBER = 10620837929843658752;
const BITS_USED = 3;
const N = 64 - BITS_USED;
const VALUE_ARRAY = new Array(2 ** BITS_USED);
function InsertValue(KEY, VALUE) {
VALUE_ARRAY[(MAGIC_NUMBER * KEY) >> N] = VALUE;
}
function GetValueFromKey(KEY) {
return VALUE_ARRAY[(MAGIC_NUMBER * KEY) >> N];
}
Searching for a Magic Number
The final question is how the above magic number was found? The answer is simple. Brute Force search.
We need to generate and test random numbers until we find one that doesn’t produce collisions (duplicate indices after the multiplication and shift).
Important Note
For the above example only 5 keys were used and 3-bits is the minimum amount needed. If instead we had 5000 keys, we would need a minimum of 13-bits (2^13 = 8192). Because there are so many keys, it’s very unlikely to find a magic number that gives unique 13-bit values. To get around this, it’s better to start by using many more bits than required, and once a valid magic number is found, search for another that is 1-bit less.
Ideally we want to use the least bits possible since it reduces the size of the value Array we need to create. Every bit less we use means the value Array is half the size.
JavaScript Code
Some notes on the below code:
- Because there is no 64-bit integer in JavaScript we need to perform an AND operation after we perform the multiplication of the Key and Magic Number.
- On line 16 a random prime number is generated for the magic number. This isn’t required, and a pseudo random number can be used.
The code starts of by using 31 bits for the value Array, then progressively lowers it until a magic number using the minimum amount of bits is used, or until a certain amount of Iterations has passed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
const crypto = require('crypto');
const BIG_VALUES = [
6019811509317997855n,
8863454925401798656n,
13735527195181205504n,
10620837929843658752n,
5503223162953909248n,
];
const MIN_NUMBER_OF_BITS = BIG_VALUES.length.toString(2).length;
const MAX_NUMBER_OF_BITS = 64;
const MAX_BIT_MASK = BigInt('0b' + (new Array(MAX_NUMBER_OF_BITS)).fill(1).join(''));
function FindMagicNumberV1(outputNumberBitSize) {
const magicNum = crypto.generatePrimeSync(MAX_NUMBER_OF_BITS, {bigint: true})
const mappedNumbers = new Set();
let isValid = true;
for (let i = 0; i < BIG_VALUES.length; ++i) {
let currentNumber = ((BIG_VALUES[i]*magicNum) & MAX_BIT_MASK) >> (64n - BigInt(outputNumberBitSize));
if (mappedNumbers.has(currentNumber)) {
isValid = false;
break;
}
mappedNumbers.add(currentNumber);
}
if (isValid) return {
magicNumber: magicNum,
bitsUsed: outputNumberBitSize,
};
return { bitsUsed: Number.MAX_SAFE_INTEGER };
}
let bestSoFar = {bitsUsed: 32};
for (let i = 0; i < 100_000_000; ++i) {
const result = FindMagicNumberV1(bestSoFar.bitsUsed - 1);
if (result.bitsUsed < bestSoFar.bitsUsed) {
bestSoFar = result;
console.log('Best Found:', result);
}
if (bestSoFar.bitsUsed === MIN_NUMBER_OF_BITS) break;
}
The above code outputs the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Best Found: { magicNumber: 16098184039384438879n, bitsUsed: 31 }
Best Found: { magicNumber: 16325008153248595037n, bitsUsed: 30 }
Best Found: { magicNumber: 14368020153527046023n, bitsUsed: 29 }
Best Found: { magicNumber: 14621349544756933247n, bitsUsed: 28 }
Best Found: { magicNumber: 16540598574663265171n, bitsUsed: 27 }
Best Found: { magicNumber: 16228696490751924887n, bitsUsed: 26 }
Best Found: { magicNumber: 15100991816352779023n, bitsUsed: 25 }
Best Found: { magicNumber: 17324609008676439523n, bitsUsed: 24 }
Best Found: { magicNumber: 18256656583194581291n, bitsUsed: 23 }
Best Found: { magicNumber: 14520756546626378681n, bitsUsed: 22 }
Best Found: { magicNumber: 16048184102540498231n, bitsUsed: 21 }
Best Found: { magicNumber: 14838936919325713043n, bitsUsed: 20 }
Best Found: { magicNumber: 18217564146616633613n, bitsUsed: 19 }
Best Found: { magicNumber: 15466474175171765107n, bitsUsed: 18 }
Best Found: { magicNumber: 17960607931469292029n, bitsUsed: 17 }
Best Found: { magicNumber: 13931438973861982927n, bitsUsed: 16 }
Best Found: { magicNumber: 15085174118877062041n, bitsUsed: 15 }
Best Found: { magicNumber: 13841881654513396591n, bitsUsed: 14 }
Best Found: { magicNumber: 16802910703276585081n, bitsUsed: 13 }
Best Found: { magicNumber: 15454353913613611463n, bitsUsed: 12 }
Best Found: { magicNumber: 17444997694547404823n, bitsUsed: 11 }
Best Found: { magicNumber: 15415669514167943959n, bitsUsed: 10 }
Best Found: { magicNumber: 14189095924298317387n, bitsUsed: 9 }
Best Found: { magicNumber: 16644222002049844057n, bitsUsed: 8 }
Best Found: { magicNumber: 17759411055761750099n, bitsUsed: 7 }
Best Found: { magicNumber: 17934313906907920303n, bitsUsed: 6 }
Best Found: { magicNumber: 18241549279352968841n, bitsUsed: 5 }
Best Found: { magicNumber: 17266648670371355939n, bitsUsed: 4 }
Best Found: { magicNumber: 18006623335312784483n, bitsUsed: 3 }
Rust Multi-Threaded Code
The below code uses 12 Threads to find a magic number. Each thread will start at a specified number of bits, search for a magic number, and when found, lower the bit count by 1.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
use std::collections::BTreeSet;
use rand::{Rng, rngs::ThreadRng};
use std::thread;
static BIG_VALUES: [u64; 5] = [
6019811509317997855,
8863454925401798656,
13735527195181205504,
10620837929843658752,
5503223162953909248,
];
fn generate_magic_number(rng: &mut ThreadRng, bits_to_use: u8) -> u64 {
let magic_number: u64 = rng.gen::<u64>();
let mut mapped_numbers: BTreeSet<u64> = BTreeSet::new();
for i in 0..BIG_VALUES.len() {
let current_mapped_number = (BIG_VALUES[i] * magic_number) >> (64 - bits_to_use);
if mapped_numbers.contains(¤t_mapped_number) {
return 0;
}
mapped_numbers.insert(current_mapped_number);
}
return magic_number;
}
fn main() {
println!("");
let num_of_threads = 12;
let mut handles = vec![];
for _ in 0..num_of_threads {
let handle = thread::spawn(|| {
let mut rng = rand::thread_rng();
let min_bits: u8 = 3;
let mut bits: u8 = 3;
let mut result: u64;
for _ in 0..1_000_000_000 {
result = generate_magic_number(&mut rng, bits);
if result > 0 {
println!("Magic Number: ({}), Bits Used ({})", result, bits);
bits -= 1;
if bits < min_bits { break; }
}
}
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
println!("All Threads Completed.");
}
The above code outputs the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
Magic Number: (7769583043006980941), Bits Used (3)
Magic Number: (16132918982763439544), Bits Used (3)
Magic Number: (16488419013824544368), Bits Used (3)
Magic Number: (12623179947970322489), Bits Used (3)
Magic Number: (12178518977004148095), Bits Used (3)
Magic Number: (15227566936866480215), Bits Used (3)
Magic Number: (3179281196491962013), Bits Used (3)
Magic Number: (17380414587344683173), Bits Used (3)
Magic Number: (9619816875933123805), Bits Used (3)
Magic Number: (5377167397578549298), Bits Used (3)
Magic Number: (2337733146222928516), Bits Used (3)
Magic Number: (10068657740713211807), Bits Used (3)
All Threads Completed.