Yes I totally invented this term 😛 What I mean by it is producing multiple hashes from a single key. Like this (if the syntax is unfamiliar to you read this):
1 |
auto [h1, h2, h3] = hashNT<3>("key"); |
Or like this (for non-template version which returns a vector):
1 |
auto hashes = hashN("key", 3); |
Why? One place where I needed such sorcery was my bloom filter implementation. The idea is simple: one key, multiple hashes, repeatable (multiple calls with the same key produce the same hashes). But how? STL only comes with one hashing function. True, but it comes with multiple random number generators which can be seeded with a hash!
The solution then is to hash once, seed the random number generator, and make multiple calls to the RNG, like this (hash.hpp):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
#pragma once #include <array> #include <vector> #include <random> #include <algorithm> #include <functional> template<typename T> auto hashN(const T& key, std::size_t N) -> std::vector<std::size_t> { std::minstd_rand0 rng(std::hash<T>{}(key)); std::vector<std::size_t> hashes(N); std::generate(std::begin(hashes), std::end(hashes), rng); return hashes; } template<std::size_t N, typename T> auto hashNT(const T& key) -> std::array<std::size_t, N> { std::minstd_rand0 rng(std::hash<T>{}(key)); std::array<std::size_t, N> hashes{}; std::generate(std::begin(hashes), std::end(hashes), rng); return hashes; } |
You can use it like this (multi_hash.cpp):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
#include <iostream> #include <string> #include "multi_hash.hpp" using namespace std; void arr() { string s1 = "Vorbrodt's C++ Blog"; string s2 = "Vorbrodt's C++ Blog"; string s3 = "https://vorbrodt.blog"; auto h1 = hashN(s1, 3); auto h2 = hashN(s2, 3); auto h3 = hashN(s3, 3); cout << "HashN('" << s1 << "'):" << endl; for(auto it : h1) cout << it << endl; cout << endl; cout << "HashN('" << s2 << "'):" << endl; for(auto it : h2) cout << it << endl; cout << endl; cout << "HashN('" << s3 << "'):" << endl; for(auto it : h3) cout << it << endl; cout << endl; } void temp() { string s1 = "Vorbrodt's C++ Blog"; string s2 = "Vorbrodt's C++ Blog"; string s3 = "https://vorbrodt.blog"; auto [s1h1, s1h2, s1h3] = hashNT<3>(s1); auto [s2h1, s2h2, s2h3] = hashNT<3>(s2); auto [s3h1, s3h2, s3h3] = hashNT<3>(s3); cout << "HashNT('" << s1 << "'):" << endl; cout << s1h1 << endl << s1h2 << endl << s1h3 << endl << endl; cout << "HashNT('" << s2 << "'):" << endl; cout << s2h1 << endl << s2h2 << endl << s2h3 << endl << endl; cout << "HashNT('" << s3 << "'):" << endl; cout << s3h1 << endl << s3h2 << endl << s3h3 << endl << endl; } int main() { arr(); temp(); } |
HashN(‘Vorbrodt’s C++ Blog’):
Program output.
1977331388
699200791
437177953
HashN(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953
HashN(‘https://vorbrodt.blog’):
1924360287
1619619789
1594567998
HashNT(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953
HashNT(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953
HashNT(‘https://vorbrodt.blog’):
1924360287
1619619789
1594567998
technically speaking, you dont really need the trailing ret value, right?
that’s correct. with c++17 you can just specify auto. I left it in place to be more explicit about my intent.
I really like your post.
Hey, long time no see more post from you 🙂
Yea, I’ve been very busy lately, besides… after cranking out 81 posts in such short period of time I honestly burned out a bit; it was a massive brain dump in such short period of time. But I’ll get back to it eventually 🙂
Hi. If you have a hash collision let’s say then you will use the same seed getting all three hash values being exactly the same . Then what’s the point of using multiple hash values for example in the case of a bloom filter? The point of using multiple hash functions is to reduce the amount of simultaneous collisions.
Key is hashed and result used to seed RNG. That’s so that A) same key will hash to same value each time and B) what is the likelihood that seeded RNG will emit 3 identical values in a row? If you hash something using my code and get three identical values you need to look at the RNG shipped with STL not the algorithm in my code 🤷♂️
Perhaps you missed the fact that the hash seeds RNG and the returned hash values are consecutively generated by RNG? Otherwise I don’t understand your comment nor how this code could possibly return an array of identical hashes for any given key…