Multi-hashing

Yes I totally invented this term 😛 What I mean by it is producing multiple hashes from a single key. Like this (if the syntax is unfamiliar to you read this):

auto [h1, h2, h3] = hashNT<3>("key");

1	auto [h1, h2, h3] = hashNT<3>("key");

Or like this (for non-template version which returns a vector):

auto hashes = hashN("key", 3);

1	auto hashes = hashN("key", 3);

Why? One place where I needed such sorcery was my bloom filter implementation. The idea is simple: one key, multiple hashes, repeatable (multiple calls with the same key produce the same hashes). But how? STL only comes with one hashing function. True, but it comes with multiple random number generators which can be seeded with a hash!

The solution then is to hash once, seed the random number generator, and make multiple calls to the RNG, like this (hash.hpp):

#pragma once

#include <array>
#include <vector>
#include <random>
#include <algorithm>
#include <functional>

template<typename T>
auto hashN(const T& key, std::size_t N) -> std::vector<std::size_t>
{
	std::minstd_rand0 rng(std::hash<T>{}(key));
	std::vector<std::size_t> hashes(N);
	std::generate(std::begin(hashes), std::end(hashes), rng);
	return hashes;
}

template<std::size_t N, typename T>
auto hashNT(const T& key) -> std::array<std::size_t, N>
{
	std::minstd_rand0 rng(std::hash<T>{}(key));
	std::array<std::size_t, N> hashes{};
	std::generate(std::begin(hashes), std::end(hashes), rng);
	return hashes;
}

#pragma once

#include <array>

#include <vector>

#include <random>

#include <algorithm>

#include <functional>

template<typename T>

auto hashN(const T& key, std::size_t N) -> std::vector<std::size_t>

{

std::minstd_rand0 rng(std::hash<T>{}(key));

std::vector<std::size_t> hashes(N);

std::generate(std::begin(hashes), std::end(hashes), rng);

return hashes;

}

template<std::size_t N, typename T>

auto hashNT(const T& key) -> std::array<std::size_t, N>

{

std::minstd_rand0 rng(std::hash<T>{}(key));

std::array<std::size_t, N> hashes{};

std::generate(std::begin(hashes), std::end(hashes), rng);

return hashes;

}

You can use it like this (multi_hash.cpp):

#include <iostream>
#include <string>
#include "multi_hash.hpp"

using namespace std;

void arr()
{
	string s1 = "Vorbrodt's C++ Blog";
	string s2 = "Vorbrodt's C++ Blog";
	string s3 = "https://vorbrodt.blog";

	auto h1 = hashN(s1, 3);
	auto h2 = hashN(s2, 3);
	auto h3 = hashN(s3, 3);

	cout << "HashN('" << s1 << "'):" << endl;
	for(auto it : h1) cout << it << endl;
	cout << endl;

	cout << "HashN('" << s2 << "'):" << endl;
	for(auto it : h2) cout << it << endl;
	cout << endl;

	cout << "HashN('" << s3 << "'):" << endl;
	for(auto it : h3) cout << it << endl;
	cout << endl;
}

void temp()
{
	string s1 = "Vorbrodt's C++ Blog";
	string s2 = "Vorbrodt's C++ Blog";
	string s3 = "https://vorbrodt.blog";

	auto [s1h1, s1h2, s1h3] = hashNT<3>(s1);
	auto [s2h1, s2h2, s2h3] = hashNT<3>(s2);
	auto [s3h1, s3h2, s3h3] = hashNT<3>(s3);

	cout << "HashNT('" << s1 << "'):" << endl;
	cout << s1h1 << endl << s1h2 << endl << s1h3 << endl << endl;

	cout << "HashNT('" << s2 << "'):" << endl;
	cout << s2h1 << endl << s2h2 << endl << s2h3 << endl << endl;

	cout << "HashNT('" << s3 << "'):" << endl;
	cout << s3h1 << endl << s3h2 << endl << s3h3 << endl << endl;
}

int main()
{
	arr();
	temp();
}

#include <iostream>

#include <string>

#include "multi_hash.hpp"

using namespace std;

void arr()

{

string s1 = "Vorbrodt's C++ Blog";

string s2 = "Vorbrodt's C++ Blog";

string s3 = "https://vorbrodt.blog";

auto h1 = hashN(s1, 3);

auto h2 = hashN(s2, 3);

auto h3 = hashN(s3, 3);

cout << "HashN('" << s1 << "'):" << endl;

for(auto it : h1) cout << it << endl;

cout << endl;

cout << "HashN('" << s2 << "'):" << endl;

for(auto it : h2) cout << it << endl;

cout << endl;

cout << "HashN('" << s3 << "'):" << endl;

for(auto it : h3) cout << it << endl;

cout << endl;

}

void temp()

{

string s1 = "Vorbrodt's C++ Blog";

string s2 = "Vorbrodt's C++ Blog";

string s3 = "https://vorbrodt.blog";

auto [s1h1, s1h2, s1h3] = hashNT<3>(s1);

auto [s2h1, s2h2, s2h3] = hashNT<3>(s2);

auto [s3h1, s3h2, s3h3] = hashNT<3>(s3);

cout << "HashNT('" << s1 << "'):" << endl;

cout << s1h1 << endl << s1h2 << endl << s1h3 << endl << endl;

cout << "HashNT('" << s2 << "'):" << endl;

cout << s2h1 << endl << s2h2 << endl << s2h3 << endl << endl;

cout << "HashNT('" << s3 << "'):" << endl;

cout << s3h1 << endl << s3h2 << endl << s3h3 << endl << endl;

}

int main()

{

arr();

temp();

}

HashN(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953

HashN(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953

HashN(‘https://vorbrodt.blog’):
1924360287
1619619789
1594567998

HashNT(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953

HashNT(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953

HashNT(‘https://vorbrodt.blog’):
1924360287
1619619789
1594567998
Program output.

8 Replies to “Multi-hashing”

kobica says:

April 9, 2019 at 7:29 pm

technically speaking, you dont really need the trailing ret value, right?

Loading...

1. Martin Vorbrodt says:
  
  April 10, 2019 at 11:14 am
  
  that’s correct. with c++17 you can just specify auto. I left it in place to be more explicit about my intent.
  
  Loading...
  
bubble shooter says:

April 10, 2019 at 4:08 am

I really like your post.

Loading...

Do Ngoc Tan says:

September 18, 2019 at 10:48 pm

Hey, long time no see more post from you 🙂

Loading...

1. Martin Vorbrodt says:
  
  September 19, 2019 at 12:36 pm
  
  Yea, I’ve been very busy lately, besides… after cranking out 81 posts in such short period of time I honestly burned out a bit; it was a massive brain dump in such short period of time. But I’ll get back to it eventually 🙂
  
  Loading...
  
Serhat Istin says:

August 26, 2022 at 3:53 pm

Hi. If you have a hash collision let’s say then you will use the same seed getting all three hash values being exactly the same . Then what’s the point of using multiple hash values for example in the case of a bloom filter? The point of using multiple hash functions is to reduce the amount of simultaneous collisions.

Loading...

1. Martin says:
  
  August 26, 2022 at 5:07 pm
  
  Key is hashed and result used to seed RNG. That’s so that A) same key will hash to same value each time and B) what is the likelihood that seeded RNG will emit 3 identical values in a row? If you hash something using my code and get three identical values you need to look at the RNG shipped with STL not the algorithm in my code 🤷‍♂️
  
  Loading...
  
2. Martin says:
  
  August 26, 2022 at 5:15 pm
  
  Perhaps you missed the fact that the hash seeds RNG and the returned hash values are consecutively generated by RNG? Otherwise I don’t understand your comment nor how this code could possibly return an array of identical hashes for any given key…
  
  Loading...