Multi-hashing

Yes I totally invented this term ๐Ÿ˜› What I mean by it is producing multiple hashes from a single key. Like this (if the syntax is unfamiliar to you read this):

auto [h1, h2, h3] = hashNT<3>("key");

Or like this (for non-template version which returns a vector):

auto hashes = hashN("key", 3);

Why? One place where I needed such sorcery was my bloom filter implementation. The idea is simple: one key, multiple hashes, repeatable (multiple calls with the same key produce the same hashes). But how? STL only comes with one hashing function. True, but it comes with multiple random number generators which can be seeded with a hash!

The solution then is to hash once, seed the random number generator, and make multiple calls to the RNG, like this (hash.hpp):

#pragma once

#include 
#include 
#include 
#include 
#include 

template
auto hashN(const T& key, std::size_t N) -> std::vector
{
	std::minstd_rand0 rng(std::hash{}(key));
	std::vector hashes(N);
	std::generate(std::begin(hashes), std::end(hashes), rng);
	return hashes;
}

template
auto hashNT(const T& key) -> std::array
{
	std::minstd_rand0 rng(std::hash{}(key));
	std::array hashes{};
	std::generate(std::begin(hashes), std::end(hashes), rng);
	return hashes;
}

You can use it like this (multi_hash.cpp):

#include 
#include 
#include "multi_hash.hpp"

using namespace std;

void arr()
{
	string s1 = "Vorbrodt's C++ Blog";
	string s2 = "Vorbrodt's C++ Blog";
	string s3 = "https://vorbrodt.blog";

	auto h1 = hashN(s1, 3);
	auto h2 = hashN(s2, 3);
	auto h3 = hashN(s3, 3);

	cout << "HashN('" << s1 << "'):" << endl;
	for(auto it : h1) cout << it << endl;
	cout << endl;

	cout << "HashN('" << s2 << "'):" << endl;
	for(auto it : h2) cout << it << endl;
	cout << endl;

	cout << "HashN('" << s3 << "'):" << endl;
	for(auto it : h3) cout << it << endl;
	cout << endl;
}

void temp()
{
	string s1 = "Vorbrodt's C++ Blog";
	string s2 = "Vorbrodt's C++ Blog";
	string s3 = "https://vorbrodt.blog";

	auto [s1h1, s1h2, s1h3] = hashNT<3>(s1);
	auto [s2h1, s2h2, s2h3] = hashNT<3>(s2);
	auto [s3h1, s3h2, s3h3] = hashNT<3>(s3);

	cout << "HashNT('" << s1 << "'):" << endl;
	cout << s1h1 << endl << s1h2 << endl << s1h3 << endl << endl;

	cout << "HashNT('" << s2 << "'):" << endl;
	cout << s2h1 << endl << s2h2 << endl << s2h3 << endl << endl;

	cout << "HashNT('" << s3 << "'):" << endl;
	cout << s3h1 << endl << s3h2 << endl << s3h3 << endl << endl;
}

int main()
{
	arr();
	temp();
}

HashN('Vorbrodt's C++ Blog'):
1977331388
699200791
437177953

HashN('Vorbrodt's C++ Blog'):
1977331388
699200791
437177953

HashN('https://vorbrodt.blog'):
1924360287
1619619789
1594567998

HashNT('Vorbrodt's C++ Blog'):
1977331388
699200791
437177953

HashNT('Vorbrodt's C++ Blog'):
1977331388
699200791
437177953

HashNT('https://vorbrodt.blog'):
1924360287
1619619789
1594567998

Program output.

#pragma

There are two useful

#pragma
directives I like to use in my code: one let’s the preprocessor know that you want to include a header fine only once, and another deals with structure packing.

Instead of using the header include guards, which are ugly as sin, use

#pragma once
at the beginning of your header files, like this:

// This is a header file...
#pragma once // now it will be included only once
...

For structure packing, use

#pragma pack
directive. It tells the compiler about the default field alignment. On Clang 8.0.0 the
sizeof
of this structure is 32 bytes:

struct S1
{
	char c;
	short s;
	int i;
	long l;
	float f;
	double d;
};

We can pack it down to 27 bytes by using this directive (it tells the compiler to align all member fields on one byte boundary; this is useful when designing efficient network protocols or data serialization):

#pragma pack(1)

struct S2
{
	char c;
	short s;
	int i;
	long l;
	float f;
	double d;
};

You can also show, at compile time, what the current packing alignment is with

#pragma pack(show)
. Current alignment can be pushed onto a stack, then reverted back, with
#pragma pack(push, 1)
followed by
#pragma pack(pop)
.

Complete listing (pragma.cpp):

#include 

using namespace std;

struct S1
{
	char c;
	short s;
	int i;
	long l;
	float f;
	double d;
};

#pragma pack(show)
#pragma pack(push, 1)
#pragma pack(show)

struct S2
{
	char c;
	short s;
	int i;
	long l;
	float f;
	double d;
};

#pragma pack(pop)

int main()
{
	cout << sizeof(S1) << ", " << sizeof(S2) << endl;
}

32, 27

Program output.

Exception safe assignment

Longer title: exception safe assignment operator of resource owning objects. Uff. Because the object owns a resource, how do we write an exception safe assignment operator which will have to free up the old and allocate the new resource. By exception safe I don’t mean that it will never throw, that’s not possible. Instead, I mean safe in the sense that it either succeeds OR in case of exceptions the state of assigned to object is exactly as it was prior to the assignment. Like this:

int main()
{
	S s1, s2;
	s1 = s2;
}

If assignment operator

s1 = s2
throws an exception, we want the state of
s1
and
s2
to be as it was in line #3.

The trick is two fold: 1) a copy constructor is needed, and 2)

noexcept
swap function. Like this:

S(const S& s)
: m_resource(new int)
{
	*m_resource = *s.m_resource;
}
// ...
void swap(S& s) noexcept
{
	std::swap(m_resource, s.m_resource);
}

Here the copy constructor allocates the new resource first, then copies its content; the swap function just swaps pointers to the resources, which is always a

noexcept
operation. Having implemented a copy constructor and swap function we can now implement every assignment operator to have a strong exception guarantee like this:

S& operator = (const S& s)
{
	S temp(s); // May throw
	swap(temp); // Will never throw
	return *this;
}

Here’s how it works: we first make a temporary copy, which does the resource allocation. At this stage exceptions can be thrown, but we have not yet modified the assigned to object. Only after the resource allocation succeeds do we perform the

noexcept
swap. The destructor of your temporary object will take care of cleaning up the currently owned resource (that’s RAII at its best).

Complete listing (assignment.cpp):

#include 
#include 

using namespace std;

struct S
{
	S()
	: m_resource(new int)
	{
		cout << "S()" << endl;
		*m_resource = 1;
	}

	S(const S& s)
	: m_resource(new int)
	{
		cout << "S(const S&)" << endl;
		*m_resource = *s.m_resource;
	}

	S(S&&) = delete;

	S& operator = (const S& s)
	{
		cout << "operator = (const S&)" << endl;
		S temp(s); // May throw
		swap(temp); // Will never throw
		return *this;
	}

	S& operator = (S&&) = delete;

	~S()
	{
		cout << "~S()" << endl;
		delete m_resource;
	}

	void swap(S& s) noexcept
	{
		std::swap(m_resource, s.m_resource);
	}

	int* m_resource;
};

int main()
{
	S s1, s2;
	s1 = s2;
}

S()
S()
operator = (const S&)
S(const S&)
~S()
~S()
~S()

Program output.

Hashing the C++ way

Modern C++ brought us

std::hash
template (read more about it here). In short: it’s a stateless function object that implements
operator()
which takes an instance of a type as parameter and returns its hash as
size_t
. It has specializations for all primitive types as well as some library types. You can also specialize it yourself for your own data types (don’t forget to put your specialization in
namespace std
). Let’s see how it works by hashing some ints, chars, floats, pointers, strings, and our own custom data type. Pay close attention to the hash values of ints and chars…

hash.cpp:

#include 
#include 
#include 

using namespace std;

struct H
{
	string s1, s2;
};

ostream& operator << (ostream& os, const H& h)
{
	os << h.s1 << "," << h.s2;
	return os;
}

namespace std
{
	template <> struct hash
	{
		size_t operator()(const H& h) const
		{
			return hash{}(h.s1) ^ (hash{}(h.s2) << 1);
		}
	};
}

int main()
{
	for(int i = 1; i <= 3; ++i)
		cout << "Hash of '" << i << "': " << hash{}(i) << endl;

	for(char c = 'A'; c <= 'C'; ++c)
		cout << "Hash of '" << c << "': " << hash{}(c) << endl;

	for(float f = 1.1; f < 1.4; f += 0.1)
		cout << "Hash of '" << f << "': " << hash{}(f) << endl;

	char* p = new char[3];
	char* q = p;
	for(; p < q + 3; ++p)
		cout << "Hash of '" << (int*)p << "': " << hash{}(p) << endl;

	string s1 = "Vorbrodt's C++ Blog";
	string s2 = "Vorbrodt's C++ Blog";
	string s3 = "https://vorbrodt.blog";
	cout << "Hash of '" << s1 << "': " << hash{}(s1) << endl;
	cout << "Hash of '" << s2 << "': " << hash{}(s2) << endl;
	cout << "Hash of '" << s3 << "': " << hash{}(s3) << endl;

	H h1{"Vorbrodt's C++ Blog", "https://vorbrodt.blog"};
	H h2{"Vorbrodt's C++ Blog", "https://vorbrodt.blog"};
	H h3{"https://vorbrodt.blog", "Vorbrodt's C++ Blog"};
	cout << "Hash of '" << h1 << "': " << hash{}(h1) << endl;
	cout << "Hash of '" << h2 << "': " << hash{}(h2) << endl;
	cout << "Hash of '" << h3 << "': " << hash{}(h3) << endl;
}

Hash of '1': 1
Hash of '2': 2
Hash of '3': 3


Hash of 'A': 65
Hash of 'B': 66
Hash of 'C': 67


Hash of '1.1': 1066192077
Hash of '1.2': 1067030938
Hash of '1.3': 1067869799


Hash of '0x7f95fdd000a0': 6424303057458324486
Hash of '0x7f95fdd000a1': 6736290418105006831
Hash of '0x7f95fdd000a2': 13890240933949840298


Hash of 'Vorbrodt's C++ Blog': 435643587581864924
Hash of 'Vorbrodt's C++ Blog': 435643587581864924
Hash of 'https://vorbrodt.blog': 13293888041758778516


Hash of 'Vorbrodt's C++ Blog,https://vorbrodt.blog': 8570762348687434484
Hash of 'Vorbrodt's C++ Blog,https://vorbrodt.blog': 8570762348687434484
Hash of 'https://vorbrodt.blog,Vorbrodt's C++ Blog': 13000220508453909292

Data alignment the C++ way

Before modern C++ the only way to align variables or structures on a given byte boundary was to inject padding; to align a

struct
to 16 bytes you had to do this:

struct Old
{
	int x;
	char padding[16 - sizeof(int)];
};

Not any more! Modern C++ introduced a keyword just for that:

alignas
(read more about it here). Now you can specify struct’s alignment like this:

struct alignas(16) New
{
	int x;
};

This can be of great help when dealing with constructive or destructive interference of L1 cache lines. You can also space local variables apart, as well as struct/class members. Here’s a complete example (alignas.cpp):

#include 

using namespace std;

int main()
{
	struct Old
	{
		int x;
		char padding[16 - sizeof(int)];
	};
	cout << "sizeof(Old): " << sizeof(Old) << endl << endl;

	struct alignas(16) New
	{
		int x;
	};
	cout << "sizeof(New): " << sizeof(New) << endl << endl;

	alignas(16) int x{}, y{};
	alignas(16) int z{};
	ptrdiff_t delta1 = (uint8_t*)&y - (uint8_t*)&x;
	ptrdiff_t delta2 = (uint8_t*)&z - (uint8_t*)&y;
	cout << "Address of 'x'      : " << &x << endl;
	cout << "Address of 'y'      : " << &y << endl;
	cout << "Address of 'z'      : " << &z << endl;
	cout << "Distance 'x' to 'y' : " << delta1 << endl;
	cout << "Distance 'y' to 'z' : " << delta2 << endl << endl;

	struct             Empty   {};
	struct alignas(64) Empty64 {};

	cout << "sizeof(Empty)  : " << sizeof(Empty)   << endl;
	cout << "sizeof(Empty64): " << sizeof(Empty64) << endl << endl;

	struct Full
	{
		alignas(32) char c;
		alignas(16) int x, y;
	};
	cout << "sizeof(Full): " << sizeof(Full) << endl << endl;
}

sizeof(Old): 16
sizeof(New): 16
Address of 'x'ย  ย  ย  : 0x7ffee4a448c0
Address of 'y'ย  ย  ย  : 0x7ffee4a448d0
Address of 'z'ย  ย  ย  : 0x7ffee4a448e0
Distance 'x' to 'y' : 16
Distance 'y' to 'z' : 16
sizeof(Empty)ย  : 1
sizeof(Empty64): 64
sizeof(Full): 64

Program output.

Simple file I/O

I was playing around with file I/O the C++ way and decided to create a file hashing program using

ifstream
and Botan crypto library. The program reads an entire file specified as the command line argument and takes the SHA1 hash of the content. It’s amazing what you can accomplish with well designed frameworks in very little code. Here’s the program (file_hash.cpp):

#include 
#include 
#include 
#include 
#include 
#include 

using namespace std;

const int READ_SIZE = 1024 * 1024;

int main(int argc, char** argv)
{
	if(argc != 2)
	{
		cerr << "USAGE: " << argv[0] << " " << endl;
		return -1;
	}

	try
	{
		using Buffer = vector;
		Buffer buffer(READ_SIZE);

		auto sha1 = Botan::HashFunction::create_or_throw("SHA-1");

		ifstream ifs;

		ifs.exceptions(ios::failbit | ios::badbit);
		ifs.open(argv[1]);
		ifs.seekg(0, ios_base::end);
		size_t bytes_left = ifs.tellg();
		ifs.seekg(ios_base::beg);

		while(bytes_left)
		{
			size_t bytes_to_read = bytes_left > READ_SIZE ? READ_SIZE : bytes_left;

			buffer.resize(bytes_to_read);
			ifs.read(buffer.data(), buffer.size());

			sha1->update((uint8_t*)buffer.data(), buffer.size());

			bytes_left -= bytes_to_read;
		}

		ifs.close();

		cout << Botan::hex_encode(sha1->final()) << " " << argv[1] << endl;
	}
	catch(exception& e)
	{
		cerr << e.what() << endl;
	}
}

Better bloom filter

Based on this implementation it supports multiple hashes for better positive hit ratio. Can be initializes with size in bits and number of hashes to perform, like this:

bloom_filter bloom(128, 5);

As always, complete implementation on GitHub: bloom.hpp.

#pragma once

#include 
#include 
#include 
#include 
#include 

namespace
{
	template>
	class mixer
	{
	public:
		mixer(std::size_t size, const Key& key)
		: m_size(size), m_random(Hash{}(key))
		{
			assert(size > 0);
		}

		std::size_t operator()() { return m_random() % m_size; }

	private:
		std::size_t m_size;
		std::mt19937 m_random;
	};
}

template >
class bloom_filter
{
public:
	bloom_filter(std::size_t size, std::size_t hashes)
	: m_size(size), m_hashes(hashes), m_bits(size)
	{
		assert(size > 0);
		assert(hashes > 0);
	}

	void add(const Key& key)
	{
		mixer m(m_size, key);
		for(std::size_t i = 0; i < m_hashes; ++i)
			m_bits[m()] = true;
	}

	bool contains(const Key& key) const
	{
		mixer m(m_size, key);
		for(std::size_t i = 0; i < m_hashes; ++i)
			if(!m_bits[m()]) return false;
		return true;
	}

private:
	std::size_t m_size;
	std::size_t m_hashes;
	std::vector m_bits;
};

Bloom Filters

From Wikipedia:

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.

In other words, given a set of elements, bloom filter can tell you that: A) given element is definitely not in the set, or B) given element is maybe in the set. It can give you a false positive: it can say that an element is in the set when it in fact is not. But it will never give you a false negative: it will never say that an element is not in the set when in fact it is.

In my past life I used bloom filters to check whether or not I should perform an expensive database index search ๐Ÿ™‚

In the following example I construct a bloom filter given a set of strings

set1
, I then verify that each element of
set1
is in the set according to the bloom filter. Finally I try a different set of elements
set2
, and test what the bloom filter says about those elements. Given big enough bloom filter I get 100% correct answers (that non of the elements in
set2
are present). Here’s the code (bloom.cpp):

#include 
#include 
#include "bloom.hpp"

using namespace std;

int main()
{
	string set1[] = {"Martin", "Vorbrodt", "C++", "Blog"};
	string set2[] = {"Not", "In", "The", "Set"};

	bloom_filter bloom(128);

	for(auto& s : set1)
		bloom.add(s);

	for(auto& s : set1)
		cout << "Contains \"" << s << "\" : " << bloom.contains(s) << endl;

	for(auto& s : set2)
		cout << "Contains \"" << s << "\" : " << bloom.contains(s) << endl;
}

Contains "Martin" : 1
Contains "Vorbrodt" : 1
Contains "C++" : 1
Contains "Blog" : 1
Contains "Not" : 0
Contains "In" : 0
Contains "The" : 0
Contains "Set" : 0

Program output.

My implementation of the bloom filter is primitive: it uses only one hashing function which increases false positives, but it is very short and clean and can be built upon; maybe at some point I'll write a full blown implementation; though there's plenty of examples online; I want this post to be an introduction to the topic.

In any case, here's my implementation of the bloom filter (bloom.hpp):

#pragma once

#include 
#include 
#include 

template>
class bloom_filter
{
public:
	bloom_filter(std::size_t size) : m_bits(size) {}

	void add(const key& data)
	{
		m_bits[hash{}(data) % m_bits.size()] = true;
	}

	bool contains(const key& data)
	{
		return m_bits[hash{}(data) % m_bits.size()];
	}

private:
	std::vector m_bits;
};

C++ Attributes

C++11 introduced standard attributes: a way to mark fragments of code with useful information for the developer or optimization information for the compiler. See a complete list of standard attributes here, Clang attributes here, and Microsoft attributes here. I will go over a few of them in this post.

  • [[nodiscard]]
    – when specified with a function declaration, tells the compiler to emit a warning if function’s return value is ignored; when specified with a struct emits a warning wherever the struct is returned and ignored.
  • [[fallthrough]]
    – suppresses compiler warning about a switch-case statement without a
    break
    ; in other words, a
    case
    statement that falls through into another
    case
    .
  • [[no_unique_address]]
    – tells the compiler to perform empty base optimization on marked data member.
  • [[deprecated]]
    – emits a warning when marked function, struct, namespace, or variable is used.
  • [[noreturn]]
    – tells the compiler that the marked function never returns; emits a warning if it does.
  • [[maybe_unused]]
    – suppressed a warning when marked variable, function, or argument is not used.
  • [[likely]]
    – tell the compiler this is the most likely path of execution; allows for optimizations.

Note: not all are supported on every compiler; I tested on LLVM 8.0.0 and GCC 8.2. Luckily the unsupported ones do not cause a compile error ๐Ÿ™‚

Below is an example code and a screenshot of compiler messages.

attributes.cpp:

[[nodiscard]] int no_discard(int i)
{
	switch(i)
	{
		case 1:
		[[fallthrough]];
		[[likely]] case 2: return i;
		default: return i;
	}
}

struct E {};

struct [[nodiscard]] S
{
	int i;
	[[no_unique_address]] E e;
};

S no_discard_struct()
{
	return {};
}

struct [[deprecated("Use Q instead")]] P {};

[[deprecated("Use bar() instead")]] void foo() {}

[[noreturn]] void never_returns()
{
	while(true);
}

int main()
{
	[[maybe_unused]] int unused = 1;
	P p;

	no_discard(0);
	no_discard_struct();
	foo();
	never_returns();
}

Compiler warnings; LLVM 8.0.0.

Initialization list exceptions and raw pointers

What to do when an exception is thrown on the initialization list when allocating memory for a raw pointer? The situation is easy if your class only has one raw pointer member, but it gets complicated with two or more. Here’s a code example that’s guaranteed to leak memory if the second

new int
throws an exception (because the destructor will not be called):

struct bad
{
	bad() : p1(new int), p2(new int)
	{
		*p1 = 1;
		*p2 = 2;
	}

	~bad()
	{
		delete p1;
		delete p2;
	}

	int* p1;
	int* p2;
};

There is no way to free the memory allocated to

p1
if
p2(new int)
throws! Let’s build on my previous example and see what happens if we use a function try-catch block on the constructor:

struct still_bad
{
	still_bad() try : p1(new int), p2(new int)
	{
		*p1 = 1;
		*p2 = 2;
	}
	catch(...)
	{
		if(p1) delete p1; // ILLEGAL~!!!
		if(p2) delete p2; // ILLEGAL~!!!
		/*
		 * From: https://en.cppreference.com/w/cpp/language/function-try-block
		 *
		 * The behavior is undefined if the catch-clause of a function-try-block used on a
		 * constructor or a destructor accesses a base or a non-static member of the object.
		 */
	}

	~still_bad()
	{
		delete p1;
		delete p2;
	}

	int* p1 = nullptr;
	int* p2 = nullptr;
};

Still no good! Because accessing

p1
and
p2
in the catch block leads to undefined behavior. See here.

The only way to guarantee correct behavior is to use smart pointers. This works because 1) the initialization list allocates in pre-defined order (the order of member declaration) and 2) the destructors of already created members will be called. Here’s the correct way of allocating multiple pointers:

struct good
{
	good() : p1(make_unique()), p2(make_unique())
	{
		*p1 = 1;
		*p2 = 2;
	}

	unique_ptr p1;
	unique_ptr p2;
};

This is guaranteed to do proper cleanup if the second

make_unique<int>()
throws
std::bad_alloc
๐Ÿ™‚

Complete listing (bad_pointer.cpp):

#include 
#include 

using namespace std;

struct bad
{
	bad() : p1(new int), p2(new int)
	{
		*p1 = 1;
		*p2 = 2;
	}

	~bad()
	{
		delete p1;
		delete p2;
	}

	int* p1;
	int* p2;
};

struct still_bad
{
	still_bad() try : p1(new int), p2(new int)
	{
		*p1 = 1;
		*p2 = 2;
	}
	catch(...)
	{
		if(p1) delete p1; // ILLEGAL~!!!
		if(p2) delete p2; // ILLEGAL~!!!
		/*
		 * From: https://en.cppreference.com/w/cpp/language/function-try-block
		 *
		 * The behavior is undefined if the catch-clause of a function-try-block used on a
		 * constructor or a destructor accesses a base or a non-static member of the object.
		 */
	}

	~still_bad()
	{
		delete p1;
		delete p2;
	}

	int* p1 = nullptr;
	int* p2 = nullptr;
};

struct good
{
	good() : p1(make_unique()), p2(make_unique())
	{
		*p1 = 1;
		*p2 = 2;
	}

	unique_ptr p1;
	unique_ptr p2;
};

int main()
{
	bad b;
	still_bad sb;
	good g;
}

Function try-catch blocks

Syntactic sugar or a useful feature? More than just sweet sweet sugar baby! This little known feature is a nice way of wrapping an entire function in a

try
catch
block. So instead of writing this:

void eat_it()
{
	try
	{
		throw runtime_error("System error from eat_it()");
	}
	catch(exception& e)
	{
		cout << "Swallowing: " << e.what() << endl;
	}
}

You can write this:

void eat_it_sugar() try
{
	throw runtime_error("System error from eat_it_sugar()");
}
catch(exception& e)
{
	cout << "Swallowing: " << e.what() << endl;
}

The meaning of the two functions is identical. Notice here I'm swallowing the exception instead of propagating it out. I could call

throw
to re-throw it, or in both cases I could throw a different exception inside the
catch
block.

The caveat is with function try-catch blocks around constructors: they have to re-throw the same or different exception. If you don't re-throw explicitly the compiler will do it for you. It is also useful for catching exceptions emitted during the initialization of member variables, and throwing something else (or re-throwing the same). Like this:

struct P
{
	P() { throw logic_error("Logic error from P::P()"); }
};

struct Q
{
	Q() try : m_P()
	{
		cout << "Never printed!" << endl;
	}
	catch(exception& e)
	{
		cout << "Inside Q::Q() caught: " << e.what() << endl;
		throw runtime_error("Runtime error from Q::Q()");
	}

	P m_P;
};

Complete listing (try_block.cpp):

#include 
#include 

using namespace std;

struct P
{
	P() { throw logic_error("Logic error from P::P()"); }
};

struct Q
{
	Q() try : m_P()
	{
		cout << "Never printed!" << endl;
	}
	catch(exception& e)
	{
		cout << "Inside Q::Q() caught: " << e.what() << endl;
		throw runtime_error("Runtime error from Q::Q()");
	}

	P m_P;
};

void eat_it()
{
	try
	{
		throw runtime_error("System error from eat_it()");
	}
	catch(exception& e)
	{
		cout << "Swallowing: " << e.what() << endl;
	}
}

void eat_it_sugar() try
{
	throw runtime_error("System error from eat_it_sugar()");
}
catch(exception& e)
{
	cout << "Swallowing: " << e.what() << endl;
}

int main() try
{
	eat_it();
	eat_it_sugar();
	Q q;
}
catch(exception& e)
{
	cout << "Inside main() caught: " << e.what() << endl;
}

Swallowing: System error from eat_it()
Swallowing: System error from eat_it_sugar()
Inside Q::Q() caught: Logic error from P::P()
Inside main() caught: Runtime error from Q::Q()

Program output.

The #1 rule of cryptography

The #1 rule of cryptography: Don’t invent your own!

OK wiseman, now what? You want to add crypto to your program but you don’t want to code it all yourself. I’ll show you three libraries that make it possible. The choice will be yours as to which one to use.

For this example I wanted to write a simple function that accepts a

std::string
message and returns hex encoded SHA-1 hash. I picked the following libraries: Crypto++, WolfSSL, and Botan. All three made it pretty easy, and I don’t want to get into the business of picking winners and losers, but… Botan mad it a breeze and I think it will be my choice going forward ๐Ÿ™‚

crypto.cpp:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

using namespace std;

string Hash_CryptoPP(const string& msg)
{
	CryptoPP::SHA1 hash;
	std::string digest(hash.DigestSize(), '*');
	stringstream output;

	hash.Update((const CryptoPP::byte*)msg.data(), msg.size());
	hash.Final((CryptoPP::byte*)&digest[0]);

	CryptoPP::HexEncoder encoder(new CryptoPP::FileSink(output));
	CryptoPP::StringSource(digest, true, new CryptoPP::Redirector(encoder));

	return output.str();
}

string Hash_WolfSSL(const string& msg)
{
	Sha sha;
	::byte shaSum[SHA_DIGEST_SIZE];
	stringstream output;

	wc_InitSha(&sha);
	wc_ShaUpdate(&sha, (::byte*)msg.data(), msg.length());
	wc_ShaFinal(&sha, shaSum);

	string digest(shaSum, shaSum + SHA_DIGEST_SIZE);
	CryptoPP::HexEncoder encoder(new CryptoPP::FileSink(output));
	CryptoPP::StringSource(digest, true, new CryptoPP::Redirector(encoder));

	return output.str();
}

string Hash_Botan(const string& msg)
{
	auto hash = Botan::HashFunction::create("SHA-1");
	hash->update((uint8_t*)msg.data(), msg.length());
	return Botan::hex_encode(hash->final());
}

int main()
{
	std::string msg = "Vorbrodt's C++ Blog @ https://vorbrodt.blog";

	cout << "Message: " << msg << endl;
	cout << "Digest : " << Hash_CryptoPP(msg) << endl << endl;

	cout << "Message: " << msg << endl;
	cout << "Digest : " << Hash_WolfSSL(msg) << endl << endl;

	cout << "Message: " << msg << endl;
	cout << "Digest : " << Hash_Botan(msg) << endl << endl;
}

Message: Vorbrodt's C++ Blog @ https://vorbrodt.blog
Digest : 24BCAC1359AA8B773D38D6A05B22BB43DAB5B8E5

Message: Vorbrodt's C++ Blog @ https://vorbrodt.blog
Digest : 24BCAC1359AA8B773D38D6A05B22BB43DAB5B8E5

Message: Vorbrodt's C++ Blog @ https://vorbrodt.blog
Digest : 24BCAC1359AA8B773D38D6A05B22BB43DAB5B8E5

Program output.

{fmt}

I found this cool little text formatting library with very clean interface and wanted to share it with you. I decided the best way to introduce it to you is not through an extensive tutorial but rather code which illustrates how to use it; so I wrote a program which does the same thing in twelve different ways using this library… plus few extra examples of text coloring, formatting, and alignment. Take a look at the code and the program output and it will all make sense.

fmt.cpp:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

using namespace std;
using namespace fmt;

int main()
{
	// Format into a std::string
	auto msg1 = fmt::format("The answer is {}", 42);
	auto msg2 = "{0}{1}"_format("The answer is ", 42);

	// Print std::string
	fmt::print("{}\n", msg1);
	fmt::print("{}\n", msg2);

	// Format into memory buffer and print
	fmt::memory_buffer out;
	format_to(out, "The answer is {0}", "42");
	auto msg3 = string(out.begin(), out.end());
	fmt::print("{}\n", msg3);

	// Reverse order of parameters,
	// print to various outputs
	fmt::print("{1} {0}\n", 42, "The answer is");
	fmt::print(cout, "{1} {0}\n", 42, "The answer is");
	fmt::print(stdout, "{1} {0}\n", 42, "The answer is");

	// Named arguments
	fmt::print("{first} {second}\n", fmt::arg("first", "The answer is"), fmt::arg("second", 42));
	fmt::print("{second} {first}\n", "second"_a="The answer is", "first"_a=42);

	// printf style formatting
	fmt::printf("The answer is %.2f\n", 42.f);
	fmt::fprintf(cout, "The answer is %.2f\n", 42.f);
	fmt::fprintf(stdout, "The answer is %.2f\n", 42.f);

	// printf style formatting into a std::string
	auto msg4 = fmt::sprintf("The answer is %.2f\n", 42.f);
	fmt::printf("%s", msg4);

	// Text color and style manipulation
	fmt::print(fmt::emphasis::bold, "The text is bold\n");
	fmt::print(fmt::fg(fmt::color::red) | fmt::bg(fmt::color::green), "The color is red and green\n");

	// Date and time formatting
	std::time_t t = std::time(nullptr);
	fmt::print("The date and time is {:%Y-%m-%d %H:%M:%S}\n", *std::localtime(&t));

	// Alignment
	fmt::print("{:-<30}\n", "left aligned");
	fmt::print("{:->30}\n", "right aligned");
	fmt::print("{:-^30}\n", "centered");
}

The answer is 42
The answer is 42
The answer is 42
The answer is 42
The answer is 42
The answer is 42
The answer is 42
The answer is 42
The answer is 42.00
The answer is 42.00
The answer is 42.00
The answer is 42.00
The text is bold
The color is red and green
The date and time is 2019-03-31 09:03:45
left aligned——————
—————–right aligned
———–centered———–

Program output.
Linux screenshot.

SSO of std::string

What is short/small string optimization? It’s a way to squeeze some bytes into a

std::string
object without actually allocating them on the heap. It’s a hackery involving C++ unions and clever space management. Say
sizeof(std::string)
is, oh I don’t know, 24 bytes on Mac’s LLVM? The implementation manages to squeeze 22 characters into that (not including the terminating NULL) before having to allocate on the heap. Impressive. Less impressive is GCC’s implementation on Linux, with
sizeof(std::string)
being 32 bytes but only 15 can be optimized before going to the heap. I used to have this number for Visual Studio’s implementation but… see the rant above ๐Ÿ˜› The capacity of an empty string is the give away for how much you can fit in it before going to the heap ๐Ÿ˜‰

Check it out yourself on your favorite compiler with the code below!

sso.cpp:

#include 
#include 

using namespace std;

int main()
{
	auto size = sizeof(string);
	auto capacity = string().capacity();
	auto small = string(capacity, '*');
	auto big = string(capacity + 1, '*');

	cout << "sizeof  : " << size << endl;
	cout << "Capacity: " << capacity << endl;
	cout << "Small   : " << small.capacity() << endl;
	cout << "Big     : " << big.capacity() << endl;
}

sizeof  : 24
Capacity: 22
Small   : 22
Big     : 31

Program output (LLVM on Mac).

HTTP queries

Today I want to show you how to use cURLpp (C++ wrapper around libcURL) to make a simple HTTP query to ip-api.com in order to retrieve geolocation information of a given host or IP address. I chose cURLpp because it’s simple and easy to use; the example program would not have been any harder using libcURL C API but this is a C++ blog after-all ๐Ÿ™‚ I will be using Boost Property Tree library to deserialize the JSON geo-ip data. All of that is achieved in 15, give or take, lines of actual code… that’s the power of simple and well designed C++ libraries!

The program starts off by setting up a RAII object of type

curlpp::Cleanup
which initializes and cleans up cURLpp library. We then create a request object of type
curlpp::Easy
and an output
std::stringstream
where the received data will be placed. Next we setup some options like verbosity level, URL, port, the output stream, and we execute the query. Finally we parse the JSON data using
read_json
and iterate over the
ptree
structure to print it to the console.

geoip.cpp:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

using namespace std;

int main()
{
	try
	{
		curlpp::Cleanup cURLppStartStop;

		curlpp::Easy request;
		std::stringstream response;

		request.setOpt(curlpp::options::Verbose(true));
		request.setOpt(curlpp::options::WriteStream(&response));
		request.setOpt(curlpp::options::Url("http://ip-api.com/json/vorbrodt.blog"));
		request.setOpt(curlpp::options::Port(80));

		request.perform();

		boost::property_tree::ptree data;
		boost::property_tree::read_json(response, data);

		for(auto& it : data)
			cout << it.first << " = " << it.second.data() << endl;
	}
	catch(exception& e)
	{
		cerr << e.what() << endl;
	}
}

* ย  Trying 69.195.146.130...
* TCP_NODELAY set
* Connected to ip-api.com (69.195.146.130) port 80 (#0)
> GET /json/vorbrodt.blog HTTP/1.1
Host: ip-api.com
Accept: */*

< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Content-Type: application/json; charset=utf-8
< Date: Fri, 29 Mar 2019 23:17:53 GMT
< Content-Length: 284
<ย 
* Connection #0 to host ip-api.com left intact


as = AS46606 Unified Layer
city = Provo
country = United States
countryCode = US
isp = Unified Layer
lat = 40.2067
lon = -111.643
org = Unified Layer
query = 162.241.253.105
region = UT
regionName = Utah
status = success
timezone = America/Denver
zip = 84606

Program output.

C-style callbacks and lambda functions

You can use a non-capturing lambda function with C-style APIs that expect a function pointer. As long as the signatures of the callback and the lambda match, the lambda will be cast to a function pointer (or you could define a “positive lambda”, one with a

+
in front of it; this causes automatic conversion to a function pointer). This works because the compiler converts non-capturing lambdas to actual functions and stores them inside the compiled binary. Effectively a pointer to locally defined lambda is valid for the life of the program.

In the program below I define a callback with the following signature:

typedef void(*FuncPtr)(int arg)
and two C-style functions that use it:
void set_callback(FuncPtr fp)
and
void fire_callback(int arg)
. I then call
set_callback
with a positive lambda. The program works ๐Ÿ™‚

c_api_lambda.cpp:

#include 

using namespace std;

typedef void(*FuncPtr)(int arg);
FuncPtr callback;

void set_callback(FuncPtr fp) { callback = fp; }
void fire_callback(int arg) { callback(arg); }

int main()
{
	set_callback(+[](int arg) { cout << arg << endl; });
	fire_callback(42);
}

42

Program output.

Propagate exceptions across threads

What if you need to catch an exception in a worker thread and re-throw it in the main thread that’s waiting for the worker to finish?

std::future
works this way. If you spawn a future on a new thread using
std::async(std::launch::async, ...);
and that future’s worker throws an exception, when you later call
get()
on the future it will emit that exception.

You do it by wrapping the worker thread’s function in

try { /* CODE */ } catch(...) {}
and capturing the current exception pointer (
std::exception_ptr
) using
std::current_exception
. You can then re-throw the captured exception using the pointer and
std::rethrow_exception
. Below is an example that illustrates this technique. Just remember, if you have multiple worker threads make sure to have multiple
std::exception_ptr
instances; one per worker thread.

exceptions.cpp:

#include 
#include 
#include 
#include 
#include 

using namespace std;

int main(int, char**)
{
	exception_ptr ep = nullptr;

	thread t([&]()
	{
		try
		{
			stringstream str;
			str << this_thread::get_id();
			throw runtime_error(str.str().c_str());
		}
		catch(...)
		{
			ep = current_exception();
		}
	});
	t.join();

	if(ep != nullptr)
	{
		try
		{
			rethrow_exception(ep);
		}
		catch(exception& e)
		{
			cout << "Thread " << this_thread::get_id() << " caught exception from thread " << e.what() << endl;
		}
	}
}

Thread 0x1048d75c0 caught exception from thread 0x700001cf3000

Program output.

int main()

I have been spanked by certain commenter (who shall not remain anonymous ๐Ÿ˜‰ ) on here and FB about my style of naming unused

main
arguments and unnecessary
return 1;
at the end of every
main
function.

I have though about and I… concede the point of his argument ๐Ÿ™‚ From now on the style on this blog shall be as follows (if arguments to

main
are not needed):

int main()
{
    // CODE...
}

P.S. C++ standard allows for two valid signatures:

int main()
and
int main(int argc, char** argv)
, see here.

XML-RPC

XML-RPC is yet another method of implementing remote procedure calls. It used XML over HTTP to transmit data. In my past live working at TLO I used XML-RPC-C library to implement communication between cluster nodes and a cluster management system. I thought the library was well designed and easy to use so I wanted to introduce you to it.

Below is a simple client and server implementation using the XML-RPC-C library. The server implements one RPC that accepts one string parameter and returns one string. The client makes the call to the server saying hello and prints the reply. The code is easy to read and does not need any further explanation ๐Ÿ™‚

The client. xmlrpc_c.cpp:

#include 
#include 
#include 
#include 
#include 

using namespace std;

int main(int argc, char** argv)
{
	string serverUrl("http://localhost:8080/RPC2");
	string methodName("hello");

	xmlrpc_c::clientSimple client;
	xmlrpc_c::value result;

	client.call(serverUrl, methodName, "s", &result, "XMLRPC client says hello!");

	auto reply = xmlrpc_c::value_string(result);

	cout << static_cast(reply) << endl;

	return 1;
}

The server. xmlrpc_s.cpp:

#include 
#include 
#include 
#include 
#include 
#include 

using namespace std;

class hello : public xmlrpc_c::method
{
public:
	void execute(const xmlrpc_c::paramList& params, xmlrpc_c::value* retval)
	{
		string msg(params.getString(0));
		params.verifyEnd(1);

		cout << msg << endl;

		*retval = xmlrpc_c::value_string("XMLRPC server says hello!");
	}
};

int main(int argc, char** argv)
{
	xmlrpc_c::registry registry;
	registry.addMethod("hello", new hello);

	xmlrpc_c::serverAbyss server(xmlrpc_c::serverAbyss::constrOpt().registryP(&registry).portNumber(8080));
	server.run();

	return 1;
}

Base64 encoding

Base64 encoding: turning binary data into ASCII text for the purpose of saving it to text files like XML, or transmitting it over protocols like HTTP, or embedding it into web page files, and many other purposed. That’s the basic idea behind it. For every 3 bytes of input you get 4 bytes of output, so it’s not crazy inflated.

I was looking for a library that has built in, easy to use and clean base64 encoding and decoding functions but didn’t really find anything to my liking. So I looked for a reference implementation and found one at wikibooks.org. Their C++ implementation (released to public domain so I could freely use and modify it) was my starting point. I beautified the code and brought it closer to modern C++ ๐Ÿ™‚ So now you have a header only, clean base64 encode and decode functions you can use in your projects: base64.hpp.

I wrote a program that encodes and decodes an input string, checks the original against the decoded one, and also checks the encoded base64 text against reference base64 string taken from wiki. The implementation checks out and produces correct encoded strings and decoded data ๐Ÿ™‚ Below is the test program.

base64.cpp:

#include 
#include 
#include 
#include 
#include "base64.hpp"
#include "ascii_escape_code.hpp"

using namespace std;
using namespace ascii_escape_code;

int main(int argc, char** argv)
{
	string input
	{
		"Man is distinguished, not only by his reason, but by this singular passion from other animals, "
		"which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable "
		"generation of knowledge, exceeds the short vehemence of any carnal pleasure."
	};
	string reference
	{
		"TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz"
		"IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg"
		"dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu"
		"dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo"
		"ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4="
	};

	try
	{
		vector data(begin(input), end(input));

		cout << bold << bright_red << "Input: " << reset << italic << input << reset << endl << endl;
		cout << bold << bright_red << "Reference: " << reset << reference << endl << endl;

		auto encoded = base64::encode(data);
		cout << bold << bright_red << "Encoded: " << reset << encoded << endl << endl;

		if(encoded != reference) throw runtime_error("Oh snap! Encoded data does not match reference!");
		else cout << bright_red << "Encoded data matches reference :o)" << reset << endl << endl;

		auto decoded = base64::decode(encoded);
		string s2(begin(decoded), end(decoded));

		cout << bold << bright_red << "Decoded: " << reset << italic << s2 << reset << endl << endl;

		if(data != decoded) throw runtime_error("Oh snap! Input data does not match decoded!");
		else cout << bright_red << "Decoded data matches original :o)" << reset << endl << endl;
	}
	catch(exception& e)
	{
		cerr << slow_blink << bright_red << e.what() << reset << endl << endl;
	}

	return 1;
}

Input: Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.

Reference:

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ 1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3 aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmF uY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWd hYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydC B2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

Encoded:

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ 1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3 aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmF uY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWd hYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydC B2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

Encoded data matches reference :o)

Decoded: Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.

Decoded data matches original :o)

Program output.

And here is the encoder and decoder function implementation.

base64.hpp:

#pragma once

#include 
#include 
#include 
#include 

namespace base64
{
	inline static const char kEncodeLookup[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
	inline static const char kPadCharacter = '=';

	using byte = std::uint8_t;

	inline std::string encode(const std::vector& input)
	{
		std::string encoded;
		encoded.reserve(((input.size() / 3) + (input.size() % 3 > 0)) * 4);

		std::uint32_t temp{};
		auto it = input.begin();

		for(std::size_t i = 0; i < input.size() / 3; ++i)
		{
			temp  = (*it++) << 16;
			temp += (*it++) << 8;
			temp += (*it++);
			encoded.append(1, kEncodeLookup[(temp & 0x00FC0000) >> 18]);
			encoded.append(1, kEncodeLookup[(temp & 0x0003F000) >> 12]);
			encoded.append(1, kEncodeLookup[(temp & 0x00000FC0) >> 6 ]);
			encoded.append(1, kEncodeLookup[(temp & 0x0000003F)      ]);
		}

		switch(input.size() % 3)
		{
		case 1:
			temp = (*it++) << 16;
			encoded.append(1, kEncodeLookup[(temp & 0x00FC0000) >> 18]);
			encoded.append(1, kEncodeLookup[(temp & 0x0003F000) >> 12]);
			encoded.append(2, kPadCharacter);
			break;
		case 2:
			temp  = (*it++) << 16;
			temp += (*it++) << 8;
			encoded.append(1, kEncodeLookup[(temp & 0x00FC0000) >> 18]);
			encoded.append(1, kEncodeLookup[(temp & 0x0003F000) >> 12]);
			encoded.append(1, kEncodeLookup[(temp & 0x00000FC0) >> 6 ]);
			encoded.append(1, kPadCharacter);
			break;
		}

		return encoded;
	}

	std::vector decode(const std::string& input)
	{
		if(input.length() % 4)
			throw std::runtime_error("Invalid base64 length!");

		std::size_t padding{};

		if(input.length())
		{
			if(input[input.length() - 1] == kPadCharacter) padding++;
			if(input[input.length() - 2] == kPadCharacter) padding++;
		}

		std::vector decoded;
		decoded.reserve(((input.length() / 4) * 3) - padding);

		std::uint32_t temp{};
		auto it = input.begin();

		while(it < input.end())
		{
			for(std::size_t i = 0; i < 4; ++i)
			{
				temp <<= 6;
				if     (*it >= 0x41 && *it <= 0x5A) temp |= *it - 0x41;
				else if(*it >= 0x61 && *it <= 0x7A) temp |= *it - 0x47;
				else if(*it >= 0x30 && *it <= 0x39) temp |= *it + 0x04;
				else if(*it == 0x2B)                temp |= 0x3E;
				else if(*it == 0x2F)                temp |= 0x3F;
				else if(*it == kPadCharacter)
				{
					switch(input.end() - it)
					{
					case 1:
						decoded.push_back((temp >> 16) & 0x000000FF);
						decoded.push_back((temp >> 8 ) & 0x000000FF);
						return decoded;
					case 2:
						decoded.push_back((temp >> 10) & 0x000000FF);
						return decoded;
					default:
						throw std::runtime_error("Invalid padding in base64!");
					}
				}
				else throw std::runtime_error("Invalid character in base64!");

				++it;
			}

			decoded.push_back((temp >> 16) & 0x000000FF);
			decoded.push_back((temp >> 8 ) & 0x000000FF);
			decoded.push_back((temp      ) & 0x000000FF);
		}

		return decoded;
	}
}