Practical Modern C++

Visitor Pattern and Multiple-Dispatch Explained

Posted on May 10, 2022 by Martin

Brief bookclub talk I gave at work introducing the visitor design pattern and multiple-dispatch in C++.
Chapter 18 from Hands-On Design Patterns with C++ by Fedor G. Pikus.

Source code:
visitor.cpp

Adapter Pattern Explained

Posted on May 3, 2022 by Martin

Brief bookclub talk I gave at work introducing the adapter design pattern in C++.
Chapter 17 from Hands-On Design Patterns with C++ by Fedor G. Pikus.

Source code:
adapter.cpp

Decorator Pattern Explained

Posted on April 27, 2022 by Martin

Brief bookclub talk I gave at work introducing the decorator design pattern in C++.
Chapter 17 from Hands-On Design Patterns with C++ by Fedor G. Pikus.

Source code:
decorator.cpp

Policy-Based Design Explained

Posted on April 19, 2022 by Martin

Brief bookclub talk I gave at work introducing policy-based design in C++.
Chapter 16 from Hands-On Design Patterns with C++ by Fedor G. Pikus.

Singleton Pattern Explained

Posted on April 19, 2022April 27, 2022 by Martin

Brief bookclub talk I gave at work introducing the singleton design pattern in C++.
Chapter 15 from Hands-On Design Patterns with C++ by Fedor G. Pikus.

Source code:
singleton.hpp | singleton.cpp

How to implement a memory pool

Posted on June 4, 2021 by Martin

Last week I explained what a memory pool is and how to use it. This week my lesson was about implementing a simple memory pool capable of fixed size memory chunk allocations. The pool preallocates a block of memory on demand, carves out equal sized chunks from it, and returns a raw pointer each time you call pool.malloc().
Memory chunks are described using a simple struct chunk_t and stored in a singly-linked-list; the struct has an internal anonymous union for size optimization: the same memory region can either represent the next pointer of the list’s node or hold user data.

P.S. Watch in HD, though I did increased the font size per your requests 🙂

Code from the class:
lesson_memory_pool_how_to.cpp
Cleaned up implementation:
memory_pool.hpp | memory_pool.cpp

How to use memory pools

Posted on May 29, 2021 by Martin

In this week’s class I explained what memory pools are and how to use them to optimize frequent allocations and deallocations of objects. I also demonstrated how to overwrite operator new / operator new [] and operator delete / operator delete [] for a struct / class type.
The code for this class contains a simple benchmark which measures pool’s performance against the general-purpose allocator in C++ runtime. Notice how multi-threaded benchmark causes the pool to perform poorly (I suspect due to lock contention, though I did not yet investigate this). I didn’t even bother measuring array allocations; the memory pool benchmark causes massive fragmentation of the free chunks (by allocating all available chunks, shuffling them randomly, then freeing all) making continuous allocations virtually impossible without falling back onto runtime allocator.

Next week I plan on actually implementing a simple memory pool then again measure its performance against the runtime. Enjoy the video!

P.S. Remember to pick HD version of the video. It is also best to watch in full screen mode due to the font size used. I will try to remember to increase it during my next class. Apologies to anyone finding it hard to read the code.

Code from the class:
lesson_pool_allocator.cpp | lesson_pool_allocator_benchmark.cpp

How to detect memory leaks

Posted on May 27, 2021 by Martin

During my last week’s class I discussed a source-code method of finding memory leaks in C++. What do I mean by that? Source-code method is not a way of debugging a process like one could do with Valgrind on Linux or Deleaker on Windows (which I’ve used recently and really enjoyed its integration with Visual Studio). What I am referring to is a combination of #define MACRO magic and overwriting the void* operator new. A technique to keep track of all your new / new[] and delete / delete[] statements and finding A) leaks due to missing delete statements and B) mismatches between array and non array versions of said operators.
Simply by including a header file in your source code you enable this tracing capability. Though it has a limitation: unlike Deleaker or Valgrind it only works for plain vanilla new / new[] operator; my solution does not support any other form of it.

Code from the class:
newtrace.hpp (C++17 or earlier) | newtrace.cpp | README.txt
Using boost::stacktrace:
newtrace.st.hpp | newtrace.st.cpp

How to implement a producer-consumer queue

Posted on April 22, 2021 by Martin

During my most recent class I discussed implementations of simple, multi-threaded, producer-consumer queues: unbounded (can grow in size indefinitely) and bounded (can only grow to a certain size). In short: the unbounded queue will block consumers’ pop() calls if it happens to be empty until a producer pushes something onto it. Bounded queue will do the same and in addition it will block producers’ push() calls if the queue fills up to its maximum capacity, unblocking only after a consumer frees up a spot with pop(). Finally I reworked the simple thread pool from last week’s class to use said queues, greatly reducing its complexity.

In the past I have implemented both types of queues (you can find it in my archives), but the bounded one used less than optimal approach: semaphores for counting free and occupied spots. This was costly because it required three std::mutex and two std::condition_variable objects. The one I implemented during this class requires only a single std::mutex and two std::condition_variable. The unbounded queue requires only one of each.

Code from the class:
lesson_queue_how_to.cpp

How to implement a thread pool

Posted on April 15, 2021April 22, 2021 by Martin

By popular demand, and at the request of my coworkers, this week’s class was about implementing a simple thread pool using only standard C++ components. It would have taken 1h-30m if I didn’t completely forget how to use a condition variable. It’s 2h long!
Bad std::condition_variable.wait(...) predicate caused a deadlock and I spent 30 minutes trying to figure it out. If you’re interested in the step-by-step implementation only then fast forward from 1h-15m-15s to 1h-45m-0s. Otherwise watch me reason through it. I eventually realized that the predicate should have returned true if the queue contained any elements, instead it tried to answer if 1 element was added to it.
For me the lesson was…

If you want to master something, teach it.
Richard Feynman

Code from the class:
lesson_thread_pool_how_to.cpp

Light-weight RPC framework

Posted on April 11, 2021January 29, 2022 by Martin

During my latest mentoring session with coworkers I was asked to explain how modern RPC frameworks work under the hood. What better way to illustrate this than to build one from scratch using components I blogged about previously (event objects, serializer framework, and TCP sockets).

Although not part of the video recording (which I will share later in this post) I explained that RPC frameworks generally work in two ways:

The client and server code is created at runtime and can be built on demand. Good example of this is the XML-RPC-C library: it contains no RPC IDL nor associated compiler/code-generator. It is simpler of the two.
The RPC methods of the server and associated client code is defined in form of an IDL, then ran through a code-generator producing almost all of the necessary client and server code. gRPC and Thrift are prime examples of this: both come with code generators and are very sophisticated. This was the type of framework I was going to talk about.

I did not design an IDL nor did I build a parser and code-generator for my students; there wasn’t nearly enough time for that. What I wanted to illustrate were different components which would have been generated by a modern RPC framework and how they would fit together. Starting with the client interface and code created to connect and issue calls to the server, serialization of parameters, network transport, command parsing and dispatching on the server side, actual command logic (this part would not be generated by a framework), and finally everything needed to send responses back to the client.

The starting point of creating a simple calculator server capable of adding and subtracting numbers would have been the server’s interface declaration using some form of an IDL:

interface Calculator
{
    int Add(int, int);
    int Sub(int, int);
}

interface Calculator

{

int Add(int, int);

int Sub(int, int);

}

This would then be processed by the framework’s generator to produce actual C++ code (or any other language) that would allow the programmer to easily connect, issue calls, and receive responses from the server:

auto c = RPCClient(server_host_name, server_port);
auto a = c.Add(2, 2);
auto s = c.Sub(10, 5);

auto c = RPCClient(server_host_name, server_port);

auto a = c.Add(2, 2);

auto s = c.Sub(10, 5);

Creating instances of the RPC server would be equally easy, but would also require providing the core logic of the server procedures:

auto s = RPCServer(listening_port);
s.SetAddHandler([](int x, int y) { return x + y; });
s.SetSubHandler([](int x, int y) { return x - y; });
s.Run();

auto s = RPCServer(listening_port);

s.SetAddHandler([](int x, int y) { return x + y; });

s.SetSubHandler([](int x, int y) { return x - y; });

s.Run();

In order for this to work a lot more code would have been generated then compiled against the static parts of the RPC framework: serialization, network transport, compression, encryption, authentication code, etc.
During my class I implemented the parts that would have been generated based on the desired set of procedure calls (the IDL), and reused code I developed for this blog in the past in place of the framework’s static parts.

Keep in mind that some of the supporting code I have already prepared before recording the class and I had it open on another screen, but the IDL based code I was coming up with on the fly, aka freestyling it 🙂
I figured it’s worth a mention in case my delivery appears less than… perfect.

Reworked and cleaned up code from the class:
Protocol: lrpc_proto.hpp. Client: lrpc_c.cpp. Server: lrpc_s.cpp.
Serializer: lsf.hpp. Network: socket.hpp. Event: event.hpp.

Better serializer

Posted on March 31, 2021January 29, 2022 by Martin

In my last post I described a light-weight serialization framework I recently implemented (video where I go over it in detail). After receiving great feedback from r/cpp I have implemented several improvements that were suggested to me: better unpack transform signature, safer unpacking code, unpack casting iterator just to name a few.

The latest code is available on GitHub; the serializer source lsf.hpp, and usage example lsf.cpp.

Video where I describe the changes in greater detail:

Light-weight serialization framework

Posted on March 28, 2021January 29, 2022 by Martin

UPDATE: Better serializer

In my very first YouTube Channel video I discuss in detail the design of a serialization framework I came up with in response to a question I was recently asked. In this post I want to give a brief overview of how it works and how to get started using it.

The primary goal was to create a serialization framework that is uncoupled from the types it operates on, meaning no class nor struct needs to inherit any sort of

add_pack_transform<std::uint16_t>(
    [](std::uint16_t v, buffer_output_t& it) {
        pack_value(it, htons(v)); });

add_unpack_transform<std::uint16_t>(
    [](buffer_input_t& it, std::uint16_t* v) {
        *v = ntohs(unpack_value<std::uint16_t>(it)); });

add_pack_transform<std::uint16_t>(

[](std::uint16_t v, buffer_output_t& it) {

pack_value(it, htons(v)); });

add_unpack_transform<std::uint16_t>(

[](buffer_input_t& it, std::uint16_t* v) {

*v = ntohs(unpack_value<std::uint16_t>(it)); });

By default every type is packed by doing bitwise copy from its memory location into a

auto buf_1 = pack(std::uint16_t(0x1234));

auto tup_1 = unpack<std::uint16_t>(buf_1);

auto buf_1 = pack(std::uint16_t(0x1234));

auto tup_1 = unpack<std::uint16_t>(buf_1);

The code above packs a single

It is possible to perform type conversions inside the pack and unpack transforms:

add_pack_transform<std::size_t>([](std::size_t v, buffer_output_t& it) {
    pack_value(it, std::uint16_t(v)); });

add_unpack_transform<std::size_t>([](buffer_input_t& it, std::size_t* v) {
    *v = unpack_value<std::uint16_t>(it); });

add_pack_transform<std::size_t>([](std::size_t v, buffer_output_t& it) {

pack_value(it, std::uint16_t(v)); });

add_unpack_transform<std::size_t>([](buffer_input_t& it, std::size_t* v) {

*v = unpack_value<std::uint16_t>(it); });

Above code illustrates it: every

add_pack_transform<const char*>(
	[](const char* v, buffer_output_t& it)
	{
		auto len = std::strlen(v);
		pack_type(it, len);
		pack_bytes(it, v, len);
	});

add_pack_transform<std::string>(
	[](const std::string& v, buffer_output_t& it)
	{
		pack_type(it, v.length());
		pack_bytes(it, v.data(), v.length());
	});

add_unpack_transform<std::string>(
	[](buffer_input_t& it, std::string* v)
	{
		auto len = unpack_type<std::string::size_type>(it);
		new (v) std::string(len, std::string::value_type());
		unpack_bytes(it, v->data(), len);
	});

add_pack_transform<const char*>(

[](const char* v, buffer_output_t& it)

{

auto len = std::strlen(v);

pack_type(it, len);

pack_bytes(it, v, len);

});

add_pack_transform<std::string>(

[](const std::string& v, buffer_output_t& it)

{

pack_type(it, v.length());

pack_bytes(it, v.data(), v.length());

});

add_unpack_transform<std::string>(

[](buffer_input_t& it, std::string* v)

{

auto len = unpack_type<std::string::size_type>(it);

new (v) std::string(len, std::string::value_type());

unpack_bytes(it, v->data(), len);

});

Here I define how to serialize

Having defined transforms for simple types we can now add more complex types, let’s say

using strings_t = std::vector<std::string>;

add_pack_transform<strings_t>(
	[](const strings_t& vs, buffer_output_t& it)
	{
		pack_type(it, vs.size());
		std::for_each(std::begin(vs), std::end(vs),
			[&](const std::string& s) { pack_type(it, s); });
	});

add_unpack_transform<strings_t>(
	[](buffer_input_t& it, strings_t* vs)
	{
		auto size = unpack_type<strings_t::size_type>(it);
		new (vs) strings_t();
		vs->reserve(size);
		std::generate_n(std::back_inserter(*vs), size,
			[&]() { return unpack_type<std::string>(it); });
	});

using strings_t = std::vector<std::string>;

add_pack_transform<strings_t>(

[](const strings_t& vs, buffer_output_t& it)

{

pack_type(it, vs.size());

std::for_each(std::begin(vs), std::end(vs),

[&](const std::string& s) { pack_type(it, s); });

});

add_unpack_transform<strings_t>(

[](buffer_input_t& it, strings_t* vs)

{

auto size = unpack_type<strings_t::size_type>(it);

new (vs) strings_t();

vs->reserve(size);

std::generate_n(std::back_inserter(*vs), size,

[&]() { return unpack_type<std::string>(it); });

});

The packing code is trivial: pack the size of the vector followed by packing each entry one at a time. Notice here I am calling

One thing I have not yet mentioned is an optimization I introduced into the code: before types are packed the exact amount of memory is reserved inside the resulting

static auto k_default_pack_size_proc = pack_size_proc_t(
    [](const void*) { return sizeof(T); });

1 2	static auto k_default_pack_size_proc = pack_size_proc_t( [](const void*) { return sizeof(T); });

There is a void pointer parameter but in this case it is ignored. However it is required to compute sizes of more complex types like strings; that is because in order to store the pack size procs inside an internal data structure each one needs to be wrapped inside the default lambda. This lambda gets the memory location of each type as

add_pack_size_proc<const char*>(
    [](const char* v) {
        return pack_size(std::strlen(v)) + std::strlen(v); });

add_pack_size_proc<std::string>(
    [](const std::string& v) {
        return pack_size(v.length()) + v.length(); });

add_pack_size_proc<const char*>(

[](const char* v) {

return pack_size(std::strlen(v)) + std::strlen(v); });

add_pack_size_proc<std::string>(

[](const std::string& v) {

return pack_size(v.length()) + v.length(); });

Here I tell the framework how to compute the number of bytes needed to serialize strings.

add_pack_size_proc<strings_t>(
	[](const strings_t& v)
	{
		return pack_size(v.size()) +
			std::accumulate(std::begin(v), std::end(v), 0,
				[](auto sum, const std::string& v) { return sum + pack_size(v); });
	});

add_pack_size_proc<strings_t>(

[](const strings_t& v)

{

return pack_size(v.size()) +

std::accumulate(std::begin(v), std::end(v), 0,

[](auto sum, const std::string& v) { return sum + pack_size(v); });

});

I initially tagged each type with a unique value so that I could verify it during unpacking to catch errors like packing an int but trying to unpack a string. This type tagging became more complicated than I first anticipated so I got rid of it, though I may revisit it and add an additional packing and unpacking functions that perform this verification optionally.

As always, the code is available on my GitHub page.
The framework is a single header file: lsf.hpp. Example program: lsf.cpp.

Practical Modern C++

Category: YouTube