How to implement a memory pool

Last week I explained what a memory pool is and how to use it. This week my lesson was about implementing a simple memory pool capable of fixed size memory chunk allocations. The pool preallocates a block of memory on demand, carves out equal sized chunks from it, and returns a raw pointer each time you call pool.malloc().
Memory chunks are described using a simple struct chunk_t and stored in a singly-linked-list; the struct has an internal anonymous union for size optimization: the same memory region can either represent the next pointer of the list’s node or hold user data.

P.S. Watch in HD, though I did increased the font size per your requests 🙂


Code from the class:
lesson_memory_pool_how_to.cpp
Cleaned up implementation:
memory_pool.hpp | memory_pool.cpp


How to use memory pools

In this week’s class I explained what memory pools are and how to use them to optimize frequent allocations and deallocations of objects. I also demonstrated how to overwrite operator new / operator new [] and operator delete / operator delete [] for a struct / class type.
The code for this class contains a simple benchmark which measures pool’s performance against the general-purpose allocator in C++ runtime. Notice how multi-threaded benchmark causes the pool to perform poorly (I suspect due to lock contention, though I did not yet investigate this). I didn’t even bother measuring array allocations; the memory pool benchmark causes massive fragmentation of the free chunks (by allocating all available chunks, shuffling them randomly, then freeing all) making continuous allocations virtually impossible without falling back onto runtime allocator.

Next week I plan on actually implementing a simple memory pool then again measure its performance against the runtime. Enjoy the video!

P.S. Remember to pick HD version of the video. It is also best to watch in full screen mode due to the font size used. I will try to remember to increase it during my next class. Apologies to anyone finding it hard to read the code.


Code from the class:
lesson_pool_allocator.cpp | lesson_pool_allocator_benchmark.cpp


How to detect memory leaks

During my last week’s class I discussed a source-code method of finding memory leaks in C++. What do I mean by that? Source-code method is not a way of debugging a process like one could do with Valgrind on Linux or Deleaker on Windows (which I’ve used recently and really enjoyed its integration with Visual Studio). What I am referring to is a combination of #define MACRO magic and overwriting the void* operator new. A technique to keep track of all your new / new[] and delete / delete[] statements and finding A) leaks due to missing delete statements and B) mismatches between array and non array versions of said operators.
Simply by including a header file in your source code you enable this tracing capability. Though it has a limitation: unlike Deleaker or Valgrind it only works for plain vanilla new / new[] operator; my solution does not support any other form of it.


Code from the class:
newtrace.hpp (C++17 or earlier) | newtrace.cpp | README.txt
Using boost::stacktrace:
newtrace.st.hpp | newtrace.st.cpp


How to implement a producer-consumer queue

During my most recent class I discussed implementations of simple, multi-threaded, producer-consumer queues: unbounded (can grow in size indefinitely) and bounded (can only grow to a certain size). In short: the unbounded queue will block consumers’ pop() calls if it happens to be empty until a producer pushes something onto it. Bounded queue will do the same and in addition it will block producers’ push() calls if the queue fills up to its maximum capacity, unblocking only after a consumer frees up a spot with pop(). Finally I reworked the simple thread pool from last week’s class to use said queues, greatly reducing its complexity.

In the past I have implemented both types of queues (you can find it in my archives), but the bounded one used less than optimal approach: semaphores for counting free and occupied spots. This was costly because it required three std::mutex and two std::condition_variable objects. The one I implemented during this class requires only a single std::mutex and two std::condition_variable. The unbounded queue requires only one of each.


Code from the class:
lesson_queue_how_to.cpp


How to implement a thread pool

By popular demand, and at the request of my coworkers, this week’s class was about implementing a simple thread pool using only standard C++ components. It would have taken 1h-30m if I didn’t completely forget how to use a condition variable. It’s 2h long!
Bad std::condition_variable.wait(...) predicate caused a deadlock and I spent 30 minutes trying to figure it out. If you’re interested in the step-by-step implementation only then fast forward from 1h-15m-15s to 1h-45m-0s. Otherwise watch me reason through it. I eventually realized that the predicate should have returned true if the queue contained any elements, instead it tried to answer if 1 element was added to it.
For me the lesson was…

If you want to master something, teach it.

Richard Feynman

Code from the class:
lesson_thread_pool_how_to.cpp


Light-weight RPC framework

During my latest mentoring session with coworkers I was asked to explain how modern RPC frameworks work under the hood. What better way to illustrate this than to build one from scratch using components I blogged about previously (event objects, serializer framework, and TCP sockets).

Although not part of the video recording (which I will share later in this post) I explained that RPC frameworks generally work in two ways:

  • The client and server code is created at runtime and can be built on demand. Good example of this is the XML-RPC-C library: it contains no RPC IDL nor associated compiler/code-generator. It is simpler of the two.
  • The RPC methods of the server and associated client code is defined in form of an IDL, then ran through a code-generator producing almost all of the necessary client and server code. gRPC and Thrift are prime examples of this: both come with code generators and are very sophisticated. This was the type of framework I was going to talk about.

I did not design an IDL nor did I build a parser and code-generator for my students; there wasn’t nearly enough time for that. What I wanted to illustrate were different components which would have been generated by a modern RPC framework and how they would fit together. Starting with the client interface and code created to connect and issue calls to the server, serialization of parameters, network transport, command parsing and dispatching on the server side, actual command logic (this part would not be generated by a framework), and finally everything needed to send responses back to the client.

The starting point of creating a simple calculator server capable of adding and subtracting numbers would have been the server’s interface declaration using some form of an IDL:

This would then be processed by the framework’s generator to produce actual C++ code (or any other language) that would allow the programmer to easily connect, issue calls, and receive responses from the server:

Creating instances of the RPC server would be equally easy, but would also require providing the core logic of the server procedures:

In order for this to work a lot more code would have been generated then compiled against the static parts of the RPC framework: serialization, network transport, compression, encryption, authentication code, etc.
During my class I implemented the parts that would have been generated based on the desired set of procedure calls (the IDL), and reused code I developed for this blog in the past in place of the framework’s static parts.


Keep in mind that some of the supporting code I have already prepared before recording the class and I had it open on another screen, but the IDL based code I was coming up with on the fly, aka freestyling it 🙂
I figured it’s worth a mention in case my delivery appears less than… perfect.


Reworked and cleaned up code from the class:
Protocol: lrpc_proto.hpp. Client: lrpc_c.cpp. Server: lrpc_s.cpp.
Serializer: lsf.hpp. Network: socket.hpp. Event: event.hpp.


Better serializer

In my last post I described a light-weight serialization framework I recently implemented (video where I go over it in detail). After receiving great feedback from r/cpp I have implemented several improvements that were suggested to me: better unpack transform signature, safer unpacking code, unpack casting iterator just to name a few.

The latest code is available on GitHub; the serializer source lsf.hpp, and usage example lsf.cpp.

Video where I describe the changes in greater detail:

Light-weight serialization framework

UPDATE: Better serializer

In my very first YouTube Channel video I discuss in detail the design of a serialization framework I came up with in response to a question I was recently asked. In this post I want to give a brief overview of how it works and how to get started using it.

The primary goal was to create a serialization framework that is uncoupled from the types it operates on, meaning no class nor struct needs to inherit any sort of

By default every type is packed by doing bitwise copy from its memory location into a

The code above packs a single

It is possible to perform type conversions inside the pack and unpack transforms:

Above code illustrates it: every

Here I define how to serialize

Having defined transforms for simple types we can now add more complex types, let’s say

The packing code is trivial: pack the size of the vector followed by packing each entry one at a time. Notice here I am calling

One thing I have not yet mentioned is an optimization I introduced into the code: before types are packed the exact amount of memory is reserved inside the resulting

There is a void pointer parameter but in this case it is ignored. However it is required to compute sizes of more complex types like strings; that is because in order to store the pack size procs inside an internal data structure each one needs to be wrapped inside the default lambda. This lambda gets the memory location of each type as

Here I tell the framework how to compute the number of bytes needed to serialize strings.

I initially tagged each type with a unique value so that I could verify it during unpacking to catch errors like packing an int but trying to unpack a string. This type tagging became more complicated than I first anticipated so I got rid of it, though I may revisit it and add an additional packing and unpacking functions that perform this verification optionally.

As always, the code is available on my GitHub page.
The framework is a single header file: lsf.hpp. Example program: lsf.cpp.