How to implement a spinlock / event / semaphore

Multi-threading has been the theme of my previous two classes where I discussed thread-pools and producer-consumer queues. Let’s talk about the fundamental aspects of multi-threading and some of its most basic building blocks: spinlocks, automatic and manual events, and semaphores.
In this week’s class I will show you how to implement these building blocks using only standard C++ components: std::atomic_flag, std::mutex, std::condition_variable.


Code from the class:
lesson_spinlock_event_semaphore_how_to.cpp
Complete implementation:
mutex.hpp | event.hpp | semaphore.hpp


How to implement a producer-consumer queue

During my most recent class I discussed implementations of simple, multi-threaded, producer-consumer queues: unbounded (can grow in size indefinitely) and bounded (can only grow to a certain size). In short: the unbounded queue will block consumers’ pop() calls if it happens to be empty until a producer pushes something onto it. Bounded queue will do the same and in addition it will block producers’ push() calls if the queue fills up to its maximum capacity, unblocking only after a consumer frees up a spot with pop(). Finally I reworked the simple thread pool from last week’s class to use said queues, greatly reducing its complexity.

In the past I have implemented both types of queues (you can find it in my archives), but the bounded one used less than optimal approach: semaphores for counting free and occupied spots. This was costly because it required three std::mutex and two std::condition_variable objects. The one I implemented during this class requires only a single std::mutex and two std::condition_variable. The unbounded queue requires only one of each.


Code from the class:
lesson_queue_how_to.cpp


How to implement a thread pool

By popular demand, and at the request of my coworkers, this week’s class was about implementing a simple thread pool using only standard C++ components. It would have taken 1h-30m if I didn’t completely forget how to use a condition variable. It’s 2h long!
Bad std::condition_variable.wait(...) predicate caused a deadlock and I spent 30 minutes trying to figure it out. If you’re interested in the step-by-step implementation only then fast forward from 1h-15m-15s to 1h-45m-0s. Otherwise watch me reason through it. I eventually realized that the predicate should have returned true if the queue contained any elements, instead it tried to answer if 1 element was added to it.
For me the lesson was…

If you want to master something, teach it.

Richard Feynman

Code from the class:
lesson_thread_pool_how_to.cpp


Light-weight RPC framework

During my latest mentoring session with coworkers I was asked to explain how modern RPC frameworks work under the hood. What better way to illustrate this than to build one from scratch using components I blogged about previously (event objects, serializer framework, and TCP sockets).

Although not part of the video recording (which I will share later in this post) I explained that RPC frameworks generally work in two ways:

  • The client and server code is created at runtime and can be built on demand. Good example of this is the XML-RPC-C library: it contains no RPC IDL nor associated compiler/code-generator. It is simpler of the two.
  • The RPC methods of the server and associated client code is defined in form of an IDL, then ran through a code-generator producing almost all of the necessary client and server code. gRPC and Thrift are prime examples of this: both come with code generators and are very sophisticated. This was the type of framework I was going to talk about.

I did not design an IDL nor did I build a parser and code-generator for my students; there wasn’t nearly enough time for that. What I wanted to illustrate were different components which would have been generated by a modern RPC framework and how they would fit together. Starting with the client interface and code created to connect and issue calls to the server, serialization of parameters, network transport, command parsing and dispatching on the server side, actual command logic (this part would not be generated by a framework), and finally everything needed to send responses back to the client.

The starting point of creating a simple calculator server capable of adding and subtracting numbers would have been the server’s interface declaration using some form of an IDL:

This would then be processed by the framework’s generator to produce actual C++ code (or any other language) that would allow the programmer to easily connect, issue calls, and receive responses from the server:

Creating instances of the RPC server would be equally easy, but would also require providing the core logic of the server procedures:

In order for this to work a lot more code would have been generated then compiled against the static parts of the RPC framework: serialization, network transport, compression, encryption, authentication code, etc.
During my class I implemented the parts that would have been generated based on the desired set of procedure calls (the IDL), and reused code I developed for this blog in the past in place of the framework’s static parts.


Keep in mind that some of the supporting code I have already prepared before recording the class and I had it open on another screen, but the IDL based code I was coming up with on the fly, aka freestyling it 🙂
I figured it’s worth a mention in case my delivery appears less than… perfect.


Reworked and cleaned up code from the class:
Protocol: lrpc_proto.hpp. Client: lrpc_c.cpp. Server: lrpc_s.cpp.
Serializer: lsf.hpp. Network: socket.hpp. Event: event.hpp.


Case insensitive string, etc

The question of case insensitive strings has been, and continues to be asked a lot, and equally many answers can be found all over the internet. The go-to solution is to create a char_traits policy class with eq, lt, and compare methods implemented using std::toupper before comparing characters, then instantiate std::basic_string with it using istring = std::basic_string<char, char_itraits<char>> This works well until you try to use your new type anywhere near code designed with std::string in mind. Then what?

I guess what I am trying to say is that I never found a complete solution to this, so I decided to create one myself. I wanted the case insensitive string to play nicely and seamlessly with its standard counterpart. I wanted the ability to pass it as a parameter anywhere std::string is accepted; to convert between istring and std::string in both directions; push it to output stream; read it from input stream; compare it using all six operators, ==, !=, <, <=, >, >=, with std::string whether it appeared on the left or right side of the operation; and to declare literals of its type: "std::string literal"s.
In other words have it be indistinguishable from std::string except when comparisons are needed, like this:

The starting point was the character traits policy class mentioned earlier, but instead of using it with std::basic_string I inherited publicly from std::basic_string template configured with case insensitive traits policy; this pulled all its methods into the derived class scope, I only had to pull base class constructors:

All this derived class needed now was a constructor which would allow it to be created from any other type of std::basic_string as long as the character type was the same; implicit type cast operator to seamlessly convert it to std::string, and 4 comparison operators: == and <=> declared twice with istring as the first or second parameter. The constructor and comparison operators needed to be selectively enabled only for strings with different character traits policy, otherwise they would cause ambiguity and compilation errors. Final step was declaring operator >>, operator <<, and operator""_is.

P.S. Everything I just described also applies to wchar_t aka std::wstring.


The implementation on my GitHub page: istring.hpp, example program: istring.cpp.


Welcome, Kobi!

I would like to extend a warm welcome to fellow C++ enthusiast, and blog contributor, Kobi! Many of you know him from his Twitter account or as the organizer of San Diego C++ Meetup.

We have been exchanging likes and occasional comments on Twitter ever since I started this blog. He has always been encouraging and supportive in my blogging journey and I am happy he accepted my invitation to share knowledge and wisdom on this page (his first post).

Kobi will have permanent presence here posting news and updates about the San Diego C++ Meetup.

Welcome!

std::deep_ptr

std::deep_ptr aka deep copying smart pointer has not yet been introduced to the STL though several C++ experts have proposed an implementation in the past. See n3339.pdf for one such proposal and a list of possible implementations.

In this post I want to share my implementation of deep copying pointer I recently came up with. Its interface is virtually identical to std::unique_ptr with the exception of additional cloner template parameter (a policy class) that specifies how deep copies are to be made. By default the pointer is copied by doing return (p ? new T{ *p } : nullptr); that is by invoking the copy constructor of T.

My example program illustrates how to create a cloning policy for polymorphic types; in other words how to make deep copies when deep_ptr<Base> actually points to some struct Derived : Base and types in a given inheritance hierarchy implement virtual S* clone():

Finally, and just like the std::unique_ptr, my pointer type can be used with a custom deleter:

del will be called on a pointer managed by p when it goes out of scope or when p is assigned to.


Example program on my GitHub account: deep_ptr.cpp. The implementation: deep_ptr.hpp.


Rounding floating point numbers and constexpr

Good friend asked me a while ago how to round float or double to N digits of precision. Given 0.0123456789 and N = 7 how to get 0.0123457000 as the result. The only way to round numbers (inside a machine) I could think of was to multiply them by 10 ^ N (ten raised to the power of N), cast the result to some type of int effectively stripping the fractional part, followed by a cast back to a floating point representation and division by same 10 ^ N:

Or something to that effect. Translated to C++:

Then he added the dreaded do it fast,… and just as I was about to tell him to take a long walk off a short pier I remembered something about constexpr and doing arithmetic at compile time.

Looking at the implementation above I decided to start with computing the power of 10 at compile time. My first approach was a recursive template struct power_of with a partial specialization, the recursion stop, for the power of zero case (and a helper power_of_v variable declaration):

This allowed me to write power_of_v<10, 3, float> to compute at compile time the 3rd power of 10 stored as type float.
I also created a recursive constexpr function since those too can be evaluated at compile time:

With it I could write power_of_f<float>(10, 3) and it would be the equivalent of using the recursive template struct above.
I decided to think big and used std::uint64_t as the base and unsigned char as the exponent type of the computation. Hopefully overflow was not going to be an issue…

Next I moved onto rounding the floating point number to an integer type (after it has been multiplied by 10 ^ N) and quickly realized a simple type cast was not going to cut it. std::round does much more than that; it considers how close the number to be rounded is to the integer representation of it, ex.: given 0.49 is it closer to 0 or 1. It then rounds it to the closest whole number. Moreover, it considers the sign and rounds in different direction if the number is positive vs negative.

This meant I needed to determine (at compile time) whether the number is positive or negative:

Compute the absolute value of a given number:

And round the number based on the proximity to its integer representation while considering the sign:

Putting it all together and adding some sanity checks resulted in this floating point, compile time (as long as all parameters are constexpr), rounding function which produced the same exact results as its runtime equivalent:

The if statements are there to guard against number overflows. It is better to not round at all in such cases than to return garbage values, or at least this is my attitude regarding error handling in this case.

Here is a test program I created while testing my implementation; it allowed me to easily compare compile time results against the reference std::round based approach.

1: 0.0000000000
2: 0.0100000000
3: 0.0120000000
4: 0.0123000000
5: 0.0123500000
6: 0.0123460000
7: 0.0123457000
8: 0.0123456800
9: 0.0123456790
10: 0.0123456789

Test program output.

The example program on GitHub: round.cpp. The floating point rounding implementation (sprinkled with comments and decorated with template concepts for good measures): round.hpp.


Given the way float and double are represented on the level of bits it is impossible to round them exactly. You can only do so much and come so close to being exact. Playing with the example program’s value and type of variable v as well as different parameters to setprecision stream modifier will illustrate what I mean…



UPDATE

Shout out to Ben from thoughts-on-coding.com for suggesting another runtime solution (code below)!

Fun with TCP sockets

The purpose of this post is not to teach you about posix sockets; there’s plenty of that already. Instead I want to share two simple classes I implemented last night which encapsulate two types of tcp sockets: listening server socket and a client socket. These sockets as well as my recent serializer code will be the foundation of a light-weight RPC system I plan on designing and teaching to my colleagues at work and elsewhere.
I will try to illustrate what happened under the hood of frameworks like gRPC, Thrift, or a much simpler XML-RPC

The server’s socket only job is to bind to a local port and listen for incoming connections. The client socket represents a connection to a server. It can be created either by the server socket using its private constructor, or by the user with its public constructor.

The difference between my implementation and what I commonly see online is that my design (not necessarily better than others) is event driven, meaning you do not pull on the client socket to receive incoming data. Instead the receive function dispatches an event when data arrives; all you have to do is put the method in a while loop to keep receiving incoming packets. Here’s the event handler and the receive loop:

Here the socket’s event handler converts the incoming bytes to a string, and if not empty prints it to standard output. The loop blocks until more data arrives, dispatches the event when it does, then goes back to sleep. This is a part of a simple echo client and server I created to test the socket code; it will be shown further down this post.

The server’s handler for incoming connections is a bit more complicated; it defines what to do with the newly created connection socket, as well as what that socket should do when it received data. You’ve been warned:

Upon receiving a new connection the handler prints out some basic info about the connecting machine, like host name, IP address, and the originating port number. It then tells this socket that, when data arrives on it from the client machine, it should convert it to a string, print it, and send the data right back to where it came from. It is not a true echo server, because the string is reversed before being sent to the client, but for the purpose of this exercise it will suffice. It only prints and sends the data back if it is not an empty string. Finally it looks for a special keyword ‘die’ that instructs the server to shut down.

The client on the other hand waits for incoming data from the server on the main thread, while a background thread reads user input, line by line, checks if it isn’t an empty line, checks for keyword ‘q’ which instructs it to disconnect from the server, and finally sends the data and waits for an event. This event is signaled only once the response comes back from the server, so that the keyboard input and programs output stay nicely separated on their own lines. Client loop below:


The socket code is located on my GitHub page: sockets.hpp, along with the echo client: echo_c.cpp and server: echo_s.cpp.

Better serializer

In my last post I described a light-weight serialization framework I recently implemented (video where I go over it in detail). After receiving great feedback from r/cpp I have implemented several improvements that were suggested to me: better unpack transform signature, safer unpacking code, unpack casting iterator just to name a few.

The latest code is available on GitHub; the serializer source lsf.hpp, and usage example lsf.cpp.

Video where I describe the changes in greater detail:

Light-weight serialization framework

UPDATE: Better serializer

In my very first YouTube Channel video I discuss in detail the design of a serialization framework I came up with in response to a question I was recently asked. In this post I want to give a brief overview of how it works and how to get started using it.

The primary goal was to create a serialization framework that is uncoupled from the types it operates on, meaning no class nor struct needs to inherit any sort of

By default every type is packed by doing bitwise copy from its memory location into a

The code above packs a single

It is possible to perform type conversions inside the pack and unpack transforms:

Above code illustrates it: every

Here I define how to serialize

Having defined transforms for simple types we can now add more complex types, let’s say

The packing code is trivial: pack the size of the vector followed by packing each entry one at a time. Notice here I am calling

One thing I have not yet mentioned is an optimization I introduced into the code: before types are packed the exact amount of memory is reserved inside the resulting

There is a void pointer parameter but in this case it is ignored. However it is required to compute sizes of more complex types like strings; that is because in order to store the pack size procs inside an internal data structure each one needs to be wrapped inside the default lambda. This lambda gets the memory location of each type as

Here I tell the framework how to compute the number of bytes needed to serialize strings.

I initially tagged each type with a unique value so that I could verify it during unpacking to catch errors like packing an int but trying to unpack a string. This type tagging became more complicated than I first anticipated so I got rid of it, though I may revisit it and add an additional packing and unpacking functions that perform this verification optionally.

As always, the code is available on my GitHub page.
The framework is a single header file: lsf.hpp. Example program: lsf.cpp.


The future of C++

I hope the title of this post got your attention! What I really mean is the future direction of this blog…

In 2019 I created 85 short articles, in 2020 I blogged only twice. The last 12 months have pulled my attention away from this site and toward mentoring other engineers at work by hosting online C++ and OO classes. I have mentored tens of engineers over the span of 19 classes in the past 6 months, totaling around 25 hours of recorded material and thousands of lines of code.

I have always enjoyed sharing my knowledge by teaching, this is why this blog came into existence and the feedback I received from it has been fantastic (and humbling). The interactive classes I hold have been particularly rewarding to me and I decided to open them up to the world.

Going forward I want to hold live and interactive classes like I do at work, but open to anyone on the internet. Perhaps also 1 on 1 sessions to help others with their day to day coding challenges. All of this I will publish on my YouTube Channel where you can already find one of my classes I held earlier this week.

I hope you will subscribe and follow me there. I also want to ask you for topics and questions you would like to see me tackle as I reason through the design and implementation of my solutions.

Finally I wanted to add that I will continue to blog. Hopefully the world will soon return to normal and I can get back to what I enjoy the most: C++

Stay safe!

Make std::vector of T, efficiently

What I want to briefly talk about today is likely obvious to seasoned C++ programmers, but may be of benefit for the newcomers to the language and the STL: that is how to efficiently create std::vector of any number of elements of type T, whatever T may be. The code is short and sweet, so let’s jump right into it:

In lines #3 and #4 we define the creation function template ( inline because it’s likely to live in a header file and we don’t want to violate the ODR) that says make me N instances of T and use A as parameter(s) to T’s constructor (see parameter packs and fold expressions). Next we create an instance of std::vector<T> in line #6 and reserve enough space for N instances of T in line #7 (without actually constructing instances of T yet). Line #8 is self explanatory, and the efficiency magic happens in line #9. Instead of creating a temporary instance of T and calling .push_back we instead .emplace_back each instance of T, meaning we construct it in the memory previously reserved for the elements of the container in line #7. This way we avoid unnecessary copy or move constructors (which may not even be defined for type T).

Before C++11 and move semantics we would have been stuck with creating an instance of T, then copy constructing it into every element of std::vector<T>, like this:

And don’t even get me started on having to return std::vector<T> from a function and how costly that would have been (again, prior to the magic of move semantics)! We’re in a brave new world…

P.S. I hope you’re all doing well and staying safe in this COVID-19 world! The situation has certainly put a strain on me and a dent in my blogging frequency. I hope to return to my normal post-or-more-a-week blogging frequency soon!

Singleton Pattern

First things first: if you’re still reading this blog, thanks! I haven’t posted anything since last December; it has been a rough first half of the year: COVID-19, working from home, isolation, you know the deal, but I have not given up on blogging, just finding the time for it has been nearly impossible. I have a long list of topics I eventually want to cover but I can’t make any promises until this mad situation normalizes…

Alright then! A friend from work asked me about the Singleton Design Pattern and how to best implement it in C++. I have done it in the past but was not happy with that implementation; it insisted that the singleton class have a default constructor for example, so I started coding something that would allow non-default initialization. The first thing I came up with was a singleton template base class using the Curiously Recurring Template Pattern. It worked but it required you to declare its constructor private and declare the singleton base class a friend:

This approach allowed me to separate the singleton creation S::Create(17); from usage S::Instance()->foo(); but I was still bothered by the need for private constructor(s) and friendship, so I kept experimenting… I wanted a simpler solution, one that by the very nature of inheriting the singleton base class would automatically render the class non-instantiable by anyone other than the parent template:

The name of the singleton base class probably gave away the approach, but let me explain anyway: the abstract_singleton<AS> base injects a pure virtual method, preventing one from creating instances of class AS. The parent class later erases the abstraction by implementing the private pure virtual method before creating an instance of AS (it actually creates an instance of a private type derived from AS, the compiler takes care of the rest; this works because in C++ access and visibility of a member are two distinct concepts):

One can of course easily defeat the mechanism by which instance creation is restricted by implementing the void abstract_singleton() override {} in the class meant to be a singleton. I don’t think there is much that can be done about that, but if it’s not done on purpose the compiler will detect attempts of creating instances and will fail with cannot instantiate an abstract class error.

Here’s the example program (singleton.cpp):

And the complete listing (singleton.hpp):

std::unique_ptr, or how to express explicit ownership

When we talk about std::unique_ptr we must mention the idea of explicit resource ownership as well as the concept of a resource source and sink. By wrapping a pointer inside std::unique_ptr we state that whoever holds the std::unique_ptr owns the resource explicitly: has complete control over its lifetime. Prior to C++11 this was expressed using std::auto_ptr. Modern C++ deprecated std::auto_ptr and addressed its shortcomings by introducing the std::unique_ptr.

A source is a function that creates a resource then relinquishes its ownership; in other words, it gives the ownership to whoever called it, and from then on, the caller is responsible for releasing that resource.

A sink is a function which accepts a resource as a parameter and assumes ownership of it; in other words, it promises to release said resource once it’s done executing.

Transfer of ownership must be stated explicitly for named instances (variables of type std::unique_ptr) with std::move. Unnamed temporaries can be passed into a variable or as a function parameter without invoking std::move (this will be clearly illustrated in the code sample below).

Explicit ownership of a resource can be “upgraded” to shared ownership by std::move’ing the std::unique_ptr into a std::shared_ptr. It resets the std::unique_ptr to point back at nullptr and initializes the std::shared_ptr with reference count of 1 (also illustrated in the code below).

Worth mentioning is the fact that std::unique_ptr can own arrays of objects allocated with operator new. A partial specialization of std::unique_ptr template for array types also overloads operator[] for easy array element access; it mirrors the syntax of native arrays. However there are limitations of storing arrays inside std::unique_ptr: when creating an array of primitive types an initial value for each element cannot be specified (and each element ends up with an undefined value); when creating an array of user-defined types (classes and structs) said type must have a default constructor. Moreover the size of the array is lost: there is no way to query the std::unique_ptr and find out how many elements the array holds. For those reasons I recommend you stick with std::vector: it’s elements can be initialized with a default or custom value and you can always call size() on it.

The example program below, using inline comments, further details how to transfer the resource ownership and how to express the idea of source and sink funtions.

Complete listing on GitHub: unique.cpp and T.hpp. Merry Christmas and Happy New Year!

When exactly does the std::shared_ptr take ownership?

In a post I wrote few days ago titled “A word on std::shared_ptr” I was mistakenly arguing that, should the std::shared_ptr fail to allocate its control block (where the reference count and other information is stored), the passed raw pointer would not be deleted. Example:

auto p = std::shared_ptr<T>(new T);

Here, if the allocation and construction of T succeeded, the code would then enter shared pointer’s constructor. Inside this constructor the allocation of the control block would take place. If that failed, I argued, the newly allocated instance of T would leak. That is not what actually happens!

After re-reading the C++ specification, and looking over the shared pointer’s implementation supplied with Visual Studio 2017, it is now clear to me that the allocated instance of T will in fact be deleted.

Since learning of this mistake, I have taken down that post and redirected the original URL here, where I take a more in-depth look at the std::shared_ptr.

I would like to thank the kind people on reddit for taking the time to set me straight! It was both a humbling and an embarrassing experience, but I am a better C++ programmer because of it.

In-depth look at C++ shared pointers

First, you should use std::make_shared (most of the time) to create shared pointers. It is optimized to perform only one allocation for both the reference count / control block and the object it holds.

It is also safer to use std::make_shared in the presence of exceptions:

some_function(std::shared_ptr<T>(new T), std::shared_ptr<T>(new T));

Prior to C++17, one of the possible orders of events could have been:
1) new T
2) new T
3) std::shared_ptr<T>(...)
4) std::shared_ptr<T>(...)

So in the worse case the code could leak an instance of T if the second allocation in step #2 failed. That is not the case with C++17. It has more stringent requirements on the order of function parameter evaluation. C++17 dictates that each parameter must be evaluated completely before the next one is handled. So the new order of events becomes:
1) 1st new T
2) 1st std::shared_ptr<T>(...)
3) 2nd new T
4) 2nd std::shared_ptr<T>(...)
or:
1) 2nd new T
2) 2nd std::shared_ptr<T>(...)
3) 1st new T
4) 1st std::shared_ptr<T>(...)

Shared pointers can also be used with arrays by providing a custom deleter, like this:

auto p = std::shared_ptr<T>(new T[N], [](T* ptr) { delete [] ptr; });

Unfortunately you can not use std::make_shared for that, and it is much easier to use arrays with std::unique_ptr since std::shared_ptr does not provide an overloaded operator [], but about that in my next post 😉

Disadvantages of using std::make_shared:

std::make_shared can only construct instances of T using T’s public constructors. If you need to create an instance using private or protected constructor you have to do it from within a member or static-member function of T. It is not possible to declare std::make_shared to be a friend of T (technically you can declare it to be a friend, but during invocation of std::make_shared it will fail a static_assert inside std::is_constructible type trait class, so for all intents and purposes friendship is not possible here).

std::weak_ptr makes things even more complicated:

Finally, let’s take a look at when instances of objects held by std::shared_ptr are destroyed and their memory released. Under normal circumstances the destructor of T held by a shared pointer will be called when the last shared pointer is destroyed, and the memory where T lived will be released.

That is not the case if the shared pointer of T was constructed using std::make_shared and std::weak_ptr was made to point at it.

In order to function properly std::weak_ptr must hold a reference to the shared pointer’s control block so that it can: 1) answer the call to use_count() and 2) return a nullptr when lock() is called on it after the last shared pointer went out of scope. If the shared pointer’s control block and the instance of T lived in the same memory block, that memory can not be freed until all shared and weak pointers referencing it go away. Now, the destructor of T will be called when the last shared pointer goes away, but the memory will linger until the remaining weak pointers are gone.

I didn’t mention anything about passing shared pointers as function arguments or return values… but that’s a topic about higher level design of object ownership and lifetime; maybe I’ll write about it after I cover unique pointers.

Avoiding deadlocks the C++ way

When interviewing engineers for a C++ programming position the question of deadlocks often comes up (ask me how I know this 😉 ). What is it? And how to avoid it?

Often times a deadlock occurs due to a wrong order of acquiring locks: multiple threads need to access 2 or more shared resources; each resource requires mutual exclusion. It goes something like this: thread 1 has successfully acquires the lock for resource 1, then tries to acquire the lock for resource 2. While on another CPU core, around the same time, thread 2 has successfully acquired the lock for resource 2, and now tries to acquire the lock for resource 1. Both threads are now stuck! Thread 1 holds resource 1 and waits for resource 2. Thread 2 holds resource 2 and waits for resource 1. Nasty business! A partial implementation illustrating this bug looks something like this:

Notice that thread t1 locks mutex m1 first, m2 second. Thread t2 does the opposite. Another thing I would like to point out in the above example is the explicit calls to lock() and unlock(). This is dangerous because 1) you may forget to call unlock() on a mutex you previously locked, and 2) in the presence of exceptions emitted from DO_SOME_WORK(...) the locks you acquired will not be automatically released. A perfect solution to both issues already exists: the RAII technique.

The way to improve all that is wrong with the above code is to always lock the mutex’es in the same order, and have a mutex owning local object handle the unlocking, whether exiting the function normally or due to an exception. But locking not just by explicitly writing the lock() calls in the right order; rather a more elegant, automatic solution is desired here. C++ has just the thing for you: std::lock (see here) and std::scoped_lock (and here). In short: std::lock will perform deadlock resolution magic, even if thread 1 calls std::lock(mutex1, mutex2);, while thread 2 calls std::lock(mutex2, mutex1);, but you will still need to call unlock() explicitly on the mutex’es if that is what you desire. Alternatively (and preferably) you will pass the mutex’es to std::scoped_lock which will use std::lock internally to guarantee no deadlocks take place: std::scoped_lock guard(mutex1, mutex2);. Deadlock free and exception safe (in terms of properly unlocking the mutex’es) partial implementation looks something like this:

The order in which the mutex’es are passed to std::scoped_lock is irrelevant. Internally std::lock will do the right thing. In presence of exceptions (which I am not catching in the code above, but you should 😉 ) the destructors of local guard objects will release the locks held.

Complete listing below (and on the web at GitHub: deadlock.cpp):

Multi-hashing

Yes I totally invented this term 😛 What I mean by it is producing multiple hashes from a single key. Like this (if the syntax is unfamiliar to you read this):

Or like this (for non-template version which returns a vector):

Why? One place where I needed such sorcery was my bloom filter implementation. The idea is simple: one key, multiple hashes, repeatable (multiple calls with the same key produce the same hashes). But how? STL only comes with one hashing function. True, but it comes with multiple random number generators which can be seeded with a hash!

The solution then is to hash once, seed the random number generator, and make multiple calls to the RNG, like this (hash.hpp):

You can use it like this (multi_hash.cpp):

HashN(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953

HashN(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953

HashN(‘https://vorbrodt.blog’):
1924360287
1619619789
1594567998

HashNT(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953

HashNT(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953

HashNT(‘https://vorbrodt.blog’):
1924360287
1619619789
1594567998

Program output.