Some time back I reviewed code that had the following design and data structure:

Now, the above is a struct but just imagine it could be a class with private/public sections, a beefy interface and data members.

The author(s) of the code wanted to place it in some data structure that can help retrieve an “Object” using a given “ID”. So they picked the following approach:

Pretty standard data structure. A {key, value} pair where the “key” is the “ID” of the “Object” and the “value” is the “Object” itself. This is not the first time I see such pattern. It is actually quite common in various code trees.

While looking at this piece of code I wonder, can we do better? The “ID” is part of the “Object”, and the requirements are to be able to insert/find values using std::map(*). The “Object”s are ordered by “ID”. My question is – Why do we need to repeat “ID” as a “Key” ?

Before we move forward, let’s compute the current cost:

This prints “sizeof pair 48, object 40, string 32” which means we pay an extra 8 bytes for storing an integer as the “ID”/”Key”. The same integer that is already stored in “Object”.

So the immediate thought is “yes“. We can do better. What if we eliminate the need for a separate “Key” and store the “Object”s in a similar container, that provides us the same requirements/performance(O(x) wise) but with smaller memory footprints?

We have something like this. It is called std::set<> and it has very similar characteristics to std::map<>. std::map<> stores std::pair<T1,T2>. It contains 2 types. std::set<> stores a single type.

So let’s give it a shot:

And we would like to have the same idea. We’d like to lookup number “5”. But std::set<> does not have a subscript operator (“[]”). It does however have “find()” function instead:

There are couple things we need to straighten out before we open the champagne.

First, by default, std::set<> requires less (“<“) operator to be defined for the stored objects. The compiler does not know how to compare “Object”s. We need to teach it. We can pass a compare function, we can use lambda or, we can just implement a simple member function that will help us to compare Objects by IDs.

This is basically fixing the major missing part. Remember that with std::map<> we had simple “int” as a “Key” and the compiler is very capable of generating code to compare “int”s. But not for “Object”s.

There is some wart here that requires discussion. If we want to write something like:

We need to be able to convert an integer to an “Object”. We can do it using a conversion constructor:

Notice the comment. We cannot make it explicit. In most cases, conversion constructor should be marked explicit, but if you actually need the conversion, you would have to pay this trade off. Otherwise, you’d need to write something less “natural” e.g.:

In my opinion this is a trade off that we would need to pay. Unless there are other sane options. For example I could think about having this non-explicit constructor in private section and having a get() friend function accessing it but I’m not sure this worth the effort.

That’s it. I saved you 8 bytes per entry. Keep the change and thank me later!

(*) std::map is probably implemented as a Red-Black Tree with log(N) to find something and O(1) to (re)balance it

Edit: September 29 2022

I recently realized that I could use Heterogeneous lookup and solve the issue I described with the find function. Here is the example:

And now, I can caall my find function in a very natural way:

Bingo!

2 Replies to “Use case of utilizing std::set instead of std::map”

  1. Good stuff Kobi!

    BTW all the operator < or functor definitions can be replaced with lambdas in unevaluated context (courtesy of C++20):

Leave a Reply