Some time back I reviewed code that had the following design and data structure:
1 2 3 4 5 6 |
struct Object final { // some constructors, more data members, more member functions Object(const int id, std::string n) : id_(id), name_(std::move(n)) {} int id_{}; std::string name_; }; |
Now, the above is a struct but just imagine it could be a class with private/public sections, a beefy interface and data members.
The author(s) of the code wanted to place it in some data structure that can help retrieve an “Object” using a given “ID”. So they picked the following approach:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
// usage example with some helper functions void add_new(std::map<int, Object>& m, Object obj) { m.insert(std::make_pair(obj.id_, std::move(obj))); } void add_new(std::map<int, Object>& m, const int id, const std::string& name) { m.insert(std::make_pair(id, Object{id, name})); } // the simple data structure std::map<int, Object> mymap; add_new(mymap, Object{1, "1"s}); add_new(mymap, 5, "5"s); add_new(mymap, 3, "3"s); add_new(mymap, -7, "-7"s); add_new(mymap, Object{4, "4"s}); // usage of key to find/access Object const auto& n = mymap[5].name_; fmt::print("get 5's name {}\n", n); assert(n == "5"s); |
Pretty standard data structure. A {key, value} pair where the “key” is the “ID” of the “Object” and the “value” is the “Object” itself. This is not the first time I see such pattern. It is actually quite common in various code trees.
While looking at this piece of code I wonder, can we do better? The “ID” is part of the “Object”, and the requirements are to be able to insert/find values using std::map(*). The “Object”s are ordered by “ID”. My question is – Why do we need to repeat “ID” as a “Key” ?
Before we move forward, let’s compute the current cost:
1 2 3 4 |
// This prints: sizeof pair 48, object 40, string 32 fmt::print("sizeof pair {}, object {}, string {}\n", sizeof(std::pair<int, Object>), sizeof(Object), sizeof(std::string)); |
This prints “sizeof pair 48, object 40, string 32” which means we pay an extra 8 bytes for storing an integer as the “ID”/”Key”. The same integer that is already stored in “Object”.
So the immediate thought is “yes“. We can do better. What if we eliminate the need for a separate “Key” and store the “Object”s in a similar container, that provides us the same requirements/performance(O(x) wise) but with smaller memory footprints?
We have something like this. It is called std::set<> and it has very similar characteristics to std::map<>. std::map<> stores std::pair<T1,T2>. It contains 2 types. std::set<> stores a single type.
So let’s give it a shot:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
// more helper functions void add_new(std::set<Object>& s, Object obj) { s.insert(std::move(obj)); } void add_new(std::set<Object>& s, const int id, const std::string& name) { s.insert(Object{id, name}); } std::set<Object> myset; add_new(myset, Object{1, "1"s}); add_new(myset, 5, "5"s); add_new(myset, 3, "3"s); add_new(myset, -7, "-7"s); add_new(myset, Object{4, "4"s}); |
And we would like to have the same idea. We’d like to lookup number “5”. But std::set<> does not have a subscript operator (“[]”). It does however have “find()” function instead:
1 2 3 |
const auto& nset = myset.find(5)->name_; fmt::print("get 5's name {}\n", nset); assert(nset == "5"s); |
There are couple things we need to straighten out before we open the champagne.
First, by default, std::set<> requires less (“<“) operator to be defined for the stored objects. The compiler does not know how to compare “Object”s. We need to teach it. We can pass a compare function, we can use lambda or, we can just implement a simple member function that will help us to compare Objects by IDs.
1 2 3 4 5 6 7 8 9 10 |
struct Object final { // same as above // ... // new function definition [[nodiscard("why would you discard me?")]] bool operator<(const Object& o) const noexcept { return id_ < o.id_; } }; |
This is basically fixing the major missing part. Remember that with std::map<> we had simple “int” as a “Key” and the compiler is very capable of generating code to compare “int”s. But not for “Object”s.
There is some wart here that requires discussion. If we want to write something like:
1 |
const auto& nset = myset.find(5)->name_; |
We need to be able to convert an integer to an “Object”. We can do it using a conversion constructor:
1 2 3 4 5 6 7 |
struct Object final { // cannot make it explicit since we need it as a conversion operator Object(const int id) : id_(id) {} // same as before // ... }; |
Notice the comment. We cannot make it explicit. In most cases, conversion constructor should be marked explicit, but if you actually need the conversion, you would have to pay this trade off. Otherwise, you’d need to write something less “natural” e.g.:
1 |
const auto& nset = myset.find(Object{5})->name_; |
In my opinion this is a trade off that we would need to pay. Unless there are other sane options. For example I could think about having this non-explicit constructor in private section and having a get() friend function accessing it but I’m not sure this worth the effort.
That’s it. I saved you 8 bytes per entry. Keep the change and thank me later!
(*) std::map is probably implemented as a Red-Black Tree with log(N) to find something and O(1) to (re)balance it
Edit: September 29 2022
I recently realized that I could use Heterogeneous lookup and solve the issue I described with the find function. Here is the example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
struct Object final { // some constructors, more data members, more member functions explicit Object(const int id, std::string n) : id_(id), name_(std::move(n)) {} int id_{}; std::string name_; }; struct Compare final { using is_transparent = void; bool operator()(const Object& obj, const int id) const { return obj.id_ < id; } bool operator()(const int id, const Object& obj) const { return id < obj.id_; } bool operator()(const Object& l, const Object& r) const { return l.id_ < r.id_; } }; |
And now, I can caall my find function in a very natural way:
1 2 3 |
std::set<Object, Compare> myset; // .. as before const auto& nset = myset.find(5)->name_; // <-- check this out! |
Bingo!
Good stuff Kobi!
BTW all the operator < or functor definitions can be replaced with lambdas in unevaluated context (courtesy of C++20):