The 25th Meetup of the San Diego C++ Group was held on April 27th, 2021 and featured guest speaker Jerry Wiltse from the Conan Package Manager Team. The recording and slides of the presentation can be found here.
How to implement a spinlock / event / semaphore
Multi-threading has been the theme of my previous two classes where I discussed thread-pools and producer-consumer queues. Let’s talk about the fundamental aspects of multi-threading and some of its most basic building blocks: spinlocks, automatic and manual events, and semaphores.
In this week’s class I will show you how to implement these building blocks using only standard C++ components:
std::atomic_flag,
std::mutex,
std::condition_variable.
Code from the class:
lesson_spinlock_event_semaphore_how_to.cpp
Complete implementation:
mutex.hpp | event.hpp | semaphore.hpp
How to implement a producer-consumer queue
During my most recent class I discussed implementations of simple, multi-threaded, producer-consumer queues: unbounded (can grow in size indefinitely) and bounded (can only grow to a certain size). In short: the unbounded queue will block consumers’ pop() calls if it happens to be empty until a producer pushes something onto it. Bounded queue will do the same and in addition it will block producers’ push() calls if the queue fills up to its maximum capacity, unblocking only after a consumer frees up a spot with pop(). Finally I reworked the simple thread pool from last week’s class to use said queues, greatly reducing its complexity.
In the past I have implemented both types of queues (you can find it in my archives), but the bounded one used less than optimal approach: semaphores for counting free and occupied spots. This was costly because it required three std::mutex and two std::condition_variable objects. The one I implemented during this class requires only a single std::mutex and two std::condition_variable. The unbounded queue requires only one of each.
Code from the class:
lesson_queue_how_to.cpp
How to implement a thread pool
By popular demand, and at the request of my coworkers, this week’s class was about implementing a simple thread pool using only standard C++ components. It would have taken 1h-30m if I didn’t completely forget how to use a condition variable. It’s 2h long!
Bad
std::condition_variable.wait(...) predicate caused a deadlock and I spent 30 minutes trying to figure it out. If you’re interested in the step-by-step implementation only then fast forward from 1h-15m-15s to 1h-45m-0s. Otherwise watch me reason through it. I eventually realized that the predicate should have returned
true if the queue contained any elements, instead it tried to answer if 1 element was added to it.
For me the lesson was…
If you want to master something, teach it.
Richard Feynman
Code from the class:
lesson_thread_pool_how_to.cpp
Light-weight RPC framework
During my latest mentoring session with coworkers I was asked to explain how modern RPC frameworks work under the hood. What better way to illustrate this than to build one from scratch using components I blogged about previously (event objects, serializer framework, and TCP sockets).
Although not part of the video recording (which I will share later in this post) I explained that RPC frameworks generally work in two ways:
- The client and server code is created at runtime and can be built on demand. Good example of this is the XML-RPC-C library: it contains no RPC IDL nor associated compiler/code-generator. It is simpler of the two.
- The RPC methods of the server and associated client code is defined in form of an IDL, then ran through a code-generator producing almost all of the necessary client and server code. gRPC and Thrift are prime examples of this: both come with code generators and are very sophisticated. This was the type of framework I was going to talk about.
I did not design an IDL nor did I build a parser and code-generator for my students; there wasn’t nearly enough time for that. What I wanted to illustrate were different components which would have been generated by a modern RPC framework and how they would fit together. Starting with the client interface and code created to connect and issue calls to the server, serialization of parameters, network transport, command parsing and dispatching on the server side, actual command logic (this part would not be generated by a framework), and finally everything needed to send responses back to the client.
The starting point of creating a simple calculator server capable of adding and subtracting numbers would have been the server’s interface declaration using some form of an IDL:
1 2 3 4 5 |
interface Calculator { int Add(int, int); int Sub(int, int); } |
This would then be processed by the framework’s generator to produce actual C++ code (or any other language) that would allow the programmer to easily connect, issue calls, and receive responses from the server:
1 2 3 |
auto c = RPCClient(server_host_name, server_port); auto a = c.Add(2, 2); auto s = c.Sub(10, 5); |
Creating instances of the RPC server would be equally easy, but would also require providing the core logic of the server procedures:
1 2 3 4 |
auto s = RPCServer(listening_port); s.SetAddHandler([](int x, int y) { return x + y; }); s.SetSubHandler([](int x, int y) { return x - y; }); s.Run(); |
In order for this to work a lot more code would have been generated then compiled against the static parts of the RPC framework: serialization, network transport, compression, encryption, authentication code, etc.
During my class I implemented the parts that would have been generated based on the desired set of procedure calls (the IDL), and reused code I developed for this blog in the past in place of the framework’s static parts.
Keep in mind that some of the supporting code I have already prepared before recording the class and I had it open on another screen, but the IDL based code I was coming up with on the fly, aka freestyling it 🙂
I figured it’s worth a mention in case my delivery appears less than… perfect.
Reworked and cleaned up code from the class:
Protocol: lrpc_proto.hpp. Client: lrpc_c.cpp. Server: lrpc_s.cpp.
Serializer: lsf.hpp. Network: socket.hpp. Event: event.hpp.
Case insensitive string, etc
The question of case insensitive strings has been, and continues to be asked a lot, and equally many answers can be found all over the internet. The go-to solution is to create a char_traits policy class with eq, lt, and compare methods implemented using std::toupper before comparing characters, then instantiate std::basic_string with it using istring = std::basic_string<char, char_itraits<char>> This works well until you try to use your new type anywhere near code designed with std::string in mind. Then what?
I guess what I am trying to say is that I never found a complete solution to this, so I decided to create one myself. I wanted the case insensitive string to play nicely and seamlessly with its standard counterpart. I wanted the ability to pass it as a parameter anywhere
std::string is accepted; to convert between
istring and
std::string in both directions; push it to output stream; read it from input stream; compare it using all six operators,
==,
!=,
<,
<=,
>,
>=, with
std::string whether it appeared on the left or right side of the operation; and to declare literals of its type:
"std::string literal"s.
In other words have it be indistinguishable from
std::string except when comparisons are needed, like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
auto f_std = [](const string& s) {}; auto f_is = [](const istring& s) {}; auto std1 = string{"abc"}; auto istr1 = istring{"ABC"}; auto istr2 = "istring literal"_is; f_std(istr1); f_is(std1); auto result = (std1 == istr1); std1 = istr1; istr1 = std1; |
The starting point was the character traits policy class mentioned earlier, but instead of using it with std::basic_string I inherited publicly from std::basic_string template configured with case insensitive traits policy; this pulled all its methods into the derived class scope, I only had to pull base class constructors:
1 2 3 4 5 6 |
template<typename CharT, typename Alloc = std::allocator<CharT>> class basic_istring : public std::basic_string<CharT, char_itraits<CharT>, Alloc> { public: using base = std::basic_string<CharT, char_itraits<CharT>, Alloc>; using base::base; // use base class constructors |
All this derived class needed now was a constructor which would allow it to be created from any other type of std::basic_string as long as the character type was the same; implicit type cast operator to seamlessly convert it to std::string, and 4 comparison operators: == and <=> declared twice with istring as the first or second parameter. The constructor and comparison operators needed to be selectively enabled only for strings with different character traits policy, otherwise they would cause ambiguity and compilation errors. Final step was declaring operator >>, operator <<, and operator""_is.
P.S. Everything I just described also applies to wchar_t aka std::wstring.
The implementation on my GitHub page: istring.hpp, example program: istring.cpp.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
#pragma once #include <istream> #include <ostream> #include <compare> #include <string> #include <locale> #include <utility> #include <algorithm> #include <type_traits> inline namespace detail { template<typename CharT> inline auto char_ieq(CharT c1, CharT c2, const std::locale& loc = std::locale()) { return std::toupper(c1, loc) == std::toupper(c2, loc); }; template<typename CharT> inline auto char_ilt(CharT c1, CharT c2, const std::locale& loc = std::locale()) { return std::toupper(c1, loc) < std::toupper(c2, loc); }; template<typename CharT> inline auto string_icmp(const CharT* s1, std::size_t n1, const CharT* s2, std::size_t n2, const std::locale& loc = std::locale()) { if(std::lexicographical_compare(s1, s1 + n1, s2, s2 + n2, [&](CharT c1, CharT c2) { return char_ilt(c1, c2, loc); })) return -1; if(std::lexicographical_compare(s2, s2 + n2, s1, s1 + n1, [&](CharT c1, CharT c2) { return char_ilt(c1, c2, loc); })) return 1; return 0; } template<typename CharT> struct char_itraits : std::char_traits<CharT> { static auto eq(CharT c1, CharT c2) { return char_ieq(c1, c2); } static auto lt(CharT c1, CharT c2) { return char_ilt(c1, c2); } static auto compare(const CharT* s1, const CharT* s2, std::size_t n) { return string_icmp(s1, n, s2, n); } }; } template<typename CharT, typename Alloc = std::allocator<CharT>> class basic_istring : public std::basic_string<CharT, char_itraits<CharT>, Alloc> { public: using base = std::basic_string<CharT, char_itraits<CharT>, Alloc>; using base::base; template<typename Traits2, typename Alloc2, std::enable_if_t<not std::is_same_v<char_itraits<CharT>, Traits2>, void>* = nullptr> basic_istring(const std::basic_string<CharT, Traits2, Alloc2>& str) : base(str.data(), str.length()) {} operator auto () const { return std::basic_string<CharT>(this->data(), this->length()); } template<typename Traits2, typename Alloc2> std::enable_if_t<not std::is_same_v<char_itraits<CharT>, Traits2>, bool> friend operator == (const basic_istring& lhs, std::basic_string<CharT, Traits2, Alloc2>& rhs) { return string_icmp(lhs.data(), lhs.length(), rhs.data(), rhs.length()) == 0; } template<typename Traits2, typename Alloc2> std::enable_if_t<not std::is_same_v<char_itraits<CharT>, Traits2>, std::strong_ordering> friend operator <=> (const basic_istring& lhs, std::basic_string<CharT, Traits2, Alloc2>& rhs) { return string_icmp(lhs.data(), lhs.length(), rhs.data(), rhs.length()) <=> 0; } template<typename Traits2, typename Alloc2> std::enable_if_t<not std::is_same_v<char_itraits<CharT>, Traits2>, bool> friend operator == (std::basic_string<CharT, Traits2, Alloc2>& lhs, const basic_istring& rhs) { return string_icmp(lhs.data(), lhs.length(), rhs.data(), rhs.length()) == 0; } template<typename Traits2, typename Alloc2> std::enable_if_t<not std::is_same_v<char_itraits<CharT>, Traits2>, std::strong_ordering> friend operator <=> (std::basic_string<CharT, Traits2, Alloc2>& lhs, const basic_istring& rhs) { return string_icmp(lhs.data(), lhs.length(), rhs.data(), rhs.length()) <=> 0; } }; using istring = basic_istring<char>; using iwstring = basic_istring<wchar_t>; inline auto& operator >> (std::istream& is, istring& istr) { std::string temp; is >> temp; istr = std::move(temp); return is; } inline auto& operator >> (std::wistream& wis, iwstring& iwstr) { std::wstring temp; wis >> temp; iwstr = std::move(temp); return wis; } inline auto& operator << (std::ostream& os, const istring& istr) { os << istr.c_str(); return os; } inline auto& operator << (std::wostream& wos, const iwstring& iwstr) { wos << iwstr.c_str(); return wos; } inline auto operator ""_is(const char* istr, std::size_t len) { return istring(istr, len); } inline auto operator ""_iws(const wchar_t* iwstr, std::size_t len) { return iwstring(iwstr, len); } |
Welcome, Kobi!
I would like to extend a warm welcome to fellow C++ enthusiast, and blog contributor, Kobi! Many of you know him from his Twitter account or as the organizer of San Diego C++ Meetup.
We have been exchanging likes and occasional comments on Twitter ever since I started this blog. He has always been encouraging and supportive in my blogging journey and I am happy he accepted my invitation to share knowledge and wisdom on this page (his first post).
Kobi will have permanent presence here posting news and updates about the San Diego C++ Meetup.
Welcome!
std::deep_ptr
std::deep_ptr aka deep copying smart pointer has not yet been introduced to the STL though several C++ experts have proposed an implementation in the past. See n3339.pdf for one such proposal and a list of possible implementations.
In this post I want to share my implementation of deep copying pointer I recently came up with. Its interface is virtually identical to std::unique_ptr with the exception of additional cloner template parameter (a policy class) that specifies how deep copies are to be made. By default the pointer is copied by doing return (p ? new T{ *p } : nullptr); that is by invoking the copy constructor of T.
My example program illustrates how to create a cloning policy for polymorphic types; in other words how to make deep copies when deep_ptr<Base> actually points to some struct Derived : Base and types in a given inheritance hierarchy implement virtual S* clone():
1 2 3 4 5 |
auto clone = [](S* p) { return p->clone(); }; using clone_ptr = deep_ptr<S, decltype(clone)>; auto p1 = clone_ptr(new S, clone); auto p2 = p1; |
Finally, and just like the std::unique_ptr, my pointer type can be used with a custom deleter:
1 2 3 4 |
auto del = [](S* p) { delete p; }; using del_ptr = deep_ptr<S, decltype(clone), decltype(del)>; auto p = del_ptr(new S, clone, del); |
del will be called on a pointer managed by p when it goes out of scope or when p is assigned to.
Example program on my GitHub account: deep_ptr.cpp. The implementation: deep_ptr.hpp.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
#pragma once #include <memory> #include <utility> #include <compare> #include <type_traits> #include <initializer_list> #include <cstddef> template<typename T> struct default_cloner { using pointer = T*; pointer operator () (pointer p) const { return (p ? new T{ *p } : nullptr); } }; template<typename T, typename C = default_cloner<T>, typename D = std::default_delete<T>> class deep_ptr { public: using pointer = T*; using element_type = T; using reference_type = T&; using cloner_type = C; using deleter_type = D; constexpr deep_ptr() noexcept = default; constexpr deep_ptr(std::nullptr_t) noexcept {} explicit deep_ptr(pointer p) noexcept : m_p{ p } {} deep_ptr(pointer p, cloner_type c) : m_c{ c }, m_p{ p } {} deep_ptr(pointer p, deleter_type d) : m_d{ d }, m_p{ p } {} deep_ptr(pointer p, cloner_type c, deleter_type d) : m_c{ c }, m_d{ d }, m_p{ p } {} deep_ptr(const deep_ptr& d) : m_c{ d.m_c }, m_d{ d.m_d }, m_p{ get_cloner()(d.m_p) } {} deep_ptr(deep_ptr&& d) noexcept : m_c{ std::move(d.m_c) }, m_d{ std::move(d.m_d) }, m_p{ std::exchange(d.m_p, nullptr) } {} template<typename U, typename V, typename W> deep_ptr(const deep_ptr<U, V, W>& d) : m_c{ d.m_c }, m_d{ d.m_d }, m_p{ get_cloner()(d.m_p) } {} template<typename U, typename V, typename W> deep_ptr(deep_ptr<U, V, W>&& d) noexcept : m_c{ std::move(d.m_c) }, m_d{ std::move(d.m_d) }, m_p{ std::exchange(d.m_p, nullptr) } {} ~deep_ptr() noexcept { get_deleter()(get()); } deep_ptr& operator = (deep_ptr r) noexcept { swap(r); return *this; } template<typename U, typename V, typename W> friend bool operator == (const deep_ptr<T, C, D>& x, const deep_ptr<U, V, W>& y) noexcept { return x.m_p == y.m_p; } template<typename U, typename V, typename W> friend auto operator <=> (const deep_ptr<T, D, D>& x, const deep_ptr<U, V, W>& y) noexcept { return x.m_p <=> y.m_p; } explicit operator bool () const noexcept { return m_p != nullptr; } reference_type operator * () const { return *m_p; } pointer operator -> () const noexcept { return m_p; } pointer get() const noexcept { return m_p; } deleter_type& get_deleter() noexcept { return m_d; } const deleter_type& get_deleter() const noexcept { return m_d; } cloner_type& get_cloner() noexcept { return m_c; } const cloner_type& get_cloner() const noexcept { return m_c; } pointer release() noexcept { return std::exchange(m_p, nullptr); } void reset(pointer p = pointer()) noexcept { get_deleter()(std::exchange(m_p, p)); } void swap(deep_ptr& o) noexcept { std::swap(m_c, o.m_c); std::swap(m_d, o.m_d); std::swap(m_p, o.m_p); } private: template<typename, typename, typename> friend class deep_ptr; cloner_type m_c = cloner_type(); deleter_type m_d = deleter_type(); pointer m_p = pointer(); }; template<typename T, typename C, typename D> inline void swap(deep_ptr<T, C, D>& x, deep_ptr<T, C, D>& y) { x.swap(y); } template<typename T, typename V> inline auto make_deep(std::initializer_list<V> l) { using U = std::decay_t<T>; return deep_ptr<T>(new U(l)); } template<typename T, typename... A> inline auto make_deep(A&&... a) { return deep_ptr<T>(new T(std::forward<A>(a)...)); } |
Rounding floating point numbers and constexpr
Good friend asked me a while ago how to round float or double to N digits of precision. Given 0.0123456789 and N = 7 how to get 0.0123457000 as the result. The only way to round numbers (inside a machine) I could think of was to multiply them by 10 ^ N (ten raised to the power of N), cast the result to some type of int effectively stripping the fractional part, followed by a cast back to a floating point representation and division by same 10 ^ N:
1 2 3 4 5 |
V = 0.0123456789 N = 7 P = 10 ^ N = 10'000'000 T = V * P = 123456.789 = 123456 R = T / P ≈ 0.0123457000 |
Or something to that effect. Translated to C++:
1 2 3 4 5 6 |
template<typename T> auto round(T v, unsigned char d) { auto p = std::pow(T(10), T(d)); return std::round(v * p) / p; } |
Then he added the dreaded do it fast,… and just as I was about to tell him to take a long walk off a short pier I remembered something about constexpr and doing arithmetic at compile time.
Looking at the implementation above I decided to start with computing the power of 10 at compile time. My first approach was a recursive template struct power_of with a partial specialization, the recursion stop, for the power of zero case (and a helper power_of_v variable declaration):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
template<std::uint64_t B, unsigned char E, typename T> struct power_of { static constexpr T value = T(B) * power_of<B, E - 1, T>::value; }; template<std::uint64_t B, typename T> struct power_of<B, 0, T> { static constexpr T value = T(1); }; template<std::uint64_t B, unsigned char E, typename T> inline constexpr auto power_of_v = power_of<B, E, T>::value; |
This allowed me to write
power_of_v<10, 3, float> to compute at compile time the 3rd power of
10 stored as type
float.
I also created a recursive
constexpr function since those too can be evaluated at compile time:
1 2 3 4 5 |
template<typename T> constexpr T power_of_f(std::uint64_t b, unsigned char e) { return e == 0 ? T(1) : T(b) * power_of_f<T>(b, e - 1); } |
With it I could write
power_of_f<float>(10, 3) and it would be the equivalent of using the recursive template struct above.
I decided to think big and used
std::uint64_t as the base and
unsigned char as the exponent type of the computation. Hopefully overflow was not going to be an issue…
Next I moved onto rounding the floating point number to an integer type (after it has been multiplied by 10 ^ N) and quickly realized a simple type cast was not going to cut it. std::round does much more than that; it considers how close the number to be rounded is to the integer representation of it, ex.: given 0.49 is it closer to 0 or 1. It then rounds it to the closest whole number. Moreover, it considers the sign and rounds in different direction if the number is positive vs negative.
This meant I needed to determine (at compile time) whether the number is positive or negative:
1 2 3 4 5 |
template<typename T> constexpr auto my_sign(T v) { return v >= T(0) ? T(1) : T(-1); } |
Compute the absolute value of a given number:
1 2 3 4 5 |
template<typename T> constexpr auto my_abs(T v) { return v >= T(0) ? v : -v; } |
And round the number based on the proximity to its integer representation while considering the sign:
1 2 3 4 5 6 |
template<typename T> constexpr auto my_rnd(T v) { constexpr auto h = T(0.5) - std::numeric_limits<T>::epsilon(); return (std::int64_t)(v + h * my_sign(v)); } |
Putting it all together and adding some sanity checks resulted in this floating point, compile time (as long as all parameters are constexpr), rounding function which produced the same exact results as its runtime equivalent:
1 2 3 4 5 6 7 8 9 10 |
template<unsigned char D, typename T> constexpr auto constexpr_round(T v) { constexpr auto p = power_of_v<10, D, T>; if(my_abs(v) > std::numeric_limits<T>::max() / p) return v; if(my_abs(v) * p > std::numeric_limits<std::int64_t>::max() - 1) return v; return my_rnd(v * p) / p; } |
The if statements are there to guard against number overflows. It is better to not round at all in such cases than to return garbage values, or at least this is my attitude regarding error handling in this case.
Here is a test program I created while testing my implementation; it allowed me to easily compare compile time results against the reference std::round based approach.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
#include <iostream> #include <iomanip> #include <array> #include "round.hpp" //#define my_round(V, D) runtime_round(V, D) #define my_round(V, D) constexpr_round<D>(V) int main() { using namespace std; constexpr auto v = 0.0123456789; array<decltype(v), 10> a = { my_round(v, 1), my_round(v, 2), my_round(v, 3), my_round(v, 4), my_round(v, 5), my_round(v, 6), my_round(v, 7), my_round(v, 8), my_round(v, 9), my_round(v, 10) }; cout << fixed << setprecision(10); for(int p = 1; auto v : a) cout << p++ << ":\t" << v << endl; } |
1: 0.0000000000
Test program output.
2: 0.0100000000
3: 0.0120000000
4: 0.0123000000
5: 0.0123500000
6: 0.0123460000
7: 0.0123457000
8: 0.0123456800
9: 0.0123456790
10: 0.0123456789
The example program on GitHub: round.cpp. The floating point rounding implementation (sprinkled with comments and decorated with template concepts for good measures): round.hpp.
Given the way float and double are represented on the level of bits it is impossible to round them exactly. You can only do so much and come so close to being exact. Playing with the example program’s value and type of variable v as well as different parameters to setprecision stream modifier will illustrate what I mean…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
#pragma once #include <limits> #include <type_traits> #include <stdexcept> #include <cmath> #include <cstdint> // solution 1 // can specify number of digits at run-time as the second parameter // slowest due to 2 function calls template<typename T> requires std::is_floating_point_v<T> auto runtime_round(T v, unsigned char d) { auto p = std::pow(T(10), T(d)); if(std::abs(v) > std::numeric_limits<T>::max() / p) // v * p would overflow throw std::overflow_error("rounding would overflow"); return std::round(v * p) / p; } // sloution 2 // if used only with other constexpr the result will be evaluated // entirely at compile time meaning no runtime cost :) // recursive template to compute B^E at compile time // result is stored as a static variable 'value' of type T template<std::uint64_t B, unsigned char E, typename T> requires std::is_arithmetic_v<T> struct power_of { static constexpr T value = T(B) * power_of<B, E - 1, T>::value; }; // terminating template for the recursion one above once E == 0 template<std::uint64_t B, typename T> requires std::is_arithmetic_v<T> struct power_of<B, 0, T> { static constexpr T value = T(1); }; template<std::uint64_t B, unsigned char E, typename T> inline constexpr auto power_of_v = power_of<B, E, T>::value; // recursive function template to calculate b^e // if both parameters are constexpr it will evaluate at compile time // otherwise it will evaluate at run time // returns the result as type T template<typename T> requires std::is_arithmetic_v<T> constexpr T power_of_f(std::uint64_t b, unsigned char e) { return e == 0 ? T(1) : T(b) * power_of_f<T>(b, e - 1); } // given a value 'v' return +1 if v is >= 0, otherwise return -1 template<typename T> requires std::is_arithmetic_v<T> constexpr auto my_sign(T v) { return v >= T(0) ? T(1) : T(-1); } // given a value 'v' return it's absolute value template<typename T> requires std::is_arithmetic_v<T> constexpr auto my_abs(T v) { return v >= T(0) ? v : -v; } // round float/double/long double value 'v' to the nearest integer // using compile time type conversions template<typename T> requires std::is_floating_point_v<T> constexpr auto my_rnd(T v) { constexpr auto h = T(0.5) - std::numeric_limits<T>::epsilon(); return (std::int64_t)(v + h * my_sign(v)); } // self explanatory :) // though number of digits must be provided at compile time // as the first template parameter 'D' template<unsigned char D, typename T> requires std::is_floating_point_v<T> constexpr auto constexpr_round(T v) { /* option 1 */ //constexpr auto p = power_of_f<T>(10, D); /* option 2 */ constexpr auto p = power_of_v<10, D, T>; if(my_abs(v) > std::numeric_limits<T>::max() / p) return v; // v * p would overflow if(my_abs(v) * p > std::numeric_limits<std::int64_t>::max() - 1) return v; // v * p would not fit in int64_t return my_rnd(v * p) / p; } |
UPDATE
Shout out to Ben from thoughts-on-coding.com for suggesting another runtime solution (code below)!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
// solution 1.2 // faster runtime solution, powers of 10 are compiled in, 2 function calls // approach suggested by Ben from https://thoughts-on-coding.com template<typename T> requires std::is_floating_point_v<T> auto runtime_round2(T v, unsigned d) { static const auto k_p = std::array<T, 20> { 1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10, 1e11, 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19 }; auto p = k_p.at(d); if(std::abs(v) > std::numeric_limits<T>::max() / p) return v; // v * p would overflow, do nothing... return std::round(v * p) / p; } |
Fun with TCP sockets
The purpose of this post is not to teach you about posix sockets; there’s plenty of that already. Instead I want to share two simple classes I implemented last night which encapsulate two types of tcp sockets: listening server socket and a client socket. These sockets as well as my recent serializer code will be the foundation of a light-weight RPC system I plan on designing and teaching to my colleagues at work and elsewhere.
I will try to illustrate what happened under the hood of frameworks like gRPC, Thrift, or a much simpler XML-RPC…
The server’s socket only job is to bind to a local port and listen for incoming connections. The client socket represents a connection to a server. It can be created either by the server socket using its private constructor, or by the user with its public constructor.
1 2 3 |
auto server = server_socket(port); // ... auto client = client_socket(host, port); |
The difference between my implementation and what I commonly see online is that my design (not necessarily better than others) is event driven, meaning you do not pull on the client socket to receive incoming data. Instead the receive function dispatches an event when data arrives; all you have to do is put the method in a while loop to keep receiving incoming packets. Here’s the event handler and the receive loop:
1 2 3 4 5 6 7 8 |
client.set_data_handler([&](client_socket& cs, socket_buffer_t data) { auto msg = string((const char*)data.data(), data.size()); if(!msg.empty()) cout << "< " << msg << endl; event.signal(); }); // ... while(client.receive()); |
Here the socket’s event handler converts the incoming bytes to a string, and if not empty prints it to standard output. The loop blocks until more data arrives, dispatches the event when it does, then goes back to sleep. This is a part of a simple echo client and server I created to test the socket code; it will be shown further down this post.
The server’s handler for incoming connections is a bit more complicated; it defines what to do with the newly created connection socket, as well as what that socket should do when it received data. You’ve been warned:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
server.set_accept_handler( [&](server_socket& ss, client_socket cs, host_info_t info) { cout << info.host << " (" << info.ip << ") : " << info.port << " connected" << endl; cs.set_data_handler([=, sref = ref(server)](client_socket& cs, socket_buffer_t data) { auto msg = string((const char*)data.data(), data.size()); if(msg == "die") { sref.get().close(); } else if(!msg.empty()) { cout << info.host << " > " << msg << endl; cs.send({ rbegin(data), rend(data) }); } }); thread([=, cs = std::move(cs)]() mutable { while(cs.receive()); cout << info.host << " (" << info.ip << ") : " << info.port << " disconnected" << endl; }).detach(); }); |
Upon receiving a new connection the handler prints out some basic info about the connecting machine, like host name, IP address, and the originating port number. It then tells this socket that, when data arrives on it from the client machine, it should convert it to a string, print it, and send the data right back to where it came from. It is not a true echo server, because the string is reversed before being sent to the client, but for the purpose of this exercise it will suffice. It only prints and sends the data back if it is not an empty string. Finally it looks for a special keyword ‘die’ that instructs the server to shut down.
The client on the other hand waits for incoming data from the server on the main thread, while a background thread reads user input, line by line, checks if it isn’t an empty line, checks for keyword ‘q’ which instructs it to disconnect from the server, and finally sends the data and waits for an event. This event is signaled only once the response comes back from the server, so that the keyboard input and programs output stay nicely separated on their own lines. Client loop below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
thread([&]() { cout << "[enter] to send, ['q'] to exit, ['die'] to stop server" << endl; while(true) { cout << "> "; auto line = string(); getline(cin, line); if(line.empty()) continue; if(line == "q") { client.close(); break; } client.send(line.data(), line.length()); event.wait(); } }).detach(); |
The socket code is located on my GitHub page: sockets.hpp, along with the echo client: echo_c.cpp and server: echo_s.cpp.
Better serializer
In my last post I described a light-weight serialization framework I recently implemented (video where I go over it in detail). After receiving great feedback from r/cpp I have implemented several improvements that were suggested to me: better unpack transform signature, safer unpacking code, unpack casting iterator just to name a few.
The latest code is available on GitHub; the serializer source lsf.hpp, and usage example lsf.cpp.
Video where I describe the changes in greater detail:
Light-weight serialization framework
UPDATE: Better serializer
In my very first YouTube Channel video I discuss in detail the design of a serialization framework I came up with in response to a question I was recently asked. In this post I want to give a brief overview of how it works and how to get started using it.
The primary goal was to create a serialization framework that is uncoupled from the types it operates on, meaning no class nor struct needs to inherit any sort of
1 2 3 4 5 6 7 |
add_pack_transform<std::uint16_t>( [](std::uint16_t v, buffer_output_t& it) { pack_value(it, htons(v)); }); add_unpack_transform<std::uint16_t>( [](buffer_input_t& it, std::uint16_t* v) { *v = ntohs(unpack_value<std::uint16_t>(it)); }); |
By default every type is packed by doing bitwise copy from its memory location into a
1 2 3 |
auto buf_1 = pack(std::uint16_t(0x1234)); auto tup_1 = unpack<std::uint16_t>(buf_1); |
The code above packs a single
It is possible to perform type conversions inside the pack and unpack transforms:
1 2 3 4 5 |
add_pack_transform<std::size_t>([](std::size_t v, buffer_output_t& it) { pack_value(it, std::uint16_t(v)); }); add_unpack_transform<std::size_t>([](buffer_input_t& it, std::size_t* v) { *v = unpack_value<std::uint16_t>(it); }); |
Above code illustrates it: every
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
add_pack_transform<const char*>( [](const char* v, buffer_output_t& it) { auto len = std::strlen(v); pack_type(it, len); pack_bytes(it, v, len); }); add_pack_transform<std::string>( [](const std::string& v, buffer_output_t& it) { pack_type(it, v.length()); pack_bytes(it, v.data(), v.length()); }); add_unpack_transform<std::string>( [](buffer_input_t& it, std::string* v) { auto len = unpack_type<std::string::size_type>(it); new (v) std::string(len, std::string::value_type()); unpack_bytes(it, v->data(), len); }); |
Here I define how to serialize
Having defined transforms for simple types we can now add more complex types, let’s say
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
using strings_t = std::vector<std::string>; add_pack_transform<strings_t>( [](const strings_t& vs, buffer_output_t& it) { pack_type(it, vs.size()); std::for_each(std::begin(vs), std::end(vs), [&](const std::string& s) { pack_type(it, s); }); }); add_unpack_transform<strings_t>( [](buffer_input_t& it, strings_t* vs) { auto size = unpack_type<strings_t::size_type>(it); new (vs) strings_t(); vs->reserve(size); std::generate_n(std::back_inserter(*vs), size, [&]() { return unpack_type<std::string>(it); }); }); |
The packing code is trivial: pack the size of the vector followed by packing each entry one at a time. Notice here I am calling
One thing I have not yet mentioned is an optimization I introduced into the code: before types are packed the exact amount of memory is reserved inside the resulting
1 2 |
static auto k_default_pack_size_proc = pack_size_proc_t( [](const void*) { return sizeof(T); }); |
There is a void pointer parameter but in this case it is ignored. However it is required to compute sizes of more complex types like strings; that is because in order to store the pack size procs inside an internal data structure each one needs to be wrapped inside the default lambda. This lambda gets the memory location of each type as
1 2 3 4 5 6 7 |
add_pack_size_proc<const char*>( [](const char* v) { return pack_size(std::strlen(v)) + std::strlen(v); }); add_pack_size_proc<std::string>( [](const std::string& v) { return pack_size(v.length()) + v.length(); }); |
Here I tell the framework how to compute the number of bytes needed to serialize strings.
1 2 3 4 5 6 7 |
add_pack_size_proc<strings_t>( [](const strings_t& v) { return pack_size(v.size()) + std::accumulate(std::begin(v), std::end(v), 0, [](auto sum, const std::string& v) { return sum + pack_size(v); }); }); |
I initially tagged each type with a unique value so that I could verify it during unpacking to catch errors like packing an int but trying to unpack a string. This type tagging became more complicated than I first anticipated so I got rid of it, though I may revisit it and add an additional packing and unpacking functions that perform this verification optionally.
As always, the code is available on my GitHub page.
The framework is a single header file: lsf.hpp. Example program: lsf.cpp.
The future of C++
I hope the title of this post got your attention! What I really mean is the future direction of this blog…
In 2019 I created 85 short articles, in 2020 I blogged only twice. The last 12 months have pulled my attention away from this site and toward mentoring other engineers at work by hosting online C++ and OO classes. I have mentored tens of engineers over the span of 19 classes in the past 6 months, totaling around 25 hours of recorded material and thousands of lines of code.
I have always enjoyed sharing my knowledge by teaching, this is why this blog came into existence and the feedback I received from it has been fantastic (and humbling). The interactive classes I hold have been particularly rewarding to me and I decided to open them up to the world.
Going forward I want to hold live and interactive classes like I do at work, but open to anyone on the internet. Perhaps also 1 on 1 sessions to help others with their day to day coding challenges. All of this I will publish on my YouTube Channel where you can already find one of my classes I held earlier this week.
I hope you will subscribe and follow me there. I also want to ask you for topics and questions you would like to see me tackle as I reason through the design and implementation of my solutions.
Finally I wanted to add that I will continue to blog. Hopefully the world will soon return to normal and I can get back to what I enjoy the most: C++
Stay safe!
Make std::vector of T, efficiently
What I want to briefly talk about today is likely obvious to seasoned C++ programmers, but may be of benefit for the newcomers to the language and the STL: that is how to efficiently create std::vector of any number of elements of type T, whatever T may be. The code is short and sweet, so let’s jump right into it:
1 2 3 4 5 6 7 8 9 10 11 |
#include <vector> template<typename T, typename... A> inline auto make_N_of_T(std::size_t N, A&&... a) { std::vector<T> v; v.reserve(N); for(std::size_t i = 0; i < N; ++i) v.emplace_back(std::forward<A>(a)...); return v; } |
In lines #3 and #4 we define the creation function template ( inline because it’s likely to live in a header file and we don’t want to violate the ODR) that says make me N instances of T and use A as parameter(s) to T’s constructor (see parameter packs and fold expressions). Next we create an instance of std::vector<T> in line #6 and reserve enough space for N instances of T in line #7 (without actually constructing instances of T yet). Line #8 is self explanatory, and the efficiency magic happens in line #9. Instead of creating a temporary instance of T and calling .push_back we instead .emplace_back each instance of T, meaning we construct it in the memory previously reserved for the elements of the container in line #7. This way we avoid unnecessary copy or move constructors (which may not even be defined for type T).
Before C++11 and move semantics we would have been stuck with creating an instance of T, then copy constructing it into every element of std::vector<T>, like this:
1 2 |
std::size_t N = 42; vector<T> v(N, T{}); |
And don’t even get me started on having to return std::vector<T> from a function and how costly that would have been (again, prior to the magic of move semantics)! We’re in a brave new world…
P.S. I hope you’re all doing well and staying safe in this COVID-19 world! The situation has certainly put a strain on me and a dent in my blogging frequency. I hope to return to my normal post-or-more-a-week blogging frequency soon!
Singleton Pattern
First things first: if you’re still reading this blog, thanks! I haven’t posted anything since last December; it has been a rough first half of the year: COVID-19, working from home, isolation, you know the deal, but I have not given up on blogging, just finding the time for it has been nearly impossible. I have a long list of topics I eventually want to cover but I can’t make any promises until this mad situation normalizes…
Alright then! A friend from work asked me about the Singleton Design Pattern and how to best implement it in C++. I have done it in the past but was not happy with that implementation; it insisted that the singleton class have a default constructor for example, so I started coding something that would allow non-default initialization. The first thing I came up with was a singleton template base class using the Curiously Recurring Template Pattern. It worked but it required you to declare its constructor private and declare the singleton base class a friend:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
class S final : public singleton<S> { public: ~S() { cout << "~S()" << endl; } void foo() { cout << "S::foo() x = " << _x << endl; } void bar() const { cout << "S::bar() x = " << _x << endl; } private: // Constructor must be private to prevent creation of instances... // ...except by singleton<T> base class, which is our friend... friend class singleton<S>; S(int x) : _x(x) { cout << "S(" << _x << ")" << endl; } int _x = 0; }; |
This approach allowed me to separate the singleton creation S::Create(17); from usage S::Instance()->foo(); but I was still bothered by the need for private constructor(s) and friendship, so I kept experimenting… I wanted a simpler solution, one that by the very nature of inheriting the singleton base class would automatically render the class non-instantiable by anyone other than the parent template:
1 2 3 4 5 6 7 8 9 10 11 |
class AS : public abstract_singleton<AS> { public: AS(int x) : _x(x) { cout << "AS(" << _x << ")" << endl; } ~AS() { cout << "~AS()" << endl; } void foo() { cout << "AS::foo() x = " << _x << endl; } void bar() const { cout << "AS::bar() x = " << _x << endl; } private: int _x = 0; }; |
The name of the singleton base class probably gave away the approach, but let me explain anyway: the abstract_singleton<AS> base injects a pure virtual method, preventing one from creating instances of class AS. The parent class later erases the abstraction by implementing the private pure virtual method before creating an instance of AS (it actually creates an instance of a private type derived from AS, the compiler takes care of the rest; this works because in C++ access and visibility of a member are two distinct concepts):
1 2 3 4 5 |
struct Q : T { using T::T; void __abstract_singleton__() override {} }; |
One can of course easily defeat the mechanism by which instance creation is restricted by implementing the void abstract_singleton() override {} in the class meant to be a singleton. I don’t think there is much that can be done about that, but if it’s not done on purpose the compiler will detect attempts of creating instances and will fail with cannot instantiate an abstract class error.
Here’s the example program (singleton.cpp):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
#include <iostream> #include "singleton.hpp" using namespace std; SINGLETON_CLASS(A) {}; SINGLETON_STRUCT(B) {}; struct C final : public singleton<C> {}; ABSTRACT_SINGLETON_CLASS(AA) {}; ABSTRACT_SINGLETON_STRUCT(AB) {}; struct AC : public abstract_singleton<AC> {}; class S SINGLETON(S) { public: ~S() { cout << "~S()" << endl; } void foo() { cout << "S::foo() x = " << _x << endl; } void bar() const { cout << "S::bar() x = " << _x << endl; } private: // Constructor must be private to prevent creation of instances... // ...except by singleton<T> base class, which is our friend... SINGLETON_FRIEND(S); S(int x) : _x(x) { cout << "S(" << _x << ")" << endl; } int _x = 0; }; class AS ABSTRACT_SINGLETON(AS) { public: // No friendship needed if constructors are public... // ...but you still can't create instances of AS... // ...except by abstract_singleton<T> base class... // ...which internally erases the abstraction... //ABSTRACT_SINGLETON_FRIEND(AS); AS(int x) : _x(x) { cout << "AS(" << _x << ")" << endl; } ~AS() { cout << "~AS()" << endl; } void foo() { cout << "AS::foo() x = " << _x << endl; } void bar() const { cout << "AS::bar() x = " << _x << endl; } private: int _x = 0; }; int main() { S::Create(17); //S s(17); // Compile-time error, can't create instances... try { S::Create(17); } catch(exception& e) { cout << e.what() << endl; } //*S::Instance() = *S::Instance(); // Compile-time error, can't copy/move singletons... S::Instance()->foo(); S::Instance()->bar(); AS::Create(20); //AS s(20); // Compile-time error, can't create instances... try { AS::Create(20); } catch(exception& e) { cout << e.what() << endl; } //*AS::Instance() = *AS::Instance(); // Compile-time error, can't copy/move singletons... AS::Instance()->foo(); AS::Instance()->bar(); cout << "Done!" << endl; } |
And the complete listing (singleton.hpp):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
#include <mutex> #include <memory> #include <utility> #include <stdexcept> template<typename T> class singleton { public: template<typename... Args> static void Create(Args&&... args) { static std::mutex s_lock; std::scoped_lock lock(s_lock); if(!s_instance) s_instance.reset(new T(std::forward<Args>(args)...)); else throw std::logic_error("This singleton has already been created!"); } static T* Instance() noexcept { return s_instance.get(); } protected: singleton() = default; singleton(const singleton&) = delete; singleton(singleton&&) = delete; singleton& operator = (const singleton&) = delete; singleton& operator = (singleton&&) = delete; ~singleton() = default; private: using storage_t = std::unique_ptr<T>; inline static storage_t s_instance = nullptr; }; #define SINGLETON(T) final : public singleton<T> #define SINGLETON_CLASS(C) class C SINGLETON(C) #define SINGLETON_STRUCT(S) struct S SINGLETON(S) #define SINGLETON_FRIEND(T) friend class singleton<T> template<typename T> class abstract_singleton { public: template<typename... Args> static void Create(Args&&... args) { static std::mutex s_lock; std::scoped_lock lock(s_lock); struct Q : T { using T::T; void __abstract_singleton__() override {} }; if(!s_instance) s_instance.reset(new Q(std::forward<Args>(args)...)); else throw std::logic_error("This abstract singleton has already been created!"); } static T* Instance() noexcept { return s_instance.get(); } protected: abstract_singleton() = default; abstract_singleton(const abstract_singleton&) = delete; abstract_singleton(abstract_singleton&&) = delete; abstract_singleton& operator = (const abstract_singleton&) = delete; abstract_singleton& operator = (abstract_singleton&&) = delete; virtual ~abstract_singleton() = default; private: using storage_t = std::unique_ptr<T>; inline static storage_t s_instance = nullptr; virtual void __abstract_singleton__() = 0; }; #define ABSTRACT_SINGLETON(T) : public abstract_singleton<T> #define ABSTRACT_SINGLETON_CLASS(C) class C ABSTRACT_SINGLETON(C) #define ABSTRACT_SINGLETON_STRUCT(S) struct S ABSTRACT_SINGLETON(S) #define ABSTRACT_SINGLETON_FRIEND(T) friend class abstract_singleton<T> |
std::unique_ptr, or how to express explicit ownership
When we talk about std::unique_ptr we must mention the idea of explicit resource ownership as well as the concept of a resource source and sink. By wrapping a pointer inside std::unique_ptr we state that whoever holds the std::unique_ptr owns the resource explicitly: has complete control over its lifetime. Prior to C++11 this was expressed using std::auto_ptr. Modern C++ deprecated std::auto_ptr and addressed its shortcomings by introducing the std::unique_ptr.
A source is a function that creates a resource then relinquishes its ownership; in other words, it gives the ownership to whoever called it, and from then on, the caller is responsible for releasing that resource.
A sink is a function which accepts a resource as a parameter and assumes ownership of it; in other words, it promises to release said resource once it’s done executing.
Transfer of ownership must be stated explicitly for named instances (variables of type std::unique_ptr) with std::move. Unnamed temporaries can be passed into a variable or as a function parameter without invoking std::move (this will be clearly illustrated in the code sample below).
Explicit ownership of a resource can be “upgraded” to shared ownership by std::move’ing the std::unique_ptr into a std::shared_ptr. It resets the std::unique_ptr to point back at nullptr and initializes the std::shared_ptr with reference count of 1 (also illustrated in the code below).
Worth mentioning is the fact that std::unique_ptr can own arrays of objects allocated with operator new. A partial specialization of std::unique_ptr template for array types also overloads operator[] for easy array element access; it mirrors the syntax of native arrays. However there are limitations of storing arrays inside std::unique_ptr: when creating an array of primitive types an initial value for each element cannot be specified (and each element ends up with an undefined value); when creating an array of user-defined types (classes and structs) said type must have a default constructor. Moreover the size of the array is lost: there is no way to query the std::unique_ptr and find out how many elements the array holds. For those reasons I recommend you stick with std::vector: it’s elements can be initialized with a default or custom value and you can always call size() on it.
The example program below, using inline comments, further details how to transfer the resource ownership and how to express the idea of source and sink funtions.
Complete listing on GitHub: unique.cpp and T.hpp. Merry Christmas and Happy New Year!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
#include <memory> #include <vector> #include <cassert> #include "T.hpp" using namespace std; using T_u_ptr = unique_ptr<T>; using T_s_ptr = shared_ptr<T>; // Creates and gives away explicit ownership... T_u_ptr source() { return make_unique<T>(); } // Assumes explicit ownership, "t" gets deallocated at the end of this function void sink(T_u_ptr t) { t->foo(); } // Does NOT assume explicit ownership because we're taking by reference void NOT_sink(T_u_ptr& t) { t->foo(); } // Assumes ownership, but then hands it back if the caller captures the return value, // otherwise releases the resource T_u_ptr sink_or_pass_thru(T_u_ptr t) { t->foo(); return t; } // Just a function that takes a shared_ptr... void shared(T_s_ptr t) { t->foo(); } int main() { auto t1 = source(); sink(move(t1)); // We have to std::move it, because copy-constructor of unique_ptr = delete, // by using std::move we're forcing the use of the move constructor (if one exists), // this would have worked without std::move if using std::auto_ptr (now deprecated) // and it would have stole the ownership without warning us!!! assert(!t1); // "t1" is now pointing to null because of the std::move above auto t2 = source(); NOT_sink(t2); // "t2" still pointing to resource after this call assert(t2); sink(move(t2)); // and now "t2" is gone... assert(!t2); sink(source()); // No need for explicit std::move, temporary is captured as r-value reference // so the unique_ptr's move constructor is automatically invoked auto t3 = source(); auto t4 = sink_or_pass_thru(move(t3)); // Effectively moves the ownership from "t3" to "t4" assert(!t3 && t4); sink_or_pass_thru(source()); // Takes ownership, but deletes the resource since nobody captures the return value T_s_ptr t5 = source(); // Create and "upgrade" from unique to shared ownership T_s_ptr t6 = move(t4); // unique_ptr's must be explicitly std::move'ed to shared_ptr's assert(!t4 && t5 && t6); shared(t6); // No transfer of ownership, just using a shared resource here... // PRIMITIVE ARRAYS... constexpr const int N = 3; auto a1 = make_unique<int[]>(N); // Allocates N int's, size of array is lost, values are undefined auto a2 = vector<int>(N, int{42}); // Allocates and value-initializes N int's, size is known, values are well defined auto a3 = move(a1); // Transfer ownership of from "a1" to "a3" assert(!a1 && a3); a3[N - 1] = 1; // Access the last int of the array // ARRAYS... auto a4 = make_unique<T[]>(N); // Create an array of N T's, size is lost, T must have a default constructor auto a5 = vector<T>(N, T{42}); // Create a vector of N T's, size is known, initialize with custom T auto a6 = move(a4); // Transfer ownership from "a4" to "a6" assert(!a4 && a6); a6[N - 1].foo(); // Access the last T of the array } |
When exactly does the std::shared_ptr take ownership?
In a post I wrote few days ago titled “A word on std::shared_ptr” I was mistakenly arguing that, should the std::shared_ptr fail to allocate its control block (where the reference count and other information is stored), the passed raw pointer would not be deleted. Example:
auto p = std::shared_ptr<T>(new T);
Here, if the allocation and construction of T succeeded, the code would then enter shared pointer’s constructor. Inside this constructor the allocation of the control block would take place. If that failed, I argued, the newly allocated instance of T would leak. That is not what actually happens!
After re-reading the C++ specification, and looking over the shared pointer’s implementation supplied with Visual Studio 2017, it is now clear to me that the allocated instance of T will in fact be deleted.
Since learning of this mistake, I have taken down that post and redirected the original URL here, where I take a more in-depth look at the std::shared_ptr.
I would like to thank the kind people on reddit for taking the time to set me straight! It was both a humbling and an embarrassing experience, but I am a better C++ programmer because of it.
In-depth look at C++ shared pointers
First, you should use std::make_shared (most of the time) to create shared pointers. It is optimized to perform only one allocation for both the reference count / control block and the object it holds.
It is also safer to use std::make_shared in the presence of exceptions:
some_function(std::shared_ptr<T>(new T), std::shared_ptr<T>(new T));
Prior to C++17, one of the possible orders of events could have been:
1)
new T
2)
new T
3)
std::shared_ptr<T>(...)
4)
std::shared_ptr<T>(...)
So in the worse case the code could leak an instance of
T if the second allocation in step #2 failed. That is not the case with C++17. It has more stringent requirements on the order of function parameter evaluation. C++17 dictates that each parameter must be evaluated completely before the next one is handled. So the new order of events becomes:
1) 1st
new T
2) 1st
std::shared_ptr<T>(...)
3) 2nd
new T
4) 2nd
std::shared_ptr<T>(...)
or:
1) 2nd
new T
2) 2nd
std::shared_ptr<T>(...)
3) 1st
new T
4) 1st
std::shared_ptr<T>(...)
Shared pointers can also be used with arrays by providing a custom deleter, like this:
auto p = std::shared_ptr<T>(new T[N], [](T* ptr) { delete [] ptr; });
Unfortunately you can not use std::make_shared for that, and it is much easier to use arrays with std::unique_ptr since std::shared_ptr does not provide an overloaded operator [], but about that in my next post 😉
Disadvantages of using std::make_shared:
std::make_shared can only construct instances of T using T’s public constructors. If you need to create an instance using private or protected constructor you have to do it from within a member or static-member function of T. It is not possible to declare std::make_shared to be a friend of T (technically you can declare it to be a friend, but during invocation of std::make_shared it will fail a static_assert inside std::is_constructible type trait class, so for all intents and purposes friendship is not possible here).
std::weak_ptr makes things even more complicated:
Finally, let’s take a look at when instances of objects held by std::shared_ptr are destroyed and their memory released. Under normal circumstances the destructor of T held by a shared pointer will be called when the last shared pointer is destroyed, and the memory where T lived will be released.
That is not the case if the shared pointer of T was constructed using std::make_shared and std::weak_ptr was made to point at it.
In order to function properly std::weak_ptr must hold a reference to the shared pointer’s control block so that it can: 1) answer the call to use_count() and 2) return a nullptr when lock() is called on it after the last shared pointer went out of scope. If the shared pointer’s control block and the instance of T lived in the same memory block, that memory can not be freed until all shared and weak pointers referencing it go away. Now, the destructor of T will be called when the last shared pointer goes away, but the memory will linger until the remaining weak pointers are gone.
I didn’t mention anything about passing shared pointers as function arguments or return values… but that’s a topic about higher level design of object ownership and lifetime; maybe I’ll write about it after I cover unique pointers.
Avoiding deadlocks the C++ way
When interviewing engineers for a C++ programming position the question of deadlocks often comes up (ask me how I know this 😉 ). What is it? And how to avoid it?
Often times a deadlock occurs due to a wrong order of acquiring locks: multiple threads need to access 2 or more shared resources; each resource requires mutual exclusion. It goes something like this: thread 1 has successfully acquires the lock for resource 1, then tries to acquire the lock for resource 2. While on another CPU core, around the same time, thread 2 has successfully acquired the lock for resource 2, and now tries to acquire the lock for resource 1. Both threads are now stuck! Thread 1 holds resource 1 and waits for resource 2. Thread 2 holds resource 2 and waits for resource 1. Nasty business! A partial implementation illustrating this bug looks something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
mutex m1, m2; thread t1([&]() { while(true) { m1.lock(); DO_SOME_WORK("Thread 1"); m2.lock(); DO_SOME_WORK("Thread 1"); m1.unlock(); m2.unlock(); } }); thread t2([&]() { while(true) { m2.lock(); DO_SOME_WORK("Thread 2"); m1.lock(); DO_SOME_WORK("Thread 2"); m1.unlock(); m2.unlock(); } }); |
Notice that thread t1 locks mutex m1 first, m2 second. Thread t2 does the opposite. Another thing I would like to point out in the above example is the explicit calls to lock() and unlock(). This is dangerous because 1) you may forget to call unlock() on a mutex you previously locked, and 2) in the presence of exceptions emitted from DO_SOME_WORK(...) the locks you acquired will not be automatically released. A perfect solution to both issues already exists: the RAII technique.
The way to improve all that is wrong with the above code is to always lock the mutex’es in the same order, and have a mutex owning local object handle the unlocking, whether exiting the function normally or due to an exception. But locking not just by explicitly writing the lock() calls in the right order; rather a more elegant, automatic solution is desired here. C++ has just the thing for you: std::lock (see here) and std::scoped_lock (and here). In short: std::lock will perform deadlock resolution magic, even if thread 1 calls std::lock(mutex1, mutex2);, while thread 2 calls std::lock(mutex2, mutex1);, but you will still need to call unlock() explicitly on the mutex’es if that is what you desire. Alternatively (and preferably) you will pass the mutex’es to std::scoped_lock which will use std::lock internally to guarantee no deadlocks take place: std::scoped_lock guard(mutex1, mutex2);. Deadlock free and exception safe (in terms of properly unlocking the mutex’es) partial implementation looks something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
mutex m1, m2; thread t1([&]() { while(true) { scoped_lock guard(m1, m2); DO_SOME_WORK("Thread 1"); } }); thread t2([&]() { while(true) { scoped_lock guard(m2, m1); DO_SOME_WORK("Thread 2"); } }); |
The order in which the mutex’es are passed to std::scoped_lock is irrelevant. Internally std::lock will do the right thing. In presence of exceptions (which I am not catching in the code above, but you should 😉 ) the destructors of local guard objects will release the locks held.
Complete listing below (and on the web at GitHub: deadlock.cpp):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
#include <iostream> #include <mutex> #include <thread> #include <chrono> #include <cstdlib> #include <ctime> using namespace std; using namespace chrono; void DO_SOME_WORK(const char* msg) { { static mutex cout_lock; auto t = system_clock::to_time_t(system_clock::now()); lock_guard guard(cout_lock); cout << msg << " @ " << ctime(&t); } this_thread::sleep_for(milliseconds(rand() % 1)); } void BAD() { mutex m1, m2; thread t1([&]() { while(true) { m1.lock(); DO_SOME_WORK("Thread 1"); m2.lock(); DO_SOME_WORK("Thread 1"); m1.unlock(); m2.unlock(); } }); thread t2([&]() { while(true) { m2.lock(); DO_SOME_WORK("Thread 2"); m1.lock(); DO_SOME_WORK("Thread 2"); m1.unlock(); m2.unlock(); } }); t1.join(); t2.join(); } void GOOD() { mutex m1, m2; thread t1([&]() { while(true) { scoped_lock guard(m1, m2); DO_SOME_WORK("Thread 1"); } }); thread t2([&]() { while(true) { scoped_lock guard(m2, m1); DO_SOME_WORK("Thread 2"); } }); t1.join(); t2.join(); } int main() { srand(time(NULL)); //BAD(); GOOD(); } |
Multi-hashing
Yes I totally invented this term 😛 What I mean by it is producing multiple hashes from a single key. Like this (if the syntax is unfamiliar to you read this):
1 |
auto [h1, h2, h3] = hashNT<3>("key"); |
Or like this (for non-template version which returns a vector):
1 |
auto hashes = hashN("key", 3); |
Why? One place where I needed such sorcery was my bloom filter implementation. The idea is simple: one key, multiple hashes, repeatable (multiple calls with the same key produce the same hashes). But how? STL only comes with one hashing function. True, but it comes with multiple random number generators which can be seeded with a hash!
The solution then is to hash once, seed the random number generator, and make multiple calls to the RNG, like this (hash.hpp):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
#pragma once #include <array> #include <vector> #include <random> #include <algorithm> #include <functional> template<typename T> auto hashN(const T& key, std::size_t N) -> std::vector<std::size_t> { std::minstd_rand0 rng(std::hash<T>{}(key)); std::vector<std::size_t> hashes(N); std::generate(std::begin(hashes), std::end(hashes), rng); return hashes; } template<std::size_t N, typename T> auto hashNT(const T& key) -> std::array<std::size_t, N> { std::minstd_rand0 rng(std::hash<T>{}(key)); std::array<std::size_t, N> hashes{}; std::generate(std::begin(hashes), std::end(hashes), rng); return hashes; } |
You can use it like this (multi_hash.cpp):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
#include <iostream> #include <string> #include "multi_hash.hpp" using namespace std; void arr() { string s1 = "Vorbrodt's C++ Blog"; string s2 = "Vorbrodt's C++ Blog"; string s3 = "https://vorbrodt.blog"; auto h1 = hashN(s1, 3); auto h2 = hashN(s2, 3); auto h3 = hashN(s3, 3); cout << "HashN('" << s1 << "'):" << endl; for(auto it : h1) cout << it << endl; cout << endl; cout << "HashN('" << s2 << "'):" << endl; for(auto it : h2) cout << it << endl; cout << endl; cout << "HashN('" << s3 << "'):" << endl; for(auto it : h3) cout << it << endl; cout << endl; } void temp() { string s1 = "Vorbrodt's C++ Blog"; string s2 = "Vorbrodt's C++ Blog"; string s3 = "https://vorbrodt.blog"; auto [s1h1, s1h2, s1h3] = hashNT<3>(s1); auto [s2h1, s2h2, s2h3] = hashNT<3>(s2); auto [s3h1, s3h2, s3h3] = hashNT<3>(s3); cout << "HashNT('" << s1 << "'):" << endl; cout << s1h1 << endl << s1h2 << endl << s1h3 << endl << endl; cout << "HashNT('" << s2 << "'):" << endl; cout << s2h1 << endl << s2h2 << endl << s2h3 << endl << endl; cout << "HashNT('" << s3 << "'):" << endl; cout << s3h1 << endl << s3h2 << endl << s3h3 << endl << endl; } int main() { arr(); temp(); } |
HashN(‘Vorbrodt’s C++ Blog’):
Program output.
1977331388
699200791
437177953
HashN(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953
HashN(‘https://vorbrodt.blog’):
1924360287
1619619789
1594567998
HashNT(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953
HashNT(‘Vorbrodt’s C++ Blog’):
1977331388
699200791
437177953
HashNT(‘https://vorbrodt.blog’):
1924360287
1619619789
1594567998