Token Bucket: or how to throttle

For a while now I was wondering how one could throttle network traffic, disk reads/writes etc. Google search quickly brought me to the Token Bucket Algorithm as well as few C++ implementations here and here. I was a bit puzzled at first by how the algorithm works when looking at the code; the two implementations I found operate on atomic integers but the algorithm operates on, well, time. After some head scratching and time at a whiteboard it made sense. Here’s how I understand it:

Imagine you’re walking in a straight line at a constant speed and you are dragging a peace of rope behind you. Every time you want to do something you first pull the rope toward you a little such that the length of it you’re dragging behind becomes shorter. You repeat it every time you want to do something (that something is what you’re trying to throttle btw). At the same time, if you choose to do nothing, you release little bit of rope such that what you’re dragging gets longer, up to the maximum length of the rope. Another way to think of it is that the rope if not yanked on unwinds at a constant rate up to its maximum length. If however you pull on the rope too much you will eventually bring it all in and now you will have to wait for it to unwind a little before you can pull it again.
Now imagine that instead of walking down a straight path you’re actually moving through time and it should all make sense now: pulling on the rope a little is like consuming a token; the length of the rope is the token bucket capacity, and the rate at which the rope unwinds up to its maximum length is the rate at which the token bucket refills if no tokens are consumed. You can also pull all of the rope in at once, and that’s the sudden burst the algorithm allows for after which the rate is limited to how fast it unwinds back behind you aka how quickly the bucket refills with tokens. I really hope that explanation makes sense to you!

Some comments about the implementations I found: both use 3 std::atomic variables where only one is actually needed (unless you want the ability to change bucket capacity and token rate reliably after constructing an instance in a multi-threaded environment, which my implementation supports); the code I linked to above only needs to keep the time variable atomic. Both also operate on integers and I felt it could be abstracted better using std::chrono. Finally, there’s no need for any atomics if only one thread is consuming tokens so I decided to create a separate class for such case (not shown below).

Complete source code:
token_bucket.hpp | throttle.cpp



inline – not what it used to be

UPDATE:
Thank you Peter Sommerlad for pointing out that the non-violation of ODR was and remains the primary purpose of inline, not the optimization hint. I’m updating the post to reflect this.

Today I want to talk about the C++ keyword inline; what it used to mean prior to C++17 and what it means today. Let’s start with the historical first.

A function declared inline could be defined in a header file (its body present in the .h file), then included in multiple compilation units (.cpp files) and the linker would not complain about seeing multiple definitions of the same symbol. This was a way of stating that ODR was not being violated by the programmer. Without inline one had to provide the signature of a function in a header file, and its implementation in a source file. Alternatively inline function could be defined multiple times across multiple source files and everything would be hunky-dory as long as the definitions were identical, otherwise… undefined behavior.

inline used to also apply to standalone and member functions (class methods declared and defined inside the body of a class or struct were implicitly inline) as a hint to the compiler to inline the function call: instead of outputting assembly code that would push parameters onto the stack and jump to the function’s address the compiler would instead emit the compiled function in place, skipping the jump and stack pushes/pops. This allowed for faster running code, sometimes at the cost of the size of the executable (if the same function’s assembly was emitted in many places across the executable).
Good example of potential performance gain would be inside a tight loop making calls/jumps to a function; the overhead in each iteration of the loop could result in significant impact on performance; inline helped to mitigate that.

I mentioned earlier that inline was a hint, meaning that declaring a function as inline did not guarantee that it would be assembled in place; compilers had the ultimate say in the matter and were free to ignore inline each and every time. The workaround to this powerlessness over the mighty compiler was to instead #define the function as a macro. Preprocessor macros are evaluated and replaced with actual code prior to compilation, effectively always resulting in a function (macro) call replaced with its body in the source code.

The compiler could refuse to inline Add but it had no choice but to compile MUL in-place. Note the parenthesis around x and y in the macro; that’s there in case x and y are complex expressions that need to be fully evaluated before the final multiplication takes place. Without the parenthesis this macro call would be very problematic: MUL(1 + 2, 3 + 4); would expand to 1 + 2 * 3 + 4 which is clearly not what’s expected (due to operator precedence) at the time of the macro call.

Enter the grand inline unification!

Since C++17 the multiple definitions meaning applies equally to both functions and variables (while also being an optimization hint for functions).

If we wanted to have a global variable shared across multiple compilation units (.cpp files) we had to first declare an extern variable in a header file:

extern int x; // Inside .h file

Then define it (and provide storage for it) in a source file:

int x = 98; // Inside .cpp file

A header only workaround prior to C++17 was to use Meyer’s Singleton approach:

Starting with C++17 the same can be accomplish by simply declaring and defining the variable as inline in a header file:

inline int x = 17; // Inside .h file only

Now the header file can be included by many source files and the linker will intelligently, despite seeing multiple symbols, pick only one and disregard all others, guaranteeing the same variable at the same memory location is accessed or modified regardless of which compilation unit it happens in.

The same holds true for static member variables of a class or struct. In the past we had to do the following in a header file:

And inside a source file:

int S::x = 98; // Inside .cpp file

C++17 requires only a header file to achieve the same result:

Worth noting is that template and constexpr functions as well as constexpr variables are also implicitly inline. You can read more about all the gory details here and here.

My Best C++11, 14 and 17 Features

From the 37th San Diego C++ Meetup:

Recently I was hosted twice in cppcast podcast with Rob and Jason. I mentioned how using C++17 makes it easy to develop Modern C++ code. I will go over various features that are proven to work well. We built a software with no memory issues, no UBs etc… – the name of the talk is: “My best C++11,14 and 17 features that helped me write a better code”. (Beginners to Intermediate)

Yacob (Kobi) Cohen-Arazi

36th San Diego C++ Meetup

As always good times at the SDCPPM (also on Twitter)! Learned something new last night and met great people! Thanks again for organizing this Kobi!

For those new to this Meetup: we meet once a month for few hours to discuss C++ features, techniques, questions asked no the net, and more. Sometimes we have guest speakers give presentations on various topics (Conan Package Manager, C++20 Concepts to name a few). Join us! Join to listen in, chat and exchange your experience coding in C++, or be a speaker and teach us something cool! All backgrounds and experience levels welcome!

Agenda

  1. Welcome slides – goals we would like to achieve in San Diego C++ Meetup
  2. C++ quiz
  3. Welcome slides – goals we would like to achieve in San Diego C++ Meetup
  4. C++ quiz (Beginners to Intermediate)
  5. Supporting creators – introducing Patreon (All levels)
  6. Book of the Month – Practical C++ Design(2nd edition), From Programming to Architecture By Adam B. Singer (All levels)
  7. Stackoverflow discussion – Is there a way to pass auto as an argument in C++? (Beginners to Intermediate)
  8. r/cpp_questions – One function – multiple datatypes? (Beginners)
  9. Discuss few slides from walletfox website (Beginners to Intermediate)
  10. My best C++11,14 and 17 features that helped me write a better code (Beginners to Intermediate)

32nd San Diego C++ Meetup

Agenda:

1. Welcome slides – goals we would like to achieve in San Diego C++ Meetup
2. C++ quiz
3. Do you know what your std::remove(_if) does?
4. Stackoverflow questions and discussions
4a. How does the C++ compiler evaluate recursive constexpr functions so quickly?
4b. How to check if int32_t number can fit in int8_t and int16_t?
5. Quora question discussion – Arrays, references and pointers
6. Book – Large-Scale C++ Volume I: Process and Architecture by John Lakos
7. Discuss few slides from walletfox website
8. Talk – My Quest for a Quick QuickSort

Event link on Meetup.


= delete; // not just for special member functions

During the 29th San Diego C++ Meetup fellow C++ enthusiast and contributor to my blog, Kobi, brought up something interesting about deleted functions and I wanted to share this little gem with you…

To my surprise the = delete; postfix can be applied not only to special member functions like constructors or assignment operators, but to any free standing or member function!

Why would you want such sorcery you ask? Imagine you have a function like this:

void foo(void*) {}

On its own foo can be called with a pointer to any type without the need for an explicit cast:

foo(new int); // LEGAL C++

If we want to disable the implicit pointer conversions we can do so by deleting all other overloads:

template<typename T> void foo(T*) = delete;

Or the short and sweet C++20 syntax:

void foo(auto*) = delete;

To cast a wider net and delete all overloads regardless of the type and number of parameters:

template<typename ...Ts> void foo(Ts&&...) = delete;

Kobi found this on stack overflow of course 🙂


Example program:
delete.cpp


C++20 Concepts

C++20 introduced the Concepts library and the corresponding language extensions to template metaprogramming. This post will be a brief introduction to the topic for people already well versed in C++ templates.

What is a concept? Besides being a new keyword in C++20 it is a mechanism to describe constraints or requirements of typename T; it is a way of restricting which types a template class or function can work with. Imagine a simple template function that adds two numbers:

template<typename T> auto add(T a, T b) { return a + b; }

The way it is implemented doesn’t stop us from calling it with std::string as the parameters’ type. With concepts we can now restrict this function template to work only with integral types for example.

But first let’s define two most basic concepts, one which will accept, or evaluate to true, for all types, and another which will reject, or evaluate to false, for all types as well:

template<typename T> concept always_true = true;
template<typename T> concept always_false = false;

Using those concepts we can now define two template functions, one which will accept, or compile with any type as its parameter, and one which will reject, or not compile regardless of the parameter’s type:

template<typename T> requires always_true<T> void good(T) {} // ALWAYS compiles
template<typename T> requires always_false<T> void bad(T) {} // NEVER compiles

Let’s now rewrite the function that adds two numbers using a standard concept std::integral found in the <concepts> header file:

template<typename T> requires std::integral<T> auto add(T a, T b) { return a + b; }

Now this template function will only work with integral types. But that’s not all! There are two other ways C++20 allows us to express the same definition. We can replace typename with the name of the concept and drop the requires keyword:

template<std::integral T> auto add(T a, T b) { return a + b; }

Or go with the C++20 abbreviated function template syntax where auto is used as a function’s parameter type together with the (optional) name of the concept we wish to use:

auto add(std::integral auto a, std::integral auto b) { return a + b; }

I don’t know about you but I really like this short new syntax!

Concepts can be easily combined. Imagine we have two concepts we wish to combine into a third one. Here’s a simple example of how to do it:

template<typename T> concept concept_1 = true;
template<typename T> concept concept_2 = false;
template<typename T> concept concept_3 = concept_1<T> and concept_2<T>;

Alternatively a function or class template can be declared to require multiple concepts (which requires additional parenthesis):

template<typename T> requires(concept_1<T> and concept_2<T>) foo(T) {}

What follows the requires keyword must be an expression that evaluates to either true or false at compile time, so we are not limited to just concepts, for example:

template<typename T> requires(std::integral<T> and sizeof(T) >= 4) void foo(T) {}

The above function has been restricted to working only with integral types that are at least 32 bit.

Let’s look at a more complex example and analyze it line by line:

In line #1 we define a concept called can_add and define an optional variable a of type T. You may be wondering why the requires keyword appears multiple times. It’s because what follows after requires and is within curly braces {} is referred to as a compound requirement. Compound requirements can contain within them other requirements separated by a semicolon ;. If the expression inside is not prefixed by the requires keyword it must only be a valid C++ statement. However, what follows directly after requires without the surrounding curly braces must instead evaluate to true at compile time. Therefore line #3 means that the value of std::integral<T> must evaluate to true. If we remove requires from line #3 it would mean only that std::integral<T> is a valid C++ code without being evaluated further. Similarly line #4 tells us that the sizeof(T) must be greater than or equal to sizeof(int). Without the requires keyword it would only mean whether or not sizeof(T) >= sizeof(int) is a valid C++ expression. Line #5 means several things: a + a must be a valid expression, a + a must not throw any exceptions, and the result of a + a must return type T (or a type implicitly convertible to T). Notice that a + a is surrounded by curly braces that must contain only one expression without the trailing semicolon ;.

We can apply the can_add concept to a template function as follows:

template<typename T> requires can_add<T> T add(T x, T y) noexcept { return x + y; }

Template function add can now only be invoked with types that satisfy the can_add concept.

So far I have limited the examples to standalone template functions, but the C++20 concepts can be applied to template classes, template member functions, and even variables.

Here’s an example of a template struct S with a template member function void func(U); the struct can only be instantiated with integral types and the member function can only be called with floating point types as the parameter:

See the source code below for more examples.


Example program:
concepts.cpp


C++ Lambda Story, a book review

C++ Lambda Story
Everything you need to know about Lambda Expressions in Modern C++!
by Bartłomiej Filipek


I thought I knew all (or most) there was to know about lambda expressions in modern C++ until I befriended a fellow coder (and countryman) Bartłomiej Filipek, found his book, and after reading it I realized there were gaps in my knowledge and understanding of lambdas as one of the major features of modern C++.

I don’t want to give away too much of the book’s content so I will make this review brief. Bartłomiej (Bartek) takes the reader through the entire history of callable objects in C++ starting with C++98 and function objects (or functors, objects which implement the function call operator), explains the usefulness as well as limitations of functors, then introduces us to lambda expressions as they appeared in C++11.

In the first and longest chapter of the book Bartek builds the foundation of knowledge by going over in great detail the syntax, types of lambdas, capture modes, return type, conversion to function, IIFE (Immediately Invoked Functional Expression), lambdas in container, and even inheriting from a lambda!
This gives the reader complete overview of the feature as it was first introduced in C++11, and prepares him/her for the following chapters where Bartek goes over lambda’s evolution through C++14, C++17, and all the way to the most recent iteration of the language, the C++20.

I found the book to be well organized into small bite-sized nuggets of knowledge that could be read, reread, understood and absorbed in a matter of minutes (similar to how Scott Meyers presents information in Effective C++).

C++ Lambda Story is not a book for people who are just starting to learn C++. In other words, it is not a C++ book one should read first. The content within the 140 or so pages is intermediate and advanced level of what the language has to offer. But it is a must for a seasoned C++ programmer as well as someone who already has a good grasp on the language.

You can find the book in black and white print on Amazon, a full color version, or a digital version on LeanPub.

To find out more about the author visit his website C++ Stories.

How to synchronize data access, part 2

Yesterday I wrote about How to synchronize data access, you should read it before continuing with this post, it will explain in detail the technique I will expand upon here. I’ll wait…

Alright! Now that you understand how a temporary RAII object can lock a mutex associated with an instance of an object, effectively synchronizing across multiple threads every method call, let’s talk about reader/writer locks.
R/W locks allow for two levels of synchronization: shared and exclusive. Shared lock allows multiple threads to simultaneously access an object under the assumption that said threads will perform only read operations (or to be more exact, operations which do not change the externally observable state of an object). Exclusive lock on the other hand can be held by only one thread at which point said thread is allowed to modify the object’s state and any data associated with it.

In order to implement this functionality I had to create two types of the locker object (the RAII holder of a mutex). Both lockers hold a reference to std::shared_mutex but one locker uses std::shared_lock while the other uses std::unique_lock to acquire ownership of the mutex. This approach is still transparent to the user with the following exception: a non-const instance of shared_synchronized<T> class must use std::as_const in order to acquire shared ownership ( const shared_synchronized<T>, shared_synchronized<const T>, and const shared_synchronized<const T> will always acquire shared lock; note that std::as_const performs a const_cast).


Implementation:
synchronized.hpp | synchronized.cpp


How to synchronize data access

I came across a post titled C++ Locking Wrapper shared by Meeting C++ on Twitter and it reminded me of Synchronized Data Structures and boost::synchronized_value so I decided to implement my own version as a learning exercise and possible topic of a future video on my YT channel.

The idea is simple: synchronize across multiple threads every method call to an instance of an object; do it in the most transparent and un-obstructing way possible. Here’s a simple example illustrating the idea:

The trick comes down to two things really: 1) Wrap a value of type T and its lock (in my case std::mutex) in a containing class and 2) Override operator->() to return a RAII temporary responsible for guarding the value.

I will add that this is perhaps a heavy handed approach to guarding data since I do not know how to make it transparent while allowing reader/writer locks, or synchronization of only some select methods. Perhaps some type system hackery with const and volatile methods could help here…


Implementation:
synchronized.hpp | synchronized.cpp


How to implement a memory pool

Last week I explained what a memory pool is and how to use it. This week my lesson was about implementing a simple memory pool capable of fixed size memory chunk allocations. The pool preallocates a block of memory on demand, carves out equal sized chunks from it, and returns a raw pointer each time you call pool.malloc().
Memory chunks are described using a simple struct chunk_t and stored in a singly-linked-list; the struct has an internal anonymous union for size optimization: the same memory region can either represent the next pointer of the list’s node or hold user data.

P.S. Watch in HD, though I did increased the font size per your requests 🙂


Code from the class:
lesson_memory_pool_how_to.cpp
Cleaned up implementation:
memory_pool.hpp | memory_pool.cpp


How to use memory pools

In this week’s class I explained what memory pools are and how to use them to optimize frequent allocations and deallocations of objects. I also demonstrated how to overwrite operator new / operator new [] and operator delete / operator delete [] for a struct / class type.
The code for this class contains a simple benchmark which measures pool’s performance against the general-purpose allocator in C++ runtime. Notice how multi-threaded benchmark causes the pool to perform poorly (I suspect due to lock contention, though I did not yet investigate this). I didn’t even bother measuring array allocations; the memory pool benchmark causes massive fragmentation of the free chunks (by allocating all available chunks, shuffling them randomly, then freeing all) making continuous allocations virtually impossible without falling back onto runtime allocator.

Next week I plan on actually implementing a simple memory pool then again measure its performance against the runtime. Enjoy the video!

P.S. Remember to pick HD version of the video. It is also best to watch in full screen mode due to the font size used. I will try to remember to increase it during my next class. Apologies to anyone finding it hard to read the code.


Code from the class:
lesson_pool_allocator.cpp | lesson_pool_allocator_benchmark.cpp


How to detect memory leaks

During my last week’s class I discussed a source-code method of finding memory leaks in C++. What do I mean by that? Source-code method is not a way of debugging a process like one could do with Valgrind on Linux or Deleaker on Windows (which I’ve used recently and really enjoyed its integration with Visual Studio). What I am referring to is a combination of #define MACRO magic and overwriting the void* operator new. A technique to keep track of all your new / new[] and delete / delete[] statements and finding A) leaks due to missing delete statements and B) mismatches between array and non array versions of said operators.
Simply by including a header file in your source code you enable this tracing capability. Though it has a limitation: unlike Deleaker or Valgrind it only works for plain vanilla new / new[] operator; my solution does not support any other form of it.


Code from the class:
newtrace.hpp (C++17 or earlier) | newtrace.cpp | README.txt
Using boost::stacktrace:
newtrace.st.hpp | newtrace.st.cpp