For a while now I was wondering how one could throttle network traffic, disk reads/writes etc. Google search quickly brought me to the Token Bucket Algorithm as well as few C++ implementations here and here. I was a bit puzzled at first by how the algorithm works when looking at the code; the two implementations I found operate on atomic integers but the algorithm operates on, well, time. After some head scratching and time at a whiteboard it made sense. Here’s how I understand it:

Imagine you’re walking in a straight line at a constant speed and you are dragging a peace of rope behind you. Every time you want to do something you first pull the rope toward you a little such that the length of it you’re dragging behind becomes shorter. You repeat it every time you want to do something (that something is what you’re trying to throttle btw). At the same time, if you choose to do nothing, you release little bit of rope such that what you’re dragging gets longer, up to the maximum length of the rope. Another way to think of it is that the rope if not yanked on unwinds at a constant rate up to its maximum length. If however you pull on the rope too much you will eventually bring it all in and now you will have to wait for it to unwind a little before you can pull it again.
Now imagine that instead of walking down a straight path you’re actually moving through time and it should all make sense now: pulling on the rope a little is like consuming a token; the length of the rope is the token bucket capacity, and the rate at which the rope unwinds up to its maximum length is the rate at which the token bucket refills if no tokens are consumed. You can also pull all of the rope in at once, and that’s the sudden burst the algorithm allows for after which the rate is limited to how fast it unwinds back behind you aka how quickly the bucket refills with tokens. I really hope that explanation makes sense to you!

Some comments about the implementations I found: both use 3 std::atomic variables where only one is actually needed (unless you want the ability to change bucket capacity and token rate reliably after constructing an instance in a multi-threaded environment, which my implementation supports); the code I linked to above only needs to keep the time variable atomic. Both also operate on integers and I felt it could be abstracted better using std::chrono. Finally, there’s no need for any atomics if only one thread is consuming tokens so I decided to create a separate class for such case (not shown below).

Complete source code:
token_bucket.hpp | throttle.cpp



Leave a Reply