#pragma

There are two useful #pragma directives I like to use in my code: one let’s the preprocessor know that you want to include a header fine only once, and another deals with structure packing.

Instead of using the header include guards, which are ugly as sin, use #pragma once at the beginning of your header files, like this:

For structure packing, use #pragma pack directive. It tells the compiler about the default field alignment. On Clang 8.0.0 the sizeof of this structure is 32 bytes:

We can pack it down to 27 bytes by using this directive (it tells the compiler to align all member fields on one byte boundary; this is useful when designing efficient network protocols or data serialization):

You can also show, at compile time, what the current packing alignment is with #pragma pack(show). Current alignment can be pushed onto a stack, then reverted back, with #pragma pack(push, 1) followed by #pragma pack(pop).

Complete listing (pragma.cpp):

32, 27

Program output.

Exception safe assignment

Longer title: exception safe assignment operator of resource owning objects. Uff. Because the object owns a resource, how do we write an exception safe assignment operator which will have to free up the old and allocate the new resource. By exception safe I don’t mean that it will never throw, that’s not possible. Instead, I mean safe in the sense that it either succeeds OR in case of exceptions the state of assigned to object is exactly as it was prior to the assignment. Like this:

If assignment operator s1 = s2 throws an exception, we want the state of s1 and s2 to be as it was in line #3.

The trick is two fold: 1) a copy constructor is needed, and 2) noexcept swap function. Like this:

Here the copy constructor allocates the new resource first, then copies its content; the swap function just swaps pointers to the resources, which is always a noexcept operation. Having implemented a copy constructor and swap function we can now implement every assignment operator to have a strong exception guarantee like this:

Here’s how it works: we first make a temporary copy, which does the resource allocation. At this stage exceptions can be thrown, but we have not yet modified the assigned to object. Only after the resource allocation succeeds do we perform the noexcept swap. The destructor of your temporary object will take care of cleaning up the currently owned resource (that’s RAII at its best).

Complete listing (assignment.cpp):

S()
S()
operator = (const S&)
S(const S&)
~S()
~S()
~S()

Program output.

Hashing the C++ way

Modern C++ brought us std::hash template (read more about it here). In short: it’s a stateless function object that implements operator() which takes an instance of a type as parameter and returns its hash as size_t. It has specializations for all primitive types as well as some library types. You can also specialize it yourself for your own data types (don’t forget to put your specialization in namespace std). Let’s see how it works by hashing some ints, chars, floats, pointers, strings, and our own custom data type. Pay close attention to the hash values of ints and chars…

hash.cpp:

Hash of ‘1’: 1
Hash of ‘2’: 2
Hash of ‘3’: 3


Hash of ‘A’: 65
Hash of ‘B’: 66
Hash of ‘C’: 67


Hash of ‘1.1’: 1066192077
Hash of ‘1.2’: 1067030938
Hash of ‘1.3’: 1067869799


Hash of ‘0x7f95fdd000a0’: 6424303057458324486
Hash of ‘0x7f95fdd000a1’: 6736290418105006831
Hash of ‘0x7f95fdd000a2’: 13890240933949840298


Hash of ‘Vorbrodt’s C++ Blog’: 435643587581864924
Hash of ‘Vorbrodt’s C++ Blog’: 435643587581864924
Hash of ‘https://vorbrodt.blog’: 13293888041758778516


Hash of ‘Vorbrodt’s C++ Blog,https://vorbrodt.blog’: 8570762348687434484
Hash of ‘Vorbrodt’s C++ Blog,https://vorbrodt.blog’: 8570762348687434484
Hash of ‘https://vorbrodt.blog,Vorbrodt’s C++ Blog’: 13000220508453909292

Data alignment the C++ way

Before modern C++ the only way to align variables or structures on a given byte boundary was to inject padding; to align a struct to 16 bytes you had to do this:

Not any more! Modern C++ introduced a keyword just for that: alignas (read more about it here). Now you can specify struct’s alignment like this:

This can be of great help when dealing with constructive or destructive interference of L1 cache lines. You can also space local variables apart, as well as struct/class members. Here’s a complete example (alignas.cpp):

sizeof(Old): 16
sizeof(New): 16
Address of ‘x’      : 0x7ffee4a448c0
Address of ‘y’      : 0x7ffee4a448d0
Address of ‘z’      : 0x7ffee4a448e0
Distance ‘x’ to ‘y’ : 16
Distance ‘y’ to ‘z’ : 16
sizeof(Empty)  : 1
sizeof(Empty64): 64
sizeof(Full): 64

Program output.

Simple file I/O

I was playing around with file I/O the C++ way and decided to create a file hashing program using ifstream and Botan crypto library. The program reads an entire file specified as the command line argument and takes the SHA1 hash of the content. It’s amazing what you can accomplish with well designed frameworks in very little code. Here’s the program (file_hash.cpp):

Better bloom filter

Based on this implementation it supports multiple hashes for better positive hit ratio. Can be initializes with size in bits and number of hashes to perform, like this: bloom_filter bloom(128, 5);
As always, complete implementation on GitHub: bloom.hpp.

Bloom Filters

From Wikipedia:

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.

In other words, given a set of elements, bloom filter can tell you that: A) given element is definitely not in the set, or B) given element is maybe in the set. It can give you a false positive: it can say that an element is in the set when it in fact is not. But it will never give you a false negative: it will never say that an element is not in the set when in fact it is.

In my past life I used bloom filters to check whether or not I should perform an expensive database index search 🙂

In the following example I construct a bloom filter given a set of strings set1, I then verify that each element of set1 is in the set according to the bloom filter. Finally I try a different set of elements set2, and test what the bloom filter says about those elements. Given big enough bloom filter I get 100% correct answers (that non of the elements in set2 are present). Here’s the code (bloom.cpp):

Contains “Martin” : 1
Contains “Vorbrodt” : 1
Contains “C++” : 1
Contains “Blog” : 1
Contains “Not” : 0
Contains “In” : 0
Contains “The” : 0
Contains “Set” : 0

Program output.

My implementation of the bloom filter is primitive: it uses only one hashing function which increases false positives, but it is very short and clean and can be built upon; maybe at some point I’ll write a full blown implementation; though there’s plenty of examples online; I want this post to be an introduction to the topic.

In any case, here’s my implementation of the bloom filter (bloom.hpp):

C++ Attributes

C++11 introduced standard attributes: a way to mark fragments of code with useful information for the developer or optimization information for the compiler. See a complete list of standard attributes here, Clang attributes here, and Microsoft attributes here. I will go over a few of them in this post.

  • [[nodiscard]] – when specified with a function declaration, tells the compiler to emit a warning if function’s return value is ignored; when specified with a struct emits a warning wherever the struct is returned and ignored.
  • [[fallthrough]] – suppresses compiler warning about a switch-case statement without a break; in other words, a case statement that falls through into another case.
  • [[no_unique_address]] – tells the compiler to perform empty base optimization on marked data member.
  • [[deprecated]] – emits a warning when marked function, struct, namespace, or variable is used.
  • [[noreturn]] – tells the compiler that the marked function never returns; emits a warning if it does.
  • [[maybe_unused]] – suppressed a warning when marked variable, function, or argument is not used.
  • [[likely]] – tell the compiler this is the most likely path of execution; allows for optimizations.

Note: not all are supported on every compiler; I tested on LLVM 8.0.0 and GCC 8.2. Luckily the unsupported ones do not cause a compile error 🙂

Below is an example code and a screenshot of compiler messages.

attributes.cpp:


Compiler warnings; LLVM 8.0.0.

Initialization list exceptions and raw pointers

What to do when an exception is thrown on the initialization list when allocating memory for a raw pointer? The situation is easy if your class only has one raw pointer member, but it gets complicated with two or more. Here’s a code example that’s guaranteed to leak memory if the second new int throws an exception (because the destructor will not be called):

There is no way to free the memory allocated to p1 if p2(new int) throws! Let’s build on my previous example and see what happens if we use a function try-catch block on the constructor:

Still no good! Because accessing p1 and p2 in the catch block leads to undefined behavior. See here.

The only way to guarantee correct behavior is to use smart pointers. This works because 1) the initialization list allocates in pre-defined order (the order of member declaration) and 2) the destructors of already created members will be called. Here’s the correct way of allocating multiple pointers:

This is guaranteed to do proper cleanup if the second make_unique<int>() throws std::bad_alloc 🙂

Complete listing (bad_pointer.cpp):

Function try-catch blocks

Syntactic sugar or a useful feature? More than just sweet sweet sugar baby! This little known feature is a nice way of wrapping an entire function in a try catch block. So instead of writing this:

You can write this:

The meaning of the two functions is identical. Notice here I’m swallowing the exception instead of propagating it out. I could call throw to re-throw it, or in both cases I could throw a different exception inside the catch block.

The caveat is with function try-catch blocks around constructors: they have to re-throw the same or different exception. If you don’t re-throw explicitly the compiler will do it for you. It is also useful for catching exceptions emitted during the initialization of member variables, and throwing something else (or re-throwing the same). Like this:

Complete listing (try_block.cpp):

Swallowing: System error from eat_it()
Swallowing: System error from eat_it_sugar()
Inside Q::Q() caught: Logic error from P::P()
Inside main() caught: Runtime error from Q::Q()

Program output.

The #1 rule of cryptography

The #1 rule of cryptography: Don’t invent your own!

OK wiseman, now what? You want to add crypto to your program but you don’t want to code it all yourself. I’ll show you three libraries that make it possible. The choice will be yours as to which one to use.

For this example I wanted to write a simple function that accepts a std::string message and returns hex encoded SHA-1 hash. I picked the following libraries: Crypto++, WolfSSL, and Botan. All three made it pretty easy, and I don’t want to get into the business of picking winners and losers, but… Botan mad it a breeze and I think it will be my choice going forward 🙂

crypto.cpp:

Message: Vorbrodt’s C++ Blog @ https://vorbrodt.blog
Digest : 24BCAC1359AA8B773D38D6A05B22BB43DAB5B8E5

Message: Vorbrodt’s C++ Blog @ https://vorbrodt.blog
Digest : 24BCAC1359AA8B773D38D6A05B22BB43DAB5B8E5

Message: Vorbrodt’s C++ Blog @ https://vorbrodt.blog
Digest : 24BCAC1359AA8B773D38D6A05B22BB43DAB5B8E5

Program output.

{fmt}

I found this cool little text formatting library with very clean interface and wanted to share it with you. I decided the best way to introduce it to you is not through an extensive tutorial but rather code which illustrates how to use it; so I wrote a program which does the same thing in twelve different ways using this library… plus few extra examples of text coloring, formatting, and alignment. Take a look at the code and the program output and it will all make sense.

fmt.cpp:

The answer is 42
The answer is 42
The answer is 42
The answer is 42
The answer is 42
The answer is 42
The answer is 42
The answer is 42
The answer is 42.00
The answer is 42.00
The answer is 42.00
The answer is 42.00
The text is bold
The color is red and green
The date and time is 2019-03-31 09:03:45
left aligned——————
—————–right aligned
———–centered———–

Program output.
Linux screenshot.

SSO of std::string

What is short/small string optimization? It’s a way to squeeze some bytes into a std::string object without actually allocating them on the heap. It’s a hackery involving C++ unions and clever space management. Say sizeof(std::string) is, oh I don’t know, 24 bytes on Mac’s LLVM? The implementation manages to squeeze 22 characters into that (not including the terminating NULL) before having to allocate on the heap. Impressive. Less impressive is GCC’s implementation on Linux, with sizeof(std::string) being 32 bytes but only 15 can be optimized before going to the heap. I used to have this number for Visual Studio’s implementation but… see the rant above 😛 The capacity of an empty string is the give away for how much you can fit in it before going to the heap 😉

Check it out yourself on your favorite compiler with the code below!

sso.cpp:

sizeof  : 24
Capacity: 22
Small   : 22
Big     : 31

Program output (LLVM on Mac).

HTTP queries

Today I want to show you how to use cURLpp (C++ wrapper around libcURL) to make a simple HTTP query to ip-api.com in order to retrieve geolocation information of a given host or IP address. I chose cURLpp because it’s simple and easy to use; the example program would not have been any harder using libcURL C API but this is a C++ blog after-all 🙂 I will be using Boost Property Tree library to deserialize the JSON geo-ip data. All of that is achieved in 15, give or take, lines of actual code… that’s the power of simple and well designed C++ libraries!

The program starts off by setting up a RAII object of type curlpp::Cleanup which initializes and cleans up cURLpp library. We then create a request object of type curlpp::Easy and an output std::stringstream where the received data will be placed. Next we setup some options like verbosity level, URL, port, the output stream, and we execute the query. Finally we parse the JSON data using read_json and iterate over the ptree structure to print it to the console.

geoip.cpp:

*   Trying 69.195.146.130…
* TCP_NODELAY set
* Connected to ip-api.com (69.195.146.130) port 80 (#0)
> GET /json/vorbrodt.blog HTTP/1.1
Host: ip-api.com
Accept: */*

< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Content-Type: application/json; charset=utf-8
< Date: Fri, 29 Mar 2019 23:17:53 GMT
< Content-Length: 284
< 
* Connection #0 to host ip-api.com left intact


as = AS46606 Unified Layer
city = Provo
country = United States
countryCode = US
isp = Unified Layer
lat = 40.2067
lon = -111.643
org = Unified Layer
query = 162.241.253.105
region = UT
regionName = Utah
status = success
timezone = America/Denver
zip = 84606

Program output.

C-style callbacks and lambda functions

You can use a non-capturing lambda function with C-style APIs that expect a function pointer. As long as the signatures of the callback and the lambda match, the lambda will be cast to a function pointer (or you could define a “positive lambda”, one with a + in front of it; this causes automatic conversion to a function pointer). This works because the compiler converts non-capturing lambdas to actual functions and stores them inside the compiled binary. Effectively a pointer to locally defined lambda is valid for the life of the program.

In the program below I define a callback with the following signature: typedef void(*FuncPtr)(int arg) and two C-style functions that use it: void set_callback(FuncPtr fp) and void fire_callback(int arg). I then call set_callback with a positive lambda. The program works 🙂

c_api_lambda.cpp:

42

Program output.

Propagate exceptions across threads

What if you need to catch an exception in a worker thread and re-throw it in the main thread that’s waiting for the worker to finish? std::future works this way. If you spawn a future on a new thread using std::async(std::launch::async, ...); and that future’s worker throws an exception, when you later call get() on the future it will emit that exception.

You do it by wrapping the worker thread’s function in try { /* CODE */ } catch(...) {} and capturing the current exception pointer ( std::exception_ptr) using std::current_exception. You can then re-throw the captured exception using the pointer and std::rethrow_exception. Below is an example that illustrates this technique. Just remember, if you have multiple worker threads make sure to have multiple std::exception_ptr instances; one per worker thread.

exceptions.cpp:

Thread 0x1048d75c0 caught exception from thread 0x700001cf3000

Program output.

int main()

I have been spanked by certain commenter (who shall not remain anonymous 😉 ) on here and FB about my style of naming unused main arguments and unnecessary return 1; at the end of every main function.

I have though about and I… concede the point of his argument 🙂 From now on the style on this blog shall be as follows (if arguments to main are not needed):

P.S. C++ standard allows for two valid signatures: int main() and int main(int argc, char** argv), see here.

XML-RPC

XML-RPC is yet another method of implementing remote procedure calls. It used XML over HTTP to transmit data. In my past live working at TLO I used XML-RPC-C library to implement communication between cluster nodes and a cluster management system. I thought the library was well designed and easy to use so I wanted to introduce you to it.

Below is a simple client and server implementation using the XML-RPC-C library. The server implements one RPC that accepts one string parameter and returns one string. The client makes the call to the server saying hello and prints the reply. The code is easy to read and does not need any further explanation 🙂

The client. xmlrpc_c.cpp:

The server. xmlrpc_s.cpp:

Base64 encoding

Base64 encoding: turning binary data into ASCII text for the purpose of saving it to text files like XML, or transmitting it over protocols like HTTP, or embedding it into web page files, and many other purposed. That’s the basic idea behind it. For every 3 bytes of input you get 4 bytes of output, so it’s not crazy inflated.

I was looking for a library that has built in, easy to use and clean base64 encoding and decoding functions but didn’t really find anything to my liking. So I looked for a reference implementation and found one at wikibooks.org. Their C++ implementation (released to public domain so I could freely use and modify it) was my starting point. I beautified the code and brought it closer to modern C++ 🙂 So now you have a header only, clean base64 encode and decode functions you can use in your projects: base64.hpp.

I wrote a program that encodes and decodes an input string, checks the original against the decoded one, and also checks the encoded base64 text against reference base64 string taken from wiki. The implementation checks out and produces correct encoded strings and decoded data 🙂 Below is the test program.

base64.cpp:

Input: Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.

Reference:

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ 1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3 aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmF uY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWd hYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydC B2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

Encoded:

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ 1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3 aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmF uY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWd hYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydC B2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

Encoded data matches reference :o)

Decoded: Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.

Decoded data matches original :o)

Program output.

And here is the encoder and decoder function implementation.

base64.hpp:

Extremely Fast Compression Algorithm

LZ4. GitHub repository here. It is open-source, available on pretty much every platform, and widely used in the industry.

It was extremely easy to get started with it. The C API could not possibly be any simpler (I’m looking at you zlib 😛 ); you pass in 4 parameters to the compression and decompression functions: input buffer, input length, output buffer, and max output length. They return either the number of bytes produced on the output, or an error code. Just be careful when compressing random data (which you should not be doing anyways!): the output is larger than the input!

Here’s a short example that compresses a vector of thousand characters:

compression.cpp:

LZ4 compress, bytes in: 1000, bytes out: 14
LZ4 decompress, bytes in: 14, bytes out: 1000
Decompressed data matches original :o)

Program output.