2019 | Vorbrodt's C++ Blog

AoS vs SoA performance

Posted on January 31, 2019March 28, 2021 by Martin

There is an interesting set of articles at Fluent{C++} discussing AoS vs SoA.
I decided to do my own performance test: to iterate and accumulate data over an array of 10,000,000 structures that describe a person:

struct Person
{
    string name;
    uint8_t age;
    uint32_t dob;
};

struct Person

{

string name;

uint8_t age;

uint32_t dob;

};

Versus a structure of arrays of the same data:

struct Persons
{
    vector<string> names;
    vector<uint8_t> ages;
    vector<uint32_t> dobs;
};

struct Persons

{

vector<string> names;

vector<uint8_t> ages;

vector<uint32_t> dobs;

};

Below are the numbers measured on my 2012 MacBook Pro 2.3 GHz Intel Core i7. The code was compiled with maximum optimizations using latest GCC, Apple Clang, and LLVM compilers available for Mac OS:

GCC -Ofast -march=native -lstdc++
AoS duration 108.039 ms
SoA duration 42.228 ms

Apple CLANG -Ofast -march=native -lc++
AoS duration 64.001 ms
SoA duration 24.916 ms

LLVM CLANG -Ofast -march=native -lc++
AoS duration 67.579 ms
SoA duration 22.620 ms

Conclusion: locality of reference matters 🙂

Complete listing:

#include <iostream>
#include <iomanip>
#include <string>
#include <chrono>
#include <vector>
#include <numeric>

using namespace std;
using namespace chrono;

const int ELEMS = 10'000'000;

struct Person
{
    Person(const string& n, uint8_t a, uint32_t d)
    : name(n), age(a), dob(d) {}
    
    string name;
    uint8_t age;
    uint32_t dob;
};

using VP = vector<Person>;

void addPerson(VP& v, Person&& p) { v.push_back(move(p)); }

uint64_t averageNameLen(const VP& v)
{
    return accumulate(begin(v), end(v), (uint64_t)0,
        [](auto sum, auto& p) { return sum + p.name.length(); }) / v.size();
}

uint64_t averageAge(const VP& v)
{
    return accumulate(begin(v), end(v), (uint64_t)0,
        [](auto sum, auto& p) { return sum + p.age; }) / v.size();
}

uint64_t averageDob(const VP& v)
{
    return accumulate(begin(v), end(v), (uint64_t)0,
        [](auto sum, auto& p) { return sum + p.dob; }) / v.size();
}

struct Persons
{
    vector<string> names;
    vector<uint8_t> ages;
    vector<uint32_t> dobs;
    
    void addPerson(const string& n, uint8_t a, uint32_t d)
    {
        names.push_back(n);
        ages.push_back(a);
        dobs.push_back(d);
    }
    
    uint64_t averageNameLen() const
    {
        return accumulate(begin(names), end(names), (uint64_t)0,
            [](auto sum, auto& n) { return sum + n.length(); }) / names.size();
    }
    
    uint64_t averageAge() const
    {
        return accumulate(begin(ages), end(ages), (uint64_t)0) / ages.size();
    }
    
    uint64_t averageDob() const
    {
        return accumulate(begin(dobs), end(dobs), (uint64_t)0) / dobs.size();
    }
};

int main(int argc, char** argv)
{
    VP v1;
    v1.reserve(ELEMS);
    for(int i = 0; i < ELEMS; ++i)
        addPerson(v1, Person(string(string().capacity(), 'N'), i % 0xFF, i % 0xFFFF));
    
    auto start_time = high_resolution_clock::now();
    auto sum = averageNameLen(v1);
    sum += averageAge(v1);
    sum += averageDob(v1);
    auto end_time = high_resolution_clock::now();
    cout  << fixed << setprecision(3);
    cout << "AoS duration " << duration_cast<microseconds>(end_time - start_time).count() / 1000.f << " ms" << endl;
    v1.clear();
    v1.shrink_to_fit();

    Persons p;
    p.names.reserve(ELEMS);
    p.ages.reserve(ELEMS);
    p.dobs.reserve(ELEMS);
    for(int i = 0; i < ELEMS; ++i)
        p.addPerson(string(string().capacity(), 'N'), rand() % 0xFF, rand() % 0xFFFF);
    
    start_time = high_resolution_clock::now();
    sum += p.averageNameLen();
    sum += p.averageAge();
    sum += p.averageDob();
    end_time = high_resolution_clock::now();
    cout << "SoA duration " << duration_cast<microseconds>(end_time - start_time).count() / 1000.f << " ms" << endl;
    
    return sum;
}

100

101

102

103

104

105

106

107

#include <iostream>

#include <iomanip>

#include <string>

#include <chrono>

#include <vector>

#include <numeric>

using namespace std;

using namespace chrono;

const int ELEMS = 10'000'000;

struct Person

{

Person(const string& n, uint8_t a, uint32_t d)

: name(n), age(a), dob(d) {}

string name;

uint8_t age;

uint32_t dob;

};

using VP = vector<Person>;

void addPerson(VP& v, Person&& p) { v.push_back(move(p)); }

uint64_t averageNameLen(const VP& v)

{

return accumulate(begin(v), end(v), (uint64_t)0,

[](auto sum, auto& p) { return sum + p.name.length(); }) / v.size();

}

uint64_t averageAge(const VP& v)

{

return accumulate(begin(v), end(v), (uint64_t)0,

[](auto sum, auto& p) { return sum + p.age; }) / v.size();

}

uint64_t averageDob(const VP& v)

{

return accumulate(begin(v), end(v), (uint64_t)0,

[](auto sum, auto& p) { return sum + p.dob; }) / v.size();

}

struct Persons

{

vector<string> names;

vector<uint8_t> ages;

vector<uint32_t> dobs;

void addPerson(const string& n, uint8_t a, uint32_t d)

{

names.push_back(n);

ages.push_back(a);

dobs.push_back(d);

}

uint64_t averageNameLen() const

{

return accumulate(begin(names), end(names), (uint64_t)0,

[](auto sum, auto& n) { return sum + n.length(); }) / names.size();

}

uint64_t averageAge() const

{

return accumulate(begin(ages), end(ages), (uint64_t)0) / ages.size();

}

uint64_t averageDob() const

{

return accumulate(begin(dobs), end(dobs), (uint64_t)0) / dobs.size();

}

};

int main(int argc, char** argv)

{

VP v1;

v1.reserve(ELEMS);

for(int i = 0; i < ELEMS; ++i)

addPerson(v1, Person(string(string().capacity(), 'N'), i % 0xFF, i % 0xFFFF));

auto start_time = high_resolution_clock::now();

auto sum = averageNameLen(v1);

sum += averageAge(v1);

sum += averageDob(v1);

auto end_time = high_resolution_clock::now();

cout << fixed << setprecision(3);

cout << "AoS duration " << duration_cast<microseconds>(end_time - start_time).count() / 1000.f << " ms" << endl;

v1.clear();

v1.shrink_to_fit();

Persons p;

p.names.reserve(ELEMS);

p.ages.reserve(ELEMS);

p.dobs.reserve(ELEMS);

for(int i = 0; i < ELEMS; ++i)

p.addPerson(string(string().capacity(), 'N'), rand() % 0xFF, rand() % 0xFFFF);

start_time = high_resolution_clock::now();

sum += p.averageNameLen();

sum += p.averageAge();

sum += p.averageDob();

end_time = high_resolution_clock::now();

cout << "SoA duration " << duration_cast<microseconds>(end_time - start_time).count() / 1000.f << " ms" << endl;

return sum;

}

P.S. I had to do the sum += and return sum; hack. Otherwise the compilers kept optimizing away the averageXXX calls!

What happens when we “new”

Posted on January 30, 2019March 28, 2021 by Martin

Let us take a look at what the C++ compiler generates for us when we use the keywords new, new[], delete, and delete[](I will skip over std::nothrow versions in this post).

new

When you write this:

T* t1 = new T;

1	T* t1 = new T;

The compiler really generates this:

T* t2 = (T*)operator new(sizeof(T));
try
{
     new (t2) T;
}
catch(...)
{
     operator delete(t2);
     throw;
}

T* t2 = (T*)operator new(sizeof(T));

try

{

new (t2) T;

}

catch(...)

{

operator delete(t2);

throw;

}

First, using operator new the memory for T is allocated. If the allocation fails it will throw std::bad_alloc exception. Next (line 4) the object of type T is being constructed in the memory address t2 (this is called placement new). This however may fail due to T’s constructor throwing an exception. The language guarantees that if that happens, the allocated memory will be freed. The code in line 6 will catch any exception emitted by T’s constructor. Line 8 will then free the memory, and finally line 9 will re-throw the exception.

delete

This one-liner:

delete t1;

1	delete t1;

Becomes this:

t2->~T();
operator delete(t2);

1 2	t2->~T(); operator delete(t2);

In line 1 T’s destructor is called, and in line 2 the memory is freed using operator delete. All is good until T’s destructor throws an exception. Noticed the lack of try-catch block? If T’s destructor throws, line 2 will never be executed. This is a memory leak and reason why you should never throw exceptions from destructors; see Effective C++ by Scott Meyers, Item #8.

new[]

Things are about to get interesting. This innocent looking snippet of code:

#define HOW_MANY 3
T* t3 = new T[HOW_MANY];

1 2	#define HOW_MANY 3 T* t3 = new T[HOW_MANY];

Becomes (gasp), more or less 😉 , this:

T* t4 = (T*)operator new(sizeof(size_t) + HOW_MANY * sizeof(T));
*((size_t*)t4) = HOW_MANY;
t4 = (T*)(((std::byte*)t4) + sizeof(size_t));
for(size_t i = 0; i < HOW_MANY; ++i)
{
    try
    {
        new (t4 + i) T;
    }
    catch(...)
    {
        for(size_t i2 = 0; i2 < i; ++i2)
            t4[i2].~T();
        t4 = (T*)(((std::byte*)t4) - sizeof(size_t));
        operator delete(t4);
        throw;
    }
}

T* t4 = (T*)operator new(sizeof(size_t) + HOW_MANY * sizeof(T));

*((size_t*)t4) = HOW_MANY;

t4 = (T*)(((std::byte*)t4) + sizeof(size_t));

for(size_t i = 0; i < HOW_MANY; ++i)

{

try

{

new (t4 + i) T;

}

catch(...)

{

for(size_t i2 = 0; i2 < i; ++i2)

t4[i2].~T();

t4 = (T*)(((std::byte*)t4) - sizeof(size_t));

operator delete(t4);

throw;

}

So what happens when we allocate an array of objects on the heap? Where do we store the number of objects allocated which needs to be referenced later when deleting the array?
The compiler will generate code that allocates enough space for HOW_MANY instances of T, plus sizeof(size_t) bytes to store the number of objects created and will place the object count at offset 0 of the allocated memory block. Moreover the language guarantees that it will either allocate and construct ALL instances of T, or none. So what happens if halfway through constructing the objects in the array one of the constructors throws an exception? The compiler will generate proper cleanup code to deal with this scenario. This is what the above code illustrates. Let’s go through it line by line.
Line 1 allocates enough space for HOW_MANY objects plus sizeof(size_t) bytes for the number of objects in the array.
Line 2 stores the number of objects at offset 0 of the memory block.
Line 3 advances the pointer t4 by sizeof(size_t) bytes to where the first T will be created.
Line 4 is the loop inside which we’ll be creating the instances of T one by one.
Line 8 creates an instance of T at the right memory location inside the array using placement new (same as when constructing a single instance of T).
Line 10 catches any possible exception from T’s constructor and begins the cleanup block of partially constructed array.
Line 12 defines the loop that will destroy all instances of T created up to the point of exception being thrown.
Line 13 calls the destructor of T.
Line 14 moves the pointer t4 back by sizeof(size_t) bytes to the beginning of the memory block allocated for the array.
Line 15 frees the previously allocated memory block.
Line 16, after having done all the cleanup, re-throws the exception.

delete[]

The following array delete statement:

delete [] t3;

1	delete [] t3;

Is turned into something like this:

size_t how_many = *(size_t*)(((std::byte*)t4) - sizeof(size_t));
for(size_t i = 0; i < how_many; ++i)
    t4[i].~T();
t4 = (T*)(((std::byte*)t4) - sizeof(size_t));
operator delete(t4);

size_t how_many = *(size_t*)(((std::byte*)t4) - sizeof(size_t));

for(size_t i = 0; i < how_many; ++i)

t4[i].~T();

t4 = (T*)(((std::byte*)t4) - sizeof(size_t));

operator delete(t4);

The compiler now has to generate code to first destroy all the instances of T stored in the array before deallocating the memory. After allocating the array of T’s the t4 pointed at the first object in the array, not at the beginning of the memory block. That’s why…
Line 1 subtracts sizeof(size_t) bytes from t4 and retrieves the number of objects in the array.
Line 2 starts the loop that will call every T’s destructor.
Line 3 destroys each instance of T.
Line 4 moves the t4 pointer back by sizeof(size_t) bytes to point at the beginning of the memory block.
Line 5, after all T’s have been destroyed, frees the memory block.
Due to a lack of try-catch block the same principle of not throwing exceptions from destructors applies here as well.

Complete listing:

#include <iostream>
#include <iomanip>
#include <new>

struct T
{
    T() { std::cout << "T() @ " << std::hex << std::showbase << this << std::endl; }
    ~T() { std::cout << "~T() @ " << std::hex << std::showbase << this << std::endl; }
};

int main(int argc, char** argv)
{
    // new
    // this:
    T* t1 = new T;
    // becomes this:
    T* t2 = (T*)operator new(sizeof(T));
    try
    {
        new (t2) T;
    }
    catch(...)
    {
        operator delete(t2);
        throw;
    }

    // delete
    // this:
    delete t1;
    // becomes this:
    t2->~T();
    operator delete(t2);

    // new []
    // this:
    #define HOW_MANY 3
    T* t3 = new T[HOW_MANY];
    // becomes this:
    T* t4 = (T*)operator new(sizeof(size_t) + HOW_MANY * sizeof(T));
    *((size_t*)t4) = HOW_MANY;
    t4 = (T*)(((std::byte*)t4) + sizeof(size_t));
    for(size_t i = 0; i < HOW_MANY; ++i)
    {
        try
        {
            new (t4 + i) T;
        }
        catch(...)
        {
            for(size_t i2 = 0; i2 < i; ++i2)
                t4[i2].~T();
            t4 = (T*)(((std::byte*)t4) - sizeof(size_t));
            operator delete(t4);
            throw;
        }
    }

    // delete []
    // this:
    delete [] t3;
    // becomes:
    size_t how_many = *(size_t*)(((std::byte*)t4) - sizeof(size_t));
    for(size_t i = 0; i < how_many; ++i)
        t4[i].~T();
    t4 = (T*)(((std::byte*)t4) - sizeof(size_t));
    operator delete(t4);

    return 1;
}

#include <iostream>

#include <iomanip>

#include <new>

struct T

{

T() { std::cout << "T() @ " << std::hex << std::showbase << this << std::endl; }

~T() { std::cout << "~T() @ " << std::hex << std::showbase << this << std::endl; }

};

int main(int argc, char** argv)

{

// new

// this:

T* t1 = new T;

// becomes this:

T* t2 = (T*)operator new(sizeof(T));

try

{

new (t2) T;

}

catch(...)

{

operator delete(t2);

throw;

}

// delete

// this:

delete t1;

// becomes this:

t2->~T();

operator delete(t2);

// new []

// this:

#define HOW_MANY 3

T* t3 = new T[HOW_MANY];

// becomes this:

T* t4 = (T*)operator new(sizeof(size_t) + HOW_MANY * sizeof(T));

*((size_t*)t4) = HOW_MANY;

t4 = (T*)(((std::byte*)t4) + sizeof(size_t));

for(size_t i = 0; i < HOW_MANY; ++i)

{

try

{

new (t4 + i) T;

}

catch(...)

{

for(size_t i2 = 0; i2 < i; ++i2)

t4[i2].~T();

t4 = (T*)(((std::byte*)t4) - sizeof(size_t));

operator delete(t4);

throw;

}

// delete []

// this:

delete [] t3;

// becomes:

size_t how_many = *(size_t*)(((std::byte*)t4) - sizeof(size_t));

for(size_t i = 0; i < how_many; ++i)

t4[i].~T();

t4 = (T*)(((std::byte*)t4) - sizeof(size_t));

operator delete(t4);

return 1;

}

enum to string, take 2

Posted on January 29, 2019March 28, 2021 by Martin

Thanks to Sebastian Mestre (who commented on my previous post) I learned something new today 🙂 The X macro makes the enum to string code much… well, perhaps not cleaner, but shorter 😉 It also solves the problem of having to update the code in 2 places: 1st in the enum itself, 2nd in the enum to string map.
Without further ado I present to you the X macro:

And the complete implementation goes something like this:

#include <iostream>

#define MY_ENUM \
    X(V1) \
    X(V2) \
    X(V3)

#define X(name) name,

enum MyEnum
{
    MY_ENUM
};

#undef X

constexpr const char* MyEnumToString(MyEnum e) noexcept
{
    #define X(name) case(name): return #name;
    switch(e)
    {
        MY_ENUM
    }
    #undef X
}

int main(int argc, char** argv)
{
    std::cout << MyEnumToString(V1) << std::endl;
    std::cout << MyEnumToString(V2) << std::endl;
    std::cout << MyEnumToString(V3) << std::endl;
    return 1;
}

#include <iostream>

#define MY_ENUM \

X(V1) \

X(V2) \

X(V3)

#define X(name) name,

enum MyEnum

{

MY_ENUM

};

#undef X

constexpr const char* MyEnumToString(MyEnum e) noexcept

{

#define X(name) case(name): return #name;

switch(e)

{

MY_ENUM

}

#undef X

}

int main(int argc, char** argv)

{

std::cout << MyEnumToString(V1) << std::endl;

std::cout << MyEnumToString(V2) << std::endl;

std::cout << MyEnumToString(V3) << std::endl;

return 1;

}

In this version we only have to update the MY_ENUM macro. The rest is taken care of by the preprocessor.

UPDATE

Here’s the same approach that works with strongly typed enums:

#include <iostream>

#define MY_ENUM \
X(V1) \
X(V2) \
X(V3)

#define X(name) name,

#define MY_ENUM_NAME MyEnum

enum class MY_ENUM_NAME : char
{
    MY_ENUM
};

#undef X

constexpr const char* MyEnumToString(MyEnum e) noexcept
{
#define X(name) case(MY_ENUM_NAME::name): return #name;
    switch(e)
    {
            MY_ENUM
    }
#undef X
}

int main(int argc, char** argv)
{
    std::cout << MyEnumToString(MyEnum::V1) << std::endl;
    std::cout << MyEnumToString(MyEnum::V2) << std::endl;
    std::cout << MyEnumToString(MyEnum::V3) << std::endl;
    return 1;
}

#include <iostream>

#define MY_ENUM \

X(V1) \

X(V2) \

X(V3)

#define X(name) name,

#define MY_ENUM_NAME MyEnum

enum class MY_ENUM_NAME : char

{

MY_ENUM

};

#undef X

constexpr const char* MyEnumToString(MyEnum e) noexcept

{

#define X(name) case(MY_ENUM_NAME::name): return #name;

switch(e)

{

MY_ENUM

}

#undef X

}

int main(int argc, char** argv)

{

std::cout << MyEnumToString(MyEnum::V1) << std::endl;

std::cout << MyEnumToString(MyEnum::V2) << std::endl;

std::cout << MyEnumToString(MyEnum::V3) << std::endl;

return 1;

}

UPDATE 2

Here’s the same approach that works with strongly typed enums, custom enum values, and enum values outside of the defined ones:

#include <iostream>

#define MY_ENUM \
X(V1, -1) \
X(V2, -3) \
X(V3, -5)

#define X(name, value) name = value,

#define MY_ENUM_NAME MyEnum
#define MY_ENUM_TYPE int

enum class MY_ENUM_NAME : MY_ENUM_TYPE
{
	MY_ENUM
};

#undef X

constexpr auto MyEnumToString(MY_ENUM_NAME e) noexcept
{
#define X(name, value) case(MY_ENUM_NAME::name): return #name;
	switch(e)
	{
		MY_ENUM
	}
#undef X
	return "UNKNOWN";
}

int main(int argc, char** argv)
{
	std::cout << "value = " << (MY_ENUM_TYPE)MyEnum::V1 << ", name = " << MyEnumToString(MY_ENUM_NAME::V1) << std::endl;
	std::cout << "value = " << (MY_ENUM_TYPE)MyEnum::V2 << ", name = " << MyEnumToString(MY_ENUM_NAME::V2) << std::endl;
	std::cout << "value = " << (MY_ENUM_TYPE)MyEnum::V3 << ", name = " << MyEnumToString(MY_ENUM_NAME::V3) << std::endl;

	MY_ENUM_NAME unknown{-42};
	std::cout << "value = " << (MY_ENUM_TYPE)unknown << ", name = " << MyEnumToString(unknown) << std::endl;

	return 1;
}

#include <iostream>

#define MY_ENUM \

X(V1, -1) \

X(V2, -3) \

X(V3, -5)

#define X(name, value) name = value,

#define MY_ENUM_NAME MyEnum

#define MY_ENUM_TYPE int

enum class MY_ENUM_NAME : MY_ENUM_TYPE

{

MY_ENUM

};

#undef X

constexpr auto MyEnumToString(MY_ENUM_NAME e) noexcept

{

#define X(name, value) case(MY_ENUM_NAME::name): return #name;

switch(e)

{

MY_ENUM

}

#undef X

return "UNKNOWN";

}

int main(int argc, char** argv)

{

std::cout << "value = " << (MY_ENUM_TYPE)MyEnum::V1 << ", name = " << MyEnumToString(MY_ENUM_NAME::V1) << std::endl;

std::cout << "value = " << (MY_ENUM_TYPE)MyEnum::V2 << ", name = " << MyEnumToString(MY_ENUM_NAME::V2) << std::endl;

std::cout << "value = " << (MY_ENUM_TYPE)MyEnum::V3 << ", name = " << MyEnumToString(MY_ENUM_NAME::V3) << std::endl;

MY_ENUM_NAME unknown{-42};

std::cout << "value = " << (MY_ENUM_TYPE)unknown << ", name = " << MyEnumToString(unknown) << std::endl;

return 1;

}

enum to string

Posted on January 28, 2019March 28, 2021 by Martin

A fellow engineer at work asked me today how to convert an enum to a string value. This is what I came up with:

#include <iostream>
#include <map>
#include <cassert>

enum class MyEnum : int
{
    V1,
    V2,
    V3,
    SIZE
};

const char* MyEnumToString(MyEnum e)
{
    using MapType = std::map<MyEnum, const char*>;
    static const MapType kMap =
    {
        { MyEnum::V1, "V1" },
        { MyEnum::V2, "V2" },
        { MyEnum::V3, "V3" },
    };
    assert(kMap.size() == (int)MyEnum::SIZE);
    return kMap.at(e);
}

int main(int argc, char** argv)
{
    std::cout << MyEnumToString(MyEnum::V1) << std::endl;
    std::cout << MyEnumToString(MyEnum::V2) << std::endl;
    std::cout << MyEnumToString(MyEnum::V3) << std::endl;
    return 1;
}

#include <iostream>

#include <map>

#include <cassert>

enum class MyEnum : int

{

V1,

V2,

V3,

SIZE

};

const char* MyEnumToString(MyEnum e)

{

using MapType = std::map<MyEnum, const char*>;

static const MapType kMap =

{

{ MyEnum::V1, "V1" },

{ MyEnum::V2, "V2" },

{ MyEnum::V3, "V3" },

};

assert(kMap.size() == (int)MyEnum::SIZE);

return kMap.at(e);

}

int main(int argc, char** argv)

{

std::cout << MyEnumToString(MyEnum::V1) << std::endl;

std::cout << MyEnumToString(MyEnum::V2) << std::endl;

std::cout << MyEnumToString(MyEnum::V3) << std::endl;

return 1;

}

MyEnumToString function has a static map of MyEnum to const char*. In the assert it checks that the MyEnum::SIZE is equal to the size of the map: so you don’t forget to update the map when you update MyEnum with a new entry 😉

For larger enums would unordered_map be a better choice for the mapping data structure in terms of performance? Probably.

UPDATE

Several people on here and reddit suggested that the use of std::map or even std::unordered_map is an overkill. I agree. Though it is a must for non continuous enums (ones that do not start at 0 and do not increment by 1). Below is an updated MyEnumToString function which offers much faster conversion. It uses std::array to hold the string values.

const char* MyEnumToString(MyEnum e)
{
    using MapType = std::array<const char*, (size_t)MyEnum::SIZE>;
    static const MapType kMap =
    {
        "V1",
        "V2",
        "V3"
    };
    static_assert(kMap.size() == (size_t)MyEnum::SIZE);
    return kMap.at((MapType::size_type)e);
}

const char* MyEnumToString(MyEnum e)

{

using MapType = std::array<const char*, (size_t)MyEnum::SIZE>;

static const MapType kMap =

{

"V1",

"V2",

"V3"

};

static_assert(kMap.size() == (size_t)MyEnum::SIZE);

return kMap.at((MapType::size_type)e);

}

shrink_to_fit that doesn’t suck

Posted on January 27, 2019March 28, 2021 by Martin

Lesson learned the hard way: trust the implementation to do the right thing! Everything else has been strikethrough’ed because my approach was flowed from the beginning. See the quoted comments at the bottom of this post.

Thanks everyone who commented! Looks like this blogging thing will be a great learning exercise for me 🙂

~~From~~ ~~https://en.cppreference.com/w/cpp/container/vector/shrink_to_fit~~

~~It is a non-binding request to reduce~~ ~~capacity()~~ to ~~size(). It depends on the implementation whether the request is fulfilled.~~

In other words, it can do absolutely nothing! Now what?
Let’s test programmatically what the implementation of STL is actually doing, and in case it does nothing let’s shrink the vector another way.

~~The test:~~

bool shrink_to_fit_test()
{
    std::vector test_vector(1);
    test_vector.reserve(2);
    test_vector.shrink_to_fit();
    return test_vector.capacity() == 1;
}

bool shrink_to_fit_test()

{

std::vector test_vector(1);

test_vector.reserve(2);

test_vector.shrink_to_fit();

return test_vector.capacity() == 1;

}

Self explanatory I hope 🙂 we create a vector of 1 std::byte, reserve space for 2 entries, then call shrink_to_fit. Finally we test the capacity; if it’s now 1 instead of 2, the call worked. Great! But what if it failed to shrink the vector?

~~shrink_to_fit that works:~~

template<typename T>
void shrink_to_fit(std::vector<T>& v)
{
    static bool shrinks = shrink_to_fit_test();
    if(shrinks)
        v.shrink_to_fit();
    else
        std::vector<T>(v).swap(v);
}

template<typename T>

void shrink_to_fit(std::vector<T>& v)

{

static bool shrinks = shrink_to_fit_test();

if(shrinks)

v.shrink_to_fit();

else

std::vector<T>(v).swap(v);

}

~~We first test (once, thanks to static) if shrink_to_fit works, if it does, we invoke it. Otherwise we fallback onto a temporary -> swap trick. This works because the temporary vector created from~~ v ~~will have capacity equal to~~ v~~‘s size. See here:~~ ~~https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Shrink-to-fit~~

~~Test of our solution:~~

int main(int argc, const char * argv[])
{
    std::vector<std::byte> v(5);
    v.reserve(10);
    std::cout << v.capacity() << std::endl;
    shrink_to_fit(v);
    std::cout << v.capacity() << std::endl;
    return 0;
}

int main(int argc, const char * argv[])

{

std::vector<std::byte> v(5);

v.reserve(10);

std::cout << v.capacity() << std::endl;

shrink_to_fit(v);

std::cout << v.capacity() << std::endl;

return 0;

}

~~Output of the program from my Xcode console:~~

10
5
Program ended with exit code: 0

~~Complete listing:~~

#include <iostream>
#include <vector>

bool shrink_to_fit_test()
{
    static std::vector<std::byte> test_vector(1);
    test_vector.reserve(2);
    test_vector.shrink_to_fit();
    return test_vector.capacity() == 1;
}

template<typename T>
void shrink_to_fit(std::vector<T>& v)
{
    static bool shrinks = shrink_to_fit_test();
    if(shrinks)
        v.shrink_to_fit();
    else
        std::vector<T>(v).swap(v);
}

int main(int argc, const char * argv[])
{
    std::vector<std::byte> v(5);
    v.reserve(10);
    std::cout << v.capacity() << std::endl;
    shrink_to_fit(v);
    std::cout << v.capacity() << std::endl;
    return 0;
}

#include <iostream>

#include <vector>

bool shrink_to_fit_test()

{

static std::vector<std::byte> test_vector(1);

test_vector.reserve(2);

test_vector.shrink_to_fit();

return test_vector.capacity() == 1;

}

template<typename T>

void shrink_to_fit(std::vector<T>& v)

{

static bool shrinks = shrink_to_fit_test();

if(shrinks)

v.shrink_to_fit();

else

std::vector<T>(v).swap(v);

}

int main(int argc, const char * argv[])

{

std::vector<std::byte> v(5);

v.reserve(10);

std::cout << v.capacity() << std::endl;

shrink_to_fit(v);

std::cout << v.capacity() << std::endl;

return 0;

}

~~UPDATE!~~

~~I shared my blog on~~ ~~Reddit CPP~~ ~~as well as comp.lang.c++ and received many comments about my shrink_to_fit approach:~~

~~Ouch 😉 Very constrictive critique of my implementation. Thank you~~ ~~sakarri~~ ~~and~~ ~~Joerg~~~~! So I decided to rewrite the initial shrink_to_fit function, here it is:~~

template<typename T>
void shrink_to_fit(std::vector<T>& v)
{
    v.shrink_to_fit();
    if(v.capacity() != v.size())
        std::vector<T>(v).swap(v);
}

template<typename T>

void shrink_to_fit(std::vector<T>& v)

{

v.shrink_to_fit();

if(v.capacity() != v.size())

std::vector<T>(v).swap(v);

}

~~In this approach we skip the shrink_to_fit_test and go straight to the vector’s call. If it doesn’t do what we want we do the temporary -> swap trick.~~

~~Here’s the complete listing again:~~

#include <iostream>
#include <vector>

template<typename T>
void shrink_to_fit(std::vector<T>& v)
{
    v.shrink_to_fit();
    if(v.capacity() != v.size())
        std::vector<T>(v).swap(v);
}

int main(int argc, const char * argv[])
{
    std::vector<std::byte> v(5);
    v.reserve(10);
    std::cout << v.capacity() << std::endl;
    shrink_to_fit(v);
    std::cout << v.capacity() << std::endl;
    return 0;
}

#include <iostream>

#include <vector>

template<typename T>

void shrink_to_fit(std::vector<T>& v)

{

v.shrink_to_fit();

if(v.capacity() != v.size())

std::vector<T>(v).swap(v);

}

int main(int argc, const char * argv[])

{

std::vector<std::byte> v(5);

v.reserve(10);

std::cout << v.capacity() << std::endl;

shrink_to_fit(v);

std::cout << v.capacity() << std::endl;

return 0;

}

Below are some of the comments I received:

Just figured I’d comment on your post about how shrink_to_fit sucks. It’s your solution that is significantly inferior in every possible way whereas the standard’s approach is optimal. That may seem blunt but it’s worth understanding why.
First your shrink_to_fit_test is too pessimistic since it assumes that an implementation either always performs the shrink or never performs it when actually the situation is a lot more complex than that. The whole purpose of the standard leaving it up to the implementation is that for small vectors, forcing a shrink_to_fit potentially wastes a lot more memory than not doing anything at all. The reason is two fold, first because the default allocator doesn’t allocate memory less than 16 bytes to begin with. In other words a vector of 10 bytes takes up just as much memory as a vector of 1 byte so forcing a shrink_to_fit in that situation doesn’t save you anything. Furthermore with your implementation it actually ends up potentially causing you to waste 4 kilobytes of RAM because your solution forces a new memory allocation and given how virtual memory works, that allocation can result in having to fetch a new page, so you’ve now wasted 4 kilobytes of physical RAM for nothing.
In situations where something like this seems fishy, it’s best to review the literature on the issue to understand why the committee made the decision they did. Sometimes they do screw up, but that’s not the initial conclusion you should jump to. Take a look at how GCC implements shrink_to_fit, it delegates the operation to the allocator so the allocator can decide whether to carry it out since the allocator knows more about the memory system than vector does. It also implements the operation in a way that can avoid any reallocation period and it does its best to provide the strong exception guarantee.
The standard didn’t make this part of the STL non-binding so that the people who work incredibly hard to implement it could be lazy, the standard made it non-binding so that if there are potential optimizations that save more memory and consume fewer CPU cycles by avoiding it, the implementation would be free to leverage them. This is explicitly noted in the standard as follows:
Note: The request is non-binding to allow latitude for implementation-specific optimizations. —end note
Basically there is no situation where you would want to use your implementation of shrink_to_fit over the one provided by the standard library.

In particular, the January 27 posting “shrink_to_fit that doesn’t suck”
still provides only solutions (to a non-existing problem) that do suck…

Practical Modern C++

Year: 2019

AoS vs SoA performance

What happens when we “new”

enum to string, take 2

enum to string

shrink_to_fit that doesn’t suck