C++17 standard introduced execution policies to the standard algorithms; those allow for parallel and SIMD optimizations. I wanted to see how much faster the Parallel STL can be on my quad core system but non of my compilers currently support it. Luckily Intel has implemented it and made it available to the world 🙂
On a side note, in this post’s example I will be using several frameworks: TBB needed to compile the Parallel STL. And Catch2 to create the test benchmark. All are freely available on GitHub. BTW, thanks to Benjamin from Thoughts on Coding for pointing me toward the Catch2 library. It’s great for creating unit tests and benchmarks.
Let’s benchmark the following operations using STL and PSTL: generating random numbers, sorting the generated random numbers, finally verifying if they’re sorted. The performance increase on my quad core 2012 MacBook Pro with i7 2.3GHz is about 5x! Nice!
benchmark name iters elapsed ns average
Program output.
———————————————————–
STL 1 10623612832 10.6236 s
PSTL 1 1967239761 1.96724 s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
#define CATCH_CONFIG_MAIN #include <catch2/catch.hpp> #include <vector> #include <random> #include <algorithm> #include <pstl/execution> #include <pstl/algorithm> using namespace std; using namespace pstl; const unsigned long long COUNT = 100'000'000; TEST_CASE("STL vs PSTL", "[benchmark]") { auto seed = random_device{}(); vector<int> data(COUNT); BENCHMARK("STL") { generate(data.begin(), data.end(), mt19937{seed}); sort(data.begin(), data.end()); is_sorted(data.begin(), data.end()); } BENCHMARK("PSTL") { generate(pstl::execution::par_unseq, data.begin(), data.end(), mt19937{seed}); sort(pstl::execution::par_unseq, data.begin(), data.end()); is_sorted(pstl::execution::par_unseq, data.begin(), data.end()); } } |