The following program, despite looking correct, is a race condition and has unpredictable behaviour:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
#include <iostream> #include <mutex> #include <thread> using namespace std; const int COUNT = 10'000'000; mutex aLock; mutex bLock; struct S { unsigned int a:9; unsigned int b:7; } __attribute__((packed)); int main(int argc, char** argv) { S s{}; thread t1([&]() { scoped_lock lock(aLock); for(int i = 0; i < COUNT; ++i) { s.a = 0; s.a = 0b111111111; } }); thread t2([&]() { scoped_lock lock(bLock); for(int i = 0; i < COUNT; ++i) { s.b = 0; s.b = 0b1111111; } }); t1.join(); t2.join(); cout << "sizeof(S) = " << sizeof(S) << ", " << s.a << ", " << s.b << endl; return 1; } |
The proof is in the output of running this code several times:
sizeof(S) = 2, 511, 127
sizeof(S) = 2, 511, 127
sizeof(S) = 2, 0, 127
sizeof(S) = 2, 511, 0
sizeof(S) = 2, 511, 127
sizeof(S) = 2, 511, 127
sizeof(S) = 2, 511, 127
sizeof(S) = 2, 0, 127
sizeof(S) = 2, 0, 127
sizeof(S) = 2, 511, 127
The race condition and the problem here is that the C++ standard states that adjacent bit fields are a single object in memory and may be packed to share the same byte. The CPU cannot operate on single bits, it must load at least 1 byte at a time, and because of the overlap, loading bits
The fix is to use one mutex to protect all adjacent bit fields. I say all because you have no guarantee that the CPU will be able to load 1 byte at a time. It may be limited to working on 32-bit values at a time; depending on the architecture.
Corrected program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
#include <iostream> #include <mutex> #include <thread> using namespace std; const int COUNT = 10'000'000; mutex abLock; struct S { unsigned int a:9; unsigned int b:7; } __attribute__((packed)); int main(int argc, char** argv) { S s{}; thread t1([&]() { scoped_lock lock(abLock); for(int i = 0; i < COUNT; ++i) { s.a = 0; s.a = 0b111111111; } }); thread t2([&]() { scoped_lock lock(abLock); for(int i = 0; i < COUNT; ++i) { s.b = 0; s.b = 0b1111111; } }); t1.join(); t2.join(); cout << "sizeof(S) = " << sizeof(S) << ", " << s.a << ", " << s.b << endl; return 1; } |
good point
Nice