Rounding floating point numbers and constexpr

Good friend asked me a while ago how to round float or double to N digits of precision. Given 0.0123456789 and N = 7 how to get 0.0123457000 as the result. The only way to round numbers (inside a machine) I could think of was to multiply them by 10 ^ N (ten raised to the power of N), cast the result to some type of int effectively stripping the fractional part, followed by a cast back to a floating point representation and division by same 10 ^ N:

V = 0.0123456789

N = 7

P = 10 ^ N = 10'000'000

T = V * P = 123456.789 = 123456

R = T / P ≈ 0.0123457000

Or something to that effect. Translated to C++:

template<typename T>
auto round(T v, unsigned char d)
{
	auto p = std::pow(T(10), T(d));
	return std::round(v * p) / p;
}

template<typename T>

auto round(T v, unsigned char d)

{

auto p = std::pow(T(10), T(d));

return std::round(v * p) / p;

}

Then he added the dreaded do it fast,… and just as I was about to tell him to take a long walk off a short pier I remembered something about constexpr and doing arithmetic at compile time.

Looking at the implementation above I decided to start with computing the power of 10 at compile time. My first approach was a recursive template struct power_of with a partial specialization, the recursion stop, for the power of zero case (and a helper power_of_v variable declaration):

template<std::uint64_t B, unsigned char E, typename T>
struct power_of
{
	static constexpr T value = T(B) * power_of<B, E - 1, T>::value;
};

template<std::uint64_t B, typename T>
struct power_of<B, 0, T>
{
	static constexpr T value = T(1);
};

template<std::uint64_t B, unsigned char E, typename T>
inline constexpr auto power_of_v = power_of<B, E, T>::value;

template<std::uint64_t B, unsigned char E, typename T>

struct power_of

{

static constexpr T value = T(B) * power_of<B, E - 1, T>::value;

};

template<std::uint64_t B, typename T>

struct power_of<B, 0, T>

{

static constexpr T value = T(1);

};

template<std::uint64_t B, unsigned char E, typename T>

inline constexpr auto power_of_v = power_of<B, E, T>::value;

This allowed me to write power_of_v<10, 3, float> to compute at compile time the 3rd power of 10 stored as type float.
I also created a recursive constexpr function since those too can be evaluated at compile time:

template<typename T>
constexpr T power_of_f(std::uint64_t b, unsigned char e)
{
	return e == 0 ? T(1) : T(b) * power_of_f<T>(b, e - 1);
}

template<typename T>

constexpr T power_of_f(std::uint64_t b, unsigned char e)

{

return e == 0 ? T(1) : T(b) * power_of_f<T>(b, e - 1);

}

With it I could write power_of_f<float>(10, 3) and it would be the equivalent of using the recursive template struct above.
I decided to think big and used std::uint64_t as the base and unsigned char as the exponent type of the computation. Hopefully overflow was not going to be an issue…

Next I moved onto rounding the floating point number to an integer type (after it has been multiplied by 10 ^ N) and quickly realized a simple type cast was not going to cut it. std::round does much more than that; it considers how close the number to be rounded is to the integer representation of it, ex.: given 0.49 is it closer to 0 or 1. It then rounds it to the closest whole number. Moreover, it considers the sign and rounds in different direction if the number is positive vs negative.

This meant I needed to determine (at compile time) whether the number is positive or negative:

template<typename T>
constexpr auto my_sign(T v)
{
	return v >= T(0) ? T(1) : T(-1);
}

template<typename T>

constexpr auto my_sign(T v)

{

return v >= T(0) ? T(1) : T(-1);

}

Compute the absolute value of a given number:

template<typename T>
constexpr auto my_abs(T v)
{
	return v >= T(0) ? v : -v;
}

template<typename T>

constexpr auto my_abs(T v)

{

return v >= T(0) ? v : -v;

}

And round the number based on the proximity to its integer representation while considering the sign:

template<typename T>
constexpr auto my_rnd(T v)
{
	constexpr auto h = T(0.5) - std::numeric_limits<T>::epsilon();
	return (std::int64_t)(v + h * my_sign(v));
}

template<typename T>

constexpr auto my_rnd(T v)

{

constexpr auto h = T(0.5) - std::numeric_limits<T>::epsilon();

return (std::int64_t)(v + h * my_sign(v));

}

Putting it all together and adding some sanity checks resulted in this floating point, compile time (as long as all parameters are constexpr), rounding function which produced the same exact results as its runtime equivalent:

template<unsigned char D, typename T>
constexpr auto constexpr_round(T v)
{
	constexpr auto p = power_of_v<10, D, T>;
	if(my_abs(v) > std::numeric_limits<T>::max() / p)
		return v;
	if(my_abs(v) * p > std::numeric_limits<std::int64_t>::max() - 1)
		return v;
	return my_rnd(v * p) / p;
}

template<unsigned char D, typename T>

constexpr auto constexpr_round(T v)

{

constexpr auto p = power_of_v<10, D, T>;

if(my_abs(v) > std::numeric_limits<T>::max() / p)

return v;

if(my_abs(v) * p > std::numeric_limits<std::int64_t>::max() - 1)

return v;

return my_rnd(v * p) / p;

}

The if statements are there to guard against number overflows. It is better to not round at all in such cases than to return garbage values, or at least this is my attitude regarding error handling in this case.

Here is a test program I created while testing my implementation; it allowed me to easily compare compile time results against the reference std::round based approach.

#include <iostream>
#include <iomanip>
#include <array>
#include "round.hpp"

//#define my_round(V, D) runtime_round(V, D)
#define my_round(V, D) constexpr_round<D>(V)

int main()
{
	using namespace std;

	constexpr auto v = 0.0123456789;
	array<decltype(v), 10> a =
	{
		my_round(v, 1),
		my_round(v, 2),
		my_round(v, 3),
		my_round(v, 4),
		my_round(v, 5),
		my_round(v, 6),
		my_round(v, 7),
		my_round(v, 8),
		my_round(v, 9),
		my_round(v, 10)
	};

	cout << fixed << setprecision(10);

	for(int p = 1; auto v : a)
		cout << p++ << ":\t" << v << endl;
}

#include <iostream>

#include <iomanip>

#include <array>

#include "round.hpp"

//#define my_round(V, D) runtime_round(V, D)

#define my_round(V, D) constexpr_round<D>(V)

int main()

{

using namespace std;

constexpr auto v = 0.0123456789;

array<decltype(v), 10> a =

{

my_round(v, 1),

my_round(v, 2),

my_round(v, 3),

my_round(v, 4),

my_round(v, 5),

my_round(v, 6),

my_round(v, 7),

my_round(v, 8),

my_round(v, 9),

my_round(v, 10)

};

cout << fixed << setprecision(10);

for(int p = 1; auto v : a)

cout << p++ << ":\t" << v << endl;

}

1: 0.0000000000
2: 0.0100000000
3: 0.0120000000
4: 0.0123000000
5: 0.0123500000
6: 0.0123460000
7: 0.0123457000
8: 0.0123456800
9: 0.0123456790
10: 0.0123456789
Test program output.

The example program on GitHub: round.cpp. The floating point rounding implementation (sprinkled with comments and decorated with template concepts for good measures): round.hpp.

Given the way float and double are represented on the level of bits it is impossible to round them exactly. You can only do so much and come so close to being exact. Playing with the example program’s value and type of variable v as well as different parameters to setprecision stream modifier will illustrate what I mean…

#pragma once

#include <limits>
#include <type_traits>
#include <stdexcept>
#include <cmath>
#include <cstdint>

// solution 1
// can specify number of digits at run-time as the second parameter
// slowest due to 2 function calls
template<typename T>
requires std::is_floating_point_v<T>
auto runtime_round(T v, unsigned char d)
{
	auto p = std::pow(T(10), T(d));
	if(std::abs(v) > std::numeric_limits<T>::max() / p) // v * p would overflow
		throw std::overflow_error("rounding would overflow");
	return std::round(v * p) / p;
}

// sloution 2
// if used only with other constexpr the result will be evaluated
// entirely at compile time meaning no runtime cost :)

// recursive template to compute B^E at compile time
// result is stored as a static variable 'value' of type T
template<std::uint64_t B, unsigned char E, typename T>
requires std::is_arithmetic_v<T>
struct power_of
{
	static constexpr T value = T(B) * power_of<B, E - 1, T>::value;
};

// terminating template for the recursion one above once E == 0
template<std::uint64_t B, typename T>
requires std::is_arithmetic_v<T>
struct power_of<B, 0, T>
{
	static constexpr T value = T(1);
};

template<std::uint64_t B, unsigned char E, typename T>
inline constexpr auto power_of_v = power_of<B, E, T>::value;

// recursive function template to calculate b^e
// if both parameters are constexpr it will evaluate at compile time
// otherwise it will evaluate at run time
// returns the result as type T
template<typename T>
requires std::is_arithmetic_v<T>
constexpr T power_of_f(std::uint64_t b, unsigned char e)
{
	return e == 0 ? T(1) : T(b) * power_of_f<T>(b, e - 1);
}

// given a value 'v' return +1 if v is >= 0, otherwise return -1
template<typename T>
requires std::is_arithmetic_v<T>
constexpr auto my_sign(T v)
{
	return v >= T(0) ? T(1) : T(-1);
}

// given a value 'v' return it's absolute value
template<typename T>
requires std::is_arithmetic_v<T>
constexpr auto my_abs(T v)
{
	return v >= T(0) ? v : -v;
}

// round float/double/long double value 'v' to the nearest integer
// using compile time type conversions
template<typename T>
requires std::is_floating_point_v<T>
constexpr auto my_rnd(T v)
{
	constexpr auto h = T(0.5) - std::numeric_limits<T>::epsilon();
	return (std::int64_t)(v + h * my_sign(v));
}

// self explanatory :)
// though number of digits must be provided at compile time
// as the first template parameter 'D'
template<unsigned char D, typename T>
requires std::is_floating_point_v<T>
constexpr auto constexpr_round(T v)
{
	/* option 1 */ //constexpr auto p = power_of_f<T>(10, D);
	/* option 2 */ constexpr auto p = power_of_v<10, D, T>;
	if(my_abs(v) > std::numeric_limits<T>::max() / p)
		return v; // v * p would overflow
	if(my_abs(v) * p > std::numeric_limits<std::int64_t>::max() - 1)
		return v; // v * p would not fit in int64_t
	return my_rnd(v * p) / p;
}

#pragma once

#include <limits>

#include <type_traits>

#include <stdexcept>

#include <cmath>

#include <cstdint>

// solution 1

// can specify number of digits at run-time as the second parameter

// slowest due to 2 function calls

template<typename T>

requires std::is_floating_point_v<T>

auto runtime_round(T v, unsigned char d)

{

auto p = std::pow(T(10), T(d));

if(std::abs(v) > std::numeric_limits<T>::max() / p) // v * p would overflow

throw std::overflow_error("rounding would overflow");

return std::round(v * p) / p;

}

// sloution 2

// if used only with other constexpr the result will be evaluated

// entirely at compile time meaning no runtime cost :)

// recursive template to compute B^E at compile time

// result is stored as a static variable 'value' of type T

template<std::uint64_t B, unsigned char E, typename T>

requires std::is_arithmetic_v<T>

struct power_of

{

static constexpr T value = T(B) * power_of<B, E - 1, T>::value;

};

// terminating template for the recursion one above once E == 0

template<std::uint64_t B, typename T>

requires std::is_arithmetic_v<T>

struct power_of<B, 0, T>

{

static constexpr T value = T(1);

};

template<std::uint64_t B, unsigned char E, typename T>

inline constexpr auto power_of_v = power_of<B, E, T>::value;

// recursive function template to calculate b^e

// if both parameters are constexpr it will evaluate at compile time

// otherwise it will evaluate at run time

// returns the result as type T

template<typename T>

requires std::is_arithmetic_v<T>

constexpr T power_of_f(std::uint64_t b, unsigned char e)

{

return e == 0 ? T(1) : T(b) * power_of_f<T>(b, e - 1);

}

// given a value 'v' return +1 if v is >= 0, otherwise return -1

template<typename T>

requires std::is_arithmetic_v<T>

constexpr auto my_sign(T v)

{

return v >= T(0) ? T(1) : T(-1);

}

// given a value 'v' return it's absolute value

template<typename T>

requires std::is_arithmetic_v<T>

constexpr auto my_abs(T v)

{

return v >= T(0) ? v : -v;

}

// round float/double/long double value 'v' to the nearest integer

// using compile time type conversions

template<typename T>

requires std::is_floating_point_v<T>

constexpr auto my_rnd(T v)

{

constexpr auto h = T(0.5) - std::numeric_limits<T>::epsilon();

return (std::int64_t)(v + h * my_sign(v));

}

// self explanatory :)

// though number of digits must be provided at compile time

// as the first template parameter 'D'

template<unsigned char D, typename T>

requires std::is_floating_point_v<T>

constexpr auto constexpr_round(T v)

{

/* option 1 */ //constexpr auto p = power_of_f<T>(10, D);

/* option 2 */ constexpr auto p = power_of_v<10, D, T>;

if(my_abs(v) > std::numeric_limits<T>::max() / p)

return v; // v * p would overflow

if(my_abs(v) * p > std::numeric_limits<std::int64_t>::max() - 1)

return v; // v * p would not fit in int64_t

return my_rnd(v * p) / p;

}

UPDATE

Shout out to Ben from thoughts-on-coding.com for suggesting another runtime solution (code below)!

// solution 1.2
// faster runtime solution, powers of 10 are compiled in, 2 function calls
// approach suggested by Ben from https://thoughts-on-coding.com
template<typename T>
requires std::is_floating_point_v<T>
auto runtime_round2(T v, unsigned d)
{
	static const auto k_p = std::array<T, 20>
	{
		 1e0,  1e1,  1e2,  1e3,  1e4,  1e5,  1e6,  1e7,  1e8,  1e9,
		1e10, 1e11, 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19
	};
	auto p = k_p.at(d);
	if(std::abs(v) > std::numeric_limits<T>::max() / p)
		return v; // v * p would overflow, do nothing...
	return std::round(v * p) / p;
}

// solution 1.2

// faster runtime solution, powers of 10 are compiled in, 2 function calls

// approach suggested by Ben from https://thoughts-on-coding.com

template<typename T>

requires std::is_floating_point_v<T>

auto runtime_round2(T v, unsigned d)

{

static const auto k_p = std::array<T, 20>

{

1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9,

1e10, 1e11, 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19

};

auto p = k_p.at(d);

if(std::abs(v) > std::numeric_limits<T>::max() / p)

return v; // v * p would overflow, do nothing...

return std::round(v * p) / p;

}

3 Replies to “Rounding floating point numbers and constexpr”

I also like a template free solution which seems comparable in performance during runtime because its code is less verbose but clearly also less flexible as the template version.

double Round(double x, int n)
{
    constexpr double v[] = {1, 10, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10, 1e11, 1e12};
    return floor(x * v[n] + 0.5) / v[n];
}

double Round(double x, int n)

constexpr double v[] = {1, 10, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10, 1e11, 1e12};

return floor(x * v[n] + 0.5) / v[n];

https://quick-bench.com/q/vXH9wAEHbmi8_hIxbnJFKyaqPSw

Martin Vorbrodt says:

April 13, 2021 at 1:14 pm

Thanks! I reworked it a bit per our conversation last week and updated the article.

Loading...

Jihadist says:

May 11, 2023 at 1:42 pm

Hello! Thanks for your solution. C++20 looks like an overhead here.
constexpr round_up can be implemented another way. Of course, you need constepxr pow and ceil/round. We can use gcem library https://github.com/kthohr/gcem. And it could be implemented very easy, or you can use solution from the first SO answer https://stackoverflow.com/questions/25925290/c-round-a-double-up-to-2-decimal-places.
P.S. If we use gcc, we do not use any third parties, gcc contains constexpr cmath. I provide example https://godbolt.org/z/haosEYe9b

Loading...

Practical Modern C++

Rounding floating point numbers and constexpr

Like this:

Related

3 Replies to “Rounding floating point numbers and constexpr”

Leave a ReplyCancel reply

Practical Modern C++

Rounding floating point numbers and constexpr

Share this:

Like this:

Related

3 Replies to “Rounding floating point numbers and constexpr”

Leave a ReplyCancel reply