is assigning two doubles guaranteed to yield the same bitset patterns?

Question

There are several posts here about floating point numbers and their nature. It is clear that comparing floats and doubles must always be done cautiously. Asking for equality has also been discussed and the recommendation is clearly to stay away from it.

But what if there is a direct assignement:

double a = 5.4;
double b = a;

assumg a is any non-NaN value - can a == b ever be false?

It seems that the answer is obviously no, yet I can't find any standard defining this behaviour in a C++ environment. IEEE-754 states that two floating point numbers with equal (non-NaN) bitset patterns are equal. Does it now mean that I can continue comparing my doubles this way without having to worry about maintainability? Do I have to worried about other compilers / operating systems and their implementation regarding these lines? Or maybe a compiler that optimizes some bits away and ruins their equality?

I wrote a little program that generates and compares non-NaN random doubles forever - until it finds a case where a == b yields false. Can I compile/run this code anywhere and anytime in the future without having to expect a halt? (ignoring endianness and assuming sign, exponent and mantissa bit sizes / positions stay the same).

#include <iostream>
#include <random>

struct double_content {
    std::uint64_t mantissa : 52;
    std::uint64_t exponent : 11;
    std::uint64_t sign : 1;
};
static_assert(sizeof(double) == sizeof(double_content), "must be equal");


void set_double(double& n, std::uint64_t sign, std::uint64_t exponent, std::uint64_t mantissa) {
    double_content convert;
    memcpy(&convert, &n, sizeof(double));
    convert.sign = sign;
    convert.exponent = exponent;
    convert.mantissa = mantissa;
    memcpy(&n, &convert, sizeof(double_content));
}

void print_double(double& n) {
    double_content convert;
    memcpy(&convert, &n, sizeof(double));
    std::cout << "sign: " << convert.sign << ", exponent: " << convert.exponent << ", mantissa: " << convert.mantissa << " --- " << n << '\n';
}

int main() {
    std::random_device rd;
    std::mt19937_64 engine(rd());
    std::uniform_int_distribution<std::uint64_t> mantissa_distribution(0ull, (1ull << 52) - 1);
    std::uniform_int_distribution<std::uint64_t> exponent_distribution(0ull, (1ull << 11) - 1);
    std::uniform_int_distribution<std::uint64_t> sign_distribution(0ull, 1ull);

    double a = 0.0;
    double b = 0.0;

    bool found = false;

    while (!found){
        auto sign = sign_distribution(engine);
        auto exponent = exponent_distribution(engine);
        auto mantissa = mantissa_distribution(engine);

        //re-assign exponent for NaN cases
        if (mantissa) {
            while (exponent == (1ull << 11) - 1) {
                exponent = exponent_distribution(engine);
            }
        }
        //force -0.0 to be 0.0
        if (mantissa == 0u && exponent == 0u) {
            sign = 0u;
        }


        set_double(a, sign, exponent, mantissa);
        b = a;

        //here could be more (unmodifying) code to delay the next comparison

        if (b != a) { //not equal!
            print_double(a);
            print_double(b);
            found = true;
        }
    }
}

using Visual Studio Community 2017 Version 15.9.5

In your example with a literal probably equal. However, I once fixed a bug in code using the x87 FPU in 32-bit mode where the code compared an 80-bit precision in-register version of the "same number" to its 64-bit double representation. The comparison failed of course. — Zan Lynx, Mar 27 '19 at 14:34
Also, the general rule in floating point is that more precision is always better. If you want to compare floats always round both of them to whatever their useful precision is. Otherwise on future hardware with 256-bit floats or whatever, comparing one to a 64-bit float would be obviously wrong. — Zan Lynx, Mar 27 '19 at 14:38
Beware that it's undefined behavior to read from any member of a union unless it was the last one written to. Since you last write to `convert.bits` the `expression n = convert.d;` is undefined behavior. Though I seem to remember that VC++ allows it as an extension. — François Andrieux, Mar 27 '19 at 14:40
Following François Andrieux comment, if you use `memcpy` you can copy the bit representation into a `content` in a portable way. — NathanOliver, Mar 27 '19 at 14:44
@Stack Danny No - that is still undefined behavoir as it breaks the strict aliasing rules. `memcpy` is the only real portable way to do this. (And all modern compilers should optimise it away) — Mike Vine, Mar 27 '19 at 14:52
You'll get away with it as long as you use VS2017 and you don't expose the variables beyond module bounds, it no longer generates FPU code so you have little to fear from the optimizer. That's not necessarily the case for older versions or different compilers. — Hans Passant, Mar 27 '19 at 15:03

Max Langhof · Accepted Answer · 2019-08-05T09:57:58.380

The C++ standard clearly specifies in [basic.types]#3:

For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a potentially-overlapping subobject, if the underlying bytes ([intro.memory]) making up obj1 are copied into obj2, obj2 shall subsequently hold the same value as obj1.

It gives this example:

T* t1p;
T* t2p;
// provided that t2p points to an initialized object ...
std::memcpy(t1p, t2p, sizeof(T));
// at this point, every subobject of trivially copyable type in *t1p contains
// the same value as the corresponding subobject in *t2p

The remaining question is what a value is. We find in [basic.fundamental]#12 (emphasis mine):

There are three floating-point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined.

Since the C++ standard has no further requirements on how floating point values are represented, this is all you will find as guarantee from the standard, as assignment is only required to preserve values ([expr.ass]#2):

In simple assignment (=), the object referred to by the left operand is modified by replacing its value with the result of the right operand.

As you correctly observed, IEEE-754 requires that non-NaN, non-zero floats compare equal if and only if they have the same bit pattern. So if your compiler uses IEEE-754-compliant floats, you should find that assignment of non-NaN, non-zero floating point numbers preserves bit patterns.

And indeed, your code

double a = 5.4;
double b = a;

should never allow (a == b) to return false. But as soon as you replace 5.4 with a more complicated expression, most of this nicety vanishes. It's not the exact subject of the article, but https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ mentions several possible ways in which innocent looking code can yield different results (which breaks "identical to the bit pattern" assertions). In particular, you might be comparing an 80 bit intermediate result with a 64 bit rounded result, possibly yielding inequality.

"IEEE-754 requires that non-NaN floats compare equal if and only if they have the same bit pattern." - I don't think this is true; positive zero and negative zero are allowed to compare equal (I believe the IEEE754 standard recommends that languages' equality operator should report `true` for such a comparison). — M.M, Aug 05 '19 at 09:41
@M.M Yeah you're right. This is one of the exceptions made by the standard. See chapter 5.11 in e.g. http://irem.univ-reunion.fr/IMG/pdf/ieee-754-2008.pdf. — Max Langhof, Aug 05 '19 at 09:56

score 3 · Answer 2 · answered Mar 27 '19 at 19:04

There are some complications here. First, note that the title asks a different question than the question. The title asks:

is assigning two doubles guaranteed to yield the same bitset patterns?

while the question asks:

can a == b ever be false?

The first of these asks whether different bits might occur from an assignment (which could be due to either the assignment not recording the same value as its right operand or due to the assignment using a different bit pattern that represents the same value), while the second asks whether, whatever bits are written by an assignment, the stored value must compare equal to the operand.

In full generality, the answer to the first question is no. Using IEEE-754 binary floating-point formats, there is a one-to-one map between non-zero numeric values and their encodings in bit patterns. However, this admits several cases where an assignment could produce a different bit pattern:

The right operand is the IEEE-754 −0 entity, but +0 is stored. This is not a proper IEEE-754 operation, but C++ is not required to conform to IEEE 754. Both −0 and +0 represent mathematical zero and would satisfy C++ requirements for assignment, so a C++ implementation could do this.
IEEE-754 decimal formats have one-to-many maps between numeric values and their encodings. By way of illustration, three hundred could be represented with bits whose direct meaning is 3•10² or bits whose direct meaning is 300•10⁰. Again, since these represent the same mathematical value, it would be permissible under the C++ standard to store one in the left operand of an assignment when the right operand is the other.
IEEE-754 includes many non-numeric entities called NaNs (for Not a Number), and a C++ implementation might store a NaN different from the right operand. This could include either replacing any NaN with a “canonical” NaN for the implementation or, upon assignment of a signaling Nan, indicating the signal in some way and then converting the signaling NaN to a quiet NaN and storing that.
Non-IEEE-754 formats may have similar issues.

Regarding the latter question, can a == b be false after a = b, where both a and b have type double, the answer is no. The C++ standard does require that an assignment replace the value of the left operand with the value of the right operand. So, after a = b, a must have the value of b, and therefore they are equal.

Note that the C++ standard does not impose any restrictions on the accuracy of floating-point operations (although I see this only stated in non-normative notes). So, theoretically, one might interpret assignment or comparison of floating-point values to be floating-point operations and say that they do not need to be accuracy, so the assignment could change the value or the comparison could return an inaccurate result. I do not believe this is a reasonable interpretation of the standard; the lack of restrictions on floating-point accuracy is intended to allow latitude in expression evaluation and library routines, not simple assignment or comparison.

One should note the above applies specifically to a double object that is assigned from a simple double operand. This should not lull readers into complacency. Several similar but different situations can result in failure of what might seem intuitive mathematically, such as:

After float x = 3.4;, the expression x == 3.4 will generally evaluate as false, since 3.4 is a double and has to be converted to a float for the assignment. That conversion reduces precision and alters the value.
After double x = 3.4 + 1.2;, the expression x == 3.4 + 1.2 is permitted by the C++ standard to evaluate to false. This is because the standard permits floating-point expressions to be evaluated with more precision than the nominal type requires. Thus, 3.4 + 1.2 might be evaluated with the precision of long double. When the result is assigned to x, the standard requires that the excess precision be “discarded,” so the value is converted to a double. As with the float example above, this conversion may change the value. Then the comparison x == 3.4 + 1.2 may compare a double value in x to what is essentially a long double value produced by 3.4 + 1.2.

I thought that the two questions are exchangeable as IEEE-754 says that equal == same bit pattern. Thanks for the answer, so as long as it stays at a simple assignment I should be fine to compare them using `==`. — Stack Danny, Mar 28 '19 at 07:30
I'm pretty certain that if the architecture has FPU registers with higher precision and the compiler selects one for a variable, it can assign 80 bits or more from a literal. If a second variable is in 64-bit RAM, it will NOT be equal even if both use the same literal in code. — Zan Lynx, Mar 28 '19 at 15:53

is assigning two doubles guaranteed to yield the same bitset patterns?

2 Answers2