Floating Point Determinism


Hi, I’m Glenn Fiedler and welcome to Networking for Game Programmers.

Lately I’ve been doing some research into networking game physics simulations via deterministic lockstep methods.

The basic idea is that instead of synchronizing the state of physics objects directly by sending the positions, orientations, velocities etc. over the network, one could synchronize the simulation implicitly by sending just the player inputs.

This is a very attractive synchronization strategy because the amount of network traffic depends on the size of the player inputs instead of the amount of physics state in the world. In fact, this strategy has been used for many years in RTS games for precisely this reason; with thousands and thousands of units on the map, they simply have too much state to send over the network.

Perhaps you have a complex physics simulation with lots of rigid body state, or a cloth or soft body simulation which needs to stay perfectly in sync across two machines because it is gameplay affecting, but you cannot afford to send all the state. It is clear that the only possible solution in this situation is to attempt a deterministic networking strategy.

But we run into a problem. Physics simulations use floating point calculations, and for one reason or another it is considered very difficult to get exactly the same result from floating point calculations on two different machines. People even report different results on the same machine from run to run, and between debug and release builds. Other folks say that AMDs give different results to Intel machines, and that SSE results are different from x87. What exactly is going on? Are floating point calculations deterministic or not?

Unfortunately, the answer is not a simple “yes” or “no” but “yes, if…”

Here is what I have discovered so far:

  • If your physics simulation is itself deterministic, with a bit of work you should be able to get it to play back a replay of recorded inputs on the same machine and get the exact same result.

  • It is possible to get deterministic results for floating calculations across multiple computers provided you use an executable built with the same compiler, run on machines with the same architecture, and perform some platform-specific tricks.

  • It is incredibly naive to write arbitrary floating point code in C or C++ and expect it to give exactly the same result across different compilers or architectures, or even the same results across debug and release builds.

  • However with a good deal of work you may be able to coax exactly the same floating point results out of different compilers or different machine architectures by using your compilers “strict” IEEE 754 compliant mode and restricting the set of floating point operations you use. This typically results in significantly lower floating point performance.

If you would like to debate these points or add your own nuance, please contact me! I consider this question by no means settled and am very interested in other peoples experiences with deterministic floating point simulations and exactly reproducible floating point calculations. Please contact me especially if you have managed to get binary exact results across different architectures and compilers in real world situations.

Here are the resources I have discovered in my search so far…

As long as you stick to a single compiler, and a single CPU instruction set, it is possible to make floating point fully deterministic. The specifics vary by platform (i e, different between x86, x64 and PPC).

You have to make sure that the internal precision is set to 64 bits (not 80, because only Intel implements that), and that the rounding mode is consistent. Furthermore, you have to check this after calls to external DLLs, because many DLLs (Direct3D, printer drivers, sound libraries, etc) will change the precision or rounding mode without setting it back.

The ISA is IEEE compliant. If your x87 implementation isn’t IEEE, it’s not x87.

Also, you can’t use SSE or SSE2 for floating point, because it’s too under-specified to be deterministic.

Jon Watte, GameDev.net forums http://www.gamedev.net/community/forums/topic.asp?topic_id=499435

At app startup time we call:

_controlfp(_PC_24, _MCW_PC)
_controlfp(_RC_NEAR, _MCW_RC)

Also, every tick we assert that these fpu settings are still set:

gpAssert( (_controlfp(0, 0) & _MCW_PC) == _PC_24 );
gpAssert( (_controlfp(0, 0) & _MCW_RC) == _RC_NEAR );

There are some MS API functions that can change the fpu model on you so you need to manually enforce the fpu mode after those calls to ensure the fpu stays the same across machines. The assert is there to catch if anyone has buggered the fpu mode.

FYI We have the compiler floating point model set to Fast /fp:fast ( but its not a requirement )

We have never had a problem with the IEEE standard across any PC cpu AMD and Intel with this approach. None of our SupCom or Demigod customers have had problems with their machines either, and we are talking over 1 million customers here (supcom1 + expansion pack). We would have heard if there was a problem with the fpu not having the same results as replays or multiplayer mode wouldn’t work at all.

We did however have problems when using some physics APIs because their code did not have determinism or reproducibility in mind. For example some physics APIS have solvers that take X number of iterations when solving where X can be lower with faster CPUs.

Elijah, Gas Powered Games http://www.box2d.org/forum/viewtopic.php?f=3&t=1800

This is madness! Why don’t we make all hardware work the same? Well, we could, if we didn’t care about performance. We could say “hey Mr. Hardware Guy, forget about your crazy fused multiply-add instructions and just give us a basic IEEE implementation”, and “hey Compiler Dude, please don’t bother trying to optimize our code”. That way our programs would run consistently slowly everywhere :-)

Shawn Hargreaves, MSDN Blog http://blogs.msdn.com/shawnhar/archive/2009/03/25/is-floating-point-math-deterministic.aspx

Ken Miller, Pandemic Studios http://www.box2d.org/forum/viewtopic.php?f=4&t=175

Branimir Karadžić, Pandemic Studios http://www.google.com/buzz/100111796601236342885/8hDZ655S6x3/Floating-Point-Determinism-Gaffer-on-Games

Algebraic compiler optimizations “Complex” instructions like multiply-accumulate or sine x86-specific pain not available on any other platform; not that ~100% of non-embedded devices is a small market share for a pain.

The good news is that most pain comes from item 3 which can be more or less solved automatically. For the purpose of decision making (”should we invest energy into FP consistency or is it futile?”), I’d say that it’s not futile and if you can cite actual benefits you’d get from consistency, then it’s worth the (continuous) effort.

Summary: use SSE2 or SSE, and if you can’t, configure the FP CSR to use 64b intermediates and avoid 32b floats. Even the latter solution works passably in practice, as long as everybody is aware of it.

Yossi Kreinin, Personal Blog http://www.yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html

The long answer to these questions and more can be found in what is probably the best reference on floating point, David Goldberg’s What Every Computer Scientist Should Know About Floating Point Arithmetic. Skip to the section on the IEEE standard for the key details.

Finally, if you are doing the same sequence of floating point calculations on the same initial inputs, then things should be replayable exactly just fine. The exact sequence can change depending on your compiler/os/standard library, so you might get some small errors this way.

Where you usually run into problems in floating point is if you have a numerically unstable method and you start with FP inputs that are approximately the same but not quite. If your method’s stable, you should be able to guarantee reproducibility within some tolerance. If you want more detail than this, then take a look at Goldberg’s FP article linked above or pick up an intro text on numerical analysis.

Todd Gamblin, Stack Overflow http://stackoverflow.com/questions/968435/what-could-cause-a-deterministic-process-to-generate-floating-point-errors

Günter Obiltschnig, Cross-Platform Issues with Floating-Point arithmetics in C++ http://www.appinf.com/download/FPIssues.pdf

STREFLOP Library http://nicolas.brodu.numerimoire.net/en/programmation/streflop/index.html

• Accuracy - Produce results that are “close” to the correct value

• Reproducibility - Produce consistent results from one run to the next. From one set of build options to another. From one compiler to another. From one platform to another.

• Performance – Produce the most efficient code possible.

These options usually conflict! Judicious use of compiler options lets you control the tradeoffs.

Intel C++ Compiler: Floating Point Consistency http://www.nccs.nasa.gov/images/FloatingPoint%5Fconsistency.pdf.

Intel C++ Compiler Manual http://cache-www.intel.com/cd/00/00/34/76/347605_347605.pdf

Microsoft Visual C++ Floating-Point Optimization http://msdn.microsoft.com/en-us/library/aa289157(VS.71).aspx#floapoint_topic4

Apple Developer Support http://developer.apple.com/hardwaredrivers/ve/sse.html

Intel Software Network Support http://software.intel.com/en-us/forums/showthread.php?t=48339

D. Monniaux on IEEE 754 mailing list http://grouper.ieee.org/groups/754/email/msg03864.html

David Hough on 754 IEEE mailing list http://grouper.ieee.org/groups/754/email/msg03867.html

Nick Maclaren on 754 IEEE mailing list http://grouper.ieee.org/groups/754/email/msg03872.html

Nick Maclaren on 754 IEEE mailing list http://grouper.ieee.org/groups/754/email/msg03862.html

Wikipedia Page on IEEE 754-2008 standard http://en.wikipedia.org/wiki/IEEE_754-2008#Reproducibility

A simpler solution for current personal computers is simply to force the compiler to use the SSE unit for computations on IEEE-754 types; however, most embedded systems using IA32 microprocessors or microcontrollers do not use processors equipped with this unit.

David Monniaux, The pitfalls of verifying floating-point computations http://hal.archives-ouvertes.fr/docs/00/28/14/29/PDF/floating-point-article.pdf

Even under the 1985 version of IEEE-754, if two implementations of the standard executed an operation on the same data, under the same rounding mode and default exception handling, the result of the operation would be identical. The new standard tries to go further to describe when a program will produce identical floating point results on different implementations. The operations described in the standard are all reproducible operations.

The recommended operations, such as library functions or reduction operators are not reproducible, because they are not required in all implementations. Likewise dependence on the underflow and inexact flags is not reproducible because two different methods of treating underflow are allowed to preserve conformance between IEEE-754(1985) and IEEE-754(2008). The rounding modes are reproducible attributes. Optional attributes are not reproducible.

The use of value-changing optimizations is to be avoided for reproducibility. This includes use of the associative and disributative laws, and automatic generation of fused multiply-add operations when the programmer did not explicitly use that operator.

Peter Markstein, The New IEEE Standard for Floating Point Arithmetic http://drops.dagstuhl.de/opus/volltexte/2008/1448/pdf/08021.MarksteinPeter.ExtAbstract.1448.pdf

Many programmers may not realize that even a program that uses only the numeric formats and operations prescribed by the IEEE standard can compute different results on different systems. In fact, the authors of the standard intended to allow different implementations to obtain different results. Their intent is evident in the definition of the term destination in the IEEE 754 standard: “A destination may be either explicitly designated by the user or implicitly supplied by the system (for example, intermediate results in subexpressions or arguments for procedures). Some languages place the results of intermediate calculations in destinations beyond the user’s control. Nonetheless, this standard defines the result of an operation in terms of that destination’s format and the operands’ values.” (IEEE 754-1985, p. 7) In other words, the IEEE standard requires that each result be rounded correctly to the precision of the destination into which it will be placed, but the standard does not require that the precision of that destination be determined by a user’s program. Thus, different systems may deliver their results to destinations with different precisions, causing the same program to produce different results (sometimes dramatically so), even though those systems all conform to the standard.

Differences Among IEEE 754 Implementations http://docs.sun.com/source/806-3568/ncg_goldberg.html#3098

Glenn Fiedler is the founder and CEO of Network Next.Network Next is fixing the internet for games by creating a marketplace for premium network transit.