Thursday, January 28, 2010

FPU Precision, DirectX, _controlfp_s

So debugging a mysterious issue today, I found that DirectX would unilaterally lower the FPU precision of the calling thread to single precision(in order to boost the application performance). So Microsoft has decided that none of the DirectX application should rely on double precision! There is a D3DCREATE_FPU_PRESERVE flag to avoid this but it is the poor programmer's responsibility to know that. The problem I have with such an assumption is that this behavior may break code(as in my case) in some totally unrelated module without any obvious clue.

Here is something funnier. Now, that I knew why my code did not behave correctly(needed higher FPU precision), the next thing was to enable higher precision for the code that needed it. So nice of Microsoft to provide _controlfp_s to do this. But seriously, who has the time to read through the entire documentation at work? After all, I did not need to deal with this at the first place. Anyways, here is the function signature.

errno_t _controlfp_s(
unsigned int *currentControl,
unsigned int newControl,
unsigned int mask
);

So I expected to call the function with desired FPU precision(newControl) and it would return the current FPU precision(currentControl) that I can save in order to restore it after the calculation. So something like following.

unsigned int savedPrecision;
_controlfp_s(&savedPrecision, _PC_64, MCW_PC);//_PC_64 for extended precision
//Do calculation
_controlfp_s(NULL, savedPrecision, MCW_PC);

But that does not work correctly!! Saved precision is never restored, it remains higher precision(_PC_64). Stumped!!

Read the documentation again and you would find that if you want to "actually get the current control word", you have to pass in 0 for the mask. Otherwise, the "currentControl" is the value of the control word "after" setting the desired precision. A little weird to me. So instead of two steps, it is 3 step process.
unsigned int savedPrecision;
_controlfp_s(&savedPrecision, 0, 0);
_controlfp_s(NULL, _PC_64, MCW_PC);
//Do calculation
_controlfp_s(NULL, savedPrecision, MCW_PC);