FMA problems (Ryzen and others?)
log in

Advanced search

Message boards : Number crunching : FMA problems (Ryzen and others?)

1 · 2 · Next
Author Message
NEO83
Send message
Joined: 22 Oct 16
Posts: 5
Credit: 856,495
RAC: 4
Germany
Message 996 - Posted: 20 Mar 2017, 16:05:32 UTC
Last modified: 20 Mar 2017, 16:08:16 UTC

I got the first few FMA Apps on my Ryzen ... not that good results

http://gene.disi.unitn.it/test/results.php?userid=426&offset=0&show_names=0&state=6&appid=

all i got are broken after a few minutes ...

dont know if it is a problem with my CPU or its a general Problem, AVX works well so far, so pls check this problem or i have to stop crunching because too much broken WUs

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 395
Italy
Message 997 - Posted: 20 Mar 2017, 16:46:55 UTC
Last modified: 21 Mar 2017, 14:31:03 UTC

I marked the Windows X64 FMA application version as 'beta', so only users that agreed to 'Run test applications' in their profile should now get it. I know that Ryzen had some problems with FMA3 (although, afaik, we don't use FMA3 op-codes here, Daniel may say something about), see here: http://forum.hwbot.org/showthread.php?t=167605

So, please, if you want to test the Windows X64 FMA application, especially if you have generic AMD or AMD Ryzen hardware, please check 'Run test applications' in your profile, run a few workunits and give feedback here.

NEO83
Send message
Joined: 22 Oct 16
Posts: 5
Credit: 856,495
RAC: 4
Germany
Message 998 - Posted: 20 Mar 2017, 17:02:14 UTC

I changed it for my Bristol Ridge AMD, but have to wait for some FMA WUs because it have has much AVX in work for now

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 1000 - Posted: 20 Mar 2017, 22:49:45 UTC
Last modified: 20 Mar 2017, 23:17:54 UTC

FMA app uses FMA3 instructions, they are supported by both AMD and Intel CPUs. FMA4 is supported by AMD only, so FMA app does not use them.

Error "Reason: Privileged Instruction" is interesting. This error is reported when user-space app tries to execute some kernel-space instruction. Maybe Ryzen incorrectly thinks that some FMA3 instruction is a kernel-space one and raises this error? I suspect that this is another FMA-related bug in Ryzen, so microcode update would be needed here.

I also read about that conflict between few antiviruses or similar software may also cause this. Do you use few such programs?

Edit: please check if there is BIOS update for your motherboard. If yes, please install it, especially if release notes for it says that it provide microcode update.
____________

NEO83
Send message
Joined: 22 Oct 16
Posts: 5
Credit: 856,495
RAC: 4
Germany
Message 1001 - Posted: 21 Mar 2017, 4:54:57 UTC

I am waiting for a BIOS Update but the latest one is installed and i dont use any antivirus software.

@ this moment i watched for the FMA apps on my BR, no problems there so it the Ryzen or only my System but i dont know ... i will wait for the next BIOS Update with a microcode update in it an will give Feedback here

koschi
Send message
Joined: 22 Oct 16
Posts: 25
Credit: 17,930,382
RAC: 39
Germany
Message 1005 - Posted: 21 Mar 2017, 14:09:09 UTC

On Linux (4.10) the FMA app works fine on my R7.

http://gene.disi.unitn.it/test/results.php?hostid=2506&offset=140&show_names=1&state=4&appid=

Seems a tiny bit (30s) slower than the AVX app though.

Krümel
Send message
Joined: 31 Oct 16
Posts: 19
Credit: 14,052,147
RAC: 44
Germany
Message 1008 - Posted: 22 Mar 2017, 16:10:19 UTC
Last modified: 22 Mar 2017, 16:13:03 UTC

Same here with Ryzen 7 1700 and Win 10.
FMA WUs won´t work.

http://gene.disi.unitn.it/test/result.php?resultid=7952994

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 1009 - Posted: 22 Mar 2017, 19:42:25 UTC

I tried to disassemble compiled binary and things got interesting. All crash reports here mentions following error:

Privileged Instruction (0xc0000096) at address 0x00000000004f5458


When I checked instruction at this address, I got "STI" which is a sensitive instruction according to https://support.microsoft.com/en-nz/help/114473/intel-privileged-and-sensitive-instructions. But when I tried to disassemble whole app, it turned out that address 0x00000000004f5458 is invalid - valid instruction starts one byte earlier, at 0x00000000004f5457. Instruction at this address is "vmovsd" - it is an AVX instruction. This instruction maps to line 160 in pc.cpp. It looks like Ryzen decided to jump to some invalid address and executed some random instruction there which turned out to be an STI instruction.

Valterc, do you know if Windows 64-bit FMA app works fine on other CPUs? I wonder if this problem affects Ryzen CPUs only, or all users with FMA-capable CPUs and 64-bit Windows.
____________

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 395
Italy
Message 1010 - Posted: 22 Mar 2017, 20:24:29 UTC - in response to Message 1009.
Last modified: 22 Mar 2017, 20:26:30 UTC

I can give more information tomorrow but if you check my Windows hosts you may see one of them (I7 4770k) crunching happily with the fma app.

koschi
Send message
Joined: 22 Oct 16
Posts: 25
Credit: 17,930,382
RAC: 39
Germany
Message 1011 - Posted: 22 Mar 2017, 20:58:09 UTC
Last modified: 22 Mar 2017, 20:58:26 UTC

The previous Windows FMA app (still with large output file), was that provided by Daniel, created with MingW/GCC?

omega01
Send message
Joined: 26 Feb 17
Posts: 1
Credit: 398,347
RAC: 0
Hungary
Message 1012 - Posted: 23 Mar 2017, 1:01:01 UTC

I test the fma app on my System with
AMD FX(tm)-6300 Six-Core Processor [Family 21 Model 2 Stepping 0]
Microsoft Windows 8.1
Professional x64 Edition, (06.03.9600.00)
no Problems here. all wus where finished. time is nearly the same as the avx app.

koschi
Send message
Joined: 22 Oct 16
Posts: 25
Credit: 17,930,382
RAC: 39
Germany
Message 1013 - Posted: 23 Mar 2017, 9:00:32 UTC
Last modified: 23 Mar 2017, 9:09:18 UTC

I suspect the MS compilers can't compile error free FMA binaries for Ryzen, something like that.

That FMA3 problem found in Ryzen, can be triggered by running a program called "flops" under Windows (compiled with MS compiler). It does not trigger under Linux (using GCC). I understood from valterc writing in the "Optimizing" thread that you are usually using MSVC to build the binary.
Parts of the optimization was then achieved by Daniel compiling the old app with Mingw/GCC (of course plus further mods down the line).

My question is, are you still using MSVC to build the new app?

Is anyone with a Windows Ryzen System able to run the old FMA app (big output file) with an old WU? I packaged that again (66MB!!!), it's all included in the below file:

http://kerbodyne.com/boinc/tngrid_gene_vv.zip

You should be able to run the test like this, analogue to how I would do it under Linux:
pc.exe 1488394767724_wu-85_tile.txt out_file 0.05 1 701


If this still works, it would hint at the MS compiler. Maybe Daniel can then just recompile the current code with mingw/GCC and Windows users will be able to run TN-Grid FMA apps on Ryzen again.

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 1014 - Posted: 23 Mar 2017, 10:46:34 UTC
Last modified: 23 Mar 2017, 10:56:17 UTC

Both Windows and Linux apps are compiled using gcc. Linux app was compiled using gcc 4.8.5. Windows one was compiled with gcc 5.4.0, so its code probably is better optimized than Linux one. There are also some system-specific changes, they also may play role here.

New app has new code to decompress input file, and to filter out some output results. Code which performs actual calculations was not changed. So previous version most probably would crash on Ryzen too.

Current Windows apps were compiled by me too, Valterc asked me to to this. I have downloaded FMA app version from TN-Grid server yesterday and verified that it is the same as one which I sent to him.
____________

koschi
Send message
Joined: 22 Oct 16
Posts: 25
Credit: 17,930,382
RAC: 39
Germany
Message 1015 - Posted: 23 Mar 2017, 12:29:14 UTC
Last modified: 23 Mar 2017, 12:29:37 UTC

Thanks for the info ;-)
Loved to have blame it on Microsoft :-D

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 1016 - Posted: 28 Mar 2017, 21:41:48 UTC

Any update on this? People on hwbot forum says that ASUS released new BIOS which resolved problem for them. Did you have change to test it, or one for your mainboard if available?
____________

NEO83
Send message
Joined: 22 Oct 16
Posts: 5
Credit: 856,495
RAC: 4
Germany
Message 1018 - Posted: 29 Mar 2017, 14:30:10 UTC
Last modified: 29 Mar 2017, 14:30:30 UTC

My ASUS Mainboard died a few days ago and i changed to a MSI MB, but MSI removed the Beta BIOS with the new AGESA Update so i am not able to try and ASUS had no available because of a heavy bug in the AGESA Update.

MSI said in there Forum that anybody who will share Links for the removed BIOS or Shares the BIOS itself will be set on read only for 14 Days so i think the Bug will be rly hard so we need to wait ... i can try a few if u want?

Krümel
Send message
Joined: 31 Oct 16
Posts: 19
Credit: 14,052,147
RAC: 44
Germany
Message 1027 - Posted: 5 Apr 2017, 18:56:30 UTC

Thanks to an AGESA-Update with the new Beta-Bios for my ASUS Prime B350-Plus the FMA-App is noe working fine with my R7 1700.
It realy was the FMA-Bug of the new Ryzen Processor that caused the problem.

Krümel
Send message
Joined: 31 Oct 16
Posts: 19
Credit: 14,052,147
RAC: 44
Germany
Message 1028 - Posted: 6 Apr 2017, 5:57:15 UTC

OK something is still strange.
Most FMA-WU worked propperly, but 4 errored with "Privileged Instruction".
The new AGESA still seems "Beta". :)

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 182
Credit: 4,633,870
RAC: 24
Italy
Message 1029 - Posted: 6 Apr 2017, 14:17:26 UTC - in response to Message 1027.

Thanks to an AGESA-Update with the new Beta-Bios for my ASUS Prime B350-Plus the FMA-App is noe working fine with my R7 1700.
It realy was the FMA-Bug of the new Ryzen Processor that caused the problem.


Did you install a BETA bios??? :-O

Krümel
Send message
Joined: 31 Oct 16
Posts: 19
Credit: 14,052,147
RAC: 44
Germany
Message 1030 - Posted: 7 Apr 2017, 18:58:01 UTC - in response to Message 1029.

I´m the risky kind of guy... ;)
Well, but all about Ryzen-Mainboards is kind of "Beta". :D

1 · 2 · Next
Post to thread

Message boards : Number crunching : FMA problems (Ryzen and others?)


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN