FMA problems (Ryzen and others?)
log in

Advanced search

Message boards : Number crunching : FMA problems (Ryzen and others?)

Previous · 1 · 2
Author Message
NEO83
Send message
Joined: 22 Oct 16
Posts: 5
Credit: 857,965
RAC: 0
Germany
Message 1047 - Posted: 11 Apr 2017, 6:52:56 UTC

So i tryied a few FMA again after a BIOS Update this morning on my MSI Board but dont know if there was a AGESA Update

http://gene.disi.unitn.it/test/results.php?userid=426

a few worked but most of the FMA are broken again ... so i dont know if this is only a problem of Ryzen

Millenium
Send message
Joined: 26 Jul 17
Posts: 16
Credit: 2,460,009
RAC: 0
Italy
Message 1097 - Posted: 26 Jul 2017, 20:50:51 UTC

I have an AMD Ryzen 1700, I'm trying this project, I still have to update the BIOS of my motherboard, I'll do that in the next days. Anyway, as you can see, like half of the wu I got errored, one SSE2 and most of the FMA ones errored, now I got some AVX ones, let's see what happens.
What is interesting is that so far this is the only project giving me these troubles, but well maybe it is the only one using advanced instructions so it makes sense maybe. Anyway, I'll get new WU once I update the BIOS.

ikeke1
Send message
Joined: 28 Jul 17
Posts: 1
Credit: 241,144
RAC: 0
Message 1100 - Posted: 29 Jul 2017, 9:50:31 UTC
Last modified: 29 Jul 2017, 10:25:08 UTC

All FMA WUs error on my R7 1700 aswell, same reason on all of them.

Latest prod BIOS, AGESA 1.0.0.6a update.

https://valid.x86.fr/sry3nt

http://gene.disi.unitn.it/test/result.php?resultid=16117256

*** Dump of thread ID 9428 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 6250000.000000, User Time: 349218752.000000, Wait Time: 69220064.000000

- Unhandled Exception Record -
Reason: Privileged Instruction (0xc0000096) at address 0x00000000004f5458


http://gene.disi.unitn.it/test/result.php?resultid=16117138

*** Dump of thread ID 12016 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 2343750.000000, User Time: 89218752.000000, Wait Time: 69171816.000000

- Unhandled Exception Record -
Reason: Privileged Instruction (0xc0000096) at address 0x00000000004f5458

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 1102 - Posted: 29 Jul 2017, 17:19:04 UTC
Last modified: 29 Jul 2017, 17:19:16 UTC

Looks that TN-Grid app causes some new bug on Ryzen CPUs, which is not fixed yet. I have created post on AMD forum to let them know about it: https://community.amd.com/message/2814366
____________

Millenium
Send message
Joined: 26 Jul 17
Posts: 16
Credit: 2,460,009
RAC: 0
Italy
Message 1110 - Posted: 18 Aug 2017, 19:50:14 UTC
Last modified: 18 Aug 2017, 20:06:33 UTC

I update my BIOS, things improved a lot.

AMD Ryzen 1700
ASUS Prime B350 Plus, updated to 806 BIOS version, AGESA is now 1.0.0.6a

The only WU erroring out now are some the SSE2 ones. Randomly during the elaboration some error out (but most successfully complete), the errors are always similar:

Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00000000004f5845 read attempt to address 0x05289720 Engaging BOINC Windows Runtime Debugger...


Other WUs have no problems, the FMA ones are perfectly fine and valid, also some SSE2 are successfully completed and valid.

Weird, by reading the thread and from my experience with the old BIOS I expected FMA giving me troubles but now with the updated BIOS they are working fine. Anyway I'll crunch more, let's see what happens. In the next days I'll crunch some WUs on Linux and see what happens.

Millenium
Send message
Joined: 26 Jul 17
Posts: 16
Credit: 2,460,009
RAC: 0
Italy
Message 1111 - Posted: 18 Aug 2017, 23:10:33 UTC

Update: some FMA WUs failed. What I noticed is that they all failed within a minute from each other, like if something in that minute made them fail.
I was just playing a game, nothing else, the WUs of the other projects are fine anyway so it is something related to this project.
The error is different from the SSE2 ones that errored:

Unhandled Exception Detected... - Unhandled Exception Record - Reason: Privileged Instruction (0xc0000096) at address 0x00000000004f5458 Engaging BOINC Windows Runtime Debugger...


Now I received a bunch of AVX ones, let's see what happen with them

Millenium
Send message
Joined: 26 Jul 17
Posts: 16
Credit: 2,460,009
RAC: 0
Italy
Message 1112 - Posted: 20 Aug 2017, 17:49:41 UTC

After more WUs I had some errors, not a lot, but most FMA failed. Meh, I wonder what causes them. And why only on this project.

Millenium
Send message
Joined: 26 Jul 17
Posts: 16
Credit: 2,460,009
RAC: 0
Italy
Message 1129 - Posted: 18 Oct 2017, 14:28:39 UTC

I have been crunching on a gridcoin pool for a month or so, so the rac you see on this account of course is nonsense, anyway, the error rate on my Ryzen CPU is almost 0, right now for example my host page shows me 581 Valid WUs and only 1 Error.

I have no idea what changed in these weeks, I did not change anything in my pc and I only updated the bios back in august.

Any update from the admin of the project on this problem?

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 1
Italy
Message 1131 - Posted: 18 Oct 2017, 14:44:29 UTC - in response to Message 1129.

I have been crunching on a gridcoin pool for a month or so, so the rac you see on this account of course is nonsense, anyway, the error rate on my Ryzen CPU is almost 0, right now for example my host page shows me 581 Valid WUs and only 1 Error.

I have no idea what changed in these weeks, I did not change anything in my pc and I only updated the bios back in august.

Any update from the admin of the project on this problem?

We didn't change anything recently. Remember that, if I remember correctly, the bug affects only some Ryzen cpus with unpatched BIOS, on Windows (but you have to accept BETA work in your profile in order to get the FMA application for Windows).

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 183
Credit: 4,641,505
RAC: 0
Italy
Message 1134 - Posted: 19 Oct 2017, 10:21:51 UTC - in response to Message 1129.

I have no idea what changed in these weeks, I did not change anything in my pc and I only updated the bios back in august.


Asus released other 2 bios for your motherboard, the 0808 and the most recent 0902 with AGESA 1.0.0.6b

mmonnin
Send message
Joined: 24 Oct 16
Posts: 14
Credit: 4,519,646
RAC: 0
United States
Message 1138 - Posted: 20 Oct 2017, 10:12:53 UTC

Fired up this project on my 1950x for the Formula BOINC Event. I think its the 1st project I've ran that has made use of AVX2 or FMA on it. No invalids/errors yet.

Previous · 1 · 2
Post to thread

Message boards : Number crunching : FMA problems (Ryzen and others?)


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN