FMA application for windows_x86_64
log in

Advanced search

Message boards : Number crunching : FMA application for windows_x86_64

1 · 2 · Next
Author Message
Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 624
Credit: 34,677,535
RAC: 1
Italy
Message 2143 - Posted: 24 Dec 2020, 12:18:29 UTC
Last modified: 14 Feb 2021, 17:42:23 UTC

I just deployed the FMA optimized version of the application for Windows x64. It had problems, some time ago, with the early releases of the AMD Ryzen cpus (although, if I remember correctly, the problems were solved with BIOS upgrades).
The application is marked as "beta", if you you want to try it out you need to enable "Run test applications" in your profile.

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 183
Credit: 4,641,505
RAC: 0
Italy
Message 2149 - Posted: 25 Dec 2020, 8:19:17 UTC - in response to Message 2143.

I just deployed the FMA optimized version of the application for Windows x64.


Well done, Valter!!
And... Happy Christmas

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 624
Credit: 34,677,535
RAC: 1
Italy
Message 2152 - Posted: 28 Dec 2020, 20:30:18 UTC - in response to Message 2149.
Last modified: 28 Dec 2020, 20:30:35 UTC

Got only one host, so far, that have problems with this application (no problems with SSE2 and AVX), this one: https://gene.disi.unitn.it/test/show_host_detail.php?hostid=64398.
Error code 1 (I don't really know the meaning of it....)

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 64
Credit: 15,299,594
RAC: 0
United States
Message 2153 - Posted: 29 Dec 2020, 6:17:24 UTC

Is there a debug parameter option for the application to produce better failure output?

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 624
Credit: 34,677,535
RAC: 1
Italy
Message 2155 - Posted: 29 Dec 2020, 9:55:19 UTC - in response to Message 2153.
Last modified: 29 Dec 2020, 16:17:06 UTC

Is there a debug parameter option for the application to produce better failure output?

Unfortunately not. This kind of problem is very low level, related to the specific computational architecture. Running the application from the command line might show some more info about.
The only real solution would be throwing the core dump to a debugger like gdb, on the failing machine (this on Linux, I don't know about how to do this on Windows).

I was concerned about the error number itself, it sounds very strange to me. I had expected to see some "Illegal instructions" (SIGILL, 0x04) in case some computers claimed to be able to execute FMA instructions but were not correctly configured to do this properly, but this seems not the case...

Shmya-2
Send message
Joined: 4 Jul 20
Posts: 4
Credit: 8,533,188
RAC: 0
Russia
Message 2156 - Posted: 30 Dec 2020, 4:50:25 UTC - in response to Message 2155.

For comparison, I'll run this fma app on my win10 machine.

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 64
Credit: 15,299,594
RAC: 0
United States
Message 2168 - Posted: 30 Dec 2020, 22:50:19 UTC - in response to Message 2156.

For comparison, I'll run this fma app on my win10 machine.

Best way to run it would be to run the app and a task in the terminal so you can capture the output for debugging.

Mr P Hucker
Send message
Joined: 29 Sep 17
Posts: 37
Credit: 584,834
RAC: 0
United Kingdom
Message 2181 - Posted: 31 Jan 2021, 12:37:58 UTC - in response to Message 2143.
Last modified: 31 Jan 2021, 12:40:41 UTC

I just deployed the FMA optimized version of the application for Windows x64. It had problems, some time ago, with the early releases of the AMD Ryzen cpus (although, if I remember correctly, the problem were solved with BIOS upgrades).
The application is marked as "beta", if you you want to try it out you need to enable "Run test applications" in your profile.

I notice FMA and AVX running on my Ryzen 9 3900XT (brand new, with brand new motherboard with latest BIOS) and my i5 8600K. The FMA ones don't seem to be any faster than the AVX tasks. Are they doing more or different calculations?

Also, why with both AVX and FMA, is the Ryzen so slow compared to the i5? It should be just over 3/4s of the speed per core, but it's only 1/2 the speed.

I don't know much about FMA and AVX, but from what I've read, FMA is a subset of AVX, which has me even more confused. Then there's FMA3 and FMA4, and Intel and AMD choosing one then the other, then going back again....

My Ryzen's completed tasks: http://gene.disi.unitn.it/test/results.php?hostid=63548&offset=0&show_names=0&state=4&appid=

My I5's completed tasks: http://gene.disi.unitn.it/test/results.php?hostid=65175&offset=0&show_names=0&state=4&appid=

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 64
Credit: 15,299,594
RAC: 0
United States
Message 2182 - Posted: 31 Jan 2021, 23:50:16 UTC - in response to Message 2181.

In Linux, my 3900X, 3950X and Epyc 7402P settled on FMA exclusively as fastest app. Takes about two weeks trying out AVX, SSE2 and FMA before declaring a winner.

Mr P Hucker
Send message
Joined: 29 Sep 17
Posts: 37
Credit: 584,834
RAC: 0
United Kingdom
Message 2184 - Posted: 1 Feb 2021, 11:38:10 UTC - in response to Message 2182.
Last modified: 1 Feb 2021, 11:38:31 UTC

In Linux, my 3900X, 3950X and Epyc 7402P settled on FMA exclusively as fastest app. Takes about two weeks trying out AVX, SSE2 and FMA before declaring a winner.

How come my two fast machines (see links in last post) are almost identical? I can't see how that would change in two weeks, the timings are indistinguishable. Unless it's an OS thing, I only use Windows 10.

Bryn Mawr
Send message
Joined: 23 Jun 20
Posts: 44
Credit: 14,260,228
RAC: 0
United Kingdom
Message 2185 - Posted: 1 Feb 2021, 12:38:20 UTC - in response to Message 2182.

In Linux, my 3900X, 3950X and Epyc 7402P settled on FMA exclusively as fastest app. Takes about two weeks trying out AVX, SSE2 and FMA before declaring a winner.


My 3600 is fma whereas my 3700x is avx so I just leave them to it :-)

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 624
Credit: 34,677,535
RAC: 1
Italy
Message 2186 - Posted: 1 Feb 2021, 15:41:58 UTC - in response to Message 2184.
Last modified: 1 Feb 2021, 15:43:08 UTC

In Linux, my 3900X, 3950X and Epyc 7402P settled on FMA exclusively as fastest app. Takes about two weeks trying out AVX, SSE2 and FMA before declaring a winner.

How come my two fast machines (see links in last post) are almost identical? I can't see how that would change in two weeks, the timings are indistinguishable. Unless it's an OS thing, I only use Windows 10.

It happens that the performances of the various versions are almost the same. If there is not a clear winner the server simply don't care about what to send. Other important factors while comparing computers, for our application, are the RAM speed and CPU cache size.

Mr P Hucker
Send message
Joined: 29 Sep 17
Posts: 37
Credit: 584,834
RAC: 0
United Kingdom
Message 2187 - Posted: 1 Feb 2021, 17:33:11 UTC - in response to Message 2186.

In Linux, my 3900X, 3950X and Epyc 7402P settled on FMA exclusively as fastest app. Takes about two weeks trying out AVX, SSE2 and FMA before declaring a winner.

How come my two fast machines (see links in last post) are almost identical? I can't see how that would change in two weeks, the timings are indistinguishable. Unless it's an OS thing, I only use Windows 10.

It happens that the performances of the various versions are almost the same. If there is not a clear winner the server simply don't care about what to send. Other important factors while comparing computers, for our application, are the RAM speed and CPU cache size.

So the server actually calculates which is the most efficient version to send to each machine? That's cool :-)

So this "two weeks" being discussed is the server seeing which works best?

klepel
Send message
Joined: 13 Sep 17
Posts: 4
Credit: 89,201,015
RAC: 0
Peru
Message 2188 - Posted: 2 Feb 2021, 0:48:56 UTC - in response to Message 2186.

It happens that the performances of the various versions are almost the same.

I am not so sure about that! I observed: tn-grid Server starts to send out only one App type to a certain computer after a while and does not send other Apps to re-check from time to time, if another App type would work better.

I have also observed, when I bring a new computer online with a mix of tn-grid WUs and Climateprediction.net WUs, the tn-grid server starts to send out only sse2 WUs to this particular computer and seems never to check again, if another App type (fma) works better for that computer.

Said this, my computers (CPU) use the following apps:
AMD 1700x (RAM 3200, 16 GB, Linux): fma
AMD 2600 (RAM 3200, 16 GB, Win10): avx
AMD 2600 (RAM 3200, 16 GB, Win10): avx
AMD 2600 (RAM 3000, 8 GB, Win10): Motherboatd broken, but it was avx.
AMD 3700x (RAM 3600, 32 GB, Win10): sse2
AMD 3950x (RAM 3600, 32 GB, Linux): sse2 (this computer here with similar characteristics: http://gene.disi.unitn.it/test/results.php?hostid=64788 and here: http://gene.disi.unitn.it/test/results.php?hostid=60351, fma, and they are 500 [s] faster)
This is why I asked for an working app_config to select a particular App type for tn-grid. Unfortunately, I never worked it out!

floyd
Send message
Joined: 1 May 20
Posts: 1
Credit: 3,865,137
RAC: 0
Message 2189 - Posted: 2 Feb 2021, 12:14:58 UTC - in response to Message 2188.

It happens that the performances of the various versions are almost the same.

I am not so sure about that! I observed: tn-grid Server starts to send out only one App type to a certain computer after a while and does not send other Apps to re-check from time to time, if another App type would work better.

In my experience that happens when one application seems(!) to be much faster than the others. For example my 1700X gets FMA tasks exclusively after a short start phase, because BOINC rated AVX and SSE2 around 5.4 GFLOPS but FMA above 7. The reason for that is not that the application is really much faster but I experimented with system configuration during the evaluation phase. Now in regular operation the experienced speed of FMA drops towards the other applications. I wonder what will happen when they are about equal. Maybe BOINC will start trying other applications again.

I have also observed, when I bring a new computer online with a mix of tn-grid WUs and Climateprediction.net WUs, the tn-grid server starts to send out only sse2 WUs to this particular computer

I always seem to get a batch of the same type initially but other types soon after. On my computers, other than the 1700X, I still get a mix of applications after some weeks. BOINC hasn't found a favourite yet and it looks like it won't. Speed differences are small and the relative speeds keep changing.

AMD 3950x (RAM 3600, 32 GB, Linux): sse2 (this computer here with similar characteristics: http://gene.disi.unitn.it/test/results.php?hostid=64788 and here: http://gene.disi.unitn.it/test/results.php?hostid=60351, fma, and they are 500 [s] faster)

See the application statistics for those hosts:
http://gene.disi.unitn.it/test/host_app_versions.php?hostid=64788
http://gene.disi.unitn.it/test/host_app_versions.php?hostid=60351
You'll notice that the speed difference between the fastest and the slowest application is about 2% or less. That's also my own experience. For the difference between those computers and yours there's more likely reasons. System load and clocks come to mind immediately.

This is why I asked for an working app_config to select a particular App type for tn-grid. Unfortunately, I never worked it out!

You can't do that with app_config. You'd need to run an Anonymous Platform via app_info but that's tricky. I suggest you don't touch that without a better reason.

Mr P Hucker
Send message
Joined: 29 Sep 17
Posts: 37
Credit: 584,834
RAC: 0
United Kingdom
Message 2190 - Posted: 2 Feb 2021, 18:27:40 UTC - in response to Message 2188.
Last modified: 2 Feb 2021, 18:27:48 UTC

It happens that the performances of the various versions are almost the same.

I am not so sure about that! I observed: tn-grid Server starts to send out only one App type to a certain computer after a while and does not send other Apps to re-check from time to time, if another App type would work better.

Why would it need to? Unless your computer has changed, eg you upgraded it. Which should cause a re-evaluation, not sure how Boinc or the server would know you changed it though.

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 624
Credit: 34,677,535
RAC: 1
Italy
Message 2192 - Posted: 3 Feb 2021, 13:08:25 UTC - in response to Message 2190.
Last modified: 3 Feb 2021, 13:08:49 UTC

The server software is supposed to try different apps to see which one is faster and more reliable. It needs at least N (don't know exactly this number, could be 10 or 20) successfully validated results for each app before it can get an APR (Average Processing Rate), this was the DCF some time ago. You can see the APR numbers for each app when you look at your computer details under your account. Its supposed to pick the fastest app and occasionally send one of the others to see how they go. (documentation about this feature is not so easy to find...)

Bryn Mawr
Send message
Joined: 23 Jun 20
Posts: 44
Credit: 14,260,228
RAC: 0
United Kingdom
Message 2193 - Posted: 3 Feb 2021, 13:48:33 UTC - in response to Message 2192.

The server software is supposed to try different apps to see which one is faster and more reliable. It needs at least N (don't know exactly this number, could be 10 or 20) successfully validated results for each app before it can get an APR (Average Processing Rate), this was the DCF some time ago. You can see the APR numbers for each app when you look at your computer details under your account. Its supposed to pick the fastest app and occasionally send one of the others to see how they go. (documentation about this feature is not so easy to find...)


Interesting and it does show that it works, the machine I updated the other week was almost exclusively fma (r5 2600) and has now swapped to almost exclusively avx (r7 3700x). The odd thing is that my old old machine that started as an FX8370 then inherited the r5 2600 shows it was getting sse2 ?

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 64
Credit: 15,299,594
RAC: 0
United States
Message 2199 - Posted: 12 Feb 2021, 19:38:02 UTC - in response to Message 2192.

The server software is supposed to try different apps to see which one is faster and more reliable. It needs at least N (don't know exactly this number, could be 10 or 20) successfully validated results for each app before it can get an APR (Average Processing Rate), this was the DCF some time ago. You can see the APR numbers for each app when you look at your computer details under your account. Its supposed to pick the fastest app and occasionally send one of the others to see how they go. (documentation about this feature is not so easy to find...)

You need 10 valid tasks to set an APR, but in actuality you need 11 valid tasks to cause the server software to recognize an APR change and display it in the client.

Aurum
Send message
Joined: 18 Jul 18
Posts: 97
Credit: 291,386,295
RAC: 0
United States
Message 2200 - Posted: 13 Feb 2021, 18:05:06 UTC

Is there any way to trigger the Instruction Set retest???

As mentioned earlier upgrading a CPU or Merging will confound that computer's performance history.

1 · 2 · Next
Post to thread

Message boards : Number crunching : FMA application for windows_x86_64


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN