| log in | 
Message boards : Number crunching : FMA application for windows_x86_64
1 · 2 · Next
| Author | Message | 
|---|---|
| I just deployed the FMA optimized version of the application for Windows x64. It had problems, some time ago, with the early releases of the AMD Ryzen cpus (although, if I remember correctly, the problems were solved with BIOS upgrades).  | |
| ID: 2143 ·  Reply
Quote | |
| I just deployed the FMA optimized version of the application for Windows x64. Well done, Valter!! And... Happy Christmas | |
| ID: 2149 ·  Reply
Quote | |
| Got only one host, so far, that have problems with this application (no problems with SSE2 and AVX), this one: https://gene.disi.unitn.it/test/show_host_detail.php?hostid=64398. | |
| ID: 2152 ·  Reply
Quote | |
| Is there a debug parameter option for the application to produce better failure output? | |
| ID: 2153 ·  Reply
Quote | |
| Is there a debug parameter option for the application to produce better failure output? Unfortunately not. This kind of problem is very low level, related to the specific computational architecture. Running the application from the command line might show some more info about. The only real solution would be throwing the core dump to a debugger like gdb, on the failing machine (this on Linux, I don't know about how to do this on Windows). I was concerned about the error number itself, it sounds very strange to me. I had expected to see some "Illegal instructions" (SIGILL, 0x04) in case some computers claimed to be able to execute FMA instructions but were not correctly configured to do this properly, but this seems not the case... | |
| ID: 2155 ·  Reply
Quote | |
| For comparison, I'll run this fma app on my win10 machine. | |
| ID: 2156 ·  Reply
Quote | |
| For comparison, I'll run this fma app on my win10 machine. Best way to run it would be to run the app and a task in the terminal so you can capture the output for debugging. | |
| ID: 2168 ·  Reply
Quote | |
| I just deployed the FMA optimized version of the application for Windows x64. It had problems, some time ago, with the early releases of the AMD Ryzen cpus (although, if I remember correctly, the problem were solved with BIOS upgrades). I notice FMA and AVX running on my Ryzen 9 3900XT (brand new, with brand new motherboard with latest BIOS) and my i5 8600K. The FMA ones don't seem to be any faster than the AVX tasks. Are they doing more or different calculations? Also, why with both AVX and FMA, is the Ryzen so slow compared to the i5? It should be just over 3/4s of the speed per core, but it's only 1/2 the speed. I don't know much about FMA and AVX, but from what I've read, FMA is a subset of AVX, which has me even more confused. Then there's FMA3 and FMA4, and Intel and AMD choosing one then the other, then going back again.... My Ryzen's completed tasks: http://gene.disi.unitn.it/test/results.php?hostid=63548&offset=0&show_names=0&state=4&appid= My I5's completed tasks: http://gene.disi.unitn.it/test/results.php?hostid=65175&offset=0&show_names=0&state=4&appid= | |
| ID: 2181 ·  Reply
Quote | |
| In Linux, my 3900X, 3950X and Epyc 7402P settled on FMA exclusively as fastest app. Takes about two weeks trying out AVX, SSE2 and FMA before declaring a winner. | |
| ID: 2182 ·  Reply
Quote | |
| In Linux, my 3900X, 3950X and Epyc 7402P settled on FMA exclusively as fastest app. Takes about two weeks trying out AVX, SSE2 and FMA before declaring a winner. How come my two fast machines (see links in last post) are almost identical? I can't see how that would change in two weeks, the timings are indistinguishable. Unless it's an OS thing, I only use Windows 10. | |
| ID: 2184 ·  Reply
Quote | |
| In Linux, my 3900X, 3950X and Epyc 7402P settled on FMA exclusively as fastest app. Takes about two weeks trying out AVX, SSE2 and FMA before declaring a winner. My 3600 is fma whereas my 3700x is avx so I just leave them to it :-) | |
| ID: 2185 ·  Reply
Quote | |
| In Linux, my 3900X, 3950X and Epyc 7402P settled on FMA exclusively as fastest app. Takes about two weeks trying out AVX, SSE2 and FMA before declaring a winner. It happens that the performances of the various versions are almost the same. If there is not a clear winner the server simply don't care about what to send. Other important factors while comparing computers, for our application, are the RAM speed and CPU cache size. | |
| ID: 2186 ·  Reply
Quote | |
| In Linux, my 3900X, 3950X and Epyc 7402P settled on FMA exclusively as fastest app. Takes about two weeks trying out AVX, SSE2 and FMA before declaring a winner. So the server actually calculates which is the most efficient version to send to each machine? That's cool :-) So this "two weeks" being discussed is the server seeing which works best? | |
| ID: 2187 ·  Reply
Quote | |
| It happens that the performances of the various versions are almost the same. I am not so sure about that! I observed: tn-grid Server starts to send out only one App type to a certain computer after a while and does not send other Apps to re-check from time to time, if another App type would work better. I have also observed, when I bring a new computer online with a mix of tn-grid WUs and Climateprediction.net WUs, the tn-grid server starts to send out only sse2 WUs to this particular computer and seems never to check again, if another App type (fma) works better for that computer. Said this, my computers (CPU) use the following apps: AMD 1700x (RAM 3200, 16 GB, Linux): fma AMD 2600 (RAM 3200, 16 GB, Win10): avx AMD 2600 (RAM 3200, 16 GB, Win10): avx AMD 2600 (RAM 3000, 8 GB, Win10): Motherboatd broken, but it was avx. AMD 3700x (RAM 3600, 32 GB, Win10): sse2 AMD 3950x (RAM 3600, 32 GB, Linux): sse2 (this computer here with similar characteristics: http://gene.disi.unitn.it/test/results.php?hostid=64788 and here: http://gene.disi.unitn.it/test/results.php?hostid=60351, fma, and they are 500 [s] faster) This is why I asked for an working app_config to select a particular App type for tn-grid. Unfortunately, I never worked it out! | |
| ID: 2188 ·  Reply
Quote | |
| It happens that the performances of the various versions are almost the same. In my experience that happens when one application seems(!) to be much faster than the others. For example my 1700X gets FMA tasks exclusively after a short start phase, because BOINC rated AVX and SSE2 around 5.4 GFLOPS but FMA above 7. The reason for that is not that the application is really much faster but I experimented with system configuration during the evaluation phase. Now in regular operation the experienced speed of FMA drops towards the other applications. I wonder what will happen when they are about equal. Maybe BOINC will start trying other applications again. I have also observed, when I bring a new computer online with a mix of tn-grid WUs and Climateprediction.net WUs, the tn-grid server starts to send out only sse2 WUs to this particular computer I always seem to get a batch of the same type initially but other types soon after. On my computers, other than the 1700X, I still get a mix of applications after some weeks. BOINC hasn't found a favourite yet and it looks like it won't. Speed differences are small and the relative speeds keep changing. AMD 3950x (RAM 3600, 32 GB, Linux): sse2 (this computer here with similar characteristics: http://gene.disi.unitn.it/test/results.php?hostid=64788 and here: http://gene.disi.unitn.it/test/results.php?hostid=60351, fma, and they are 500 [s] faster) See the application statistics for those hosts: http://gene.disi.unitn.it/test/host_app_versions.php?hostid=64788 http://gene.disi.unitn.it/test/host_app_versions.php?hostid=60351 You'll notice that the speed difference between the fastest and the slowest application is about 2% or less. That's also my own experience. For the difference between those computers and yours there's more likely reasons. System load and clocks come to mind immediately. This is why I asked for an working app_config to select a particular App type for tn-grid. Unfortunately, I never worked it out! You can't do that with app_config. You'd need to run an Anonymous Platform via app_info but that's tricky. I suggest you don't touch that without a better reason. | |
| ID: 2189 ·  Reply
Quote | |
| It happens that the performances of the various versions are almost the same. Why would it need to? Unless your computer has changed, eg you upgraded it. Which should cause a re-evaluation, not sure how Boinc or the server would know you changed it though. | |
| ID: 2190 ·  Reply
Quote | |
| The server software is supposed to try different apps to see which one is faster and more reliable. It needs at least N (don't know exactly this number, could be 10 or 20) successfully validated results for each app before it can get an APR (Average Processing Rate), this was the DCF some time ago. You can see the APR numbers for each app when you look at your computer details under your account. Its supposed to pick the fastest app and occasionally send one of the others to see how they go. (documentation about this feature is not so easy to find...) | |
| ID: 2192 ·  Reply
Quote | |
| The server software is supposed to try different apps to see which one is faster and more reliable. It needs at least N (don't know exactly this number, could be 10 or 20) successfully validated results for each app before it can get an APR (Average Processing Rate), this was the DCF some time ago. You can see the APR numbers for each app when you look at your computer details under your account. Its supposed to pick the fastest app and occasionally send one of the others to see how they go. (documentation about this feature is not so easy to find...) Interesting and it does show that it works, the machine I updated the other week was almost exclusively fma (r5 2600) and has now swapped to almost exclusively avx (r7 3700x). The odd thing is that my old old machine that started as an FX8370 then inherited the r5 2600 shows it was getting sse2 ? | |
| ID: 2193 ·  Reply
Quote | |
| The server software is supposed to try different apps to see which one is faster and more reliable. It needs at least N (don't know exactly this number, could be 10 or 20) successfully validated results for each app before it can get an APR (Average Processing Rate), this was the DCF some time ago. You can see the APR numbers for each app when you look at your computer details under your account. Its supposed to pick the fastest app and occasionally send one of the others to see how they go. (documentation about this feature is not so easy to find...) You need 10 valid tasks to set an APR, but in actuality you need 11 valid tasks to cause the server software to recognize an APR change and display it in the client. | |
| ID: 2199 ·  Reply
Quote | |
| Is there any way to trigger the Instruction Set retest??? | |
| ID: 2200 ·  Reply
Quote | |
            Message boards : 
            Number crunching : 
        FMA application for windows_x86_64