Compiling for AVX-512
log in

Advanced search

Message boards : Number crunching : Compiling for AVX-512

1 · 2 · 3 · Next
Author Message
Aurum
Send message
Joined: 18 Jul 18
Posts: 50
Credit: 90,446,643
RAC: 651,546
United States
Message 2029 - Posted: 22 Oct 2020, 17:07:01 UTC

Has any consideration been given to compiling for AVX-512?

IntelĀ® Advanced Vector Extensions 512 (IntelĀ® AVX-512)
____________

Aurum
Send message
Joined: 18 Jul 18
Posts: 50
Credit: 90,446,643
RAC: 651,546
United States
Message 2031 - Posted: 22 Oct 2020, 17:20:15 UTC

A free compiler may be available:
https://software.intel.com/content/www/us/en/develop/tools/parallel-studio-xe/choose-download/open-source-contributor.html

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 33
Credit: 2,136,660
RAC: 8,320
United States
Message 2035 - Posted: 23 Oct 2020, 16:10:34 UTC

So few processors are capable of AVX-512, I wouldn't think it worth the effort.

Aurum
Send message
Joined: 18 Jul 18
Posts: 50
Credit: 90,446,643
RAC: 651,546
United States
Message 2037 - Posted: 23 Oct 2020, 19:10:45 UTC - in response to Message 2035.

So few processors are capable of AVX-512, I wouldn't think it worth the effort.

I'll make it worth someone's while to compile for AVX-512. All Cascade Lake and Skylake CPUs are AVX-512 capable:

https://en.wikipedia.org/wiki/Skylake_(microarchitecture)#High-end_desktop_processors_(Skylake-X)

https://ark.intel.com/content/www/us/en/ark/products/codename/124664/cascade-lake.html

Even more than I thought:
https://en.wikipedia.org/wiki/AVX-512#:~:text=AVX%2D512%20are%20512%2Dbit,i5%2D7640X%20and%20Core%20i7%2D

I'm running these now and all my future CPU upgrades will be AVX-512 capable: i9-10980XE, i9-9980XE, i9-7980XE & i9-9960X. They're vastly underutilized without AVX-512.

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 33
Credit: 2,136,660
RAC: 8,320
United States
Message 2039 - Posted: 23 Oct 2020, 22:16:37 UTC

Hmmm, whatever.

AVX-512 is not universally supported on all cpus, not even Intel ones. Certainly not on any AMD ones. AVX-512 support takes up a bunch of processor real estate that only gets used occasionally and only in special cases where software invokes it.

I'll let you read from this article from Extremetech.com and you can develop your own opinion that the arguments presented have validity.

https://www.extremetech.com/computing/312673-linus-torvalds-i-hope-avx512-dies-a-painful-death

Do you actually get any performance advantage in an application when the processor has to downclock to stay with in its power limits?

Aurum
Send message
Joined: 18 Jul 18
Posts: 50
Credit: 90,446,643
RAC: 651,546
United States
Message 2040 - Posted: 24 Oct 2020, 0:58:42 UTC

Like he said AVX-512 is for HPC and that's what we're doing. The proof's in the pudding. Until someone compiles for AVX-512 and we can test it we won't know.
All of his pent up hate seems related to broad-based commercial applications.

When I run Asteroids@home they always send me the right applications for my hardware. http://asteroidsathome.net/boinc/apps.php

So a project can tailor the applications and only send AVX-512 WUs to AVX-512 capable CPUs.

crims0nparr0t is Indiana University & they're running a lot WUs between TN-Grid and WCG. Bet they have more than a few AVX-512 capable CPUs. Take a look at the WCG greatest hits list and note UH UIT HPC (university of Houston) and others with HPC capabilities.
https://www.worldcommunitygrid.org/stat/viewStatsByMemberAT.do?sort=points

I hope I get a chance to put AVX-512 WUs to the test.

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 33
Credit: 2,136,660
RAC: 8,320
United States
Message 2042 - Posted: 24 Oct 2020, 17:54:00 UTC - in response to Message 2040.

The code in BOINC already probes for the host's cpu capabilities, so it knows whether the processor supports AVX-512 instructions.

So it really is up to the project app developers to determine whether there is enough host support attached to their project to decide if an AVX-512 capable app is worth the effort to develop and with high enough production to make it worthwhile.

The cpu statistics XML export page should provide the resource to filter out all the capable Intel cpus that can do AVX-512 if you can craft a decent search parameter. That would get you the total count of capable cpus attached at one time. Won't show whether they are currently attached though.

Aurum
Send message
Joined: 18 Jul 18
Posts: 50
Credit: 90,446,643
RAC: 651,546
United States
Message 2043 - Posted: 26 Oct 2020, 2:09:11 UTC
Last modified: 26 Oct 2020, 2:23:04 UTC

I found 236 AVX-512 capable CPUs plus 2 of mine that haven't been included yet:
i9-10980XE
i9-9960X
Intel Xeon Processor (Skylake, IBRS) [Family 6 Model 85 Stepping 4]
Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz [Family 6 Model 85 Stepping 4]
Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz [Family 6 Model 85 Stepping 4]
Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz [Family 6 Model 85 Stepping 4]
Intel(R) Xeon(R) CPU [Family 6 Model 85 Stepping 3]
Intel(R) Xeon(R) CPU 8164 @ 2.00GHz [Family 6 Model 85 Stepping 3]
Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz [Family 6 Model 85 Stepping 4]
Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz [Family 6 Model 85 Stepping 4]
Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz [Family 6 Model 85 Stepping 4]
Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz [Family 6 Model 26 Stepping 4]
Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz [Family 6 Model 85 Stepping 3]
Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz [Family 6 Model 85 Stepping 4]
Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz [Family 6 Model 85 Stepping 4]
Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz [Family 6 Model 85 Stepping 4]
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz [Family 6 Model 85 Stepping 4]
Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz [Family 6 Model 85 Stepping 4]

The BOINC CPU Benchmarks are broken and undefined so meaningless to include. E.g., GFLOPs/computer = GFLOPs/core x Total # Threads.
Not everyone uses all threads for a given project. Mine are currently running half ARP & half TN-Grid.
The Linux BOINC CPU Benchmark broke between 7.9.3 & 7.16.6 and is awaiting a Github Issue response.

The CPU Models Statistics page includes all CPUs that ever made an appearance here. The Server Status page says 360 Users in the last 24 hours. What's a User??? A unique External CPID or a unique Host??? So do I count as 1 or 40???
http://gene.disi.unitn.it/test/cpu_list.php

Of the Top 20 Users six are aggregators representing hundreds or thousands of computers.
http://gene.disi.unitn.it/test/top_users.php

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 489
Credit: 25,018,775
RAC: 12,995
Italy
Message 2044 - Posted: 26 Oct 2020, 10:14:36 UTC - in response to Message 2043.

What's a User?

It's a unique CPID (so, yes, aggregators are counted as a single user)

About AVX-512, I do not have AVX-512 capable computers so it would very difficult for me to build and test a new application. Also please keep in mind that the benefits are unknown, see: AVX-512 slower than AVX. For instance FMA against AVX only provided a very small speed gain, that's one of the reasons I do not distribute FMA for Windows, also at the very beginning FMA was broken for Ryzen.

Anyway, if someone would like to build up a AVX-512 application I can provide some assistance.

Aurum
Send message
Joined: 18 Jul 18
Posts: 50
Credit: 90,446,643
RAC: 651,546
United States
Message 2063 - Posted: 9 Nov 2020, 0:42:12 UTC - in response to Message 2044.

...benefits are unknown...
Exactly and we won't know until someone tries it.
see: AVX-512 slower than AVX.
I tried to read this thread but it got very tedious. The only thing I learned was Intel made a major mistake and produced some CPUs with singlets instead of pairs so I updated my list.
if someone would like to build up a AVX-512 application I can provide some assistance.
And if anyone does try it I'd be glad to test it :-)

Intel Core i9-7900X CPU @ 3.30GHz [Family 6 Model 85 Stepping 4] 2 x AVX-512
Intel Core i9-7960X CPU @ 2.80GHz [Family 6 Model 85 Stepping 4] 2 x AVX-512
Intel Core i9-7980XE CPU @ 2.60GHz [Family 6 Model 85 Stepping 4] 2 x AVX-512
i9-9960X 2 x AVX-512
i9-10980XE 2 x AVX-512
Intel Xeon CPU 8164 @ 2.00GHz [Family 6 Model 85 Stepping 3] 2 x AVX-512
Intel Xeon Gold 6130 CPU @ 2.10GHz [Family 6 Model 85 Stepping 4] 2 x AVX-512
Intel Xeon Gold 6140 CPU @ 2.30GHz [Family 6 Model 85 Stepping 4] 2 x AVX-512
Intel Xeon Gold 6148 CPU @ 2.40GHz [Family 6 Model 26 Stepping 4] 2 x AVX-512
Intel Xeon Platinum 8124M CPU @ 3.00GHz [Family 6 Model 85 Stepping 3] 2 x AVX-512
Intel Xeon Platinum 8124M CPU @ 3.00GHz [Family 6 Model 85 Stepping 4] 2 x AVX-512
Intel Xeon Platinum 8167M CPU @ 2.00GHz [Family 6 Model 85 Stepping 4] 2 x AVX-512
Intel Xeon Platinum 8168 CPU @ 2.70GHz [Family 6 Model 85 Stepping 4] 2 x AVX-512
Intel Xeon Platinum 8171M CPU @ 2.60GHz [Family 6 Model 85 Stepping 4] 2 x AVX-512
Intel Xeon Silver 4110 CPU @ 2.10GHz [Family 6 Model 85 Stepping 4] 1 x AVX-512
Intel Xeon Gold 5118 CPU @ 2.30GHz [Family 6 Model 85 Stepping 4] 1 x AVX-512

Aurum
Send message
Joined: 18 Jul 18
Posts: 50
Credit: 90,446,643
RAC: 651,546
United States
Message 2081 - Posted: 18 Nov 2020, 12:41:05 UTC - in response to Message 2044.

FMA against AVX only provided a very small speed gain, that's one of the reasons I do not distribute FMA for Windows...

Look at the Applications page. There are many Windoze crunchers here than Linux. FMA WUs require both wingmen to be Linux users. Be more more efficient if FMA compiled for Windoze as well.
____________

Bryn Mawr
Send message
Joined: 23 Jun 20
Posts: 19
Credit: 2,069,491
RAC: 9,422
United Kingdom
Message 2082 - Posted: 18 Nov 2020, 14:28:15 UTC - in response to Message 2081.
Last modified: 18 Nov 2020, 14:31:07 UTC

FMA against AVX only provided a very small speed gain, that's one of the reasons I do not distribute FMA for Windows...

Look at the Applications page. There are many Windoze crunchers here than Linux. FMA WUs require both wingmen to be Linux users. Be more more efficient if FMA compiled for Windoze as well.


Not convinced by this statement. Almost all my WUs are fma and if I look at the wingmen the very often receive avx or sse2 equivalents - see Workunit 25706025 or 25705750 as examples.

Jim1348
Send message
Joined: 29 Dec 16
Posts: 48
Credit: 9,092,224
RAC: 12,539
United States
Message 2083 - Posted: 18 Nov 2020, 14:42:04 UTC - in response to Message 2082.

Almost all my WUs are fma and if I look at the wingmen the very often receive avx or sse2 equivalents - see Workunit 25706025 or 25705750 as examples.

I have never paid any attention to this, but now that you mention it, it seems that my Ubuntu 20.04.1 machine is being validated by everyone and everything.
http://gene.disi.unitn.it/test/results.php?hostid=55080&offset=0&show_names=0&state=4&appid=

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 33
Credit: 2,136,660
RAC: 8,320
United States
Message 2084 - Posted: 18 Nov 2020, 17:57:47 UTC
Last modified: 18 Nov 2020, 17:58:56 UTC

The validator of the science application is agnostic about who or what produced the output file.

It only cares if there is consensus reached between the two samples of matching or closed matched results.

Jim1348
Send message
Joined: 29 Dec 16
Posts: 48
Credit: 9,092,224
RAC: 12,539
United States
Message 2085 - Posted: 18 Nov 2020, 20:32:20 UTC - in response to Message 2084.

It only cares if there is consensus reached between the two samples of matching or closed matched results.

Sure. But it is frequently the case on some projects that "similar" outputs don't match well enough for validation. I suppose it depends on the science, and how the validator is calibrated.

Aurum
Send message
Joined: 18 Jul 18
Posts: 50
Credit: 90,446,643
RAC: 651,546
United States
Message 2086 - Posted: 18 Nov 2020, 22:19:13 UTC

Oh. I thought the WU needed to match its program since all WUs say what instruction set they're for.
gene@home PC-IM (avx)
gene@home PC-IM (fma)
gene@home PC-IM (sse2)
So why include the instruction set if the program is agnostic?

Bryn Mawr
Send message
Joined: 23 Jun 20
Posts: 19
Credit: 2,069,491
RAC: 9,422
United Kingdom
Message 2088 - Posted: 18 Nov 2020, 23:30:10 UTC - in response to Message 2086.

Oh. I thought the WU needed to match its program since all WUs say what instruction set they're for.
gene@home PC-IM (avx)
gene@home PC-IM (fma)
gene@home PC-IM (sse2)
So why include the instruction set if the program is agnostic?


Diagnostic for troubleshooting when things go wrong.

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 489
Credit: 25,018,775
RAC: 12,995
Italy
Message 2089 - Posted: 19 Nov 2020, 11:06:26 UTC - in response to Message 2088.
Last modified: 19 Nov 2020, 11:14:59 UTC

Some time ago, when starting to deploy applications for different architectures we made sure (and this wasn't easy) that the output file matched our "gold" one, which we defined to be the one got using plain Linux x64. Therefore, a Xeon FMA Linux output exactly matches any other one output (like ARM vfpv3 or MacOs). This way both the distribution and the validation of workunits are much easier to handle.
When adding new applications to the system BOINC requires to give them unique name and description.

BTW, as I explained before, I cannot build and test a AVX512 application (I don't have any AVX512 capable computer here). Anyway, it is not difficult to compile the source code (gcc on Linux x64), the tricky thing is to compile the needed BOINC libraries but, if requested, I can provide them.

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 33
Credit: 2,136,660
RAC: 8,320
United States
Message 2090 - Posted: 19 Nov 2020, 17:02:15 UTC - in response to Message 2086.

Oh. I thought the WU needed to match its program since all WUs say what instruction set they're for.
gene@home PC-IM (avx)
gene@home PC-IM (fma)
gene@home PC-IM (sse2)
So why include the instruction set if the program is agnostic?

All the instruction set in the name is doing is showing how the source code was compiled. Same source code, just using a different set of compilation parameters for using different hardware.

Nothing more.

One instruction set works better or worse on whatever hardware architecture and age of cpu you are using.

My Ryzen and Treadripper perform the fastest on the FMA app.

Aurum
Send message
Joined: 18 Jul 18
Posts: 50
Credit: 90,446,643
RAC: 651,546
United States
Message 2091 - Posted: 19 Nov 2020, 17:43:30 UTC - in response to Message 2089.

BTW, as I explained before, I cannot build and test a AVX512 application (I don't have any AVX512 capable computer here). Anyway, it is not difficult to compile the source code (gcc on Linux x64), the tricky thing is to compile the needed BOINC libraries but, if requested, I can provide them.
I haven't compiled anything since the nineties. Maybe I can get a free compiler from Intel for open source use. Is the gene_pcim code open source???

1 · 2 · 3 · Next
Post to thread

Message boards : Number crunching : Compiling for AVX-512


Main page · Your account · Message boards


Copyright © 2021 CNR-TN & UniTN