Optimization
log in

Advanced search

Message boards : Number crunching : Optimization

Previous · 1 . . . 7 · 8 · 9 · 10
Author Message
Krümel
Send message
Joined: 31 Oct 16
Posts: 19
Credit: 14,099,551
RAC: 0
Germany
Message 1224 - Posted: 12 Dec 2017, 19:04:39 UTC - in response to Message 1222.
Last modified: 12 Dec 2017, 19:05:03 UTC


i have 1x8GB 2666mhz c16,...


OK, that´s just single channel and slows down the system as well.
With two RAM-Sticks you would use dual channel with a much wider bandwith.

Profile Buro87 [Lombardia]
Send message
Joined: 23 Nov 16
Posts: 100
Credit: 4,000,541
RAC: 0
Italy
Message 1225 - Posted: 13 Dec 2017, 14:25:03 UTC - in response to Message 1223.

How is your RAM-Speed?
On my R7 it makes a big differens between 2.133 MHz an 3.066 Mht (up to 30 minutes)


i have 1x8GB 2666mhz c16, maybe i try to overclock it

i have undersood the "problem": if i set CPU usage at 50% (or turning off SMT) the 6 simultaneus WUs take around 2400s to complete:mysteri solved

I don't understand why WUs don't speedup when cpu is at 3.8ghz

TN-Grid app is very memory-intensive. One person from my team wrote that on his Xeon 14c/28t (I do not know exact model, probably it is E5-2683 v3) 4 TN-Grid WUs consumed all available memory bandwidth. So when you hit this limit, increasing CPU speed will not help, it will faster wait for memory ;)

Edit: when you set CPU usage to 50%, app will be able to get data from memory faster (less apps will compete for the same limited bandwidth), every app instance could use more cache (additionally helping with loading data), plus CPU resources will not be shared between two apps (SMT/HT does not improve speed twice, usually it is much less).

If you want to improve speeds, use fastest possible memory, and overclock it if possible.


tanks Daniel

Thanks Daniel
how i can see memory bandwidth usage?
just for test i tried to run ram at 2133mhz and a Wu (with 12 thread) take around 4300s
Setting ram at 2933mhz Wu drops to 3700s

Profile Buro87 [Lombardia]
Send message
Joined: 23 Nov 16
Posts: 100
Credit: 4,000,541
RAC: 0
Italy
Message 1226 - Posted: 13 Dec 2017, 14:38:02 UTC - in response to Message 1224.


i have 1x8GB 2666mhz c16,...


OK, that´s just single channel and slows down the system as well.
With two RAM-Sticks you would use dual channel with a much wider bandwith.



when ram price will drop to a "fair price" i'll add another stick, only for dual channel bandwidth, because TN-grid Wus are no so big (Win10 + 12Wus take only 3GB of ram)

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 1227 - Posted: 14 Dec 2017, 7:01:35 UTC - in response to Message 1225.

How is your RAM-Speed?
On my R7 it makes a big differens between 2.133 MHz an 3.066 Mht (up to 30 minutes)


i have 1x8GB 2666mhz c16, maybe i try to overclock it

i have undersood the "problem": if i set CPU usage at 50% (or turning off SMT) the 6 simultaneus WUs take around 2400s to complete:mysteri solved

I don't understand why WUs don't speedup when cpu is at 3.8ghz

TN-Grid app is very memory-intensive. One person from my team wrote that on his Xeon 14c/28t (I do not know exact model, probably it is E5-2683 v3) 4 TN-Grid WUs consumed all available memory bandwidth. So when you hit this limit, increasing CPU speed will not help, it will faster wait for memory ;)

Edit: when you set CPU usage to 50%, app will be able to get data from memory faster (less apps will compete for the same limited bandwidth), every app instance could use more cache (additionally helping with loading data), plus CPU resources will not be shared between two apps (SMT/HT does not improve speed twice, usually it is much less).

If you want to improve speeds, use fastest possible memory, and overclock it if possible.


tanks Daniel

Thanks Daniel
how i can see memory bandwidth usage?
just for test i tried to run ram at 2133mhz and a Wu (with 12 thread) take around 4300s
Setting ram at 2933mhz Wu drops to 3700s

There is no tool to see it. Intel support also claims that this cannot be measured. That person found this by performing series of tests:
- when he decreased CPU clock, CPU usage raised to 100%;
- when he decreased memory clock, CPU usage dropped to 85%;
- when he added 3rd memory stick, CPU usage increased from 90% to 100%.
____________

Jay
Send message
Joined: 2 Jun 17
Posts: 4
Credit: 373,931
RAC: 0
Message 1229 - Posted: 17 Dec 2017, 23:13:18 UTC

How does one go about running a test on the same WU repeatedly so as to test SSE2 vs AVX vs FMA on a Windows machine?

Thanks

Jay
Send message
Joined: 2 Jun 17
Posts: 4
Credit: 373,931
RAC: 0
Message 1230 - Posted: 18 Dec 2017, 0:19:37 UTC - in response to Message 1229.

Can I run it from the command line?

kotenok2000
Send message
Joined: 18 Feb 20
Posts: 13
Credit: 260,344
RAC: 0
Russia
Message 3353 - Posted: 13 Feb 2024, 17:46:05 UTC

Is it Olay if i compiled binary with
# AVX2+FMA, 64-bit
ARCH += -march=native -mtune=native -msse4.2 -mpopcnt -maes -mpclmul -mavx -mfma -mavx2 -m64 -flto

instead of
# AVX2+FMA, 64-bit
ARCH += -march=core2 -mtune=generic -msse4.2 -mpopcnt -maes -mpclmul -mavx -mfma -mavx2 -m64

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 98
Italy
Message 3356 - Posted: 14 Feb 2024, 9:41:25 UTC - in response to Message 3353.
Last modified: 14 Feb 2024, 9:42:20 UTC

I don't exactly know the details of that command line (and of your hardware) but, anyway, feel free to compile the application the way you want (and modify it, if you are brave enough). It would be successful if the obtained result is exactly the same of the original one.

Previous · 1 . . . 7 · 8 · 9 · 10
Post to thread

Message boards : Number crunching : Optimization


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN