Author |
Message |
|
i have 1x8GB 2666mhz c16,...
OK, that´s just single channel and slows down the system as well.
With two RAM-Sticks you would use dual channel with a much wider bandwith. |
|
|
|
How is your RAM-Speed?
On my R7 it makes a big differens between 2.133 MHz an 3.066 Mht (up to 30 minutes)
i have 1x8GB 2666mhz c16, maybe i try to overclock it
i have undersood the "problem": if i set CPU usage at 50% (or turning off SMT) the 6 simultaneus WUs take around 2400s to complete:mysteri solved
I don't understand why WUs don't speedup when cpu is at 3.8ghz
TN-Grid app is very memory-intensive. One person from my team wrote that on his Xeon 14c/28t (I do not know exact model, probably it is E5-2683 v3) 4 TN-Grid WUs consumed all available memory bandwidth. So when you hit this limit, increasing CPU speed will not help, it will faster wait for memory ;)
Edit: when you set CPU usage to 50%, app will be able to get data from memory faster (less apps will compete for the same limited bandwidth), every app instance could use more cache (additionally helping with loading data), plus CPU resources will not be shared between two apps (SMT/HT does not improve speed twice, usually it is much less).
If you want to improve speeds, use fastest possible memory, and overclock it if possible.
tanks Daniel
Thanks Daniel
how i can see memory bandwidth usage?
just for test i tried to run ram at 2133mhz and a Wu (with 12 thread) take around 4300s
Setting ram at 2933mhz Wu drops to 3700s |
|
|
|
i have 1x8GB 2666mhz c16,...
OK, that´s just single channel and slows down the system as well.
With two RAM-Sticks you would use dual channel with a much wider bandwith.
when ram price will drop to a "fair price" i'll add another stick, only for dual channel bandwidth, because TN-grid Wus are no so big (Win10 + 12Wus take only 3GB of ram) |
|
|
|
How is your RAM-Speed?
On my R7 it makes a big differens between 2.133 MHz an 3.066 Mht (up to 30 minutes)
i have 1x8GB 2666mhz c16, maybe i try to overclock it
i have undersood the "problem": if i set CPU usage at 50% (or turning off SMT) the 6 simultaneus WUs take around 2400s to complete:mysteri solved
I don't understand why WUs don't speedup when cpu is at 3.8ghz
TN-Grid app is very memory-intensive. One person from my team wrote that on his Xeon 14c/28t (I do not know exact model, probably it is E5-2683 v3) 4 TN-Grid WUs consumed all available memory bandwidth. So when you hit this limit, increasing CPU speed will not help, it will faster wait for memory ;)
Edit: when you set CPU usage to 50%, app will be able to get data from memory faster (less apps will compete for the same limited bandwidth), every app instance could use more cache (additionally helping with loading data), plus CPU resources will not be shared between two apps (SMT/HT does not improve speed twice, usually it is much less).
If you want to improve speeds, use fastest possible memory, and overclock it if possible.
tanks Daniel
Thanks Daniel
how i can see memory bandwidth usage?
just for test i tried to run ram at 2133mhz and a Wu (with 12 thread) take around 4300s
Setting ram at 2933mhz Wu drops to 3700s
There is no tool to see it. Intel support also claims that this cannot be measured. That person found this by performing series of tests:
- when he decreased CPU clock, CPU usage raised to 100%;
- when he decreased memory clock, CPU usage dropped to 85%;
- when he added 3rd memory stick, CPU usage increased from 90% to 100%.
____________
|
|
|
|
How does one go about running a test on the same WU repeatedly so as to test SSE2 vs AVX vs FMA on a Windows machine?
Thanks |
|
|
|
Can I run it from the command line? |
|
|
|
Is it Olay if i compiled binary with
# AVX2+FMA, 64-bit
ARCH += -march=native -mtune=native -msse4.2 -mpopcnt -maes -mpclmul -mavx -mfma -mavx2 -m64 -flto
instead of
# AVX2+FMA, 64-bit
ARCH += -march=core2 -mtune=generic -msse4.2 -mpopcnt -maes -mpclmul -mavx -mfma -mavx2 -m64 |
|
|
valtercProject administrator Project tester Send message
Joined: 30 Oct 13 Posts: 623 Credit: 34,676,744 RAC: 1,154
|
I don't exactly know the details of that command line (and of your hardware) but, anyway, feel free to compile the application the way you want (and modify it, if you are brave enough). It would be successful if the obtained result is exactly the same of the original one. |
|
|