Posts by hoppisaur
log in
1) Message boards : Number crunching : Optimization (Message 799)
Posted 24 Jan 2017 by hoppisaur

This app was limited by memory speed, so SSE version may be faster. Older version was executing more slow calculations (square roots, divisions) plus loops for AVX were executing less times because of longer vectors, so AVX and FMA were faster. Now with reduced number of these slow calculations and with unrolled loops it may be that SSE is faster. I could test it on SandyBridge CPUs only which have slow AVX division and square roots, so AVX app was slower there too. On newer CPUs with faster AVX and memory things may work differently and AVX/FMA versions may be faster. Will see, I hope people will post their results for new apps here.


I am seeing a little less than 34 min/workunit on win 7 64 with the newest AVX version. On an i3-4330 (Haswell) running two instances




Main page · Your account · Message boards


Copyright © 2022 CNR-TN & UniTN