Posts by Xavier Wallece

1) Message boards : Development : Doublevector always uses SSE path instead of AVX (Message 2115)
Posted 27 Nov 2020 by Xavier Wallece

I'm looking into it and see that it is running several sets. Each set runs for about 13 seconds. After that a new set begins.

I'm going to test the avx/SSE doublevector code in the pc algorithem part.
But it is allready heavely optimized.

I also understand why avx does not always help, the algoritem parts it needs to go though are small so it cannot strech it legs.

2) Message boards : Number crunching : Compiling for AVX-512 (Message 2106)
Posted 25 Nov 2020 by Xavier Wallece

After reading this topic, I started reviewing the code.

I suggest you first compile the program as-is.
run test_run2.sh so the file compare says the files are equal.

I sugest you make a copy of test_run2.sh (and bin/pc) and change the parameter value '2470' to something large like 100000
the program runs longer so you should see the diffrence better. Output files will not be corrects since the input file does not contain that many entries.

make a backup of the file bin/pc. With each make you will overwrite this file.

For a quick and dirty tryout you need to:

Look in de SIMD folder (under src) you notice that there are 4 code paths: NEON,scalar,SSE and AVX.
change the file AvxDoubleVectorTraits.hpp in src/simd folder

change VectorSize from 32 to 64
and DataAlignment from 32 to 64.

change avx functions like _mm256_add_pd with the avx512 ones like _mm512_add_pd.
see intels reference guide to do so:

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX_512&text=_pd&expand=127

Also do not forget to change the sum method. it adds only 4 variables (avx) instead of 8 (avx512)

Also change DoubleVector.hpp (see http://gene.disi.unitn.it/test/forum_thread.php?id=302 )
In 2017 avx was not very performant on intel haswell so someone changed it back to SSE. This is the reason that avx code is as fast as the sse code.

line 71: typedef AvxDoubleVector DoubleVectorLong;
line 72: typedef SseDoubleVector DoubleVector;
line 73: #elif defined (__SSE2__)

change line 72 to
typedef AvxDoubleVector DoubleVector;

After you have done that change the Makefile

It seems that you are using AVX512F functions(see intel guide above) for intel skylake and icelake.
If you compile the program you will see if it compiles for SSE or AVX. If it compiles for SSE add the "-mavx -mfma -mavx2 " parameters in the makefile in ARCH

Compile the program and run via test_run2.sh if the results are equal you can try to run in with a larger set.

Looking forward in seeing your results

3) Message boards : Development : Doublevector always uses SSE path instead of AVX (Message 2105)
Posted 25 Nov 2020 by Xavier Wallece

Good day Today,

After looking into https://bitbucket.org/francesco-asnicar/pc-boinc/src/master/ and compiling the code myself in ubuntu, I noticed that de DoubleVector always uses the SSE code and not the AVX code.

file: DoubleVector.hpp
staring at Line: 60
#ifdef __AVX__
#ifdef __FMA__
#ifdef __AVX2__
#pragma message "Using FMA+AVX2 instructions"
#else
#pragma message "Using FMA+AVX instructions"
#endif
#else
#pragma message "Using AVX instructions"
#endif

typedef AvxDoubleVector DoubleVectorLong;
typedef SseDoubleVector DoubleVector;
#elif defined (__SSE2__)

Changed line 72 to:
typedef AvxDoubleVector DoubleVector;

It seems that haswell in 2017 was not very performant with AVX at the time and therefor it was disabled.

Running the compiled verion as follows:
bin/pc input/tile2.txt output/output2.txt 0.05 1 100000 0

The avx realtime is 22 second, the SSE version realtime is 32 seconds under ubuntu in hyper-v.
This only helps if the 5th parameter (100000) is large enough. If it is small like the test_run.sh scripts it almost makes no difference.

Could someone verify that the compiled gene_pcim_v1.11_win64__avx.exe is running in SSE instead of AVX mode?