| Author | Message | 
|---|
        
        |  | 
        
        | Hello,
 Thank you, it's running well now.
 
 But api_version needed to be modified to the one I am running.
 
 Best
 
 Philippe
 | 
|
|  | 
        
        |  | 
        
        | Thank you for your hard work ! The improvement is fantastic !
 | 
|
|  | 
        
        |  | 
        
        | Hi all,I have just added application for Linux ARM v7a vfpv4 (address is the same: https://bitbucket.org/sirzooro/pc-boinc/downloads). It is about 30% faster than original one. I tested it on my Odroid XU4 and runs fine. Unfortunately NEON instructions does not support double precision operations, so additional optimalization with vectorization is not possible for ARM. Maybe some future generations of ARM CPUs will allow this.
 ____________
 
   | 
|
|  | 
        
        |  | 
        
        | Hello,Sorry for the question ... but I see no performance difference between AVX and FMA on an i7 4770K HT OFF / W7 Ultimate. Do you have any recommendation or is this normal ?
 (Primegrid LLR WU's are using FMA3 I think, and it makes the CPU running at his highest perf.)
 Thank You.
 Philippe
 | 
|
|  | 
        
        |  | 
        
        | I do not have at this moment BOINC libs compiled for Linux 32 bit and for Windows. I will prepare Windows app later. Let me know if you need 32-bit app for Windows or Linux too, I wonder if someone would need it, let me know if you need one.
 
 Yes, i have 5 hosts with Windows 32bit version.
 If you can create 32bit app, it will be very cool.
 Thanks for your optimization work
 | 
|
|  | 
        
        | 
      valtercProject administrator Project tester
 Send message
 Joined: 30 Oct 13
 Posts: 629
 Credit: 34,725,842
 RAC: 618
 
       | 
        
        | I do not have at this moment BOINC libs compiled for Linux 32 bit and for Windows. I will prepare Windows app later. Let me know if you need 32-bit app for Windows or Linux too, I wonder if someone would need it, let me know if you need one.
 
 Yes, i have 5 hosts with Windows 32bit version.
 If you can create 32bit app, it will be very cool.
 Thanks for your optimization work
 
 Here you may find the boinc api libraries (from the latest source code) for Linux x32 and x64 http://gene.disi.unitn.it/test/files/boinc_libs-x32-x64.7z
 | 
|
|  | 
        
        |  | 
        
        | I do not have at this moment BOINC libs compiled for Linux 32 bit and for Windows. I will prepare Windows app later. Let me know if you need 32-bit app for Windows or Linux too, I wonder if someone would need it, let me know if you need one.
 
 Yes, i have 5 hosts with Windows 32bit version.
 If you can create 32bit app, it will be very cool.
 Thanks for your optimization work
 
 Here you may find the boinc api libraries (from the latest source code) for Linux x32 and x64 http://gene.disi.unitn.it/test/files/boinc_libs-x32-x64.7z
 Thanks, but now I need ones for Windows :).
 
 I have added 32-bit apps for windows, in 4 versions: without SIMD instructions (x87 FPU version), SSE2, AVX and FMA. They passed my small test, so they should give correct results. Let me know if they work for you.
 ____________
 
   | 
|
|  | 
        
        |  | 
        
        | Hello,Sorry for the question ... but I see no performance difference between AVX and FMA on an i7 4770K HT OFF / W7 Ultimate. Do you have any recommendation or is this normal ?
 (Primegrid LLR WU's are using FMA3 I think, and it makes the CPU running at his highest perf.)
 Thank You.
 Philippe
 No need for an answer. I saw your benchmark tests.
 | 
|
|  | 
        
        |  | 
        
        | Hello,Sorry for the question ... but I see no performance difference between AVX and FMA on an i7 4770K HT OFF / W7 Ultimate. Do you have any recommendation or is this normal ?
 (Primegrid LLR WU's are using FMA3 I think, and it makes the CPU running at his highest perf.)
 Thank You.
 Philippe
 No need for an answer. I saw your benchmark tests.
 You should see some difference, as in benchmark results above. Did you try to run different app versions manually on test data, or you ran them on BOINC tasks? If the latter, please keep in mind that they wary in length, some of them complete faster some slower.
 
 And one important note for 32-bit Windows apps: you need at least Windows 7 SP1 or Windows Server 2008 R2 if you want to use AVX and FMA versions. On Windows XP you will be able to run SSE and non-SIMD versions only. Here is list of OSes which supports AVX: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support.
 ____________
 
   | 
|
|  | 
        
        |  | 
        
        | I do not have at this moment BOINC libs compiled for Linux 32 bit and for Windows. I will prepare Windows app later. Let me know if you need 32-bit app for Windows or Linux too, I wonder if someone would need it, let me know if you need one.
 
 Yes, i have 5 hosts with Windows 32bit version.
 If you can create 32bit app, it will be very cool.
 Thanks for your optimization work
 
 Here you may find the boinc api libraries (from the latest source code) for Linux x32 and x64 http://gene.disi.unitn.it/test/files/boinc_libs-x32-x64.7z
 Thanks, but now I need ones for Windows :).
 
 I have added 32-bit apps for windows, in 4 versions: without SIMD instructions (x87 FPU version), SSE2, AVX and FMA. They passed my small test, so they should give correct results. Let me know if they work for you.
 
 I just installed the optimized version on my Windows 32bit hosts and everything seems to work perfectly.
 Thanks again for your work ^_^
 | 
|
|  | 
        
        |  | 
        
        | Hello,Sorry for the question ... but I see no performance difference between AVX and FMA on an i7 4770K HT OFF / W7 Ultimate. Do you have any recommendation or is this normal ?
 (Primegrid LLR WU's are using FMA3 I think, and it makes the CPU running at his highest perf.)
 Thank You.
 Philippe
 No need for an answer. I saw your benchmark tests.
 You should see some difference, as in benchmark results above. Did you try to run different app versions manually on test data, or you ran them on BOINC tasks? If the latter, please keep in mind that they wary in length, some of them complete faster some slower.
 
 And one important note for 32-bit Windows apps: you need at least Windows 7 SP1 or Windows Server 2008 R2 if you want to use AVX and FMA versions. On Windows XP you will be able to run SSE and non-SIMD versions only. Here is list of OSes which supports AVX: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support.
 
 Hello Daniel,
 
 Thank you for your answer.
 
 Machine ID = ID: 462 : Windows 7 Ultimate 64 Bits - i7 4770K - HT ON - 24 Go RAM ...
 
 I am used to run AVX on projects such as Asteroids@Home, and FMA3 on Primegrid.
 
 Have installed the TN Grid FMA optimization package last night, and running times are > 10 % longer than AVX ... Credits are 1% up to 5% higher too, but I don't know if these are new WU's ?
 
 => I will reinstall the AVX optimization.
 
 Hereafter the details of some wu's processed lately :
 
 AVX :
 
 
 4630473 	2226855 	29 Dec 2016, 13:36:19 UTC 	29 Dec 2016, 19:44:58 UTC 	Terminé et validé 	2,739.02 	2,711.84 	61.66 	Gene Network Application
 Plateforme anonyme (CPU)
 4630480 	2226859 	29 Dec 2016, 13:33:46 UTC 	29 Dec 2016, 19:40:33 UTC 	Terminé et validé 	2,636.53 	2,616.22 	58.15 	Gene Network Application
 Plateforme anonyme (CPU)
 4630029 	2226634 	29 Dec 2016, 13:31:43 UTC 	29 Dec 2016, 19:35:37 UTC 	Terminé et validé 	2,746.45 	2,729.22 	56.26 	Gene Network Application
 Plateforme anonyme (CPU)
 
 
 FMA :
 
 
 4640989 	2231990 	29 Dec 2016, 20:54:25 UTC 	30 Dec 2016, 3:41:04 UTC 	Terminé et validé 	3,283.54 	3,241.39 	62.08 	Gene Network Application
 Plateforme anonyme (CPU)
 4639494 	2231249 	29 Dec 2016, 19:57:43 UTC 	29 Dec 2016, 23:44:08 UTC 	Terminé et validé 	3,330.47 	3,295.07 	64.60 	Gene Network Application
 Plateforme anonyme (CPU)
 4639751 	2231377 	29 Dec 2016, 19:57:43 UTC 	29 Dec 2016, 23:42:03 UTC 	Terminé et validé 	3,492.71 	3,452.72 	63.42 	Gene Network Application
 Plateforme anonyme (CPU)
 
 
 Thank You,
 
 Philippe
 
 EDIT : Thank you for your hard work and these optimizations !!!
 | 
|
|  | 
        
        |  | 
        
        | Thx Daniel for your work - very good optimization.First tests with  Linux64-bit on  AMD Phenom II X6 @3,4GHz, optimized SSE2:
 Standard-Application:    5500s-5700s - 58-60 Cr
 with your optimized app: 3320s-3400s - 49-55 Cr
 
 Happy New Year to you all :-)
 | 
|
|  | 
        
        |  | 
        
        | Happy New Year to you all :-) 
 Happy new year!!!
 | 
|
|  | 
        
        |  | 
        
        | Hi Daniel,about this:
 
 
 I found something what may be a problem. You use undirected graph, so I thought that I could reduce number of iterations of loop at pc.cpp:418 to test (i,j) pairs for j > i only. However after doing this output file size changed from 47.8K to 67.6K. Original code before my changes also generated bigger file after applying this change. I checked code briefly and do not see anything obvious what may cause this. Could you take a look on this? it is actually not possible to half the number of iteration because the algorithm choose a pair of node i,j and test whether the arc linking the two nodes should be removed. When l increases, the test is conditioned to a set of neighbours of size l of the first node. If the edge is not removed, it could be the case that there exists a set of neighbours of j of a certain size l that allows the removal of the edge. So, it is important to test all the possible combination of i,j.
 
 I hope it is clear enough. A bit more details can be found here: http://www.jmlr.org/papers/volume8/kalisch07a/kalisch07a.pdf (page 5, Algorithm 1)
 
 Many thanks,
 Francesco
 ____________
 
   | 
|
|  | 
        
        |  | 
        
        | Hi Daniel,about this:
 
 
 I found something what may be a problem. You use undirected graph, so I thought that I could reduce number of iterations of loop at pc.cpp:418 to test (i,j) pairs for j > i only. However after doing this output file size changed from 47.8K to 67.6K. Original code before my changes also generated bigger file after applying this change. I checked code briefly and do not see anything obvious what may cause this. Could you take a look on this? it is actually not possible to half the number of iteration because the algorithm choose a pair of node i,j and test whether the arc linking the two nodes should be removed. When l increases, the test is conditioned to a set of neighbours of size l of the first node. If the edge is not removed, it could be the case that there exists a set of neighbours of j of a certain size l that allows the removal of the edge. So, it is important to test all the possible combination of i,j.
 
 I hope it is clear enough. A bit more details can be found here: http://www.jmlr.org/papers/volume8/kalisch07a/kalisch07a.pdf (page 5, Algorithm 1)
 
 Many thanks,
 Francesco
 Thanks for explanation. I will take a closer look on linked paper.
 ____________
 
   | 
|
|  | 
        
        | 
      valtercProject administrator Project tester
 Send message
 Joined: 30 Oct 13
 Posts: 629
 Credit: 34,725,842
 RAC: 618
 
       | 
        
        | I'm just thinking about strategies for deploying the new versions of the application. Some thoughts:- SSE2 should be the base version (I guess that there are no more around computers without SSE2)
 - AVX is okay, I don't know what to do with the FMA version
 - we will have versions for Win x32-x64, Linux x32-x64, we are still missing a version for Mac-OS x64
 - ARM. I'd like to have it in a standard way, but I don't know which platform is the more suitable (see here: https://boinc.berkeley.edu/trac/wiki/BoincPlatforms) and if there is the need of an app plan (see https://boinc.berkeley.edu/trac/wiki/AppPlan)
 | 
|
|  | 
        
        |  | 
        
        | I'm just thinking about strategies for deploying the new versions of the application. Some thoughts:- SSE2 should be the base version (I guess that there are no more around computers without SSE2)
 - AVX is okay, I don't know what to do with the FMA version
 - we will have versions for Win x32-x64, Linux x32-x64, we are still missing a version for Mac-OS x64
 - ARM. I'd like to have it in a standard way, but I don't know which platform is the more suitable (see here: https://boinc.berkeley.edu/trac/wiki/BoincPlatforms) and if there is the need of an app plan (see https://boinc.berkeley.edu/trac/wiki/AppPlan)
 Good news :) Few comments for this:
 - stats on downloads page shows that 32-bit windows non-SSE version of my app was downloaded 12 times, so there is some need for it. You can also decide to provide this version later if someone will ask for it;
 - FMA should be OK too. It should be sent to hosts which supports FMA3 instruction set;
 - I am not sure if there is come crosscompiler ready. If Mac header files are available somewhere, you can try to build crosscompiler (crosstool package will be your friend);
 - you need at lest two, arm-unknown-linux-gnueabihf and aarch64-unknown-linux-gnu (for 32 and 64 bit ARMs). There are 3 versions of 32-bit ARM app, so plan classes also will be needed. Supported FPU instruction set should be sent to server in similar way as for x86 CPUs.
 - some projects try to send few app versions to client to gather some benchmarks and choose the fastest one. This is probably standard BOINC server feature. This would be good to use here, to check if AVX app is faster than SSE2, people reported mixed results for these apps. FMA app was always faster than AVX, but it may be worthwhile to benchmark it against SSE.
 ____________
 
   | 
|
|  | 
        
        | 
      valtercProject administrator Project tester
 Send message
 Joined: 30 Oct 13
 Posts: 629
 Credit: 34,725,842
 RAC: 618
 
       | 
        
        | OK. I just added the new sse2 windows/linux x64 versions and normal+sse2 for win32. Let's see if it works correctly before adding the other ones.
 [addendum] I found a comment on boinc_dev saying that that:
 
 > Any processor with avx will also have pni, so you should expect both apps
 > to go to machines with AVX until the server can figure out which one is
 > faster on a given host (which is usually about 10 results if there's a
 > significant speed difference).  If there is no speed difference, then both
 > with be sent for a long time.
 So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one...
 | 
|
|  | 
        
        |  | 
        
        | So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one... Potential problem: some machines error the WUs with avx and fma while sse2 seems to work with everthing I've tried.
 I've found that fma can be slightly faster than the sse2 version on some machines but the difference is small.
 | 
|
|  | 
        
        |  | 
        
        | So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one... Potential problem: some machines error the WUs with avx and fma while sse2 seems to work with everthing I've tried.
 I've found that fma can be slightly faster than the sse2 version on some machines but the difference is small.
 What OS do you use? AVX needs support on OS side too. List of supported OSes is here: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support
 FMA version is in fact FMA+AVX, so it also needs such OS support.
 ____________
 
   | 
|
|  |