Optimization
log in

Advanced search

Message boards : Number crunching : Optimization

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next
Author Message
Profile Phil1966
Send message
Joined: 14 Jun 14
Posts: 20
Credit: 3,212,282
RAC: 0
France
Message 687 - Posted: 25 Dec 2016, 18:45:28 UTC - in response to Message 686.

Hello,

Thank you, it's running well now.

But api_version needed to be modified to the one I am running.

Best

Philippe

Profile Phil1966
Send message
Joined: 14 Jun 14
Posts: 20
Credit: 3,212,282
RAC: 0
France
Message 688 - Posted: 26 Dec 2016, 7:47:12 UTC

Thank you for your hard work !
The improvement is fantastic !

Profile Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 80
Credit: 2,204,110
RAC: 114
Poland
Message 689 - Posted: 26 Dec 2016, 13:14:33 UTC
Last modified: 26 Dec 2016, 13:19:52 UTC

Hi all,
I have just added application for Linux ARM v7a vfpv4 (address is the same: https://bitbucket.org/sirzooro/pc-boinc/downloads). It is about 30% faster than original one. I tested it on my Odroid XU4 and runs fine. Unfortunately NEON instructions does not support double precision operations, so additional optimalization with vectorization is not possible for ARM. Maybe some future generations of ARM CPUs will allow this.
____________

Profile Phil1966
Send message
Joined: 14 Jun 14
Posts: 20
Credit: 3,212,282
RAC: 0
France
Message 690 - Posted: 28 Dec 2016, 14:01:22 UTC

Hello,
Sorry for the question ... but I see no performance difference between AVX and FMA on an i7 4770K HT OFF / W7 Ultimate. Do you have any recommendation or is this normal ?
(Primegrid LLR WU's are using FMA3 I think, and it makes the CPU running at his highest perf.)
Thank You.
Philippe

Profile [AF>France>IDF]Lic
Send message
Joined: 19 May 14
Posts: 7
Credit: 2,346,610
RAC: 4,110
France
Message 692 - Posted: 29 Dec 2016, 9:42:47 UTC - in response to Message 653.


I do not have at this moment BOINC libs compiled for Linux 32 bit and for Windows. I will prepare Windows app later. Let me know if you need 32-bit app for Windows or Linux too, I wonder if someone would need it, let me know if you need one.


Yes, i have 5 hosts with Windows 32bit version.
If you can create 32bit app, it will be very cool.
Thanks for your optimization work

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 320
Credit: 16,292,963
RAC: 4,052
Italy
Message 693 - Posted: 29 Dec 2016, 10:08:47 UTC - in response to Message 692.
Last modified: 29 Dec 2016, 10:10:05 UTC


I do not have at this moment BOINC libs compiled for Linux 32 bit and for Windows. I will prepare Windows app later. Let me know if you need 32-bit app for Windows or Linux too, I wonder if someone would need it, let me know if you need one.


Yes, i have 5 hosts with Windows 32bit version.
If you can create 32bit app, it will be very cool.
Thanks for your optimization work


Here you may find the boinc api libraries (from the latest source code) for Linux x32 and x64 http://gene.disi.unitn.it/test/files/boinc_libs-x32-x64.7z

Profile Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 80
Credit: 2,204,110
RAC: 114
Poland
Message 694 - Posted: 29 Dec 2016, 18:37:39 UTC - in response to Message 693.
Last modified: 29 Dec 2016, 18:39:24 UTC


I do not have at this moment BOINC libs compiled for Linux 32 bit and for Windows. I will prepare Windows app later. Let me know if you need 32-bit app for Windows or Linux too, I wonder if someone would need it, let me know if you need one.


Yes, i have 5 hosts with Windows 32bit version.
If you can create 32bit app, it will be very cool.
Thanks for your optimization work


Here you may find the boinc api libraries (from the latest source code) for Linux x32 and x64 http://gene.disi.unitn.it/test/files/boinc_libs-x32-x64.7z

Thanks, but now I need ones for Windows :).

I have added 32-bit apps for windows, in 4 versions: without SIMD instructions (x87 FPU version), SSE2, AVX and FMA. They passed my small test, so they should give correct results. Let me know if they work for you.
____________

Profile Phil1966
Send message
Joined: 14 Jun 14
Posts: 20
Credit: 3,212,282
RAC: 0
France
Message 695 - Posted: 29 Dec 2016, 19:46:42 UTC - in response to Message 690.
Last modified: 29 Dec 2016, 19:48:43 UTC

Hello,
Sorry for the question ... but I see no performance difference between AVX and FMA on an i7 4770K HT OFF / W7 Ultimate. Do you have any recommendation or is this normal ?
(Primegrid LLR WU's are using FMA3 I think, and it makes the CPU running at his highest perf.)
Thank You.
Philippe

No need for an answer. I saw your benchmark tests.

Profile Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 80
Credit: 2,204,110
RAC: 114
Poland
Message 696 - Posted: 29 Dec 2016, 20:30:01 UTC - in response to Message 695.

Hello,
Sorry for the question ... but I see no performance difference between AVX and FMA on an i7 4770K HT OFF / W7 Ultimate. Do you have any recommendation or is this normal ?
(Primegrid LLR WU's are using FMA3 I think, and it makes the CPU running at his highest perf.)
Thank You.
Philippe

No need for an answer. I saw your benchmark tests.

You should see some difference, as in benchmark results above. Did you try to run different app versions manually on test data, or you ran them on BOINC tasks? If the latter, please keep in mind that they wary in length, some of them complete faster some slower.

And one important note for 32-bit Windows apps: you need at least Windows 7 SP1 or Windows Server 2008 R2 if you want to use AVX and FMA versions. On Windows XP you will be able to run SSE and non-SIMD versions only. Here is list of OSes which supports AVX: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support.
____________

Profile [AF>France>IDF]Lic
Send message
Joined: 19 May 14
Posts: 7
Credit: 2,346,610
RAC: 4,110
France
Message 697 - Posted: 30 Dec 2016, 8:30:54 UTC - in response to Message 694.


I do not have at this moment BOINC libs compiled for Linux 32 bit and for Windows. I will prepare Windows app later. Let me know if you need 32-bit app for Windows or Linux too, I wonder if someone would need it, let me know if you need one.


Yes, i have 5 hosts with Windows 32bit version.
If you can create 32bit app, it will be very cool.
Thanks for your optimization work


Here you may find the boinc api libraries (from the latest source code) for Linux x32 and x64 http://gene.disi.unitn.it/test/files/boinc_libs-x32-x64.7z

Thanks, but now I need ones for Windows :).

I have added 32-bit apps for windows, in 4 versions: without SIMD instructions (x87 FPU version), SSE2, AVX and FMA. They passed my small test, so they should give correct results. Let me know if they work for you.


I just installed the optimized version on my Windows 32bit hosts and everything seems to work perfectly.
Thanks again for your work ^_^

Profile Phil1966
Send message
Joined: 14 Jun 14
Posts: 20
Credit: 3,212,282
RAC: 0
France
Message 698 - Posted: 30 Dec 2016, 8:31:52 UTC - in response to Message 696.
Last modified: 30 Dec 2016, 9:06:07 UTC

Hello,
Sorry for the question ... but I see no performance difference between AVX and FMA on an i7 4770K HT OFF / W7 Ultimate. Do you have any recommendation or is this normal ?
(Primegrid LLR WU's are using FMA3 I think, and it makes the CPU running at his highest perf.)
Thank You.
Philippe

No need for an answer. I saw your benchmark tests.

You should see some difference, as in benchmark results above. Did you try to run different app versions manually on test data, or you ran them on BOINC tasks? If the latter, please keep in mind that they wary in length, some of them complete faster some slower.

And one important note for 32-bit Windows apps: you need at least Windows 7 SP1 or Windows Server 2008 R2 if you want to use AVX and FMA versions. On Windows XP you will be able to run SSE and non-SIMD versions only. Here is list of OSes which supports AVX: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support.


Hello Daniel,

Thank you for your answer.

Machine ID = ID: 462 : Windows 7 Ultimate 64 Bits - i7 4770K - HT ON - 24 Go RAM ...

I am used to run AVX on projects such as Asteroids@Home, and FMA3 on Primegrid.

Have installed the TN Grid FMA optimization package last night, and running times are > 10 % longer than AVX ... Credits are 1% up to 5% higher too, but I don't know if these are new WU's ?

=> I will reinstall the AVX optimization.

Hereafter the details of some wu's processed lately :

AVX :


4630473 2226855 29 Dec 2016, 13:36:19 UTC 29 Dec 2016, 19:44:58 UTC Terminé et validé 2,739.02 2,711.84 61.66 Gene Network Application
Plateforme anonyme (CPU)
4630480 2226859 29 Dec 2016, 13:33:46 UTC 29 Dec 2016, 19:40:33 UTC Terminé et validé 2,636.53 2,616.22 58.15 Gene Network Application
Plateforme anonyme (CPU)
4630029 2226634 29 Dec 2016, 13:31:43 UTC 29 Dec 2016, 19:35:37 UTC Terminé et validé 2,746.45 2,729.22 56.26 Gene Network Application
Plateforme anonyme (CPU)


FMA :


4640989 2231990 29 Dec 2016, 20:54:25 UTC 30 Dec 2016, 3:41:04 UTC Terminé et validé 3,283.54 3,241.39 62.08 Gene Network Application
Plateforme anonyme (CPU)
4639494 2231249 29 Dec 2016, 19:57:43 UTC 29 Dec 2016, 23:44:08 UTC Terminé et validé 3,330.47 3,295.07 64.60 Gene Network Application
Plateforme anonyme (CPU)
4639751 2231377 29 Dec 2016, 19:57:43 UTC 29 Dec 2016, 23:42:03 UTC Terminé et validé 3,492.71 3,452.72 63.42 Gene Network Application
Plateforme anonyme (CPU)


Thank You,

Philippe

EDIT : Thank you for your hard work and these optimizations !!!

Profile taurec
Send message
Joined: 15 Oct 15
Posts: 1
Credit: 3,180,279
RAC: 0
Germany
Message 702 - Posted: 31 Dec 2016, 12:11:52 UTC - in response to Message 698.

Thx Daniel for your work - very good optimization.
First tests with Linux64-bit on AMD Phenom II X6 @3,4GHz, optimized SSE2:
Standard-Application: 5500s-5700s - 58-60 Cr
with your optimized app: 3320s-3400s - 49-55 Cr

Happy New Year to you all :-)

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 914,447
RAC: 1,443
Italy
Message 703 - Posted: 1 Jan 2017, 7:38:51 UTC - in response to Message 702.

Happy New Year to you all :-)


Happy new year!!!

Profile FrancescoAsnicar [SSC11]
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 14 Nov 13
Posts: 50
Credit: 6,660,682
RAC: 4,824
Italy
Message 737 - Posted: 11 Jan 2017, 12:30:52 UTC - in response to Message 648.

Hi Daniel,
about this:

I found something what may be a problem. You use undirected graph, so I thought that I could reduce number of iterations of loop at pc.cpp:418 to test (i,j) pairs for j > i only. However after doing this output file size changed from 47.8K to 67.6K. Original code before my changes also generated bigger file after applying this change. I checked code briefly and do not see anything obvious what may cause this. Could you take a look on this?

it is actually not possible to half the number of iteration because the algorithm choose a pair of node i,j and test whether the arc linking the two nodes should be removed. When l increases, the test is conditioned to a set of neighbours of size l of the first node. If the edge is not removed, it could be the case that there exists a set of neighbours of j of a certain size l that allows the removal of the edge. So, it is important to test all the possible combination of i,j.

I hope it is clear enough. A bit more details can be found here: http://www.jmlr.org/papers/volume8/kalisch07a/kalisch07a.pdf (page 5, Algorithm 1)

Many thanks,
Francesco
____________

Profile Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 80
Credit: 2,204,110
RAC: 114
Poland
Message 738 - Posted: 11 Jan 2017, 21:03:23 UTC - in response to Message 737.

Hi Daniel,
about this:

I found something what may be a problem. You use undirected graph, so I thought that I could reduce number of iterations of loop at pc.cpp:418 to test (i,j) pairs for j > i only. However after doing this output file size changed from 47.8K to 67.6K. Original code before my changes also generated bigger file after applying this change. I checked code briefly and do not see anything obvious what may cause this. Could you take a look on this?

it is actually not possible to half the number of iteration because the algorithm choose a pair of node i,j and test whether the arc linking the two nodes should be removed. When l increases, the test is conditioned to a set of neighbours of size l of the first node. If the edge is not removed, it could be the case that there exists a set of neighbours of j of a certain size l that allows the removal of the edge. So, it is important to test all the possible combination of i,j.

I hope it is clear enough. A bit more details can be found here: http://www.jmlr.org/papers/volume8/kalisch07a/kalisch07a.pdf (page 5, Algorithm 1)

Many thanks,
Francesco

Thanks for explanation. I will take a closer look on linked paper.
____________

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 320
Credit: 16,292,963
RAC: 4,052
Italy
Message 739 - Posted: 12 Jan 2017, 10:41:50 UTC - in response to Message 738.
Last modified: 12 Jan 2017, 10:42:12 UTC

I'm just thinking about strategies for deploying the new versions of the application. Some thoughts:
- SSE2 should be the base version (I guess that there are no more around computers without SSE2)
- AVX is okay, I don't know what to do with the FMA version
- we will have versions for Win x32-x64, Linux x32-x64, we are still missing a version for Mac-OS x64
- ARM. I'd like to have it in a standard way, but I don't know which platform is the more suitable (see here: https://boinc.berkeley.edu/trac/wiki/BoincPlatforms) and if there is the need of an app plan (see https://boinc.berkeley.edu/trac/wiki/AppPlan)

Profile Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 80
Credit: 2,204,110
RAC: 114
Poland
Message 740 - Posted: 12 Jan 2017, 11:37:27 UTC - in response to Message 739.

I'm just thinking about strategies for deploying the new versions of the application. Some thoughts:
- SSE2 should be the base version (I guess that there are no more around computers without SSE2)
- AVX is okay, I don't know what to do with the FMA version
- we will have versions for Win x32-x64, Linux x32-x64, we are still missing a version for Mac-OS x64
- ARM. I'd like to have it in a standard way, but I don't know which platform is the more suitable (see here: https://boinc.berkeley.edu/trac/wiki/BoincPlatforms) and if there is the need of an app plan (see https://boinc.berkeley.edu/trac/wiki/AppPlan)

Good news :) Few comments for this:
- stats on downloads page shows that 32-bit windows non-SSE version of my app was downloaded 12 times, so there is some need for it. You can also decide to provide this version later if someone will ask for it;
- FMA should be OK too. It should be sent to hosts which supports FMA3 instruction set;
- I am not sure if there is come crosscompiler ready. If Mac header files are available somewhere, you can try to build crosscompiler (crosstool package will be your friend);
- you need at lest two, arm-unknown-linux-gnueabihf and aarch64-unknown-linux-gnu (for 32 and 64 bit ARMs). There are 3 versions of 32-bit ARM app, so plan classes also will be needed. Supported FPU instruction set should be sent to server in similar way as for x86 CPUs.
- some projects try to send few app versions to client to gather some benchmarks and choose the fastest one. This is probably standard BOINC server feature. This would be good to use here, to check if AVX app is faster than SSE2, people reported mixed results for these apps. FMA app was always faster than AVX, but it may be worthwhile to benchmark it against SSE.
____________

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 320
Credit: 16,292,963
RAC: 4,052
Italy
Message 741 - Posted: 12 Jan 2017, 14:47:49 UTC - in response to Message 740.
Last modified: 12 Jan 2017, 18:43:41 UTC

OK. I just added the new sse2 windows/linux x64 versions and normal+sse2 for win32. Let's see if it works correctly before adding the other ones.

[addendum] I found a comment on boinc_dev saying that that:


> Any processor with avx will also have pni, so you should expect both apps
> to go to machines with AVX until the server can figure out which one is
> faster on a given host (which is usually about 10 results if there's a
> significant speed difference). If there is no speed difference, then both
> with be sent for a long time.

So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one...

Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 24
Credit: 5,510,169
RAC: 23,582
United States
Message 744 - Posted: 13 Jan 2017, 16:10:29 UTC - in response to Message 741.

So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one...

Potential problem: some machines error the WUs with avx and fma while sse2 seems to work with everthing I've tried.
I've found that fma can be slightly faster than the sse2 version on some machines but the difference is small.

Profile Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 80
Credit: 2,204,110
RAC: 114
Poland
Message 745 - Posted: 13 Jan 2017, 16:17:46 UTC - in response to Message 744.
Last modified: 13 Jan 2017, 16:20:42 UTC

So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one...

Potential problem: some machines error the WUs with avx and fma while sse2 seems to work with everthing I've tried.
I've found that fma can be slightly faster than the sse2 version on some machines but the difference is small.

What OS do you use? AVX needs support on OS side too. List of supported OSes is here: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support
FMA version is in fact FMA+AVX, so it also needs such OS support.
____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next
Post to thread

Message boards : Number crunching : Optimization


Main page · Your account · Message boards


Copyright © 2017 CNR-TN & UniTN