Posts by [B@P] Daniel
log in
61) Message boards : Number crunching : Optimization (Message 761)
Posted 14 Jan 2017 by Profile [B@P] Daniel

Is this temporary or won't you consider Mac OS (X/macOS) ?

We will continue to support Mac OS (sse, avx and fma). We just need more time for building the applications (lack of hardware is the main problem).

valters, I tried to google for "linux mac cross compiler" and found few interesting discussions on StackOverflow on this topic. Looks that there is such crosscompiler ready to use (see https://stackoverflow.com/a/10341443). There is even VM with Apple's system, although I am not sure what their legal status is.
62) Message boards : Number crunching : Optimization (Message 753)
Posted 13 Jan 2017 by Profile [B@P] Daniel
I checked your computers and found few validations errors, but no error reported by app or BOINC client that something was wrong. Maybe these are caused by that mysterious error mentioned by valterc before? I also saw this few times but on SSE version. That was on machine running 24/7 and BOINC configured with long task switch time, so these WUs were crunched from start to end without interruptions. One of them was also crunched by someone with my app and that for that person it was validated successfully, so it is even more interesting.

The only 3 validation errors I had were all at once on 1 machine when there was a power glitch. The problem I had with the non sse2 versions were on certain CPU types. They errored all WUs immediately, but it was a while ago when I was testing and the WUs are no longer in the database. The sse2 version runs fine on every machine I've tried so far (15-20 boxes for myself and friends).

Yeah, power outage is a problem for many BOINC apps, usually they assume they will be able to always write checkpoint successfully, and do not take into account that this may be interrupted by power outage or another sudden app termination.

You mentioned errors with AVX app. Are they problems with starting/running app, or validation errors? If with starting/running, could you provide me link to example failed WU? I would like to check if there are some details which may be helpful.
63) Message boards : Number crunching : Optimization (Message 751)
Posted 13 Jan 2017 by Profile [B@P] Daniel
So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one...

Potential problem: some machines error the WUs with avx and fma while sse2 seems to work with everthing I've tried.
I've found that fma can be slightly faster than the sse2 version on some machines but the difference is small.

What OS do you use? AVX needs support on OS side too. List of supported OSes is here: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support

All machines are Win7-64.

Do you have SP1 installed? AVX support was added in it.

Yes and all patches and updates.

I checked your computers and found few validations errors, but no error reported by app or BOINC client that something was wrong. Maybe these are caused by that mysterious error mentioned by valterc before? I also saw this few times but on SSE version. That was on machine running 24/7 and BOINC configured with long task switch time, so these WUs were crunched from start to end without interruptions. One of them was also crunched by someone with my app and that for that person it was validated successfully, so it is even more interesting.
64) Message boards : Number crunching : Optimization (Message 748)
Posted 13 Jan 2017 by Profile [B@P] Daniel
So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one...

Potential problem: some machines error the WUs with avx and fma while sse2 seems to work with everthing I've tried.
I've found that fma can be slightly faster than the sse2 version on some machines but the difference is small.

What OS do you use? AVX needs support on OS side too. List of supported OSes is here: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support

All machines are Win7-64.

Do you have SP1 installed? AVX support was added in it.
65) Message boards : Number crunching : Optimization (Message 745)
Posted 13 Jan 2017 by Profile [B@P] Daniel
So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one...

Potential problem: some machines error the WUs with avx and fma while sse2 seems to work with everthing I've tried.
I've found that fma can be slightly faster than the sse2 version on some machines but the difference is small.

What OS do you use? AVX needs support on OS side too. List of supported OSes is here: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support
FMA version is in fact FMA+AVX, so it also needs such OS support.
66) Message boards : Number crunching : Optimization (Message 740)
Posted 12 Jan 2017 by Profile [B@P] Daniel
I'm just thinking about strategies for deploying the new versions of the application. Some thoughts:
- SSE2 should be the base version (I guess that there are no more around computers without SSE2)
- AVX is okay, I don't know what to do with the FMA version
- we will have versions for Win x32-x64, Linux x32-x64, we are still missing a version for Mac-OS x64
- ARM. I'd like to have it in a standard way, but I don't know which platform is the more suitable (see here: https://boinc.berkeley.edu/trac/wiki/BoincPlatforms) and if there is the need of an app plan (see https://boinc.berkeley.edu/trac/wiki/AppPlan)

Good news :) Few comments for this:
- stats on downloads page shows that 32-bit windows non-SSE version of my app was downloaded 12 times, so there is some need for it. You can also decide to provide this version later if someone will ask for it;
- FMA should be OK too. It should be sent to hosts which supports FMA3 instruction set;
- I am not sure if there is come crosscompiler ready. If Mac header files are available somewhere, you can try to build crosscompiler (crosstool package will be your friend);
- you need at lest two, arm-unknown-linux-gnueabihf and aarch64-unknown-linux-gnu (for 32 and 64 bit ARMs). There are 3 versions of 32-bit ARM app, so plan classes also will be needed. Supported FPU instruction set should be sent to server in similar way as for x86 CPUs.
- some projects try to send few app versions to client to gather some benchmarks and choose the fastest one. This is probably standard BOINC server feature. This would be good to use here, to check if AVX app is faster than SSE2, people reported mixed results for these apps. FMA app was always faster than AVX, but it may be worthwhile to benchmark it against SSE.
67) Message boards : Number crunching : Optimization (Message 738)
Posted 11 Jan 2017 by Profile [B@P] Daniel
Hi Daniel,
about this:

I found something what may be a problem. You use undirected graph, so I thought that I could reduce number of iterations of loop at pc.cpp:418 to test (i,j) pairs for j > i only. However after doing this output file size changed from 47.8K to 67.6K. Original code before my changes also generated bigger file after applying this change. I checked code briefly and do not see anything obvious what may cause this. Could you take a look on this?

it is actually not possible to half the number of iteration because the algorithm choose a pair of node i,j and test whether the arc linking the two nodes should be removed. When l increases, the test is conditioned to a set of neighbours of size l of the first node. If the edge is not removed, it could be the case that there exists a set of neighbours of j of a certain size l that allows the removal of the edge. So, it is important to test all the possible combination of i,j.

I hope it is clear enough. A bit more details can be found here: http://www.jmlr.org/papers/volume8/kalisch07a/kalisch07a.pdf (page 5, Algorithm 1)

Many thanks,
Francesco

Thanks for explanation. I will take a closer look on linked paper.
68) Message boards : Number crunching : Gene application for GNU/Linux on ARM devices (Message 730)
Posted 6 Jan 2017 by Profile [B@P] Daniel
I have just read that AARCH64 CPUs has new NEON SIMD instructions with double precision support, so it should be possible to get additional speed boost by using them. Probably it is time to get some Odroid C2 and play with it a bit :)


As a C2 fanboy, I approve of this ;-)

If you have troubles obtaining one, I might also be able to grant you access to one of mine...

Hardkernel site lists 3 distributors in Poland, so I can buy one quite easily. I think I will order one this month :)
69) Message boards : Number crunching : Gene application for GNU/Linux on ARM devices (Message 726)
Posted 5 Jan 2017 by Profile [B@P] Daniel
I have just read that AARCH64 CPUs has new NEON SIMD instructions with double precision support, so it should be possible to get additional speed boost by using them. Probably it is time to get some Odroid C2 and play with it a bit :)
70) Message boards : Number crunching : Gene application for GNU/Linux on ARM devices (Message 715)
Posted 3 Jan 2017 by Profile [B@P] Daniel
Thanks a ton!

So far I only did the test_run.sh and replaced the binary on one system (no completed WUs yet). The benchmark looks very promising though, wow!


root@odroidc2-1:~/rpi-boinc-ap/pc-boinc# ./test_run.sh
Running test with bin/pcv7:

real 0m23.641s
user 0m21.550s
sys 0m0.100s
Running test with bin/pcv8:

real 0m10.500s
user 0m8.430s
sys 0m0.070s


edit:

the DL link was no longer valid, I used https://raw.githubusercontent.com/sorcrosc/rpi-boinc-ap/master/TN-Grid/bin/pc_armv8-a.tgz to get the v8 binary...

Nice numbers :) BTW, you can get additional speed boost if you use my optimized code. I have created one binary for ARMv7, it is about 30% faster than original code.
71) Message boards : Number crunching : Optimization (Message 696)
Posted 29 Dec 2016 by Profile [B@P] Daniel
Hello,
Sorry for the question ... but I see no performance difference between AVX and FMA on an i7 4770K HT OFF / W7 Ultimate. Do you have any recommendation or is this normal ?
(Primegrid LLR WU's are using FMA3 I think, and it makes the CPU running at his highest perf.)
Thank You.
Philippe

No need for an answer. I saw your benchmark tests.

You should see some difference, as in benchmark results above. Did you try to run different app versions manually on test data, or you ran them on BOINC tasks? If the latter, please keep in mind that they wary in length, some of them complete faster some slower.

And one important note for 32-bit Windows apps: you need at least Windows 7 SP1 or Windows Server 2008 R2 if you want to use AVX and FMA versions. On Windows XP you will be able to run SSE and non-SIMD versions only. Here is list of OSes which supports AVX: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support.
72) Message boards : Number crunching : Optimization (Message 694)
Posted 29 Dec 2016 by Profile [B@P] Daniel

I do not have at this moment BOINC libs compiled for Linux 32 bit and for Windows. I will prepare Windows app later. Let me know if you need 32-bit app for Windows or Linux too, I wonder if someone would need it, let me know if you need one.


Yes, i have 5 hosts with Windows 32bit version.
If you can create 32bit app, it will be very cool.
Thanks for your optimization work


Here you may find the boinc api libraries (from the latest source code) for Linux x32 and x64 http://gene.disi.unitn.it/test/files/boinc_libs-x32-x64.7z

Thanks, but now I need ones for Windows :).

I have added 32-bit apps for windows, in 4 versions: without SIMD instructions (x87 FPU version), SSE2, AVX and FMA. They passed my small test, so they should give correct results. Let me know if they work for you.
73) Message boards : Number crunching : Optimization (Message 689)
Posted 26 Dec 2016 by Profile [B@P] Daniel
Hi all,
I have just added application for Linux ARM v7a vfpv4 (address is the same: https://bitbucket.org/sirzooro/pc-boinc/downloads). It is about 30% faster than original one. I tested it on my Odroid XU4 and runs fine. Unfortunately NEON instructions does not support double precision operations, so additional optimalization with vectorization is not possible for ARM. Maybe some future generations of ARM CPUs will allow this.
74) Message boards : Number crunching : Optimization (Message 686)
Posted 25 Dec 2016 by Profile [B@P] Daniel
Hello,

In order to download optimzed WU's, should the "api_version" be updated to the one running on each computer ?

Have tried today, but all WU's went ended in error after 1 sec.

Is the app_info.xml sufficient or is there another the file to add / update ?

You do not need to modify any file, you only need to stop BOINC, unpack provided files into project directory and start BOINC.

I checked your computers and looks that app could not start at all - there was error "Couldn't start app: CreateProcess() failed - Access refused.". Please check your antivirus, probably it blocks execution of this app.

BTW, there are no optimized WUs, only app is optimized to crunch them faster.

Is it possible to have a kind of "automatic" selection of the adequate WU's ? ie like Asteroids@Home (sse2/sse3/AVX - Please no FMA3 :) )

It is possible, application could check CPU and OS capabilities and select best algorithm for them. However this is more complicated, my goal was to release all versions as a separate apps and let users decide appropriate version(s).

NB 64 WU's still reported as running but not on my computer anylonger :/

Looks like some synchronization problem. It will either disappear soon, or these WUs will time out and will be sent again.
75) Message boards : Number crunching : Optimization (Message 678)
Posted 21 Dec 2016 by Profile [B@P] Daniel
My only slight problem is that one Windows machine has Symantec endpoint protection installed, and it's flagged pc.exe as a potential risk. It leaves it alone so I can continue to crunch, but it does not recognize it enough to know how to classify it. Every so often it will pop up a message about the file.

It should not complain about anything, this is a false positive. Just to be sure I checked all binaries using metascanners metadefender.com and virustotal.com. First one checked them using 41 antiviruses and found nothing. Second one checked with 55 and one of them (Baidu) also gave false alert. BTW, both of these sites uses Symantec scanner but it did not complained about anything.
76) Message boards : Number crunching : Optimization (Message 671)
Posted 20 Dec 2016 by Profile [B@P] Daniel
I have just uploaded Windows binaries to https://bitbucket.org/sirzooro/pc-boinc/downloads. In order to install them, please stop BOINC, extract files to <BOINC_Data_Dir>\projects\gene.disi.unitn.it_test\ and start BOINC again. Their speed is comparable with Linux ones. They have (Opti) appended to displayed name, so you will immediately see that you run them.

Path to <BOINC_Data_Dir> depends on Windows version:
Windows 2000/XP: C:\Documents and Settings\All Users\Application Data\BOINC\
Windows Vista/Windows 7/8/8.1: C:\ProgramData\BOINC\
Windows 10: C:\Users\All Users\BOINC\

This dir may be hidden by default. You can paste path to Windows Explorer address bar go go there directly.

I did some other benchmarks (same computer as before, Intel I7-4770k with hyper-threading enabled)


Impressive! This is on my AMD FX8320:

Thanks for these results! So looks that every new instruction set used improved performance a bit. Not on all CPUs, but it is still worth testing which version is the fastest one. WUs sent by server contains 50 tiles, so actual time improvement between versions will be about 50 times bigger.

I'm not able to obtain a result file to compare. Is there a trick?

Yes. By default output file is created only when running under BOINC control. You can also pass param "BOINC_STUB=1" to make, this also will enable this. App compiled in this way does not use BOINC libs, so cannot be used for normal crunching.

@valterc If you want to compile BOINC under MinGW, you probably will have to apply patch from https://github.com/BOINC/boinc/issues/1739. For reference, I compiled it from Cygwin 64 using following command, and then used "make <all params> install". You can do this too and them copy compiled libs to Linux. They will be fine for crosscompilation.

make -f Makefile.mingw CC="x86_64-w64-mingw32-gcc -m64" CXX="x86_64-w64-mingw32-g++ -m64" BOINC_PREFIX=./boinc64
77) Message boards : Number crunching : Optimization (Message 666)
Posted 20 Dec 2016 by Profile [B@P] Daniel
Try restarting boinc-client service, I had to to this on my CentOS. After doing this it should start using new app.
78) Message boards : Number crunching : Optimization (Message 662)
Posted 20 Dec 2016 by Profile [B@P] Daniel

Looking at your host I didn't find workunits marked as anonymous platform.
Did you copy all the files in the right place? Check app_info (not app_config)


BTW in the BOINC GUI manager I don't find a reference to a local app. is this correct. What I see is a standard "Gene Application 0.09" application

app_info file specifies user-friendly name for app, which is the same as for original app. Please check event log, somewhere at the beginning you should see line for TN-Grid app like "Found app_info.xml; using anonymous platform".

You can also check task list on your account here, in Application column you should see "Gene Network Application Unknown Platform (CPU)"

Flops parameter in app_info file is not correct for new app, I kept old one. BOINC will increase "percent done" value by 2%, remaining time will be adjusted as necessary.
79) Message boards : Number crunching : Optimization (Message 658)
Posted 20 Dec 2016 by Profile [B@P] Daniel
I downloaded the optimized version
extracted in the gene.... directory both files (app_config.xml and pc)
I issued a read config command
but everything went in error

My rig is an Intel i7 5960 Linux mint 17.3 64bit Boinc 7.2.42

Is something else to do ???

Please paste error message here, without it I can only guess what may be wrong.
80) Message boards : Number crunching : Optimization (Message 656)
Posted 20 Dec 2016 by Profile [B@P] Daniel
Apps are linked statically with glibc and libstdc++, so chance for incompatibilities should be limited.

That explicit reference to input data in app_info is in fact not needed, I will remove it.

Apps are compiled with gcc 4.8.5. After changing it to newer version app you can expect some extra speed gain, especially after enabling optimization for your CPU type. BTW, app code can be optimized further, so it could be even faster. I am going to spend some extra time on it later.

FMA can be considered an extension for AVX, so it probably loads CPU even more.

@valterc, could you test other app versions and post results here? I wonder how they perform on your CPU.


Previous 20 · Next 20

Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN