log in |
61)
Message boards :
Number crunching :
Optimization
(Message 761)
Posted 14 Jan 2017 by [B@P] Daniel
valters, I tried to google for "linux mac cross compiler" and found few interesting discussions on StackOverflow on this topic. Looks that there is such crosscompiler ready to use (see https://stackoverflow.com/a/10341443). There is even VM with Apple's system, although I am not sure what their legal status is. |
62)
Message boards :
Number crunching :
Optimization
(Message 753)
Posted 13 Jan 2017 by [B@P] Daniel I checked your computers and found few validations errors, but no error reported by app or BOINC client that something was wrong. Maybe these are caused by that mysterious error mentioned by valterc before? I also saw this few times but on SSE version. That was on machine running 24/7 and BOINC configured with long task switch time, so these WUs were crunched from start to end without interruptions. One of them was also crunched by someone with my app and that for that person it was validated successfully, so it is even more interesting. Yeah, power outage is a problem for many BOINC apps, usually they assume they will be able to always write checkpoint successfully, and do not take into account that this may be interrupted by power outage or another sudden app termination. You mentioned errors with AVX app. Are they problems with starting/running app, or validation errors? If with starting/running, could you provide me link to example failed WU? I would like to check if there are some details which may be helpful. |
63)
Message boards :
Number crunching :
Optimization
(Message 751)
Posted 13 Jan 2017 by [B@P] Daniel So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one... I checked your computers and found few validations errors, but no error reported by app or BOINC client that something was wrong. Maybe these are caused by that mysterious error mentioned by valterc before? I also saw this few times but on SSE version. That was on machine running 24/7 and BOINC configured with long task switch time, so these WUs were crunched from start to end without interruptions. One of them was also crunched by someone with my app and that for that person it was validated successfully, so it is even more interesting. |
64)
Message boards :
Number crunching :
Optimization
(Message 748)
Posted 13 Jan 2017 by [B@P] Daniel So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one... Do you have SP1 installed? AVX support was added in it. |
65)
Message boards :
Number crunching :
Optimization
(Message 745)
Posted 13 Jan 2017 by [B@P] Daniel So, with sse2, avx, fma, any modern computer will get the three applications and eventually decide which one is the best one... What OS do you use? AVX needs support on OS side too. List of supported OSes is here: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support FMA version is in fact FMA+AVX, so it also needs such OS support. |
66)
Message boards :
Number crunching :
Optimization
(Message 740)
Posted 12 Jan 2017 by [B@P] Daniel I'm just thinking about strategies for deploying the new versions of the application. Some thoughts: Good news :) Few comments for this: - stats on downloads page shows that 32-bit windows non-SSE version of my app was downloaded 12 times, so there is some need for it. You can also decide to provide this version later if someone will ask for it; - FMA should be OK too. It should be sent to hosts which supports FMA3 instruction set; - I am not sure if there is come crosscompiler ready. If Mac header files are available somewhere, you can try to build crosscompiler (crosstool package will be your friend); - you need at lest two, arm-unknown-linux-gnueabihf and aarch64-unknown-linux-gnu (for 32 and 64 bit ARMs). There are 3 versions of 32-bit ARM app, so plan classes also will be needed. Supported FPU instruction set should be sent to server in similar way as for x86 CPUs. - some projects try to send few app versions to client to gather some benchmarks and choose the fastest one. This is probably standard BOINC server feature. This would be good to use here, to check if AVX app is faster than SSE2, people reported mixed results for these apps. FMA app was always faster than AVX, but it may be worthwhile to benchmark it against SSE. |
67)
Message boards :
Number crunching :
Optimization
(Message 738)
Posted 11 Jan 2017 by [B@P] Daniel Hi Daniel, Thanks for explanation. I will take a closer look on linked paper. |
68)
Message boards :
Number crunching :
Gene application for GNU/Linux on ARM devices
(Message 730)
Posted 6 Jan 2017 by [B@P] Daniel I have just read that AARCH64 CPUs has new NEON SIMD instructions with double precision support, so it should be possible to get additional speed boost by using them. Probably it is time to get some Odroid C2 and play with it a bit :) Hardkernel site lists 3 distributors in Poland, so I can buy one quite easily. I think I will order one this month :) |
69)
Message boards :
Number crunching :
Gene application for GNU/Linux on ARM devices
(Message 726)
Posted 5 Jan 2017 by [B@P] Daniel I have just read that AARCH64 CPUs has new NEON SIMD instructions with double precision support, so it should be possible to get additional speed boost by using them. Probably it is time to get some Odroid C2 and play with it a bit :) |
70)
Message boards :
Number crunching :
Gene application for GNU/Linux on ARM devices
(Message 715)
Posted 3 Jan 2017 by [B@P] Daniel Thanks a ton! Nice numbers :) BTW, you can get additional speed boost if you use my optimized code. I have created one binary for ARMv7, it is about 30% faster than original code. |
71)
Message boards :
Number crunching :
Optimization
(Message 696)
Posted 29 Dec 2016 by [B@P] Daniel Hello, You should see some difference, as in benchmark results above. Did you try to run different app versions manually on test data, or you ran them on BOINC tasks? If the latter, please keep in mind that they wary in length, some of them complete faster some slower. And one important note for 32-bit Windows apps: you need at least Windows 7 SP1 or Windows Server 2008 R2 if you want to use AVX and FMA versions. On Windows XP you will be able to run SSE and non-SIMD versions only. Here is list of OSes which supports AVX: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support. |
72)
Message boards :
Number crunching :
Optimization
(Message 694)
Posted 29 Dec 2016 by [B@P] Daniel
Thanks, but now I need ones for Windows :). I have added 32-bit apps for windows, in 4 versions: without SIMD instructions (x87 FPU version), SSE2, AVX and FMA. They passed my small test, so they should give correct results. Let me know if they work for you. |
73)
Message boards :
Number crunching :
Optimization
(Message 689)
Posted 26 Dec 2016 by [B@P] Daniel Hi all, I have just added application for Linux ARM v7a vfpv4 (address is the same: https://bitbucket.org/sirzooro/pc-boinc/downloads). It is about 30% faster than original one. I tested it on my Odroid XU4 and runs fine. Unfortunately NEON instructions does not support double precision operations, so additional optimalization with vectorization is not possible for ARM. Maybe some future generations of ARM CPUs will allow this. |
74)
Message boards :
Number crunching :
Optimization
(Message 686)
Posted 25 Dec 2016 by [B@P] Daniel Hello, You do not need to modify any file, you only need to stop BOINC, unpack provided files into project directory and start BOINC. I checked your computers and looks that app could not start at all - there was error "Couldn't start app: CreateProcess() failed - Access refused.". Please check your antivirus, probably it blocks execution of this app. BTW, there are no optimized WUs, only app is optimized to crunch them faster. Is it possible to have a kind of "automatic" selection of the adequate WU's ? ie like Asteroids@Home (sse2/sse3/AVX - Please no FMA3 :) ) It is possible, application could check CPU and OS capabilities and select best algorithm for them. However this is more complicated, my goal was to release all versions as a separate apps and let users decide appropriate version(s). NB 64 WU's still reported as running but not on my computer anylonger :/ Looks like some synchronization problem. It will either disappear soon, or these WUs will time out and will be sent again. |
75)
Message boards :
Number crunching :
Optimization
(Message 678)
Posted 21 Dec 2016 by [B@P] Daniel My only slight problem is that one Windows machine has Symantec endpoint protection installed, and it's flagged pc.exe as a potential risk. It leaves it alone so I can continue to crunch, but it does not recognize it enough to know how to classify it. Every so often it will pop up a message about the file. It should not complain about anything, this is a false positive. Just to be sure I checked all binaries using metascanners metadefender.com and virustotal.com. First one checked them using 41 antiviruses and found nothing. Second one checked with 55 and one of them (Baidu) also gave false alert. BTW, both of these sites uses Symantec scanner but it did not complained about anything. |
76)
Message boards :
Number crunching :
Optimization
(Message 671)
Posted 20 Dec 2016 by [B@P] Daniel I have just uploaded Windows binaries to https://bitbucket.org/sirzooro/pc-boinc/downloads. In order to install them, please stop BOINC, extract files to <BOINC_Data_Dir>\projects\gene.disi.unitn.it_test\ and start BOINC again. Their speed is comparable with Linux ones. They have (Opti) appended to displayed name, so you will immediately see that you run them. Path to <BOINC_Data_Dir> depends on Windows version: Windows 2000/XP: C:\Documents and Settings\All Users\Application Data\BOINC\ Windows Vista/Windows 7/8/8.1: C:\ProgramData\BOINC\ Windows 10: C:\Users\All Users\BOINC\ This dir may be hidden by default. You can paste path to Windows Explorer address bar go go there directly. I did some other benchmarks (same computer as before, Intel I7-4770k with hyper-threading enabled) Impressive! This is on my AMD FX8320: Thanks for these results! So looks that every new instruction set used improved performance a bit. Not on all CPUs, but it is still worth testing which version is the fastest one. WUs sent by server contains 50 tiles, so actual time improvement between versions will be about 50 times bigger. I'm not able to obtain a result file to compare. Is there a trick? Yes. By default output file is created only when running under BOINC control. You can also pass param "BOINC_STUB=1" to make, this also will enable this. App compiled in this way does not use BOINC libs, so cannot be used for normal crunching. @valterc If you want to compile BOINC under MinGW, you probably will have to apply patch from https://github.com/BOINC/boinc/issues/1739. For reference, I compiled it from Cygwin 64 using following command, and then used "make <all params> install". You can do this too and them copy compiled libs to Linux. They will be fine for crosscompilation. make -f Makefile.mingw CC="x86_64-w64-mingw32-gcc -m64" CXX="x86_64-w64-mingw32-g++ -m64" BOINC_PREFIX=./boinc64 |
77)
Message boards :
Number crunching :
Optimization
(Message 666)
Posted 20 Dec 2016 by [B@P] Daniel Try restarting boinc-client service, I had to to this on my CentOS. After doing this it should start using new app. |
78)
Message boards :
Number crunching :
Optimization
(Message 662)
Posted 20 Dec 2016 by [B@P] Daniel
app_info file specifies user-friendly name for app, which is the same as for original app. Please check event log, somewhere at the beginning you should see line for TN-Grid app like "Found app_info.xml; using anonymous platform". You can also check task list on your account here, in Application column you should see "Gene Network Application Unknown Platform (CPU)" Flops parameter in app_info file is not correct for new app, I kept old one. BOINC will increase "percent done" value by 2%, remaining time will be adjusted as necessary. |
79)
Message boards :
Number crunching :
Optimization
(Message 658)
Posted 20 Dec 2016 by [B@P] Daniel I downloaded the optimized version Please paste error message here, without it I can only guess what may be wrong. |
80)
Message boards :
Number crunching :
Optimization
(Message 656)
Posted 20 Dec 2016 by [B@P] Daniel Apps are linked statically with glibc and libstdc++, so chance for incompatibilities should be limited. That explicit reference to input data in app_info is in fact not needed, I will remove it. Apps are compiled with gcc 4.8.5. After changing it to newer version app you can expect some extra speed gain, especially after enabling optimization for your CPU type. BTW, app code can be optimized further, so it could be even faster. I am going to spend some extra time on it later. FMA can be considered an extension for AVX, so it probably loads CPU even more. @valterc, could you test other app versions and post results here? I wonder how they perform on your CPU. |