log in |
Message boards : Number crunching : Gene application for GNU/Linux on ARM devices
1 · 2 · Next
Author | Message |
---|---|
I have compiled the application for ARM devices running GNU/Linux (Raspberry Pi and co.) git clone https://github.com/sorcrosc/rpi-boinc-ap Then go in gene_pc directory and run the script (stop boinc computation first): cd rpi-boinc-ap/gene_pc/
./test_run.sh This will give you a timed short run of all the three apps. Let me know if you see a noticeable difference ____________ | |
ID: 544 · Reply Quote | |
Only small differences on an Odroid C2 @ 1.75GHz... | |
ID: 619 · Reply Quote | |
Thank you for testing koschi. | |
ID: 622 · Reply Quote | |
Yep thanks, shortly after posting, I stumbled over the thread and managed to compile the source. Unfortunately the test run times are in the 8-9min range. They tend to get worse specifying matching -march and -mtune for the C2's A53 cores. | |
ID: 623 · Reply Quote | |
I used some -m flags but they don't make much difference. See the build scripts in my repo | |
ID: 624 · Reply Quote | |
Yep thanks, shortly after posting, I stumbled over the thread and managed to compile the source. Unfortunately the test run times are in the 8-9min range. They tend to get worse specifying matching -march and -mtune for the C2's A53 cores. Have you tried to use -march=native -mtune=native ? They tell gcc to check CPU it is running on and enable all supported features. On x86_64 CPUs this sets more flags than simply correct -march and -mtune, so ARM also may benefit from this. ____________ | |
ID: 628 · Reply Quote | |
run gcc -march=native -E -v - </dev/null 2>&1 | grep cc1 to see what flags gcc=native enable or disable These could be different from cpuinfo flags don't run --march=native for not ARMv7-?? models Cortex-M , Cortex-R and so on may have some different flags that could be not recognized from all ____________ Powered by Gentoo Linux Kernel : 4.4.26-gentoo KDE 16.04.3 | |
ID: 629 · Reply Quote | |
Hi, yes I was definitely using march=native, mtune=native - not sure... root@odroidc2-1:~# gcc -march=native -E -v - </dev/null 2>&1 | grep cc1 I had used the compile script linux64_build.sh included in https://bitbucket.org/francesco-asnicar/pc-boinc/, it completed, but produced slow executables. Your ./linuxarmv7_build.sh I will have to adjust for the new compiler, remove -mfpu (which the aarch64 GCC doesn't understand), lets see what else, this isn't exactly my field of excellence ;-) | |
ID: 630 · Reply Quote | |
I added right now an armv8 application. This should work in 64 bit only os. | |
ID: 632 · Reply Quote | |
Thanks a ton!
edit: the DL link was no longer valid, I used https://raw.githubusercontent.com/sorcrosc/rpi-boinc-ap/master/TN-Grid/bin/pc_armv8-a.tgz to get the v8 binary... | |
ID: 714 · Reply Quote | |
Thanks a ton! Nice numbers :) BTW, you can get additional speed boost if you use my optimized code. I have created one binary for ARMv7, it is about 30% faster than original code. ____________ | |
ID: 715 · Reply Quote | |
I'm already giving that a try on another C2, thanks ;-) | |
ID: 716 · Reply Quote | |
I'm replacing all versions with others based on your code, Daniel. Only the armv6 is missing now. | |
ID: 717 · Reply Quote | |
Ok good, so I was using the old test data... | |
ID: 718 · Reply Quote | |
With the new ARMv7 app run times on my RPi3 dropped from 27k to 23k seconds... | |
ID: 722 · Reply Quote | |
Here are all tarballs with new application based on Daniel optimized code and app_info included: git clone https://github.com/sorcrosc/rpi-boinc-ap Then cd to the bin directory and untar one by one the apps you want to test cd rpi-boinc-ap/TN-Grid/bin
tar -xzf pc_armv7_vfpv3.tgz
tar -xzf pc_armv7_vfpv3.tgz
tar -xzf ..... Go to the upper directory and run the test script (stop boinc computation first). It should take 5-10 minutes every app. cd ..
./test_run.sh Further info: I crosscompiled all the apps with latest gcc 6.2 release from Linaro here. Fresh and ready to use in case project admin want to look in to it ;) armv8 with aarch64-linux-gnu armv7 with arm-linux-gnueabihf armv6 like arm-linux-gnueabihf but I recompiled it through crosstool-ng because released binaries are configured for armv7. Maybe it doesn't worth the pain because I am the only one who still use the first old Raspberry Pi 1 here :) ____________ | |
ID: 725 · Reply Quote | |
I have just read that AARCH64 CPUs has new NEON SIMD instructions with double precision support, so it should be possible to get additional speed boost by using them. Probably it is time to get some Odroid C2 and play with it a bit :) | |
ID: 726 · Reply Quote | |
I have just read that AARCH64 CPUs has new NEON SIMD instructions with double precision support, so it should be possible to get additional speed boost by using them. Probably it is time to get some Odroid C2 and play with it a bit :) I like this :) | |
ID: 727 · Reply Quote | |
I have just read that AARCH64 CPUs has new NEON SIMD instructions with double precision support, so it should be possible to get additional speed boost by using them. Probably it is time to get some Odroid C2 and play with it a bit :) As a C2 fanboy, I approve of this ;-) If you have troubles obtaining one, I might also be able to grant you access to one of mine... | |
ID: 728 · Reply Quote | |
I have just read that AARCH64 CPUs has new NEON SIMD instructions with double precision support, so it should be possible to get additional speed boost by using them. Probably it is time to get some Odroid C2 and play with it a bit :) The Odroid C2 is a fantastic product. I love mine. Well made, well supported, solid performer. But there may be better AARCH64 SBC's if you can only have one. My main objection to the C2 is the lack of AES instructions. It only has the following extension: fp asimd crc32 As for TN-Grid, here are tests on a C2. me@odroid-c2:~/TN-Grid$ ./test_run.sh
-> pc_armv6zk_vfp
real 5m2.251s
user 4m51.740s
sys 0m0.080s
-> pc_armv7_vfpv3
real 4m35.022s
user 4m32.840s
sys 0m0.080s
-> pc_armv7_vfpv4
real 4m39.926s
user 4m37.720s
sys 0m0.100s
-> pc_armv8-a
real 5m14.590s
user 5m12.330s
sys 0m0.100s
The version compiled for the armv7 vfpv3 architecture is a bit faster than the armv8 version. A less expensive alternative that does have the aes instructions is the pine64 board. It has the following extension: fp asimd aes pmull sha1 sha2 crc32. My main issue with it is cooling as it can not run flat out without overheating. A simple stick-on heat sink from a RPI helps some but not enough. It is also a bit slower than the C2. Performance on it looks like: ubuntu@pine64:~/boinc/samples/TN-Grid$ ./test_run.sh
-> pc_armv6zk_vfp
real 6m31.015s
user 6m22.620s
sys 0m0.150s
-> pc_armv7_vfpv3
real 6m8.191s
user 5m59.540s
sys 0m0.440s
-> pc_armv7_vfpv4
real 6m12.538s
user 6m4.400s
sys 0m0.100s
-> pc_armv8-a
real 6m59.002s
user 6m50.040s
sys 0m0.210s
with vfp3 enjoying a slight advantage as well. | |
ID: 729 · Reply Quote | |
Message boards :
Number crunching :
Gene application for GNU/Linux on ARM devices