Posts by koschi
log in
1) Message boards : Number crunching : Optimization (Message 1068)
Posted 20 May 2017 by koschi
The server initially sends you all 3 flavors, after you return some results for each it decides which is the best for you and would usually send you WUs tagged for just one app version.

Please note, those are the default applications, not the latest updated ones by Daniel. Check back at http://gene.disi.unitn.it/test/forum_thread.php?id=135&postid=1031#1031 and download a copy from his link. If you have AVX or FMA capable CPUs, head straight for those apps. Each archive contains an app_info.xml, so once activated you get only WUs for that application.
2) Message boards : Number crunching : Optimization (Message 1064)
Posted 15 May 2017 by koschi
Did you restart the BOINC Client after placing the files? Otherwise these won't be recognized...
3) Message boards : Number crunching : Optimization (Message 1045)
Posted 10 Apr 2017 by koschi
Ryzen R7 @ 3.6GHz with Ubuntu
bin/pc_x86-64-avx-v1.2 input/tile2.txt output/output2.txt 0.05 1 2470 0 real 0m10.033s user 0m8.016s sys 0m0.028s bin/pc_x86-64-fma-v1.2 input/tile2.txt output/output2.txt 0.05 1 2470 0 real 0m9.828s user 0m7.824s sys 0m0.012s bin/pc_x86-64-sse2-v1.2 input/tile2.txt output/output2.txt 0.05 1 2470 0 real 0m11.075s user 0m9.068s sys 0m0.016s


Lets see what times that results in with FMA...
4) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1015)
Posted 23 Mar 2017 by koschi
Thanks for the info ;-)
Loved to have blame it on Microsoft :-D
5) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1013)
Posted 23 Mar 2017 by koschi
I suspect the MS compilers can't compile error free FMA binaries for Ryzen, something like that.

That FMA3 problem found in Ryzen, can be triggered by running a program called "flops" under Windows (compiled with MS compiler). It does not trigger under Linux (using GCC). I understood from valterc writing in the "Optimizing" thread that you are usually using MSVC to build the binary.
Parts of the optimization was then achieved by Daniel compiling the old app with Mingw/GCC (of course plus further mods down the line).

My question is, are you still using MSVC to build the new app?

Is anyone with a Windows Ryzen System able to run the old FMA app (big output file) with an old WU? I packaged that again (66MB!!!), it's all included in the below file:

http://kerbodyne.com/boinc/tngrid_gene_vv.zip

You should be able to run the test like this, analogue to how I would do it under Linux:
pc.exe 1488394767724_wu-85_tile.txt out_file 0.05 1 701


If this still works, it would hint at the MS compiler. Maybe Daniel can then just recompile the current code with mingw/GCC and Windows users will be able to run TN-Grid FMA apps on Ryzen again.
6) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1011)
Posted 22 Mar 2017 by koschi
The previous Windows FMA app (still with large output file), was that provided by Daniel, created with MingW/GCC?
7) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1005)
Posted 21 Mar 2017 by koschi
On Linux (4.10) the FMA app works fine on my R7.

http://gene.disi.unitn.it/test/results.php?hostid=2506&offset=140&show_names=1&state=4&appid=

Seems a tiny bit (30s) slower than the AVX app though.
8) Message boards : Number crunching : gene_pcim v1.02 (Message 962)
Posted 14 Mar 2017 by koschi
Overhaul completed, congrats :-)

Worked fine on Ubuntu 16.04, no issues...

Will the crediting remain like this?
9) Message boards : Number crunching : Output file size (and plans for the future) (Message 908)
Posted 12 Feb 2017 by koschi
GPUGrid uploads ~150MB per large WU, but these run ~16 hours on a GTX 970 or 1060, so its most likely less than what TNGrid comes up with over the same period on 8 threads.

If you can sort out the server side capacity issues, with a stable 100/10mbit connection I'd be opting in into a large transfer queue as well...
10) Message boards : Number crunching : don't get any wus on android (Message 900)
Posted 10 Feb 2017 by koschi
This project does not provide an Android App, hence you get no WUs.
http://gene.disi.unitn.it/test/apps.php
11) Message boards : Number crunching : Optimization (Message 813)
Posted 25 Jan 2017 by koschi
Thanks Daniel, the progress and explanations how you reached these are pretty amazing!
12) Message boards : Number crunching : Optimization (Message 798)
Posted 23 Jan 2017 by koschi
Yep thanks, out of laziness I was reusing my test_run.sh that I had adjusted to loop through all pc* in bin. With test_run2.sh the change is quite dramatic!

root@odroidc2-1:~/BOINC_dev/boinc/samples/pc-boinc# ./test_run2.sh
bin/pc_armv7a-vfpv4-v1.1 input/tile2.txt output/output2.txt 0.05 1 2470
Loading: 0.831
computeStandardDeviations: 0.003
computeCorrelations: 0.369
pcAlgorithm, l 0: 0.001
pcAlgorithm, l 1: 0.064
pcAlgorithm, l 2: 0.893
pcAlgorithm, l 3: 4.866
pcAlgorithm, l 4: 16.922
pcAlgorithm, l 5: 23.217
pcAlgorithm, l 6: 22.773
pcAlgorithm, l 7: 17.738
pcAlgorithm, l 8: 16.013
pcAlgorithm, l 9: 10.758
pcAlgorithm, l 10: 6.917
pcAlgorithm, l 11: 3.896
pcAlgorithm, l 12: 2.017
pcAlgorithm, l 13: 0.736
pcAlgorithm, l 14: 0.205
pcAlgorithm, l 15: 0.041
pcAlgorithm, l 16: 0.005
pcAlgorithm, l 17: 0.000
pcAlgorithm, l 18: 0.000

real 2m10.423s
user 2m8.150s
sys 0m0.120s
diff: output/output2.txt: No such file or directory
#######################################################################

bin/pc_armv8-v0.9 input/tile2.txt output/output2.txt 0.05 1 2470

real 3m48.623s
user 3m46.260s
sys 0m0.110s
diff: output/output2.txt: No such file or directory
#######################################################################

bin/pc_armv8-v1.1 input/tile2.txt output/output2.txt 0.05 1 2470
Loading: 0.466
computeStandardDeviations: 0.003
computeCorrelations: 0.384
pcAlgorithm, l 0: 0.001
pcAlgorithm, l 1: 0.047
pcAlgorithm, l 2: 1.054
pcAlgorithm, l 3: 4.910
pcAlgorithm, l 4: 12.164
pcAlgorithm, l 5: 18.240
pcAlgorithm, l 6: 17.246
pcAlgorithm, l 7: 13.092
pcAlgorithm, l 8: 11.164
pcAlgorithm, l 9: 7.474
pcAlgorithm, l 10: 4.813
pcAlgorithm, l 11: 2.743
pcAlgorithm, l 12: 1.423
pcAlgorithm, l 13: 0.520
pcAlgorithm, l 14: 0.146
pcAlgorithm, l 15: 0.030
pcAlgorithm, l 16: 0.004
pcAlgorithm, l 17: 0.000
pcAlgorithm, l 18: 0.000

real 1m37.931s
user 1m35.870s
sys 0m0.060s
diff: output/output2.txt: No such file or directory


A saving of 57%, or the app is 2.33x as fast as the ARMv8-v0.9 app.
Should complete a WU in ~3.5h, amazing!
13) Message boards : Number crunching : Optimization (Message 795)
Posted 23 Jan 2017 by koschi
Odroid C2 1.75GHz, 1104MHz RAM

root@odroidc2-1:~/BOINC_dev/boinc/samples/pc-boinc# ./test_run.sh
Running bin/pc_armv7a-vfpv4-v1.1 -
Loading: 0.601
computeStandardDeviations: 0.002
computeCorrelations: 1.436
pcAlgorithm, l 0: 0.031
pcAlgorithm, l 1: 2.451
pcAlgorithm, l 2: 0.894
pcAlgorithm, l 3: 0.096
pcAlgorithm, l 4: 0.041
pcAlgorithm, l 5: 0.013
pcAlgorithm, l 6: 0.003
pcAlgorithm, l 7: 0.000
pcAlgorithm, l 8: 0.000

real 0m7.615s
user 0m5.520s
sys 0m0.080s


Running bin/pc_armv8-v0.9 -

real 0m10.489s
user 0m8.430s
sys 0m0.070s



Should complete a WU in a bit over 5 hours. Not bad against the ARMv8 app I was running before (7.5-8h)...



Running bin/pc_armv8-a -
Loading: 0.376
computeStandardDeviations: 0.003
computeCorrelations: 1.442
pcAlgorithm, l 0: 0.023
pcAlgorithm, l 1: 1.815
pcAlgorithm, l 2: 0.856
pcAlgorithm, l 3: 0.084
pcAlgorithm, l 4: 0.030
pcAlgorithm, l 5: 0.010
pcAlgorithm, l 6: 0.002
pcAlgorithm, l 7: 0.000
pcAlgorithm, l 8: 0.000

real 0m6.667s
user 0m4.600s
sys 0m0.070s


A lovely 37% gain over the previous ARMv8 app :-D

The ARMv7 vfp4 app works on my Rpi 3.
14) Message boards : Number crunching : Optimization (Message 792)
Posted 23 Jan 2017 by koschi
Odroid C2 1.75GHz, 1104MHz RAM

root@odroidc2-1:~/BOINC_dev/boinc/samples/pc-boinc# ./test_run.sh Running bin/[b]pc_armv7a-vfpv4-v1.1[/b] - Loading: 0.601 computeStandardDeviations: 0.002 computeCorrelations: 1.436 pcAlgorithm, l 0: 0.031 pcAlgorithm, l 1: 2.451 pcAlgorithm, l 2: 0.894 pcAlgorithm, l 3: 0.096 pcAlgorithm, l 4: 0.041 pcAlgorithm, l 5: 0.013 pcAlgorithm, l 6: 0.003 pcAlgorithm, l 7: 0.000 pcAlgorithm, l 8: 0.000 [b]real 0m7.615s[/b] user 0m5.520s sys 0m0.080s Running bin/[b]pc_armv8-v0.9[/b] - real 0m10.489s user 0m8.430s sys 0m0.070s



Should complete a WU in a bit over 5 hours. Not bad against the ARMv8 app I was running before (7.5-8h)...
15) Message boards : Number crunching : Optimization (Message 782)
Posted 23 Jan 2017 by koschi
Thanks Daniel,
runtime is down from ~93 minutes to 53 minutes on my i7 3770.

Amazing work!
16) Message boards : Number crunching : Gene application for GNU/Linux on ARM devices (Message 732)
Posted 7 Jan 2017 by koschi
The C2 should have AES as per the S905 data sheet.
http://dn.odroid.com/S905/DataSheet/S905_Public_Datasheet_V1.1.4.pdf

Unfortunately some cleanup in /proc/cpuinfo was done, not all CPU features are exposed in the cpuinfo file any longer, I learned that here: http://forum.odroid.com/viewtopic.php?f=136&t=23101

I also rerun the test_run.sh with the new test data, armv8-a remains the fastest for me.
root@odroidc2-8:~/rpi-boinc-ap/TN-Grid# ./test_run.sh -> pc_armv6zk_vfp real 4m28.778s user 4m26.660s sys 0m0.120s -> pc_armv7_vfpv3 real 4m11.586s user 4m9.490s sys 0m0.120s -> pc_armv7_vfpv4 real 4m16.049s user 4m13.950s sys 0m0.120s -> pc_armv8-a real 3m58.114s user 3m56.070s sys 0m0.080s -> pc_armv8-a_current (copied the pc from the project directory, just to cross check) real 3m57.975s user 3m55.880s sys 0m0.120s


Odroid C2 @ 1.68GHz & 1104MHz RAM

If your C2 is not overclocked yet, have a look at http://forum.odroid.com/viewtopic.php?t=23044
5 of mine run at 1.75GHz, 3 at 1.68GHz, always worth a try ;-)


edit:

Odroid C2 @ 1.75GHz & 1104 RAM
-> pc_armv6zk_vfp real 4m17.625s user 4m15.540s sys 0m0.090s -> pc_armv7_vfpv3 real 4m1.116s user 3m59.060s sys 0m0.060s -> pc_armv7_vfpv4 real 4m5.364s user 4m3.260s sys 0m0.100s -> pc_armv8-a real 3m47.986s user 3m45.940s sys 0m0.050s
17) Message boards : Number crunching : Gene application for GNU/Linux on ARM devices (Message 728)
Posted 6 Jan 2017 by koschi
I have just read that AARCH64 CPUs has new NEON SIMD instructions with double precision support, so it should be possible to get additional speed boost by using them. Probably it is time to get some Odroid C2 and play with it a bit :)


As a C2 fanboy, I approve of this ;-)

If you have troubles obtaining one, I might also be able to grant you access to one of mine...
18) Message boards : Number crunching : Gene application for GNU/Linux on ARM devices (Message 722)
Posted 4 Jan 2017 by koschi
With the new ARMv7 app run times on my RPi3 dropped from 27k to 23k seconds...
19) Message boards : Number crunching : Gene application for GNU/Linux on ARM devices (Message 718)
Posted 3 Jan 2017 by koschi
Ok good, so I was using the old test data...

edit:
Those WUs returned after 1pm UTC on 3rd of January are done purely with the 64bit app:
http://gene.disi.unitn.it/test/results.php?hostid=3074&offset=0&show_names=0&state=4&appid=

Looks like a reduction from 5 to 4 hours...
20) Message boards : Number crunching : Gene application for GNU/Linux on ARM devices (Message 716)
Posted 3 Jan 2017 by koschi
I'm already giving that a try on another C2, thanks ;-)

The 64bit WU run times seem not to improve that much unfortunately. While that hosts previous average was 18072 seconds/WU, final run time with the 64bit app should still be well over 14000 seconds.
The "benchmark" suggest a 55% run time reduction though.

Will report back once the units complete and validate.


Next 20

Main page · Your account · Message boards


Copyright © 2019 CNR-TN & UniTN