Posts by [B@P] Daniel
log in
1) Message boards : Number crunching : sse2 vs avx (Message 1448)
Posted 18 Dec 2018 by Profile [B@P] Daniel

This looks like another Ryzen bug. This time CPU also jumps to address in middle of instruction, what must end in crash sooner or later.

I have reported this bug on AMD forum. Here is link to my post, it should be visible soon when moderator approves it:
https://community.amd.com/message/2890585

The SSE2 problem for me (and Beyond) was only on the Ryzen 1700. The Ryzen 2700 is OK. Maybe you should amend your report?

Thanks for looking into this.

Interesting. I have followed link on task info page to get info about CPU and OS, so looks that sometimes crashes occur on 2700 too. Unfortunately today this task page is deleted so I cannot add link here. Anyway, I will update my report that bug happens mostly on 1700.
2) Message boards : Number crunching : sse2 vs avx (Message 1445)
Posted 17 Dec 2018 by Profile [B@P] Daniel
had to abort sse2 after 10 hours but with 442 days remaining. there were no other cpu tasks running other then this project. Looking HERE sse2 and avx had same problem but fma and an anon succeeded. wonder what the "anon" was.

"aborted by user" is not 'technically' an error, it's a user's choice. I agree that if I saw a workunit stuck at 5% with an estimated time for completion of days (even if the estimate were completely wrong) I would also be tempted to abort it. The 'problematic' behavior of the TCGA workunits doesn't depend on the version of the application.

The sse2 problem of some Ryzen cpu with the current application is a 'real' problem: the app crashes with an 'illegal instruction' error.

I just sent an e-mail to Daniel (the user who actually wrote the sse2 code) asking for hints.

Hello again, I was not here for long long time :)

This looks like another Ryzen bug. This time CPU also jumps to address in middle of instruction, what must end in crash sooner or later.

I have reported this bug on AMD forum. Here is link to my post, it should be visible soon when moderator approves it:
https://community.amd.com/message/2890585
3) Message boards : Number crunching : Optimization (Message 1227)
Posted 14 Dec 2017 by Profile [B@P] Daniel
How is your RAM-Speed?
On my R7 it makes a big differens between 2.133 MHz an 3.066 Mht (up to 30 minutes)


i have 1x8GB 2666mhz c16, maybe i try to overclock it

i have undersood the "problem": if i set CPU usage at 50% (or turning off SMT) the 6 simultaneus WUs take around 2400s to complete:mysteri solved

I don't understand why WUs don't speedup when cpu is at 3.8ghz

TN-Grid app is very memory-intensive. One person from my team wrote that on his Xeon 14c/28t (I do not know exact model, probably it is E5-2683 v3) 4 TN-Grid WUs consumed all available memory bandwidth. So when you hit this limit, increasing CPU speed will not help, it will faster wait for memory ;)

Edit: when you set CPU usage to 50%, app will be able to get data from memory faster (less apps will compete for the same limited bandwidth), every app instance could use more cache (additionally helping with loading data), plus CPU resources will not be shared between two apps (SMT/HT does not improve speed twice, usually it is much less).

If you want to improve speeds, use fastest possible memory, and overclock it if possible.


tanks Daniel

Thanks Daniel
how i can see memory bandwidth usage?
just for test i tried to run ram at 2133mhz and a Wu (with 12 thread) take around 4300s
Setting ram at 2933mhz Wu drops to 3700s

There is no tool to see it. Intel support also claims that this cannot be measured. That person found this by performing series of tests:
- when he decreased CPU clock, CPU usage raised to 100%;
- when he decreased memory clock, CPU usage dropped to 85%;
- when he added 3rd memory stick, CPU usage increased from 90% to 100%.
4) Message boards : Number crunching : Optimization (Message 1223)
Posted 12 Dec 2017 by Profile [B@P] Daniel
How is your RAM-Speed?
On my R7 it makes a big differens between 2.133 MHz an 3.066 Mht (up to 30 minutes)


i have 1x8GB 2666mhz c16, maybe i try to overclock it

i have undersood the "problem": if i set CPU usage at 50% (or turning off SMT) the 6 simultaneus WUs take around 2400s to complete:mysteri solved

I don't understand why WUs don't speedup when cpu is at 3.8ghz

TN-Grid app is very memory-intensive. One person from my team wrote that on his Xeon 14c/28t (I do not know exact model, probably it is E5-2683 v3) 4 TN-Grid WUs consumed all available memory bandwidth. So when you hit this limit, increasing CPU speed will not help, it will faster wait for memory ;)

Edit: when you set CPU usage to 50%, app will be able to get data from memory faster (less apps will compete for the same limited bandwidth), every app instance could use more cache (additionally helping with loading data), plus CPU resources will not be shared between two apps (SMT/HT does not improve speed twice, usually it is much less).

If you want to improve speeds, use fastest possible memory, and overclock it if possible.
5) Message boards : Number crunching : Server Status - No Work Available (Message 1216)
Posted 7 Dec 2017 by Profile [B@P] Daniel
The server status page is showing no work units available for download as of
Task data as of 6 Dec 2017, 23:46:47 UTC

I am assuming that would reflect all types of WU's for both Microsoft and Linux

WU Generator shows "running" ??

Thanks
Bill F

I have checked server status page and saw interesting thing:

Users registered in past 24 hours: 1
Computers registered in past 24 hours: 1717

Someone with big computing cloud has joined. No wonder that work queue is empty.
6) Message boards : Wish List : Future requests (Message 1199)
Posted 16 Nov 2017 by Profile [B@P] Daniel
Can we possibly make the latest apps from Daniel the official versions? Haven't they been out for about 6 months?

That's something I wanted to do after a system upgrade (operating system), I still haven't find the time (for instance, right now, we are focusing on the work generator) to do this... I'll try my best, maybe the next week...

Thanks, might bug you again next week :-)

Next week is here :-P

Thanks ;-). Just came to do the reminder too!

Yep, this is starting to become 'my apologies thread'.... Notice the 'maybe' in the former sentence ;) I'll wait until we have all the needed versions (the one missing is the one for MacOS)

>> I'll wait until we have all the needed versions (the one missing is the one for MacOS)

Daniel?

Unfortunately I do not have access to machine with MacOS, so I cannot help here. In the past MacOS apps were created by TN-Grid people.
7) Message boards : Number crunching : Optimization (Message 1157)
Posted 24 Oct 2017 by Profile [B@P] Daniel
I saw many users are running v 1.03
is faster than (Opti v.1.2) 1.02 developed by Daniel?


sorry, my bad. the v1.03 is only for linux

Official 1.0x apps are the same as Opti v.1.1 ones. Opti v.1.2 apps are not officially added yet.
8) Message boards : News : Another experiment on E. coli (Message 1155)
Posted 24 Oct 2017 by Profile [B@P] Daniel
Could I take a look on code of your work generator and some sample input data? I wonder if I could optimize it a bit.

Thank you for this, I will contact Francesco (who is the main author of the program) and let you know about his comments. Beware that the core of the program is written in Python, I'm planning to rewrite it in C++.

No problem, I know Python too :)

BTW, have you tried to use PyPy (https://pypy.org/) or something like this?
9) Message boards : News : Another experiment on E. coli (Message 1153)
Posted 24 Oct 2017 by Profile [B@P] Daniel
Could I take a look on code of your work generator and some sample input data? I wonder if I could optimize it a bit.
10) Message boards : Wish List : Future requests (Message 1144)
Posted 23 Oct 2017 by Profile [B@P] Daniel
Now the project is beta but, in the future, may be stable and public
So, these are usually requests from volunteers for all boinc projects:
1) A GPU client (better OpenCL, for all gpu cards)

We think about to this possibility when we started integrating the software with the BOINC API... Unfortunately our algorithm is not so much parallelizable, hence is not suitable for the GPU hardware. I don't think there will be a GPU version of our project.


I would appreciate if for future researches any kind of parallel algorithm would be considered, and if possible implemented using OpenCl

Actually, we have a slightly different variant of the (gene@home) algorithm that *may* be suitable for parallelization. The main problem, right now, is that no one here has the necessary cuda/opencl skills....


Small update from my side: I am working on OpenCL app. It turned out that for L=0 and L=1 graph edges must be processed in parallel (this part is ready). For higher L amount of per-edge work is higher, so app will process edges in a serial way, and parallelize work for every one of them. Also for higher L bandwidth of global GPU memory becomes major performance bottleneck, I am looking for solution for this problem. I will let you know when I will have something ready for public testing.
11) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1102)
Posted 29 Jul 2017 by Profile [B@P] Daniel
Looks that TN-Grid app causes some new bug on Ryzen CPUs, which is not fixed yet. I have created post on AMD forum to let them know about it: https://community.amd.com/message/2814366
12) Message boards : Number crunching : Optimization (Message 1070)
Posted 21 May 2017 by Profile [B@P] Daniel
I asked rattorosso [Marche] to create apps for remaining 3 ARM architectures: armv6_vfp, armv7_vfpv3 and aarch64. He send them to me, and I uploaded them in usual place: https://bitbucket.org/sirzooro/pc-boinc/downloads/. Feel free to download and test them too.

So....what is the difference between default app which i download automatic (for example gene_pcim_v1.02_win64__fma) and the same one fma from the link and the archive? If i saw right both are version 1.02? Which one i must use for better performance?

Both internally specify the same version 1.02, but version from this thread has extra optimizations so it runs faster than official one. Valterc is going to take it and release as a new version of official one at end of May.
13) Message boards : Number crunching : Optimization (Message 1060)
Posted 4 May 2017 by Profile [B@P] Daniel
Hi daniel, i have a question for you

on my i5-6400 win7 64bit i recieve both avx an sse2 WU
if i want instal your optimization v1.2 which version i need to copy in the project folder?
- TN-Grid.windows-x86-64-avx-v1.2
- TN-Grid.windows-x86-64-sse2-v1.2
can i install both?

Your CPU also supports FMA instructions, so you can also try FMA app version: http://www.cpu-world.com/CPUs/Core_i5/Intel-Core%20i5-6400.html. In general FMA app version should be faster than AVX, which is faster than SSE one. However on some CPUs FMA versions for some reason is a bit slower than AVX one, so please try both.

It is possible to install few versions, but you would have to rename pc.exe files and modify app_info.xml to specify all app versions with proper plan classes. Files prepared by me are configured to run single app version only.
14) Message boards : Number crunching : Optimization (Message 1039)
Posted 9 Apr 2017 by Profile [B@P] Daniel
Win10 X64
PC-IM v1.02 (sse2) 1600-1700 sec range with i7-5820k 4.2GHz
v1.2 SSE2 1750-1755 sec range

Linux X64 ubuntu 16.10LTS PC-IM v1.03 (fma) 2200-2250 sec average with xeon 2696 v3
v.1.2 (fma) 1940-1980 sec range

Same instruction to compare but different to system/os. So for these win took longer but linux shorter.

Thanks for these numbers. I did my tests on Windows using AVX version and it was faster for me. I suspect that new SSE version is slower. but I have to perform additional tests to confirm this. I will let you know when I will have some results.

I did extra benchmarks using 10 blocks from some VV WU instead of 1 like before. On my Windows machine new SSE app has results similar to AVX app. I also tried to benchmark 32-bit SSE app version and that one was slower than official SSE app. Maybe you downloaded 32-bit app instead of 64-bit one?
15) Message boards : Number crunching : Optimization (Message 1038)
Posted 9 Apr 2017 by Profile [B@P] Daniel
Win10 X64
PC-IM v1.02 (sse2) 1600-1700 sec range with i7-5820k 4.2GHz
v1.2 SSE2 1750-1755 sec range

Linux X64 ubuntu 16.10LTS PC-IM v1.03 (fma) 2200-2250 sec average with xeon 2696 v3
v.1.2 (fma) 1940-1980 sec range

Same instruction to compare but different to system/os. So for these win took longer but linux shorter.

Thanks for these numbers. I did my tests on Windows using AVX version and it was faster for me. I suspect that new SSE version is slower. but I have to perform additional tests to confirm this. I will let you know when I will have some results.
16) Message boards : Number crunching : Optimization (Message 1037)
Posted 9 Apr 2017 by Profile [B@P] Daniel
OK, I did that, but now get the message:

Message from server: Your app_info.xml file doesn't have a usable version of gene@home PC-IM.


EDIT:
Also, it won't download more work, since it says the computer has reached the daily quota of 1 task.

I suspect that problem is caused by WUs which you downloaded using official app, which are still considered as in progress. Please try to delete app_info.xml, restart BOINC, wait until BOINC will re-download all these WUs, then abort them all and install optimized app again. Before aborting tasks please also suspend project or set it to "no new tasks" to avoid downloading new WUs in place of aborted ones.

BTW, i7-4770 also supports AVX and FMA, you can try these app versions too.
17) Message boards : Number crunching : Optimization (Message 1034)
Posted 9 Apr 2017 by Profile [B@P] Daniel
New app version is ready! It is available at the same place as usual: https://bitbucket.org/sirzooro/pc-boinc/downloads/. In order to install it, do following steps:
- finish or abort all existing tasks (they will be aborted after install automatically);
- stop BOINC;
- unpack selected version to project's directory (path like C:\Users\All Users\BOINC\projects\gene.disi.unitn.it_test\ on Windows, and /var/lib/boinc-client/projects/gene.disi.unitn.it_test on Linux);
- start BOINC again
After doing this, app name should change to "Gene Network Application (Opti v1.2)". You should also see message "Found app_info.xml; using anonymous platform" in event log for TN-Grid project.

I did all that, using the SSE2 version for Linux on my i7-4770, and got that message on reboot. But I am getting only errors.
http://gene.disi.unitn.it/test/results.php?hostid=6148

Error is "Permission denied". You need to execute following commands from root account in project dir to set appropriate permissions. If you cannot switch to root account using "su -", add "sudo " before each command.

chmod 755 pc chown boinc.boinc pc


After you do this, app should start working. You do not need to restart BOINC again.
18) Message boards : Number crunching : Optimization (Message 1031)
Posted 9 Apr 2017 by Profile [B@P] Daniel
New app version is ready! It is available at the same place as usual: https://bitbucket.org/sirzooro/pc-boinc/downloads/. In order to install it, do following steps:
- finish or abort all existing tasks (they will be aborted after install automatically);
- stop BOINC;
- unpack selected version to project's directory (path like C:\Users\All Users\BOINC\projects\gene.disi.unitn.it_test\ on Windows, and /var/lib/boinc-client/projects/gene.disi.unitn.it_test on Linux);
- start BOINC again
After doing this, app name should change to "Gene Network Application (Opti v1.2)". You should also see message "Found app_info.xml; using anonymous platform" in event log for TN-Grid project.

This time I used Gray code (not Grey!) to optimize app. This code is a number sequence with special property: every two consecutive numbers differs by one bit only. This concept can be generalized in various ways. One of them are Gray code combinations, where every two consecutive subsets differs by one element only. Here is example of 3-combinations of 5 element set, generated in Gray code order:

1 2 3 1 2 4 1 3 4 2 3 4 2 3 5 1 3 5 1 2 5 1 4 5 2 4 5 3 4 5


TN-Grid Gene app uses combinations generator, so I decided to replace it with new Gray code combinations, and exploit its special property to recalculate only values which depends on changed element. By doing so I reduced total calculations time. Savings depends on maximum L value, and increases with it:
- some old organism stored as "test" data, max L=8: time reduced from 0.559s to 0.534s (4.4%);
- current organism (VV), max L=12: time reduced from 2.092s to 1.815s (13.2%);
- other old organism stored as "test2" data (it was probably ECM), max L=18: time reduced from 14.401s to 9.254s (35.7%).

If you are interested in algorithm details, you can check "Combinatorial Generation" by Frank Ruskey (page 129, algorithm 5.8), available at http://www.1stworks.com/ref/ruskeycombgen.pdf.

New app also checks if CPU supports required instruction set, and will exit with error message like "AVX instructions are not supported by your CPU!" if CPU will not support them.
19) Message boards : Number crunching : Optimization (Message 1022)
Posted 2 Apr 2017 by Profile [B@P] Daniel
Is a GPU version still under consideration? I get the impression that it would work, with all the programming talent that Daniel (and others) bring to the project, but there may not be enough work to support it.

Where are we on that?

Yes, I am still going to create it. But first I would like to release new version of CPU app, it is almost ready.
20) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1016)
Posted 28 Mar 2017 by Profile [B@P] Daniel
Any update on this? People on hwbot forum says that ASUS released new BIOS which resolved problem for them. Did you have change to test it, or one for your mainboard if available?


Next 20

Main page · Your account · Message boards


Copyright © 2019 CNR-TN & UniTN