Posts by Daniel
log in
1) Message boards : Number crunching : Optimization (Message 1227)
Posted 4 days ago by Profile Daniel
How is your RAM-Speed?
On my R7 it makes a big differens between 2.133 MHz an 3.066 Mht (up to 30 minutes)


i have 1x8GB 2666mhz c16, maybe i try to overclock it

i have undersood the "problem": if i set CPU usage at 50% (or turning off SMT) the 6 simultaneus WUs take around 2400s to complete:mysteri solved

I don't understand why WUs don't speedup when cpu is at 3.8ghz

TN-Grid app is very memory-intensive. One person from my team wrote that on his Xeon 14c/28t (I do not know exact model, probably it is E5-2683 v3) 4 TN-Grid WUs consumed all available memory bandwidth. So when you hit this limit, increasing CPU speed will not help, it will faster wait for memory ;)

Edit: when you set CPU usage to 50%, app will be able to get data from memory faster (less apps will compete for the same limited bandwidth), every app instance could use more cache (additionally helping with loading data), plus CPU resources will not be shared between two apps (SMT/HT does not improve speed twice, usually it is much less).

If you want to improve speeds, use fastest possible memory, and overclock it if possible.


tanks Daniel

Thanks Daniel
how i can see memory bandwidth usage?
just for test i tried to run ram at 2133mhz and a Wu (with 12 thread) take around 4300s
Setting ram at 2933mhz Wu drops to 3700s

There is no tool to see it. Intel support also claims that this cannot be measured. That person found this by performing series of tests:
- when he decreased CPU clock, CPU usage raised to 100%;
- when he decreased memory clock, CPU usage dropped to 85%;
- when he added 3rd memory stick, CPU usage increased from 90% to 100%.
2) Message boards : Number crunching : Optimization (Message 1223)
Posted 6 days ago by Profile Daniel
How is your RAM-Speed?
On my R7 it makes a big differens between 2.133 MHz an 3.066 Mht (up to 30 minutes)


i have 1x8GB 2666mhz c16, maybe i try to overclock it

i have undersood the "problem": if i set CPU usage at 50% (or turning off SMT) the 6 simultaneus WUs take around 2400s to complete:mysteri solved

I don't understand why WUs don't speedup when cpu is at 3.8ghz

TN-Grid app is very memory-intensive. One person from my team wrote that on his Xeon 14c/28t (I do not know exact model, probably it is E5-2683 v3) 4 TN-Grid WUs consumed all available memory bandwidth. So when you hit this limit, increasing CPU speed will not help, it will faster wait for memory ;)

Edit: when you set CPU usage to 50%, app will be able to get data from memory faster (less apps will compete for the same limited bandwidth), every app instance could use more cache (additionally helping with loading data), plus CPU resources will not be shared between two apps (SMT/HT does not improve speed twice, usually it is much less).

If you want to improve speeds, use fastest possible memory, and overclock it if possible.
3) Message boards : Number crunching : Server Status - No Work Available (Message 1216)
Posted 11 days ago by Profile Daniel
The server status page is showing no work units available for download as of
Task data as of 6 Dec 2017, 23:46:47 UTC

I am assuming that would reflect all types of WU's for both Microsoft and Linux

WU Generator shows "running" ??

Thanks
Bill F

I have checked server status page and saw interesting thing:

Users registered in past 24 hours: 1
Computers registered in past 24 hours: 1717

Someone with big computing cloud has joined. No wonder that work queue is empty.
4) Message boards : Wish List : Future requests (Message 1199)
Posted 16 Nov 2017 by Profile Daniel
Can we possibly make the latest apps from Daniel the official versions? Haven't they been out for about 6 months?

That's something I wanted to do after a system upgrade (operating system), I still haven't find the time (for instance, right now, we are focusing on the work generator) to do this... I'll try my best, maybe the next week...

Thanks, might bug you again next week :-)

Next week is here :-P

Thanks ;-). Just came to do the reminder too!

Yep, this is starting to become 'my apologies thread'.... Notice the 'maybe' in the former sentence ;) I'll wait until we have all the needed versions (the one missing is the one for MacOS)

>> I'll wait until we have all the needed versions (the one missing is the one for MacOS)

Daniel?

Unfortunately I do not have access to machine with MacOS, so I cannot help here. In the past MacOS apps were created by TN-Grid people.
5) Message boards : Number crunching : Optimization (Message 1157)
Posted 24 Oct 2017 by Profile Daniel
I saw many users are running v 1.03
is faster than (Opti v.1.2) 1.02 developed by Daniel?


sorry, my bad. the v1.03 is only for linux

Official 1.0x apps are the same as Opti v.1.1 ones. Opti v.1.2 apps are not officially added yet.
6) Message boards : News : Another experiment on E. coli (Message 1155)
Posted 24 Oct 2017 by Profile Daniel
Could I take a look on code of your work generator and some sample input data? I wonder if I could optimize it a bit.

Thank you for this, I will contact Francesco (who is the main author of the program) and let you know about his comments. Beware that the core of the program is written in Python, I'm planning to rewrite it in C++.

No problem, I know Python too :)

BTW, have you tried to use PyPy (https://pypy.org/) or something like this?
7) Message boards : News : Another experiment on E. coli (Message 1153)
Posted 24 Oct 2017 by Profile Daniel
Could I take a look on code of your work generator and some sample input data? I wonder if I could optimize it a bit.
8) Message boards : Wish List : Future requests (Message 1144)
Posted 23 Oct 2017 by Profile Daniel
Now the project is beta but, in the future, may be stable and public
So, these are usually requests from volunteers for all boinc projects:
1) A GPU client (better OpenCL, for all gpu cards)

We think about to this possibility when we started integrating the software with the BOINC API... Unfortunately our algorithm is not so much parallelizable, hence is not suitable for the GPU hardware. I don't think there will be a GPU version of our project.


I would appreciate if for future researches any kind of parallel algorithm would be considered, and if possible implemented using OpenCl

Actually, we have a slightly different variant of the (gene@home) algorithm that *may* be suitable for parallelization. The main problem, right now, is that no one here has the necessary cuda/opencl skills....


Small update from my side: I am working on OpenCL app. It turned out that for L=0 and L=1 graph edges must be processed in parallel (this part is ready). For higher L amount of per-edge work is higher, so app will process edges in a serial way, and parallelize work for every one of them. Also for higher L bandwidth of global GPU memory becomes major performance bottleneck, I am looking for solution for this problem. I will let you know when I will have something ready for public testing.
9) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1102)
Posted 29 Jul 2017 by Profile Daniel
Looks that TN-Grid app causes some new bug on Ryzen CPUs, which is not fixed yet. I have created post on AMD forum to let them know about it: https://community.amd.com/message/2814366
10) Message boards : Number crunching : Optimization (Message 1070)
Posted 21 May 2017 by Profile Daniel
I asked rattorosso [Marche] to create apps for remaining 3 ARM architectures: armv6_vfp, armv7_vfpv3 and aarch64. He send them to me, and I uploaded them in usual place: https://bitbucket.org/sirzooro/pc-boinc/downloads/. Feel free to download and test them too.

So....what is the difference between default app which i download automatic (for example gene_pcim_v1.02_win64__fma) and the same one fma from the link and the archive? If i saw right both are version 1.02? Which one i must use for better performance?

Both internally specify the same version 1.02, but version from this thread has extra optimizations so it runs faster than official one. Valterc is going to take it and release as a new version of official one at end of May.
11) Message boards : Number crunching : Optimization (Message 1060)
Posted 4 May 2017 by Profile Daniel
Hi daniel, i have a question for you

on my i5-6400 win7 64bit i recieve both avx an sse2 WU
if i want instal your optimization v1.2 which version i need to copy in the project folder?
- TN-Grid.windows-x86-64-avx-v1.2
- TN-Grid.windows-x86-64-sse2-v1.2
can i install both?

Your CPU also supports FMA instructions, so you can also try FMA app version: http://www.cpu-world.com/CPUs/Core_i5/Intel-Core%20i5-6400.html. In general FMA app version should be faster than AVX, which is faster than SSE one. However on some CPUs FMA versions for some reason is a bit slower than AVX one, so please try both.

It is possible to install few versions, but you would have to rename pc.exe files and modify app_info.xml to specify all app versions with proper plan classes. Files prepared by me are configured to run single app version only.
12) Message boards : Number crunching : Optimization (Message 1039)
Posted 9 Apr 2017 by Profile Daniel
Win10 X64
PC-IM v1.02 (sse2) 1600-1700 sec range with i7-5820k 4.2GHz
v1.2 SSE2 1750-1755 sec range

Linux X64 ubuntu 16.10LTS PC-IM v1.03 (fma) 2200-2250 sec average with xeon 2696 v3
v.1.2 (fma) 1940-1980 sec range

Same instruction to compare but different to system/os. So for these win took longer but linux shorter.

Thanks for these numbers. I did my tests on Windows using AVX version and it was faster for me. I suspect that new SSE version is slower. but I have to perform additional tests to confirm this. I will let you know when I will have some results.

I did extra benchmarks using 10 blocks from some VV WU instead of 1 like before. On my Windows machine new SSE app has results similar to AVX app. I also tried to benchmark 32-bit SSE app version and that one was slower than official SSE app. Maybe you downloaded 32-bit app instead of 64-bit one?
13) Message boards : Number crunching : Optimization (Message 1038)
Posted 9 Apr 2017 by Profile Daniel
Win10 X64
PC-IM v1.02 (sse2) 1600-1700 sec range with i7-5820k 4.2GHz
v1.2 SSE2 1750-1755 sec range

Linux X64 ubuntu 16.10LTS PC-IM v1.03 (fma) 2200-2250 sec average with xeon 2696 v3
v.1.2 (fma) 1940-1980 sec range

Same instruction to compare but different to system/os. So for these win took longer but linux shorter.

Thanks for these numbers. I did my tests on Windows using AVX version and it was faster for me. I suspect that new SSE version is slower. but I have to perform additional tests to confirm this. I will let you know when I will have some results.
14) Message boards : Number crunching : Optimization (Message 1037)
Posted 9 Apr 2017 by Profile Daniel
OK, I did that, but now get the message:

Message from server: Your app_info.xml file doesn't have a usable version of gene@home PC-IM.


EDIT:
Also, it won't download more work, since it says the computer has reached the daily quota of 1 task.

I suspect that problem is caused by WUs which you downloaded using official app, which are still considered as in progress. Please try to delete app_info.xml, restart BOINC, wait until BOINC will re-download all these WUs, then abort them all and install optimized app again. Before aborting tasks please also suspend project or set it to "no new tasks" to avoid downloading new WUs in place of aborted ones.

BTW, i7-4770 also supports AVX and FMA, you can try these app versions too.
15) Message boards : Number crunching : Optimization (Message 1034)
Posted 9 Apr 2017 by Profile Daniel
New app version is ready! It is available at the same place as usual: https://bitbucket.org/sirzooro/pc-boinc/downloads/. In order to install it, do following steps:
- finish or abort all existing tasks (they will be aborted after install automatically);
- stop BOINC;
- unpack selected version to project's directory (path like C:\Users\All Users\BOINC\projects\gene.disi.unitn.it_test\ on Windows, and /var/lib/boinc-client/projects/gene.disi.unitn.it_test on Linux);
- start BOINC again
After doing this, app name should change to "Gene Network Application (Opti v1.2)". You should also see message "Found app_info.xml; using anonymous platform" in event log for TN-Grid project.

I did all that, using the SSE2 version for Linux on my i7-4770, and got that message on reboot. But I am getting only errors.
http://gene.disi.unitn.it/test/results.php?hostid=6148

Error is "Permission denied". You need to execute following commands from root account in project dir to set appropriate permissions. If you cannot switch to root account using "su -", add "sudo " before each command.

chmod 755 pc chown boinc.boinc pc


After you do this, app should start working. You do not need to restart BOINC again.
16) Message boards : Number crunching : Optimization (Message 1031)
Posted 9 Apr 2017 by Profile Daniel
New app version is ready! It is available at the same place as usual: https://bitbucket.org/sirzooro/pc-boinc/downloads/. In order to install it, do following steps:
- finish or abort all existing tasks (they will be aborted after install automatically);
- stop BOINC;
- unpack selected version to project's directory (path like C:\Users\All Users\BOINC\projects\gene.disi.unitn.it_test\ on Windows, and /var/lib/boinc-client/projects/gene.disi.unitn.it_test on Linux);
- start BOINC again
After doing this, app name should change to "Gene Network Application (Opti v1.2)". You should also see message "Found app_info.xml; using anonymous platform" in event log for TN-Grid project.

This time I used Gray code (not Grey!) to optimize app. This code is a number sequence with special property: every two consecutive numbers differs by one bit only. This concept can be generalized in various ways. One of them are Gray code combinations, where every two consecutive subsets differs by one element only. Here is example of 3-combinations of 5 element set, generated in Gray code order:

1 2 3 1 2 4 1 3 4 2 3 4 2 3 5 1 3 5 1 2 5 1 4 5 2 4 5 3 4 5


TN-Grid Gene app uses combinations generator, so I decided to replace it with new Gray code combinations, and exploit its special property to recalculate only values which depends on changed element. By doing so I reduced total calculations time. Savings depends on maximum L value, and increases with it:
- some old organism stored as "test" data, max L=8: time reduced from 0.559s to 0.534s (4.4%);
- current organism (VV), max L=12: time reduced from 2.092s to 1.815s (13.2%);
- other old organism stored as "test2" data (it was probably ECM), max L=18: time reduced from 14.401s to 9.254s (35.7%).

If you are interested in algorithm details, you can check "Combinatorial Generation" by Frank Ruskey (page 129, algorithm 5.8), available at http://www.1stworks.com/ref/ruskeycombgen.pdf.

New app also checks if CPU supports required instruction set, and will exit with error message like "AVX instructions are not supported by your CPU!" if CPU will not support them.
17) Message boards : Number crunching : Optimization (Message 1022)
Posted 2 Apr 2017 by Profile Daniel
Is a GPU version still under consideration? I get the impression that it would work, with all the programming talent that Daniel (and others) bring to the project, but there may not be enough work to support it.

Where are we on that?

Yes, I am still going to create it. But first I would like to release new version of CPU app, it is almost ready.
18) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1016)
Posted 28 Mar 2017 by Profile Daniel
Any update on this? People on hwbot forum says that ASUS released new BIOS which resolved problem for them. Did you have change to test it, or one for your mainboard if available?
19) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1014)
Posted 23 Mar 2017 by Profile Daniel
Both Windows and Linux apps are compiled using gcc. Linux app was compiled using gcc 4.8.5. Windows one was compiled with gcc 5.4.0, so its code probably is better optimized than Linux one. There are also some system-specific changes, they also may play role here.

New app has new code to decompress input file, and to filter out some output results. Code which performs actual calculations was not changed. So previous version most probably would crash on Ryzen too.

Current Windows apps were compiled by me too, Valterc asked me to to this. I have downloaded FMA app version from TN-Grid server yesterday and verified that it is the same as one which I sent to him.
20) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1009)
Posted 22 Mar 2017 by Profile Daniel
I tried to disassemble compiled binary and things got interesting. All crash reports here mentions following error:

Privileged Instruction (0xc0000096) at address 0x00000000004f5458


When I checked instruction at this address, I got "STI" which is a sensitive instruction according to https://support.microsoft.com/en-nz/help/114473/intel-privileged-and-sensitive-instructions. But when I tried to disassemble whole app, it turned out that address 0x00000000004f5458 is invalid - valid instruction starts one byte earlier, at 0x00000000004f5457. Instruction at this address is "vmovsd" - it is an AVX instruction. This instruction maps to line 160 in pc.cpp. It looks like Ryzen decided to jump to some invalid address and executed some random instruction there which turned out to be an STI instruction.

Valterc, do you know if Windows 64-bit FMA app works fine on other CPUs? I wonder if this problem affects Ryzen CPUs only, or all users with FMA-capable CPUs and 64-bit Windows.


Next 20

Main page · Your account · Message boards


Copyright © 2017 CNR-TN & UniTN