Posts by [B@P] Daniel

1) Message boards : Science : SARS-CoV-2 virus (Message 1886)
Posted 8 Jul 2020 by

Scientists discovered that genes SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, and XCR1 are associated with severe COVID-19 cases:
https://www.physiciansbriefing.com/infectious-disease-8/coronavirus-1008/genomewide-level-associations-identified-for-severe-covid-19-758771.html
https://www.nejm.org/doi/full/10.1056/NEJMoa2020283

2) Message boards : Number crunching : SSE3 optimization and Android binary (Message 1844)
Posted 18 May 2020 by

[B@P] Daniel

You have to put in the project's director an app_info.xml file with the name of your executable and then restart boinc.
https://boinc.berkeley.edu/wiki/Anonymous_platform
let us know if everything goes fine!

Then, if your procedure worked, you could explain all the passage you've done so that we will be able to compile also for arm32.
Did you compile the app with some extension? (ex. NEON, PIE)

AARCH64 apps for Android always are PIE and have NEON enabled.

3) Message boards : Number crunching : SSE3 optimization and Android binary (Message 1832)
Posted 17 May 2020 by

[B@P] Daniel

It looks that bzip2 library was build with -D_FORTIFY_SOURCE, which is not supported by glibc on Android. You need to rebuild bzip2 with this flag disabled. Here is related question on StackOverflow:
https://stackoverflow.com/questions/22977898/android-4-4-undefined-reference-to-printf-chk

4) Message boards : Number crunching : Please fix computer #57018 - only invalids! (Message 1802)
Posted 22 Apr 2020 by

[B@P] Daniel

You can limit number of WUs sent to hosts which returned high number of invalid WUs recently. Limit of 1 WU per host and scheduler request delay set to 12h should reduce this problem a lot.

Do you remember the server config option for enabling this? Well, I will try to find out... (Is not that easy for me, working at home)

I never tried to setup my own server, so I do not know exactly how to do this. Out of curiosity I tried to search for this, and found that you can set <reliable_priority_on_over> and <reliable_priority_on_over_except_error> options in project settings, they may do the trick.
Reference: https://boinc.berkeley.edu/trac/wiki/ProjectOptions#Acceleratingretries

I also found that you can blacklist a host by setting its max_results_day field to -1. You can use this to ban hosts which return invalid results only.
Reference: https://boinc.berkeley.edu/trac/wiki/BlackList

5) Message boards : Number crunching : Please fix computer #57018 - only invalids! (Message 1799)
Posted 22 Apr 2020 by

[B@P] Daniel

You can limit number of WUs sent to hosts which returned high number of invalid WUs recently. Limit of 1 WU per host and scheduler request delay set to 12h should reduce this problem a lot.

6) Message boards : Number crunching : SSE3 optimization and Android binary (Message 1791)
Posted 19 Apr 2020 by

[B@P] Daniel

Yes, this is known issue in BOINC. You need to manually copy config.h file from boinc source root dir to InstallPath/include/boinc after installing all files. Or if you use files from directly from boinc build dir, add /home/matteo/Software/boinc/ to include paths.

You can also modify this parse.h to use #include "../config.h".

7) Message boards : Number crunching : SSE3 optimization and Android binary (Message 1780)
Posted 17 Apr 2020 by

[B@P] Daniel

I built Android apps in the past, here is link to my post with more details: http://gene.disi.unitn.it/test/forum_thread.php?id=158&postid=905#905

I did this by taking crosscompiler from Android NDK and connected it to existing Makefiles used for building project app. I found these makefles. Here are important parts of them:

From Makefile for 32-bit Android app:

ARCH += -march=armv7-a -mtune=cortex-a7 -mfpu=vfpv4 -mfloat-abi=softfp LDFLAGS += -Wl,--fix-cortex-a8 PIE ?= 0 $(info Using PIE=$(PIE)) ifeq ($(PIE),1) CFLAGS += -fPIE LDFLAGS += -fPIE -pie BOINC_DIR = ../../_boinc32pie/ else LDFLAGS += -fno-PIE -no-pie BOINC_DIR = ../../_boinc32nonpie/ endif TOOLPATH = ../../$(TOOLDIR) CFLAGS = --sysroot=c:/tn-grid/android/_arm32/sysroot/ -DANDROID -DDECLARE_TIMEZONE -Ic:/tn-grid/android/_arm32/include/c++/4.9.x/arm-linux-androideabi/ LDFLAGS = --sysroot=c:/tn-grid/android/_arm32/sysroot/ CC = ../../_arm32/bin/arm-linux-androideabi-gcc CXX = ../../_arm32/bin/arm-linux-androideabi-g++

From Makefile for 64-bit Android app:

CFLAGS = --sysroot=c:/tn-grid/android/_arm64/sysroot/ -DANDROID -DANDROID_64 -DDECLARE_TIMEZONE -fPIE LDFLAGS = --sysroot=c:/tn-grid/android/_arm64/sysroot -fPIE -pie CC = ../../_arm64/bin/aarch64-linux-android-gcc CXX = ../../_arm64/bin/aarch64-linux-android-g++ BOINC_DIR = ../../_boinc64/

I used these Makefiles from Cygwin. I hope that this will help you.

In Makefile for 32-bit app I has to use -mfloat-abi=softfp instead of -mfloat-abi=hard. This was required by Android. You can check if is is possible now, otherwise app will be slower than corresponding ARM Linux app.

8) Message boards : Number crunching : TN-Grid on AMD GPUs (Message 1779)
Posted 17 Apr 2020 by

[B@P] Daniel

Thanks for the positive feedback. I have good and bad news.

Good news first:
the OpenCL version is running and produces roughly the same results. Roughly means 8% difference on the testing data included in the repo (that's 8% at the level of individual links). I have not yet tested on current production data.

Bad news:
The OpenCL port is in large parts pretty much a 1:1 copy of the CPU code. That implies it is still very, very slow. Memory access is all over the place and wrap divergence must be huge. Long story short: It's about as fast as a single CPU-core. I sort of expected that, 1:1 ports never end up looking good. I'm now looking at which factor is the most dominating slowdown and eliminate those (if possible).

I will keep you informed ...

I am author of optimized apps used by TN-Grid now. I also tried to port app to the GPU, but faced the same problem - global memory access was too slow. If I remember correctly, I tried to port app version which does not use Gray codes (predecessor of current app version), it looked more GPU-friendly for me. I was looking for potential solution for this slow memory access, and asked for algorithm on StackOverflow. I got an answer, however I never actually tried to implement it - I was busy with different things at that time. Here is link to that question, I hope it will be useful for you. Good luck!
https://stackoverflow.com/questions/46635137/how-to-generate-combinations-in-chunks

9) Message boards : Number crunching : sse2 vs avx (Message 1448)
Posted 18 Dec 2018 by

[B@P] Daniel

This looks like another Ryzen bug. This time CPU also jumps to address in middle of instruction, what must end in crash sooner or later.

I have reported this bug on AMD forum. Here is link to my post, it should be visible soon when moderator approves it:
https://community.amd.com/message/2890585

The SSE2 problem for me (and Beyond) was only on the Ryzen 1700. The Ryzen 2700 is OK. Maybe you should amend your report?

Thanks for looking into this.

Interesting. I have followed link on task info page to get info about CPU and OS, so looks that sometimes crashes occur on 2700 too. Unfortunately today this task page is deleted so I cannot add link here. Anyway, I will update my report that bug happens mostly on 1700.

10) Message boards : Number crunching : sse2 vs avx (Message 1445)
Posted 17 Dec 2018 by

[B@P] Daniel

had to abort sse2 after 10 hours but with 442 days remaining. there were no other cpu tasks running other then this project. Looking HERE sse2 and avx had same problem but fma and an anon succeeded. wonder what the "anon" was.

"aborted by user" is not 'technically' an error, it's a user's choice. I agree that if I saw a workunit stuck at 5% with an estimated time for completion of days (even if the estimate were completely wrong) I would also be tempted to abort it. The 'problematic' behavior of the TCGA workunits doesn't depend on the version of the application.

The sse2 problem of some Ryzen cpu with the current application is a 'real' problem: the app crashes with an 'illegal instruction' error.

I just sent an e-mail to Daniel (the user who actually wrote the sse2 code) asking for hints.

Hello again, I was not here for long long time :)

This looks like another Ryzen bug. This time CPU also jumps to address in middle of instruction, what must end in crash sooner or later.

I have reported this bug on AMD forum. Here is link to my post, it should be visible soon when moderator approves it:
https://community.amd.com/message/2890585

11) Message boards : Number crunching : Optimization (Message 1227)
Posted 14 Dec 2017 by

[B@P] Daniel

How is your RAM-Speed?
On my R7 it makes a big differens between 2.133 MHz an 3.066 Mht (up to 30 minutes)

i have 1x8GB 2666mhz c16, maybe i try to overclock it

i have undersood the "problem": if i set CPU usage at 50% (or turning off SMT) the 6 simultaneus WUs take around 2400s to complete:mysteri solved

I don't understand why WUs don't speedup when cpu is at 3.8ghz

TN-Grid app is very memory-intensive. One person from my team wrote that on his Xeon 14c/28t (I do not know exact model, probably it is E5-2683 v3) 4 TN-Grid WUs consumed all available memory bandwidth. So when you hit this limit, increasing CPU speed will not help, it will faster wait for memory ;)

Edit: when you set CPU usage to 50%, app will be able to get data from memory faster (less apps will compete for the same limited bandwidth), every app instance could use more cache (additionally helping with loading data), plus CPU resources will not be shared between two apps (SMT/HT does not improve speed twice, usually it is much less).

If you want to improve speeds, use fastest possible memory, and overclock it if possible.

tanks Daniel

Thanks Daniel
how i can see memory bandwidth usage?
just for test i tried to run ram at 2133mhz and a Wu (with 12 thread) take around 4300s
Setting ram at 2933mhz Wu drops to 3700s

There is no tool to see it. Intel support also claims that this cannot be measured. That person found this by performing series of tests:
- when he decreased CPU clock, CPU usage raised to 100%;
- when he decreased memory clock, CPU usage dropped to 85%;
- when he added 3rd memory stick, CPU usage increased from 90% to 100%.

12) Message boards : Number crunching : Optimization (Message 1223)
Posted 12 Dec 2017 by

[B@P] Daniel

How is your RAM-Speed?
On my R7 it makes a big differens between 2.133 MHz an 3.066 Mht (up to 30 minutes)

i have 1x8GB 2666mhz c16, maybe i try to overclock it

i have undersood the "problem": if i set CPU usage at 50% (or turning off SMT) the 6 simultaneus WUs take around 2400s to complete:mysteri solved

I don't understand why WUs don't speedup when cpu is at 3.8ghz

TN-Grid app is very memory-intensive. One person from my team wrote that on his Xeon 14c/28t (I do not know exact model, probably it is E5-2683 v3) 4 TN-Grid WUs consumed all available memory bandwidth. So when you hit this limit, increasing CPU speed will not help, it will faster wait for memory ;)

Edit: when you set CPU usage to 50%, app will be able to get data from memory faster (less apps will compete for the same limited bandwidth), every app instance could use more cache (additionally helping with loading data), plus CPU resources will not be shared between two apps (SMT/HT does not improve speed twice, usually it is much less).

If you want to improve speeds, use fastest possible memory, and overclock it if possible.

13) Message boards : Number crunching : Server Status - No Work Available (Message 1216)
Posted 7 Dec 2017 by

[B@P] Daniel

The server status page is showing no work units available for download as of
Task data as of 6 Dec 2017, 23:46:47 UTC

I am assuming that would reflect all types of WU's for both Microsoft and Linux

WU Generator shows "running" ??

Thanks
Bill F

I have checked server status page and saw interesting thing:

Users registered in past 24 hours: 1
Computers registered in past 24 hours: 1717

Someone with big computing cloud has joined. No wonder that work queue is empty.

14) Message boards : Wish List : Future requests (Message 1199)
Posted 16 Nov 2017 by

[B@P] Daniel

Can we possibly make the latest apps from Daniel the official versions? Haven't they been out for about 6 months?

That's something I wanted to do after a system upgrade (operating system), I still haven't find the time (for instance, right now, we are focusing on the work generator) to do this... I'll try my best, maybe the next week...

Thanks, might bug you again next week :-)

Next week is here :-P

Thanks ;-). Just came to do the reminder too!

Yep, this is starting to become 'my apologies thread'.... Notice the 'maybe' in the former sentence ;) I'll wait until we have all the needed versions (the one missing is the one for MacOS)

>> I'll wait until we have all the needed versions (the one missing is the one for MacOS)

Daniel?

Unfortunately I do not have access to machine with MacOS, so I cannot help here. In the past MacOS apps were created by TN-Grid people.

15) Message boards : Number crunching : Optimization (Message 1157)
Posted 24 Oct 2017 by

[B@P] Daniel

I saw many users are running v 1.03
is faster than (Opti v.1.2) 1.02 developed by Daniel?

sorry, my bad. the v1.03 is only for linux

Official 1.0x apps are the same as Opti v.1.1 ones. Opti v.1.2 apps are not officially added yet.

16) Message boards : News : Another experiment on E. coli (Message 1155)
Posted 24 Oct 2017 by

[B@P] Daniel

Could I take a look on code of your work generator and some sample input data? I wonder if I could optimize it a bit.

Thank you for this, I will contact Francesco (who is the main author of the program) and let you know about his comments. Beware that the core of the program is written in Python, I'm planning to rewrite it in C++.

No problem, I know Python too :)

BTW, have you tried to use PyPy (https://pypy.org/) or something like this?

17) Message boards : News : Another experiment on E. coli (Message 1153)
Posted 24 Oct 2017 by

[B@P] Daniel

Could I take a look on code of your work generator and some sample input data? I wonder if I could optimize it a bit.

18) Message boards : Wish List : Future requests (Message 1144)
Posted 23 Oct 2017 by

[B@P] Daniel

Now the project is beta but, in the future, may be stable and public
So, these are usually requests from volunteers for all boinc projects:
1) A GPU client (better OpenCL, for all gpu cards)

We think about to this possibility when we started integrating the software with the BOINC API... Unfortunately our algorithm is not so much parallelizable, hence is not suitable for the GPU hardware. I don't think there will be a GPU version of our project.

I would appreciate if for future researches any kind of parallel algorithm would be considered, and if possible implemented using OpenCl

Actually, we have a slightly different variant of the (gene@home) algorithm that *may* be suitable for parallelization. The main problem, right now, is that no one here has the necessary cuda/opencl skills....

Small update from my side: I am working on OpenCL app. It turned out that for L=0 and L=1 graph edges must be processed in parallel (this part is ready). For higher L amount of per-edge work is higher, so app will process edges in a serial way, and parallelize work for every one of them. Also for higher L bandwidth of global GPU memory becomes major performance bottleneck, I am looking for solution for this problem. I will let you know when I will have something ready for public testing.

19) Message boards : Number crunching : FMA problems (Ryzen and others?) (Message 1102)
Posted 29 Jul 2017 by

[B@P] Daniel

Looks that TN-Grid app causes some new bug on Ryzen CPUs, which is not fixed yet. I have created post on AMD forum to let them know about it: https://community.amd.com/message/2814366

20) Message boards : Number crunching : Optimization (Message 1070)
Posted 21 May 2017 by

[B@P] Daniel

I asked rattorosso [Marche] to create apps for remaining 3 ARM architectures: armv6_vfp, armv7_vfpv3 and aarch64. He send them to me, and I uploaded them in usual place: https://bitbucket.org/sirzooro/pc-boinc/downloads/. Feel free to download and test them too.

So....what is the difference between default app which i download automatic (for example gene_pcim_v1.02_win64__fma) and the same one fma from the link and the archive? If i saw right both are version 1.02? Which one i must use for better performance?

Both internally specify the same version 1.02, but version from this thread has extra optimizations so it runs faster than official one. Valterc is going to take it and release as a new version of official one at end of May.

Next 20