Posts by entity

1) Message boards : Science : Spinal muscular atrophy (SMA) from Mus musculus (Message 3352)
Posted 13 Feb 2024 by entity

Picked up about 59 so far. 4 validated, 14 pending validation, the rest in progress. Running about 31 minutes on a 1st gen EPYC processor.

2) Message boards : News : Summer break (Message 3276)
Posted 14 Aug 2023 by entity

I hope to be able to restart the system with a new project at the beginning of August.

Is there any news on the new project?

3) Message boards : Number crunching : OUT of tasks (Message 3254)
Posted 6 Jun 2023 by entity

We are averaging 397.5 genes per day (and still rising) which would equate to 23055 WUs (397.5 x 58) per day not counting redundancy. Although some of the 397.5 were probably at the older 600 chunk level since the 397.5 is averaged over the last 10 days.

4) Message boards : Number crunching : OUT of tasks (Message 3241)
Posted 30 May 2023 by entity

TN-Grid limits the total number of workunits per host (regardless of its core count) to help maximize the total output of the project by spreading the work between as many hosts as possible, regardless of their core counts. Hosts with large core conts finish work more frequently, thus they have better chance to download work during a shortage.

That isn't my observation. On the 64 thread system I receive 384 WUs (6 x 64). After the change I still get 384 WUs. If I change the thread count to 128 I will get 768 (6 x 64) and it is consistent across all of my systems.

This means that your host can queue 33% more work than before the change.

I agree as long as we are only considering the number of "chunks" being downloaded (800 vs 600) but the # of WUs is the same as before. Supposedly, the problem we were trying to solve was related to the work generator and the number of work units created per unit of time. That doesn't seem to have changed much after the change.

That's no problem (both for you and for the project) provided that your host *always* have *some* work in its queue. As every workunit is a piece in a chain of 58 workunits, workunits that just sitting in a computer's queue holds back the completion of the entire chain. The more core a host have the more chains it can put on hold. The reason for limiting the total number of workunits is limiting the number of chains a host can put on hold, without reducing the host's througput.

I am always going to have a number of WUs in "Ready to Start" state to allow me to continue to work through outages whether at the project site or locally. That is a choice I make as a cruncher but I try to limit it to less than a day for the reasons you describe.

BTW your hosts are a nice example to show why *not* to crunch on virtual cores:
Your 13 years old AMD Phenom II 1090T X6 CPU can finish a longer VV workunit in 13.100 seconds,
while your 4 year old AMD Ryzen 9 3900X CPU can finish the same workunit in 12.000 seconds. Of course, it can finish twice as much workunits under the same time than the older CPU (so it's RAC is higher), but I guess it could do the same amount of work (or even a little more) if you would limit the number of tasks to the number of cores (6). Depending on the extra cache misses the extra task per core inflict, the performance running limited number of tasks simultaneously can be better.

I choose to run on virtual cores as I have found it a pain in the back side to have to constantly go into the BIOS to turn off HT/SMT for different projects. The lower thread count systems don't get me enough extra throughput to justify the trouble. The bigger server gives me better pay back for running fewer threads but I do that through the BOINC Manager (Use 50% of the CPUS) and yes I know that by doing that I'm not truly eliminating virtual cores and work isn't always balanced across the sockets but WUs run in about 1/2 the time. Turning off SMT wouldn't get me that much more throughput.

I would advocate for changing the WUs to 1200 from 800. I think that would make them run about the same time as the HS work.

5) Message boards : Number crunching : OUT of tasks (Message 3238)
Posted 30 May 2023 by entity

With the longer WUs, I'm still getting "Computer has reached limit of tasks in progress" on some of my machines with 1 day cache. Same as with the smaller WUs. This means the number of WUs downloaded is the same as before the change. WUs would have to get bigger before the number downloaded doesn't hit that limit. 6 x # of threads didn't even provide a half day cache with the smaller WUs, now it provides maybe a little over a half day but not one day.

6) Message boards : Number crunching : OUT of tasks (Message 3232)
Posted 29 May 2023 by entity

Looking at the increase in projected run times, I would venture to guess that the WUs have been increased from 600 to 800 with the last one (#58) still having 400. Number of tasks in progress have been declining for several hours.

7) Message boards : Number crunching : OUT of tasks (Message 3214)
Posted 25 May 2023 by entity

There are 600 chunks in the "normal" workunits (1-76), the last one has 400 chunks (judging by the ratio of their runtime).
If that's the case, there are 46000 chunks total (76*600=45600, 45600+400=46000).

46000|2 23000|2 11500|2 5750|2 2875|5 575|5 115|5 23|23 1|
It's not practical to double the number of chunks, as in this case the last workunit will be the half of the present ones.
I suggest to have 1000 chunks in a workunit, as in this case the workunits will be 2/3 longer than the present ones, and the last one will be the same length.

1000*46 (2*2*2*5*5*5)*(2*23) 920*50 (2*2*2*5*23)*(2*5*5)
The other possibility is to have 920 chunks, in this case every gene expansion would made up by 50 workunits.

I could support this option as it would also reduce the download/upload activity on the server as well (fewer WUs to manage).

8) Message boards : Science : The new Vitis vinifera project is underway (Message 3174)
Posted 16 May 2023 by entity

How many WUs are in a gene/isoform and is the last one short like the previous project?

9) Message boards : Number crunching : OUT of tasks (Message 3173)
Posted 16 May 2023 by entity

After moving to the new storage system, is the work generator still limited by the 600 WUs per 14 minutes?

10) Message boards : Number crunching : Curious (Message 3132)
Posted 6 Apr 2023 by entity

Looks like we will finish up this group of work in the next 12 to 24 hours. I hope Denis has a group to distribute while waiting on new work here.
I think we have 8 days' work left. (710 genes/isoforms left, the present rate is 87/day)

That would be for all the outstanding work to be returned to complete the genes/isoforms. I would suggest that tomorrow sometime we will transition from new work to only resends. Just a guess of course

11) Message boards : Number crunching : Curious (Message 3129)
Posted 6 Apr 2023 by entity

Looks like we will finish up this group of work in the next 12 to 24 hours. I hope Denis has a group to distribute while waiting on new work here.

12) Message boards : Number crunching : Curious (Message 3118)
Posted 29 Mar 2023 by entity

It is looking like the original count of genes/isoforms was/is wrong. I believe we still have at least 5 days if not 7 more days of work left to crunch. The queued number seems to be the more accurate count. Work seems to be coming in descending alphabetical order based on gene name and it takes about 1 day to 1 1/2 days per letter of the alphabet, currently finishing up the 'F' series. The status page will show over 100% (probably 101% to 102%) by the time all the genes have been completed.

13) Message boards : Number crunching : Curious (Message 3111)
Posted 20 Mar 2023 by entity

Why is the number of Queued genes/isoforms greater than the number of genes/isoforms? Was something added to the project? Source: science status page

14) Message boards : News : Storage problem (again) (Message 3103)
Posted 7 Mar 2023 by entity

it is not a major problem as only about 86 of 15300 WUs were flagged as being timed out. I don't see them when the filesystem isn't acting up. hopefully, after the move, we won't see them anymore

15) Message boards : News : Storage problem (again) (Message 3101)
Posted 6 Mar 2023 by entity

I hope this filesystem problem gets repaired soon as I'm starting to see other troubling problems. To wit: Supposedly task 230186_Hs_T142754-TRIML1-wu96_1677560784479 was sent to one of my hosts on 28-Feb-2023 19:35:31 UTC. Looking through the client log, that task is not found in any form. At that time in the log there was an indication that a scheduler request timed out. I was never sent the WU but evidently the server thought it was. Consequently, 5 days later the task was listed as an error against my host as Timed out -- no response. I'm seeing a rising number of these against all my hosts. All my hosts return WUs within the 5 day return period. Several were flagged against my 128 thread EPYC server that cannot get even one day of work before hitting the "limit of tasks in progress" message. All systems are up 24/7/365 (mostly, except brief reboots for security fixes)

16) Message boards : Number crunching : Completed work not uploading (Message 3094)
Posted 1 Mar 2023 by entity

I understand fully what you are saying and you are correct. The server cancelled WUs are not a problem as they were the work that was in my queue and not started when quorum was established for the WU. The other work however was work that was stuck in upload and missed the deadline not because there was too much work queued (this project limits the amount of work queued to twice the number of threads) but because the client was in an extended backoff. If I micro managed the client, I would have noticed the extended backoff and forced an update which would have met the deadline. I just wish they would fix that filesystem issue once and for all.

17) Message boards : Number crunching : Completed work not uploading (Message 3092)
Posted 28 Feb 2023 by entity

Not totally, I have 173 errors of which half are jobs cancelled by server and the other half are labeled Timed out -- No response. These are jobs that couldn't be uploaded in time due to either client back-off or due to files on the server that remain locked and cannot be opened when client tries the upload process again. The later can only be fixed by a server admin.

18) Message boards : Number crunching : Completed work not uploading (Message 3088)
Posted 22 Feb 2023 by entity

I have moved to Denis since they now have lots of work. Last time i looked they had about 62,000 WUs unsent

19) Message boards : Science : snoRNA interaction in healthy tissues (Message 2996)
Posted 11 Nov 2022 by entity

They produce about 100 times as much data as the other WUs (about 8k vs 800K). Is this what is causing the external filesystem issues?

20) Message boards : Number crunching : Shortest processing time ever (Message 2980)
Posted 6 Nov 2022 by entity

If you don't want to reduce the number of tasks queued on your host by limiting the number of CPU cores generally in BOINC, you can use the app_config.xml file to limit the number of simultaneous tasks for each project.
A little correction: this method (just like setting a global limit on the usable CPU cores in BOINC) limits the number of tasks queued. (I've set up my host like that to verify it).

The main message of this experiment is that one should not crunch (single-threaded apps) on the "virtual" (hyper-threaded) cores to achieve short runtimes. (Running too many of these tasks makes them wait for each other's FP operations, it could also easily fill up the last level cache of the CPU, resulting in increased cache misses and much slower memory transfers.)

If you don't turn off hyperthreading in the BIOS how do you guarantee only one thread per physical CPU core. The OS doesn't know the difference and will schedule threads on whatever is available. If you truly want to only run one thread on a core you have to turn off hyperthreading.