Posts by Retvari Zoltan

1) Message boards : Number crunching : OUT of tasks (Message 3393)
Posted 4 May 2024 by Retvari Zoltan

Number of tasks in progress reached its peak around 13.500, now its steadily decreasing. (now: 12.971)

2) Message boards : Number crunching : OUT of tasks (Message 3390)
Posted 27 Apr 2024 by Retvari Zoltan

Number of tasks in progress is steadily declining (now: 11410).
Meaning we're out of tasks until new genes is queued.

3) Message boards : News : Happy New Year! (Message 3347)
Posted 1 Feb 2024 by Retvari Zoltan

Perhaps it would be nice to update the science status page according to that.

4) Message boards : Number crunching : Shortest processing time ever (Message 3259)
Posted 17 Jun 2023 by Retvari Zoltan

I've assembled a Ryzen 9 7900X for a friend, and I'm testing it with TN-Grid.
The OS installation is quite tedious, as I have to
- start with Linux 18.04 to be able to install folding@home,
- then upgrade to 20.04
- then upgrade to 22.04
- then upgrade it to kernel 5.19.
This is a 12 core, 24 thread CPU, so according to my precept I was running 12 TN-Grid tasks simultaneously from the beginning (by setting "use at most 50% of the CPUs" in BOINC manager computing preferences).
The CPU runs at 5.3GHz (as fas as I can tell in Linux).
Before the last upgrade (i.e. on Linux 18.04 and Linux 20.04 and Linux 22.04 kernel 5.15) the runtime was ~4000 seconds (~1h 6m 40s).
After the upgrade to kernel 5.19 the runtimes dropped to ~3300 seconds (55m).
Then I've overclocked the (DDR5) RAM and the runtimes dropped further to ~3000s (50m).

RAM non-overclocked: 4800MHz CL40 [8,3ns] 1.1V, RAM overclocked: 5600MHz CL36 [6,43ns] 1.25V,

The overclocked access time is 77% of the non-overclocked RAM, while the overclocked runtime is 91% of the non-overclocked RAM.
So the Ryzen 9 7900X per core performance was around the per core performance of my i7-11700F, after the upgrade to kernel 5.19 it reached the per core performance of my i5-12400, by overclocking the RAM it's only 7% slower per core than my i9-12900F, but the Ryzen only has "performance" cores, 50% more of them than the i9-12900F, so it does 40% more work in total.
You can check the runtimes here.

5) Message boards : Number crunching : OUT of tasks (Message 3257)
Posted 15 Jun 2023 by Retvari Zoltan

Something strange is happening: there are 2589 tasks ready to send, while there are 35884 tasks in progress.

6) Message boards : Number crunching : OUT of tasks (Message 3255)
Posted 6 Jun 2023 by Retvari Zoltan

That equals to 16 workunits per minute, or 1 workunit per 3.724 seconds.

7) Message boards : Number crunching : OUT of tasks (Message 3251)
Posted 5 Jun 2023 by Retvari Zoltan

The new top score is 37.065 in progress and 61 ready to send. (2023-06-03 5:24)
It's now 33376 in progress, 31 ready to send.

8) Message boards : Number crunching : OUT of tasks (Message 3248)
Posted 3 Jun 2023 by Retvari Zoltan

The number of tasks in progress is still increasing slowly.
There are 36.919 tasks in progress, and are 0 workunits ready to send, but a few minutes later there are 36.839/44 tasks.
The average runtime was 2.67 hours before the workunit size change, it has risen to 3.4 hours in two days, then slowly risen to it's estimated new value: 3.6 hours (2.67*8/6=3.56)

9) Message boards : Number crunching : OUT of tasks (Message 3245)
Posted 2 Jun 2023 by Retvari Zoltan

The number of tasks in progress is still increasing slowly. It's 35.016 at the moment, which is near the previous top (35.559). I wonder when will we reach the new top in the number of tasks in progress, and what that number will be. There are 80 workunits ready to send.

10) Message boards : Number crunching : OUT of tasks (Message 3243)
Posted 31 May 2023 by Retvari Zoltan

The number of tasks in progress is increasing slowly but steadily (~1000 per day it's 32.318 atm), while the ready to send is fluctuating between 80 and 120 workunits, so the work generator can keep up the pace just barely. Many people (on the northern hemisphere) will leave for summer vacation soon, therefore the available computing power will decrease soon, so I think the settings are fine for now. But I'm still curious how the project would perform when it's generating even larger (1150 chunks) workunits.

11) Message boards : Number crunching : OUT of tasks (Message 3242)
Posted 30 May 2023 by Retvari Zoltan

That isn't my observation. On the 64 thread system I receive 384 WUs (6 x 64). After the change I still get 384 WUs. If I change the thread count to 128 I will get 768 (6 x 64) and it is consistent across all of my systems.

That's my mistake. Perhaps I should consider it as a new idea then.

I choose to run on virtual cores as I have found it a pain in the back side to have to constantly go into the BIOS to turn off HT/SMT for different projects. The lower thread count systems don't get me enough extra throughput to justify the trouble.

Agreed. I don't recommend to turn it off in the BIOS.

The bigger server gives me better pay back for running fewer threads but I do that through the BOINC Manager (Use 50% of the CPUS) and yes I know that by doing that I'm not truly eliminating virtual cores and work isn't always balanced across the sockets but WUs run in about 1/2 the time. Turning off SMT wouldn't get me that much more throughput.

Modern OSes select the cores wisely for power hungry apps. I let them do it on their own. Windows is bad for running thousands of threads (my Windows 11 PC: 3300) that's one of the reasons for it's degraded performance (compared to Linux).

For anyone interested in other methods than changing the BOINC manager options -> Computing preferences -> "Use at most 50% of the CPUs":

I put an app_config.xml in each project's directory, that look similar to this:

<app_config> <app> <name>gene_pcim</name> <max_concurrent>7</max_concurrent> </app> </app_config>

You can figure out the app name from the project's webpage, or from the BOINC manager's log. (Nothing bad happens when you put an incorrect name here, BOINC manager will show the known app names in it's log, you can correct the names accordingly)

The other method is limit the project itself:

<app_config> <project_max_concurrent>7</project_max_concurrent> </app_config>

This will also limit the maximum number of workunits in the queue.

When I crunch for multiple projects at the same time, it's tedious to set these files to add up to the number of cores I want to use, in this case I use the cc_config.xml method: (It's located in the BOINC directory)

<cc_config> <options> <ncpus>7</ncpus> </options> </cc_config>

This will also limit the maximum number of workunits in the queue.
Originally this value is set to -1 (=all CPUs).

Don't forget to make BOINC manager to read the configuration files after any changes you've made.

I would advocate for changing the WUs to 1200 from 800. I think that would make them run about the same time as the HS work.

Agreed, my suggested number was 920 (50 workunits) or 1150 (40 workunits). (1200 would result 38 + 1/3 workunits.)

12) Message boards : Number crunching : OUT of tasks (Message 3240)
Posted 30 May 2023 by Retvari Zoltan

With the longer WUs, I'm still getting "Computer has reached limit of tasks in progress" on some of my machines with 1 day cache.

TN-Grid limits the total number of workunits per host (regardless of its core count) to help maximize the total output of the project by spreading the work between as many hosts as possible, regardless of their core counts. Hosts with large core conts finish work more frequently, thus they have better chance to download work during a shortage.

Same as with the smaller WUs. This means the number of WUs downloaded is the same as before the change.

This means that your host can queue 33% more work than before the change.

WUs would have to get bigger before the number downloaded doesn't hit that limit. 6 x # of threads didn't even provide a half day cache with the smaller WUs, now it provides maybe a little over a half day but not one day.

That's no problem (both for you and for the project) provided that your host *always* have *some* work in its queue. As every workunit is a piece in a chain of 58 workunits, workunits that just sitting in a computer's queue holds back the completion of the entire chain. The more core a host have the more chains it can put on hold. The reason for limiting the total number of workunits is limiting the number of chains a host can put on hold, without reducing the host's througput.

BTW your hosts are a nice example to show why *not* to crunch on virtual cores:
Your 13 years old AMD Phenom II 1090T X6 CPU can finish a longer VV workunit in 13.100 seconds,
while your 4 year old AMD Ryzen 9 3900X CPU can finish the same workunit in 12.000 seconds. Of course, it can finish twice as much workunits under the same time than the older CPU (so it's RAC is higher), but I guess it could do the same amount of work (or even a little more) if you would limit the number of tasks to the number of cores (6). Depending on the extra cache misses the extra task per core inflict, the performance running limited number of tasks simultaneously can be better.

13) Message boards : Number crunching : OUT of tasks (Message 3237)
Posted 30 May 2023 by Retvari Zoltan

Available Tasks are still at Zero. What is being done to address this state of affairs?

The number of workunits in progress is slowly rising, so the server might be able to fill up every host with work using the present settings (if comparing the longer results won't take up too much resources).
Give it a few days until all hosts return the "shorter" workunits, then fill up their queue with the "longer" ones, then the longer results are uplodaded and compared.

14) Message boards : Number crunching : OUT of tasks (Message 3235)
Posted 29 May 2023 by Retvari Zoltan

Let me answer:

To achieve using 50% CPU time boinc do you use preferences "50% of the CPUs"?

Yes.

... I have a feeling that the current project "Vitis vinifera" appears to run under 3 hours on Windows times given were before they increase work unit size. Previous project ran for over 3 hours most of the time

My Windows host with an i7-9700F CPU @4.5GHz can finish one longer "Vitis vinifera" in 1h 52m (~6.700s), the shorter ones took 1h 23m (~5.000s).
This CPU is not hyperthreaded (so I don't have to use the above setting), has 8 cores, I run 7 TN-Grid tasks simultaneously.

15) Message boards : Number crunching : OUT of tasks (Message 3233)
Posted 29 May 2023 by Retvari Zoltan

Looking at the increase in projected run times, I would venture to guess that the WUs have been increased from 600 to 800 with the last one (#58) still having 400.

I confirm that. The awarded credits went up as well.

Number of tasks in progress have been declining for several hours.

This is the way I think we could give more time for the work generator to keep up with the pace of the crunchers. (As every function of the BOINC infrastructure runs on the same server, they take resources from each other. If we reduce the overhead of administering the workunits by decreasing their numbers, the host can spend more resources on generating new work, and comparing results.)

16) Message boards : Number crunching : OUT of tasks (Message 3227)
Posted 28 May 2023 by Retvari Zoltan

AMD Zen 2 cores are an equivalent of older Intel Skylake cores. (6th Gen up to 10th Gen Core i7 chips).

While the actual architectual improvements between the 6th and 8th Gen Intel cores are debated, there is a significant increase in computing performance (and in performance per Watt) between each genereation (except for the 9th and the 10th Gen cores), so it's inadequate to wash all the CPU generations from the 6th to the 10th together.

They [the AMD Zen 2 cores] are significantly slower than newer Intel 12th and 13th gen Core i7 chips.

While this is true, that's not the reason for their poor performance here (on TN-Grid).
The real reason for their poor performance is the misunderstanding of Hyper-Threading (Simultaneous Multi-Threading) that leads to the overwhelming the execution units of the CPU cores.

If you are interested, here is the TLDR document:
http://www.cslab.ece.ntua.gr/courses/advcomparch/2007/material/readings/Intel%20Hyper-Threading%20Technology.pdf
The key concepts of this technology are the same for Intel, AMD, or any other CPU manufacturer.

The main point can be found on page 15, titled "Keys to Hyper-Threading Technology Performance"
At the bottom of this page:

Understand Hyper-Threading Technology Processor Resources
Each logical processor maintains a complete set of the architecture state. The architecture state consists of registers including the general-purpose registers, the control registers, the advanced programmable interrupt controller (APIC) registers andsome machine-state registers. From a software perspective, once the architecture state is duplicated, the processor appears to be two processors. The number of transistors to store the architecture state is an extremely small fraction of the total. Logical processors share nearly all other resources on the physical processor, such as caches, execution units, branch predictors, control logic and buses.

That means: if you want your (single-threaded) science application to run as fast as it could, don't use more than 50% of your CPUs for this purpose.
The 12th and 13th gen Intel CPUs have E-Cores, which don't have the necessary resources to run TN-Grid (and similar scientific) applications, so the percentage of the usable "CPUs" (threads in reality) on these CPUs are even lower (34% on i9-12xxx, 25% on i9-13xxx).
Look at these two i9-13900k's:
https://gene.disi.unitn.it/test/results.php?hostid=86237&offset=0&show_names=0&state=4&appid= Run time: 1h 37m CPU time: 1h 8m, 1 error
https://gene.disi.unitn.it/test/results.php?hostid=86238&offset=0&show_names=0&state=4&appid= Run time: 1h 34m CPU time: 1h 6m, many errors

17) Message boards : Number crunching : OUT of tasks (Message 3219)
Posted 26 May 2023 by Retvari Zoltan

Right now a single run of the work generator builds 77 workunits (154 tasks because of the validation requisites). Say that, for example, the result is computed, on a ideal computer, in one hour.

I'm glad that I have a better than ideal computer, as my i9-12900F can do one workunit in 35 minutes :)

So this will keep busy that ideal computer for 77 hours. If I, theoretically, will pack all the tiles into a single workunit this will keep that computer busy for 77 hours

I would crunch such workunits. It would take 45 hours for my better than ideal computer.

...exactly the same time but increasing the risks of computational errors.

That's probably true for "average" computers, but there are quite a few dedicated crunching boxes, so there could be a queue for those.

18) Message boards : Number crunching : OUT of tasks (Message 3218)
Posted 26 May 2023 by Retvari Zoltan

Now I understand the work generator process.
By packing more chunks (slices, tiles, whatever) into a single workunit won't shorten the time needed for generating them, but the overhead of processing the workunits will be less if they are containing larger chunks. Reducing this overhead could be enough to feed every host, depending the ratio of this overhead and the generation process, so it's definitely worth a try to make the workunits larger. I'm not sure if a slight increase will be beneficial enough, so I suggest at least 920 chunks per wu, as the number of workunits have to go down significantly to achieve significant drop in the total overhead.
Btw my hosts can get enough work with the present settings.

19) Message boards : Number crunching : OUT of tasks (Message 3212)
Posted 25 May 2023 by Retvari Zoltan

There are 600 chunks in the "normal" workunits (1-76), the last one has 400 chunks (judging by the ratio of their runtime).
If that's the case, there are 46000 chunks total (76*600=45600, 45600+400=46000).

46000|2 23000|2 11500|2 5750|2 2875|5 575|5 115|5 23|23 1|

It's not practical to double the number of chunks, as in this case the last workunit will be the half of the present ones.
I suggest to have 1000 chunks in a workunit, as in this case the workunits will be 2/3 longer than the present ones, and the last one will be the same length.

1000*46 (2*2*2*5*5*5)*(2*23) 920*50 (2*2*2*5*23)*(2*5*5)

The other possibility is to have 920 chunks, in this case every gene expansion would made up by 50 workunits.

20) Message boards : Number crunching : OUT of tasks (Message 3207)
Posted 23 May 2023 by Retvari Zoltan

I just modified wus*core from 8 to 6 (don't remember if I also need to restart the server), also reduced the deadline from 5 to 4 days.
Thanks. We'll see if it resolves the situation or not. There are 35 tasks ready to send at the moment.

There are 239 tasks ready to send at the moment.
I think in two days it will steadily decrease again to 0, as the now overfilled hosts will fall below th 6 wus/core treshold, and begin to replenish their queue faster than the generator could fill them (the hosts didn't get slower because their queue got shorter).
So to ultimately resolve this issue you'll have to increase the number of chunks in the workunits to make the hosts run out of tasks slower than the workunit generator could feed them. We'll see.

Next 20