OUT of tasks
log in

Advanced search

Message boards : Number crunching : OUT of tasks

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author Message
Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 41
Credit: 50,716,120
RAC: 5,694
Hungary
Message 3207 - Posted: 23 May 2023, 15:58:56 UTC - in response to Message 3206.
Last modified: 23 May 2023, 16:04:50 UTC

I just modified wus*core from 8 to 6 (don't remember if I also need to restart the server), also reduced the deadline from 5 to 4 days.
Thanks. We'll see if it resolves the situation or not. There are 35 tasks ready to send at the moment.
There are 239 tasks ready to send at the moment.
I think in two days it will steadily decrease again to 0, as the now overfilled hosts will fall below th 6 wus/core treshold, and begin to replenish their queue faster than the generator could fill them (the hosts didn't get slower because their queue got shorter).
So to ultimately resolve this issue you'll have to increase the number of chunks in the workunits to make the hosts run out of tasks slower than the workunit generator could feed them. We'll see.

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,033,756
RAC: 11,828
Ukraine
Message 3208 - Posted: 24 May 2023, 2:28:52 UTC - in response to Message 3207.
Last modified: 24 May 2023, 2:29:38 UTC

We are at 0 (Zero) available tasks again.

I suggest running multiple work generating in parallel

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 620
Credit: 34,561,442
RAC: 3,302
Italy
Message 3209 - Posted: 24 May 2023, 8:39:21 UTC - in response to Message 3208.
Last modified: 24 May 2023, 10:22:13 UTC

We are at 0 (Zero) available tasks again.

I suggest running multiple work generating in parallel

This would be good but we don't have the resources. Only one instance of the work generator can be run at the same time (it needs a rewrite). Also our current hardware would not be able to support it. We need new hardware and a (slightly new) work generator, we are working on it, but it is not that easy (finding money)

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,033,756
RAC: 11,828
Ukraine
Message 3210 - Posted: 25 May 2023, 0:59:23 UTC - in response to Message 3209.

Maybe try generating 2x bigger work units, that take 2x more CPU time to process ? +deadline must be 2x longer, obviously.

Assuming that "checkpointing" is implemented in this app... is it ? (if a home PC suddenly reboots)


The work generator will magically output 2x more work.

rsNeutrino
Send message
Joined: 12 Mar 23
Posts: 7
Credit: 1,128,326
RAC: 1,533
Germany
Message 3211 - Posted: 25 May 2023, 5:17:16 UTC - in response to Message 3210.

That's what I would suggest as well, and that's what Retvari Zoltan wrote:

...to double workunit lenght and halve the maximum number of cached workunits per core.

Of course, you could then vary the max cache and/or the deadline to hold the time to finish constant.

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 41
Credit: 50,716,120
RAC: 5,694
Hungary
Message 3212 - Posted: 25 May 2023, 17:51:37 UTC

There are 600 chunks in the "normal" workunits (1-76), the last one has 400 chunks (judging by the ratio of their runtime).
If that's the case, there are 46000 chunks total (76*600=45600, 45600+400=46000).

46000|2 23000|2 11500|2 5750|2 2875|5 575|5 115|5 23|23 1|
It's not practical to double the number of chunks, as in this case the last workunit will be the half of the present ones.
I suggest to have 1000 chunks in a workunit, as in this case the workunits will be 2/3 longer than the present ones, and the last one will be the same length.
1000*46 (2*2*2*5*5*5)*(2*23) 920*50 (2*2*2*5*23)*(2*5*5)
The other possibility is to have 920 chunks, in this case every gene expansion would made up by 50 workunits.

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,033,756
RAC: 11,828
Ukraine
Message 3213 - Posted: 25 May 2023, 19:03:07 UTC - in response to Message 3212.

Since I have some extra Linux server capacity, I can definitely accept more BOINC work. So whatever is easier for work-generator to generate ...

entity
Send message
Joined: 20 Jul 20
Posts: 20
Credit: 31,439,744
RAC: 53
United States
Message 3214 - Posted: 25 May 2023, 20:03:25 UTC - in response to Message 3212.

There are 600 chunks in the "normal" workunits (1-76), the last one has 400 chunks (judging by the ratio of their runtime).
If that's the case, there are 46000 chunks total (76*600=45600, 45600+400=46000).
46000|2 23000|2 11500|2 5750|2 2875|5 575|5 115|5 23|23 1|
It's not practical to double the number of chunks, as in this case the last workunit will be the half of the present ones.
I suggest to have 1000 chunks in a workunit, as in this case the workunits will be 2/3 longer than the present ones, and the last one will be the same length.
1000*46 (2*2*2*5*5*5)*(2*23) 920*50 (2*2*2*5*23)*(2*5*5)
The other possibility is to have 920 chunks, in this case every gene expansion would made up by 50 workunits.

I could support this option as it would also reduce the download/upload activity on the server as well (fewer WUs to manage).

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,033,756
RAC: 11,828
Ukraine
Message 3215 - Posted: 25 May 2023, 21:19:01 UTC - in response to Message 3214.
Last modified: 25 May 2023, 21:19:21 UTC

But if we do make bigger Work-units, please make sure to adjust "deadline" too.

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,033,756
RAC: 11,828
Ukraine
Message 3216 - Posted: 25 May 2023, 23:49:56 UTC - in response to Message 3215.
Last modified: 26 May 2023, 0:29:44 UTC

I propose to make work units made of 1150 chunks x 40 WUs per gene.

Those will take 2x longer than now. Currently I complete a work unit in 2 hours on my Core i9 9900K desktop. So 2x more = 4 hours per work unit, which is not much, assuming that checkpoints work correctly, between PC reboots. Double the deadline too.

This should double Work generation effectiveness, and load my servers with some good work ;)

This should allow us to complete V. Vinifera project in about 1.5 months !

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 620
Credit: 34,561,442
RAC: 3,302
Italy
Message 3217 - Posted: 26 May 2023, 9:43:40 UTC - in response to Message 3216.
Last modified: 26 May 2023, 9:46:43 UTC

I might try this solution but I don't think it will work, I will try to explain why.
The speed of the work generator depends only on the number of genes in our dataset. The program itself, from the computational point of view, is rather simple; it builds 2000 random permutation of (in case of Vitis) around 22000 numbers, then it makes tiles (slices) of them and (now) it packs them 600 per workunits. It's a little bit more complex than this but this gives you the idea.
The speed of the PC algorithm on a "tile" (the application that you run) depends on the "structure" of the dataset and on Vitis-Vespucci is faster than on the previous one (Human-FANTOM).
Right now a single run of the work generator builds 77 workunits (154 tasks because of the validation requisites). Say that, for example, the result is computed, on a ideal computer, in one hour. So this will keep busy that ideal computer for 77 hours. If I, theoretically, will pack all the tiles into a single workunit this will keep that computer busy for 77 hours, exactly the same time but increasing the risks of computational errors. The time of the work generator will be the same, say a few seconds faster because of creating just one file instead of 77.
Anyway I may be wrong... I will slightly increase the tiles per workunits starting from the next batch (it will take a couple of days)

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 41
Credit: 50,716,120
RAC: 5,694
Hungary
Message 3218 - Posted: 26 May 2023, 13:49:07 UTC - in response to Message 3217.

Now I understand the work generator process.
By packing more chunks (slices, tiles, whatever) into a single workunit won't shorten the time needed for generating them, but the overhead of processing the workunits will be less if they are containing larger chunks. Reducing this overhead could be enough to feed every host, depending the ratio of this overhead and the generation process, so it's definitely worth a try to make the workunits larger. I'm not sure if a slight increase will be beneficial enough, so I suggest at least 920 chunks per wu, as the number of workunits have to go down significantly to achieve significant drop in the total overhead.
Btw my hosts can get enough work with the present settings.

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 41
Credit: 50,716,120
RAC: 5,694
Hungary
Message 3219 - Posted: 26 May 2023, 15:03:55 UTC - in response to Message 3217.

Right now a single run of the work generator builds 77 workunits (154 tasks because of the validation requisites). Say that, for example, the result is computed, on a ideal computer, in one hour.
I'm glad that I have a better than ideal computer, as my i9-12900F can do one workunit in 35 minutes :)

So this will keep busy that ideal computer for 77 hours. If I, theoretically, will pack all the tiles into a single workunit this will keep that computer busy for 77 hours
I would crunch such workunits. It would take 45 hours for my better than ideal computer.

...exactly the same time but increasing the risks of computational errors.
That's probably true for "average" computers, but there are quite a few dedicated crunching boxes, so there could be a queue for those.

Aurum
Send message
Joined: 18 Jul 18
Posts: 97
Credit: 291,316,944
RAC: 5,718
United States
Message 3220 - Posted: 27 May 2023, 13:15:36 UTC

As we enter summer in the northern hemisphere and TOU peak rates it does not seem like a donor-friendly thing to make the WUs run longer. Once my queue got filled it remained loaded so I don't see a problem.
My electric utility just blind-sided us by adding June to July-September and more hours. TOU season requires more babysitting. I'm contemplating shutting down until October.

Aurum
Send message
Joined: 18 Jul 18
Posts: 97
Credit: 291,316,944
RAC: 5,718
United States
Message 3221 - Posted: 27 May 2023, 15:59:16 UTC - in response to Message 3213.

Since I have some extra Linux server capacity, I can definitely accept more BOINC work. So whatever is easier for work-generator to generate ...


It appears that you're running on dual-CPU server motherboards. Many of your CPUs are Intel Xeon E5-2680 v4 @ 2.40GHz (56 processors). An E5-2680 v4 is a 14c/28t CPU. Do you actually run 56 WUs on a single computer?
My CPUs are 18c/36t and they now have from 71 to 176 WUs Ready to Start in addition to those running.
I suspect a shortage of WUs is uniquely your issue. Do you never get over 1,032 WUs per computer? Do you ever get over 256 WUs per server?
If you have idle threads you could help with cancer research by running MCM and SCC at WCG. Just a thought.

rsNeutrino
Send message
Joined: 12 Mar 23
Posts: 7
Credit: 1,128,326
RAC: 1,533
Germany
Message 3222 - Posted: 27 May 2023, 18:02:11 UTC - in response to Message 3221.

Maybe I can answer some of that:

I suspect a shortage of WUs is uniquely your issue.

As long as tasks ready to send are not rising, someone's cache is not satisfied, the client is crunching faster than the cache can fill, ergo they run out of TN tasks sooner or later. The bottleneck is therefore at the moment the work generator, not the power-sum of our computers. Of course users should let them work on multiple projects if they desire to prevent idle cores.


Source

Do you never get over 1,032 WUs per computer?

In general, the Boinc client has a hard limit of 1000 tasks, above that it refuses to request more.

Speedy
Send message
Joined: 13 Nov 21
Posts: 33
Credit: 1,008,345
RAC: 12
New Zealand
Message 3223 - Posted: 27 May 2023, 22:07:32 UTC - in response to Message 3219.
Last modified: 27 May 2023, 22:19:25 UTC

I'm glad that I have a better than ideal computer, as my i9-12900F can do one workunit in 35 minutes :)

Nice time for a work unit. I have a Ryzen 3900 X and it takes over an hour to do a work unit. Perhaps this project prefers Intel. Using only your performance cores assuming you can get in case you can return 13.76 results in hour. Maths to get that is: 8×1.72

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,033,756
RAC: 11,828
Ukraine
Message 3224 - Posted: 28 May 2023, 0:38:05 UTC - in response to Message 3221.

I dedicate 50% threads to BOINC (WCG/SiDock/Rosetta/TN-Grid) and 50% to Stockfish Fishtest chess.

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,033,756
RAC: 11,828
Ukraine
Message 3225 - Posted: 28 May 2023, 0:40:40 UTC - in response to Message 3223.

AMD Zen 2 cores are an equivalent of older Intel Skylake cores. (6th Gen up to 10th Gen Core i7 chips). They are significantly slower than newer Intel 12th and 13th gen Core i7 chips.

Falconet
Send message
Joined: 21 Dec 16
Posts: 105
Credit: 3,082,818
RAC: 297
Portugal
Message 3226 - Posted: 28 May 2023, 10:19:14 UTC - in response to Message 3223.
Last modified: 28 May 2023, 10:19:29 UTC

I'm glad that I have a better than ideal computer, as my i9-12900F can do one workunit in 35 minutes :)

Nice time for a work unit. I have a Ryzen 3900 X and it takes over an hour to do a work unit. Perhaps this project prefers Intel. Using only your performance cores assuming you can get in case you can return 13.76 results in hour. Maths to get that is: 8×1.72


I don't know about Intel vs AMD but it does prefer Linux. 10-15% faster IIRC.
____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Post to thread

Message boards : Number crunching : OUT of tasks


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN