OUT of tasks
log in

Advanced search

Message boards : Number crunching : OUT of tasks

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author Message
Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 2
Italy
Message 3176 - Posted: 16 May 2023, 16:45:43 UTC - in response to Message 3173.

After moving to the new storage system, is the work generator still limited by the 600 WUs per 14 minutes?

Maybe a little bit faster (better I/O) but the performance is similar

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,393,914
RAC: 2
Ukraine
Message 3178 - Posted: 20 May 2023, 5:02:38 UTC - in response to Message 3176.

What would it take to generate work units faster? New software? More CPU cores? Faster cores?
Amount of available work units is pretty low for both TN-GRID and Rosetta projects.

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 2
Italy
Message 3179 - Posted: 20 May 2023, 11:25:45 UTC - in response to Message 3178.

What would it take to generate work units faster? New software? More CPU cores? Faster cores?
Amount of available work units is pretty low for both TN-GRID and Rosetta projects.

A brand new server :)
And a new parallel version of the work generator

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,393,914
RAC: 2
Ukraine
Message 3180 - Posted: 20 May 2023, 23:28:35 UTC - in response to Message 3179.

How difficult is it to build parallel version of the work generator?

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 2
Italy
Message 3181 - Posted: 21 May 2023, 12:03:04 UTC - in response to Message 3180.

How difficult is it to build parallel version of the work generator?

It shouldn't be too difficult, not real parallelism, it would be more than enough to have a version that supports multiple instances running at the same time.
Tha problems are that it's a rather complicated code that should carefully interact with the local gene db and it's written in python (a language that I refused to learn;)

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,393,914
RAC: 2
Ukraine
Message 3182 - Posted: 21 May 2023, 15:24:49 UTC - in response to Message 3181.

How about gene-level parallelism? Basically 1 work-unit generator per gene ? Would it work ?

Do you have spare server capacity to generate in parallel like that ?

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,393,914
RAC: 2
Ukraine
Message 3184 - Posted: 21 May 2023, 23:12:37 UTC

Since there are 22393 genes in V. Vinefera project, could we run 22393 work-unit generators in parallel, on different servers?

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 2
Italy
Message 3185 - Posted: 22 May 2023, 11:01:50 UTC - in response to Message 3184.

Since there are 22393 genes in V. Vinefera project, could we run 22393 work-unit generators in parallel, on different servers?

Just a dozen in parallel would be more than enough (without stressing the system: db and storage)

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,393,914
RAC: 2
Ukraine
Message 3187 - Posted: 22 May 2023, 15:38:58 UTC - in response to Message 3185.

I think it's a good time to start now, and keep doing so until we have at least 10k work units available for download. This will allow to finish the V. Vinefera project in only a few months.

Speedy
Send message
Joined: 13 Nov 21
Posts: 33
Credit: 1,020,742
RAC: 0
New Zealand
Message 3192 - Posted: 22 May 2023, 21:40:23 UTC - in response to Message 3187.

I think it's a good time to start now, and keep doing so until we have at least 10k work units available for download. This will allow to finish the V. Vinefera project in only a few months.

It may allow us to finish the project in a few months. I believe there is not the storage to be able to make and store that many work units & results without putting too much pressure on the system TN grid is using the time the project.

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,393,914
RAC: 2
Ukraine
Message 3193 - Posted: 22 May 2023, 21:40:51 UTC

We have completed 5% of the project in only 8 days, that's Great !

It suggests that 100/5 = 20, meaning 20x more to go in 8 days = 160 days, or 5.5 months! Meaning we should complete this project *this year* around November or December. But we can do even better, if we don't starve our work servers (nodes) and generate data in parallel. Currently there are ZERO owrk units available.

This would allow to double up our efforts and finish this project, V. Vinefera in about 3 months.

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,393,914
RAC: 2
Ukraine
Message 3194 - Posted: 22 May 2023, 22:09:43 UTC - in response to Message 3192.

Nah, they have upgraded the storage system recently with SSDs and new OS/ new software, so I think It would handle the load

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 3195 - Posted: 23 May 2023, 0:19:40 UTC - in response to Message 3193.

We have completed 5% of the project in only 8 days, that's Great !

It suggests that 100/5 = 20, meaning 20x more to go in 8 days = 160 days, or 5.5 months! Meaning we should complete this project *this year* around November or December.
One 5% unit is already done, so there are 19 units left, that would take 152 days.

But we can do even better, if we don't starve our work servers (nodes) and generate data in parallel. Currently there are ZERO owrk units available.
This would allow to double up our efforts and finish this project, V. Vinefera in about 3 months.
Another solution to this problem is to double workunit lenght and halve the maximum number of cached workunits per core. This solution does not involve rewriting the work generator.

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,393,914
RAC: 2
Ukraine
Message 3197 - Posted: 23 May 2023, 2:22:04 UTC - in response to Message 3195.

> Another solution to this problem is to double workunit lenght

But generating longer (or larger?) work units are harder or the same as small work units ?
So they will take up 6 hours of CPU time rather than 3 hours on average ?

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 2
Italy
Message 3198 - Posted: 23 May 2023, 9:25:31 UTC - in response to Message 3197.

> Another solution to this problem is to double workunit lenght

But generating longer (or larger?) work units are harder or the same as small work units ?
So they will take up 6 hours of CPU time rather than 3 hours on average ?

There are, right now, 600 small computational chunks inside a workunit (the last one for every gene is smaller). I could easily increase or decrease that number, the computational time is proportional to it. The work generation time is almost independent of it so it will take around the same time to split an "expansion" in, say, 77 (right now) or 144 or 35 workunits.
The choice of the right number is a compromise between a lot of things. A very fast workunit will overuse the network connection, a very long one will not be good for people with slow computers or intermittent dedicated time and increases the chance of wasting resources due to computational errors. The deadline should also be adjusted accordingly.

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,393,914
RAC: 2
Ukraine
Message 3201 - Posted: 23 May 2023, 12:29:37 UTC - in response to Message 3198.
Last modified: 23 May 2023, 12:31:09 UTC

I don't like your definition of a work unit. A work unit is one unit that BOINC worker downloads aka "expansion" in your term is a "work unit" in my term.

"Work unit" = 1 unit of work downloaded by BOINC = 1 task

What you have there is not a work unit but kore like a gene-slice

Anyway we need more work units and bigger queue on the BOUNC server side of those available work units

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 2
Italy
Message 3202 - Posted: 23 May 2023, 13:10:15 UTC - in response to Message 3201.
Last modified: 23 May 2023, 13:18:27 UTC

I don't like your definition of a work unit. A work unit is one unit that BOINC worker downloads aka "expansion" in your term is a "work unit" in my term.

"Work unit" = 1 unit of work downloaded by BOINC = 1 task

What you have there is not a work unit but kore like a gene-slice

Anyway we need more work units and bigger queue on the BOUNC server side of those available work units

That's exactly the definition of workunit. What you actually download is a collection of 600 computational chunks (you also downloaded the expression dataset, shared among the genes), Any chunk is the run of the PC algorithm on a tile, size 1000, made up with the "seed" gene and a random subset of the other genes.
A single gene expansion is made up of 77 workunits. We wait until all of them come back and build up the gene expansion list.
You can inherit this also by looking at the name of a workunit, like:
236784_Vv_vv-VIT-01s0150g00140_wu-57_1684831345366
236784 a internal id
Vv the organism
vv-VIT-01s0150g00140 the gene name
wu-57 workunit 57 (out of 77)
1684831345366 a timestamp

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 3204 - Posted: 23 May 2023, 13:47:16 UTC - in response to Message 3198.
Last modified: 23 May 2023, 13:51:35 UTC

> Another solution to this problem is to double workunit lenght

But generating longer (or larger?) work units are harder or the same as small work units ?
So they will take up 6 hours of CPU time rather than 3 hours on average ?

There are, right now, 600 small computational chunks inside a workunit (the last one for every gene is smaller). I could easily increase or decrease that number, the computational time is proportional to it. The work generation time is almost independent of it so it will take around the same time to split an "expansion" in, say, 77 (right now) or 144 or 35 workunits.
In this case, please double the number of chunks, and halve the maximum allowed tasks per cores.
The latter is the key to resolve the "out of tasks" problem, the 1st part is needed to keep the hosts busy for the same amount of time.

The choice of the right number is a compromise between a lot of things. A very fast workunit will overuse the network connection, a very long one will not be good for people with slow computers or intermittent dedicated time and increases the chance of wasting resources due to computational errors. The deadline should also be adjusted accordingly.
Doubling the workunit processing time is a safe bet, as their processing time is the half of the previous project, which were just fine.

A sidenote:
On hyperthreaded hosts everyone should set in BOINC preferences -> computing preferences -> "Use at most 50% of the processors", the (TN-Grid) performance of the host wouldn't decrease, as it would make the CPU "do the math" twice as fast.

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 2
Italy
Message 3205 - Posted: 23 May 2023, 14:10:50 UTC - in response to Message 3204.
Last modified: 23 May 2023, 14:14:18 UTC

I just modified wus*core from 8 to 6 (don't remember if I also need to restart the server), also reduced the deadline from 5 to 4 days.

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 3206 - Posted: 23 May 2023, 14:33:40 UTC - in response to Message 3205.
Last modified: 23 May 2023, 14:34:38 UTC

I just modified wus*core from 8 to 6 (don't remember if I also need to restart the server), also reduced the deadline from 5 to 4 days.
Thanks. We'll see if it resolves the situation or not. There are 35 tasks ready to send at the moment.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Post to thread

Message boards : Number crunching : OUT of tasks


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN