OUT of tasks

Message boards : Number crunching : OUT of tasks

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Author	Message
valterc Project administrator Project tester Send message Joined: 30 Oct 13 Posts: 620 Credit: 34,664,313 RAC: 3	Message 3176 - Posted: 16 May 2023, 16:45:43 UTC - in response to Message 3173.
	After moving to the new storage system, is the work generator still limited by the 600 WUs per 14 minutes? Maybe a little bit faster (better I/O) but the performance is similar
	ID: 3176 · Reply Quote

Technologov Send message Joined: 27 Jan 22 Posts: 36 Credit: 302,381,486 RAC: 8	Message 3178 - Posted: 20 May 2023, 5:02:38 UTC - in response to Message 3176.
	What would it take to generate work units faster? New software? More CPU cores? Faster cores? Amount of available work units is pretty low for both TN-GRID and Rosetta projects.
	ID: 3178 · Reply Quote

valterc Project administrator Project tester Send message Joined: 30 Oct 13 Posts: 620 Credit: 34,664,313 RAC: 3	Message 3179 - Posted: 20 May 2023, 11:25:45 UTC - in response to Message 3178.
	What would it take to generate work units faster? New software? More CPU cores? Faster cores? Amount of available work units is pretty low for both TN-GRID and Rosetta projects. A brand new server :) And a new parallel version of the work generator
	ID: 3179 · Reply Quote

Technologov Send message Joined: 27 Jan 22 Posts: 36 Credit: 302,381,486 RAC: 8	Message 3180 - Posted: 20 May 2023, 23:28:35 UTC - in response to Message 3179.
	How difficult is it to build parallel version of the work generator?
	ID: 3180 · Reply Quote

valterc Project administrator Project tester Send message Joined: 30 Oct 13 Posts: 620 Credit: 34,664,313 RAC: 3	Message 3181 - Posted: 21 May 2023, 12:03:04 UTC - in response to Message 3180.
	How difficult is it to build parallel version of the work generator? It shouldn't be too difficult, not real parallelism, it would be more than enough to have a version that supports multiple instances running at the same time. Tha problems are that it's a rather complicated code that should carefully interact with the local gene db and it's written in python (a language that I refused to learn;)
	ID: 3181 · Reply Quote

Technologov Send message Joined: 27 Jan 22 Posts: 36 Credit: 302,381,486 RAC: 8	Message 3182 - Posted: 21 May 2023, 15:24:49 UTC - in response to Message 3181.
	How about gene-level parallelism? Basically 1 work-unit generator per gene ? Would it work ? Do you have spare server capacity to generate in parallel like that ?
	ID: 3182 · Reply Quote

Technologov Send message Joined: 27 Jan 22 Posts: 36 Credit: 302,381,486 RAC: 8	Message 3184 - Posted: 21 May 2023, 23:12:37 UTC
	Since there are 22393 genes in V. Vinefera project, could we run 22393 work-unit generators in parallel, on different servers?
	ID: 3184 · Reply Quote

valterc Project administrator Project tester Send message Joined: 30 Oct 13 Posts: 620 Credit: 34,664,313 RAC: 3	Message 3185 - Posted: 22 May 2023, 11:01:50 UTC - in response to Message 3184.
	Since there are 22393 genes in V. Vinefera project, could we run 22393 work-unit generators in parallel, on different servers? Just a dozen in parallel would be more than enough (without stressing the system: db and storage)
	ID: 3185 · Reply Quote

Technologov Send message Joined: 27 Jan 22 Posts: 36 Credit: 302,381,486 RAC: 8	Message 3187 - Posted: 22 May 2023, 15:38:58 UTC - in response to Message 3185.
	I think it's a good time to start now, and keep doing so until we have at least 10k work units available for download. This will allow to finish the V. Vinefera project in only a few months.
	ID: 3187 · Reply Quote

Speedy Send message Joined: 13 Nov 21 Posts: 33 Credit: 1,020,742 RAC: 0	Message 3192 - Posted: 22 May 2023, 21:40:23 UTC - in response to Message 3187.
	I think it's a good time to start now, and keep doing so until we have at least 10k work units available for download. This will allow to finish the V. Vinefera project in only a few months. It may allow us to finish the project in a few months. I believe there is not the storage to be able to make and store that many work units & results without putting too much pressure on the system TN grid is using the time the project.
	ID: 3192 · Reply Quote

Technologov Send message Joined: 27 Jan 22 Posts: 36 Credit: 302,381,486 RAC: 8	Message 3193 - Posted: 22 May 2023, 21:40:51 UTC
	We have completed 5% of the project in only 8 days, that's Great ! It suggests that 100/5 = 20, meaning 20x more to go in 8 days = 160 days, or 5.5 months! Meaning we should complete this project this year around November or December. But we can do even better, if we don't starve our work servers (nodes) and generate data in parallel. Currently there are ZERO owrk units available. This would allow to double up our efforts and finish this project, V. Vinefera in about 3 months.
	ID: 3193 · Reply Quote

Technologov Send message Joined: 27 Jan 22 Posts: 36 Credit: 302,381,486 RAC: 8	Message 3194 - Posted: 22 May 2023, 22:09:43 UTC - in response to Message 3192.
	Nah, they have upgraded the storage system recently with SSDs and new OS/ new software, so I think It would handle the load
	ID: 3194 · Reply Quote

Retvari Zoltan Send message Joined: 31 Mar 20 Posts: 43 Credit: 51,206,467 RAC: 12	Message 3195 - Posted: 23 May 2023, 0:19:40 UTC - in response to Message 3193.
	We have completed 5% of the project in only 8 days, that's Great ! It suggests that 100/5 = 20, meaning 20x more to go in 8 days = 160 days, or 5.5 months! Meaning we should complete this project this year around November or December. One 5% unit is already done, so there are 19 units left, that would take 152 days. But we can do even better, if we don't starve our work servers (nodes) and generate data in parallel. Currently there are ZERO owrk units available. This would allow to double up our efforts and finish this project, V. Vinefera in about 3 months. Another solution to this problem is to double workunit lenght and halve the maximum number of cached workunits per core. This solution does not involve rewriting the work generator.
	ID: 3195 · Reply Quote

Technologov Send message Joined: 27 Jan 22 Posts: 36 Credit: 302,381,486 RAC: 8	Message 3197 - Posted: 23 May 2023, 2:22:04 UTC - in response to Message 3195.
	> Another solution to this problem is to double workunit lenght But generating longer (or larger?) work units are harder or the same as small work units ? So they will take up 6 hours of CPU time rather than 3 hours on average ?
	ID: 3197 · Reply Quote

valterc Project administrator Project tester Send message Joined: 30 Oct 13 Posts: 620 Credit: 34,664,313 RAC: 3	Message 3198 - Posted: 23 May 2023, 9:25:31 UTC - in response to Message 3197.
	> Another solution to this problem is to double workunit lenght But generating longer (or larger?) work units are harder or the same as small work units ? So they will take up 6 hours of CPU time rather than 3 hours on average ? There are, right now, 600 small computational chunks inside a workunit (the last one for every gene is smaller). I could easily increase or decrease that number, the computational time is proportional to it. The work generation time is almost independent of it so it will take around the same time to split an "expansion" in, say, 77 (right now) or 144 or 35 workunits. The choice of the right number is a compromise between a lot of things. A very fast workunit will overuse the network connection, a very long one will not be good for people with slow computers or intermittent dedicated time and increases the chance of wasting resources due to computational errors. The deadline should also be adjusted accordingly.
	ID: 3198 · Reply Quote

Technologov Send message Joined: 27 Jan 22 Posts: 36 Credit: 302,381,486 RAC: 8	Message 3201 - Posted: 23 May 2023, 12:29:37 UTC - in response to Message 3198. Last modified: 23 May 2023, 12:31:09 UTC
	I don't like your definition of a work unit. A work unit is one unit that BOINC worker downloads aka "expansion" in your term is a "work unit" in my term. "Work unit" = 1 unit of work downloaded by BOINC = 1 task What you have there is not a work unit but kore like a gene-slice Anyway we need more work units and bigger queue on the BOUNC server side of those available work units
	ID: 3201 · Reply Quote

valterc Project administrator Project tester Send message Joined: 30 Oct 13 Posts: 620 Credit: 34,664,313 RAC: 3	Message 3202 - Posted: 23 May 2023, 13:10:15 UTC - in response to Message 3201. Last modified: 23 May 2023, 13:18:27 UTC
	I don't like your definition of a work unit. A work unit is one unit that BOINC worker downloads aka "expansion" in your term is a "work unit" in my term. "Work unit" = 1 unit of work downloaded by BOINC = 1 task What you have there is not a work unit but kore like a gene-slice Anyway we need more work units and bigger queue on the BOUNC server side of those available work units That's exactly the definition of workunit. What you actually download is a collection of 600 computational chunks (you also downloaded the expression dataset, shared among the genes), Any chunk is the run of the PC algorithm on a tile, size 1000, made up with the "seed" gene and a random subset of the other genes. A single gene expansion is made up of 77 workunits. We wait until all of them come back and build up the gene expansion list. You can inherit this also by looking at the name of a workunit, like: 236784_Vv_vv-VIT-01s0150g00140_wu-57_1684831345366 236784 a internal id Vv the organism vv-VIT-01s0150g00140 the gene name wu-57 workunit 57 (out of 77) 1684831345366 a timestamp
	ID: 3202 · Reply Quote

Retvari Zoltan Send message Joined: 31 Mar 20 Posts: 43 Credit: 51,206,467 RAC: 12	Message 3204 - Posted: 23 May 2023, 13:47:16 UTC - in response to Message 3198. Last modified: 23 May 2023, 13:51:35 UTC
	> Another solution to this problem is to double workunit lenght But generating longer (or larger?) work units are harder or the same as small work units ? So they will take up 6 hours of CPU time rather than 3 hours on average ? There are, right now, 600 small computational chunks inside a workunit (the last one for every gene is smaller). I could easily increase or decrease that number, the computational time is proportional to it. The work generation time is almost independent of it so it will take around the same time to split an "expansion" in, say, 77 (right now) or 144 or 35 workunits. In this case, please double the number of chunks, and halve the maximum allowed tasks per cores. The latter is the key to resolve the "out of tasks" problem, the 1st part is needed to keep the hosts busy for the same amount of time. The choice of the right number is a compromise between a lot of things. A very fast workunit will overuse the network connection, a very long one will not be good for people with slow computers or intermittent dedicated time and increases the chance of wasting resources due to computational errors. The deadline should also be adjusted accordingly. Doubling the workunit processing time is a safe bet, as their processing time is the half of the previous project, which were just fine. A sidenote: On hyperthreaded hosts everyone should set in BOINC preferences -> computing preferences -> "Use at most 50% of the processors", the (TN-Grid) performance of the host wouldn't decrease, as it would make the CPU "do the math" twice as fast.
	ID: 3204 · Reply Quote

valterc Project administrator Project tester Send message Joined: 30 Oct 13 Posts: 620 Credit: 34,664,313 RAC: 3	Message 3205 - Posted: 23 May 2023, 14:10:50 UTC - in response to Message 3204. Last modified: 23 May 2023, 14:14:18 UTC
	I just modified wus*core from 8 to 6 (don't remember if I also need to restart the server), also reduced the deadline from 5 to 4 days.
	ID: 3205 · Reply Quote

Retvari Zoltan Send message Joined: 31 Mar 20 Posts: 43 Credit: 51,206,467 RAC: 12	Message 3206 - Posted: 23 May 2023, 14:33:40 UTC - in response to Message 3205. Last modified: 23 May 2023, 14:34:38 UTC
	I just modified wus*core from 8 to 6 (don't remember if I also need to restart the server), also reduced the deadline from 5 to 4 days. Thanks. We'll see if it resolves the situation or not. There are 35 tasks ready to send at the moment.
	ID: 3206 · Reply Quote

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Post to thread

Message boards : Number crunching : OUT of tasks