log in |
Message boards : Number crunching : OUT of tasks
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author | Message |
---|---|
After moving to the new storage system, is the work generator still limited by the 600 WUs per 14 minutes? Maybe a little bit faster (better I/O) but the performance is similar | |
ID: 3176 · Reply Quote | |
What would it take to generate work units faster? New software? More CPU cores? Faster cores? | |
ID: 3178 · Reply Quote | |
What would it take to generate work units faster? New software? More CPU cores? Faster cores? A brand new server :) And a new parallel version of the work generator | |
ID: 3179 · Reply Quote | |
How difficult is it to build parallel version of the work generator? | |
ID: 3180 · Reply Quote | |
How difficult is it to build parallel version of the work generator? It shouldn't be too difficult, not real parallelism, it would be more than enough to have a version that supports multiple instances running at the same time. Tha problems are that it's a rather complicated code that should carefully interact with the local gene db and it's written in python (a language that I refused to learn;) | |
ID: 3181 · Reply Quote | |
How about gene-level parallelism? Basically 1 work-unit generator per gene ? Would it work ? | |
ID: 3182 · Reply Quote | |
Since there are 22393 genes in V. Vinefera project, could we run 22393 work-unit generators in parallel, on different servers? | |
ID: 3184 · Reply Quote | |
Since there are 22393 genes in V. Vinefera project, could we run 22393 work-unit generators in parallel, on different servers? Just a dozen in parallel would be more than enough (without stressing the system: db and storage) | |
ID: 3185 · Reply Quote | |
I think it's a good time to start now, and keep doing so until we have at least 10k work units available for download. This will allow to finish the V. Vinefera project in only a few months. | |
ID: 3187 · Reply Quote | |
I think it's a good time to start now, and keep doing so until we have at least 10k work units available for download. This will allow to finish the V. Vinefera project in only a few months. It may allow us to finish the project in a few months. I believe there is not the storage to be able to make and store that many work units & results without putting too much pressure on the system TN grid is using the time the project. | |
ID: 3192 · Reply Quote | |
We have completed 5% of the project in only 8 days, that's Great ! | |
ID: 3193 · Reply Quote | |
Nah, they have upgraded the storage system recently with SSDs and new OS/ new software, so I think It would handle the load | |
ID: 3194 · Reply Quote | |
We have completed 5% of the project in only 8 days, that's Great !One 5% unit is already done, so there are 19 units left, that would take 152 days. But we can do even better, if we don't starve our work servers (nodes) and generate data in parallel. Currently there are ZERO owrk units available.Another solution to this problem is to double workunit lenght and halve the maximum number of cached workunits per core. This solution does not involve rewriting the work generator. | |
ID: 3195 · Reply Quote | |
> Another solution to this problem is to double workunit lenght | |
ID: 3197 · Reply Quote | |
> Another solution to this problem is to double workunit lenght There are, right now, 600 small computational chunks inside a workunit (the last one for every gene is smaller). I could easily increase or decrease that number, the computational time is proportional to it. The work generation time is almost independent of it so it will take around the same time to split an "expansion" in, say, 77 (right now) or 144 or 35 workunits. The choice of the right number is a compromise between a lot of things. A very fast workunit will overuse the network connection, a very long one will not be good for people with slow computers or intermittent dedicated time and increases the chance of wasting resources due to computational errors. The deadline should also be adjusted accordingly. | |
ID: 3198 · Reply Quote | |
I don't like your definition of a work unit. A work unit is one unit that BOINC worker downloads aka "expansion" in your term is a "work unit" in my term. | |
ID: 3201 · Reply Quote | |
I don't like your definition of a work unit. A work unit is one unit that BOINC worker downloads aka "expansion" in your term is a "work unit" in my term. That's exactly the definition of workunit. What you actually download is a collection of 600 computational chunks (you also downloaded the expression dataset, shared among the genes), Any chunk is the run of the PC algorithm on a tile, size 1000, made up with the "seed" gene and a random subset of the other genes. A single gene expansion is made up of 77 workunits. We wait until all of them come back and build up the gene expansion list. You can inherit this also by looking at the name of a workunit, like: 236784_Vv_vv-VIT-01s0150g00140_wu-57_1684831345366 236784 a internal id Vv the organism vv-VIT-01s0150g00140 the gene name wu-57 workunit 57 (out of 77) 1684831345366 a timestamp | |
ID: 3202 · Reply Quote | |
In this case, please double the number of chunks, and halve the maximum allowed tasks per cores.> Another solution to this problem is to double workunit lenght The latter is the key to resolve the "out of tasks" problem, the 1st part is needed to keep the hosts busy for the same amount of time. The choice of the right number is a compromise between a lot of things. A very fast workunit will overuse the network connection, a very long one will not be good for people with slow computers or intermittent dedicated time and increases the chance of wasting resources due to computational errors. The deadline should also be adjusted accordingly.Doubling the workunit processing time is a safe bet, as their processing time is the half of the previous project, which were just fine. A sidenote: On hyperthreaded hosts everyone should set in BOINC preferences -> computing preferences -> "Use at most 50% of the processors", the (TN-Grid) performance of the host wouldn't decrease, as it would make the CPU "do the math" twice as fast. | |
ID: 3204 · Reply Quote | |
I just modified wus*core from 8 to 6 (don't remember if I also need to restart the server), also reduced the deadline from 5 to 4 days. | |
ID: 3205 · Reply Quote | |
I just modified wus*core from 8 to 6 (don't remember if I also need to restart the server), also reduced the deadline from 5 to 4 days.Thanks. We'll see if it resolves the situation or not. There are 35 tasks ready to send at the moment. | |
ID: 3206 · Reply Quote | |
Message boards :
Number crunching :
OUT of tasks