log in |
Message boards : Number crunching : Wu stuck? (TCGA workunits)
Author | Message |
---|---|
Hi, | |
ID: 1385 · ![]() | |
The input file is made by some 'computational chunks', usually any chunks run for more or less the same time, so it was easy to decide how many chunks to put inside a workunit and to forecast the overall computational time. | |
ID: 1386 · ![]() | |
Thanks for the explanation and quick feedback Valterc. | |
ID: 1387 · ![]() | |
Just had one that was stuck on 16+ hours and still at 17% on a relatively fast and previously reliable machine. Aborted it before I came over here to check... | |
ID: 1388 · ![]() | |
>> Have another one that's getting close to 75% and 7.6 hours on a very fast machine. That one's still running. | |
ID: 1389 · ![]() | |
I also have to abort my task unfortunately. | |
ID: 1390 · ![]() | |
Any work-unit is built up from a certain number of small computational pieces, in the TCGA experiments the numbers are 1000 or 200. The checkpoint is written at he end of every chunk. Obviously, if one chunk is one of the 'abnormal' ones, running for hours, there will be no checkpoint for a long period of time, unfortunately. | |
ID: 1391 · ![]() | |
My hyper-threaded i7-6700K Windows 10 system has 1 validated TCGA task (148365_Hs_TCGA-AR_wu-124_1543429375131_2, 561.1 credits for 11:59:57 runtime), 1 pending validation (148368_Hs_TCGA-KLF6_wu-102_1543433840285_2, 9:10:54 runtime), 4 running and 5 in its work queue.
| |
ID: 1393 · ![]() | |
The TCGA workunits are problematic. If you happen to have one and find it like 'frozen', i.e. no progress after a long time, feel free to abort it. | |
ID: 1395 · ![]() | |
3 of those tasks have now completed:
| |
ID: 1396 · ![]() | |
The other task mentioned in my previous message was pre-empted for 9 hours and is now running again: Now completed with 30:52:42 runtime and 16 checkpoints made, validation pending. ____________ "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer | |
ID: 1406 · ![]() | |
The output of the 'problematic' TCGA workunits were, in some cases, different if calculated on Windows or Linux. There are some validation errors because of this so I wrote down some code in order to give credits even for the TCGA invalids. See this workunit as an example: http://gene.disi.unitn.it/test/workunit.php?wuid=17810593 | |
ID: 1408 · ![]() | |
The output of the 'problematic' TCGA workunits were, in some cases, different if calculated on Windows or Linux. There are some validation errors because of this so I wrote down some code in order to give credits even for the TCGA invalids. See this workunit as an example: http://gene.disi.unitn.it/test/workunit.php?wuid=17810593 Very nice and much appreciated. | |
ID: 1421 · ![]() | |
Crunched on my 1950X and waiting for validation: | |
ID: 1435 · ![]() | |
The last one is probably one of the longest I ever seen (4,704.77 credits...). This one https://gene.disi.unitn.it/test/workunit.php?wuid=17810524 is probably the record until now. | |
ID: 1436 · ![]() | |
GPUGRID has two queues - short one and long one. Can you maybe do the same here? | |
ID: 1437 · ![]() | |
GPUGRID has two queues - short one and long one. Can you maybe do the same here? The TCGA batch behavior was unexpected. Workunits like those, without checkpoints for a very long time and somewhat unpredictable running time are not for BOINC. We don't have any plan to distribute very long workunits in the future and, for sure, workunits like the TCGA ones. Just wanted to point up that the TCGAz workunits behave correctly. | |
ID: 1438 · ![]() | |
My i7 has just completed the _8 task from workunit 148368_Hs_TCGA-KLF6_wu-154_1543433935389 with 57:17:44 runtime, 41:07:29 of it being between the 15th and 16th checkpoints. | |
ID: 1444 · ![]() | |
Message boards :
Number crunching :
Wu stuck? (TCGA workunits)