log in |
Message boards : Number crunching : New TCGA workunits (TCGAz)
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
There are about 30 TCGA workunits still around and for sure those are the very long ones. Theoretically those could run forever. The reason is that for a certain, very rare, type of input the algorithm's completely is exponential. We usually manage to adjust the input dataset in order to avoid this but in the current case there were an issue that we were able to fix only after the workunits were distributed. The results are of scientific value, of course, but without a checkpoint inside the critical section of the algorithm this is not the kind of computation to do inside the BOINC framework. It looks like this WU has been cancelled by the server? Yet it's still running on my system and probably others. Here's the latest: The top WU should finish within a day but the bottom one is still at 12% and now over 605 hours (483 hours since the last checkpoint). It looks "dead". Here's a link to the WU: http://gene.disi.unitn.it/test/workunit.php?wuid=17810256 and here's a different WU that looks hopeless: http://gene.disi.unitn.it/test/workunit.php?wuid=17810093 | |
ID: 1459 · Reply Quote | |
There are 9 workunits belonging to the TCGA 'bad' patch still 'around'. Some of them were canceled automatically by the server because they hit the "errors: Too many total results" limit. At his point there is no reason to keep the workunits running. | |
ID: 1460 · Reply Quote | |
Thanks for the update. 148368_Hs_TCGA-KLF6_wu-6_1543433660987 above finished this morning and validated. I'm aborting 148366_Hs_TCGA-BRCA2_wu-136_1543431411382 at 619 hours (still at 12%). I'd bet this has to be a record run time for a WU: | |
ID: 1461 · Reply Quote | |
Thanks for the update. 148368_Hs_TCGA-KLF6_wu-6_1543433660987 above finished this morning and validated. I'm aborting 148366_Hs_TCGA-BRCA2_wu-136_1543431411382 at 619 hours (still at 12%). I'd bet this has to be a record run time for a WU: Well I guess then that my 141 hours and stuck at 73% for over 2 days, has a way to go. Still taking up a full core so letting it run. Would be nice to get many, many thousands of credit for it (ha, ha). Conan | |
ID: 1463 · Reply Quote | |
I will give credits for every 'aborted by user' result inside a "Too many total results" workunit. I will calculate the average credit per hour of the involved pc, grant credits proportionally, plus a 20% bonus. For result 37630664 inside http://gene.disi.unitn.it/test/workunit.php?wuid=17810256, runtime of 2,230,059.45 sec will result in about 10k credits 37630664
run time: 2230059.45166
hostid: 3526 (n: 74)
average credit : 143.67223166234
average runtime: 36048.057297297
factor: 0.0039855748807055
credit: 8888.068933016
credit+bonus: 10665.682719619 let me know if you agree with this (you need to abort all your still running TCGA wus) | |
ID: 1464 · Reply Quote | |
Sounds fair to me. Thanks! | |
ID: 1465 · Reply Quote | |
Well I guess then that my 141 hours and stuck at 73% for over 2 days, has a way to go. Conan, looks like you're running this WU: http://gene.disi.unitn.it/test/workunit.php?wuid=17810575 Judging from the speed of the one machine that completed it vs. your machine I'd say you're looking at around 10-11 days total for this WU. It should complete though if your computer doesn't do something untoward (such as reboot). | |
ID: 1466 · Reply Quote | |
Sounds fair to me. Thanks! Ok, I did the 'credit granting' for all the results that were "aborted by user" or "can't validate" belonging to the "Too many total results" workunits, like this one http://gene.disi.unitn.it/test/workunit.php?wuid=17810097. The ones without credits are those without any recent host statistics (I cannot figure out the values) | |
ID: 1467 · Reply Quote | |
Thanks valterc! It's also quite possible that some of those "timed out - no response" are still running unless the server sends an abort message... | |
ID: 1470 · Reply Quote | |
Yep, thanks, I also cancelled, server side, those workunits. Will check if it worked. | |
ID: 1471 · Reply Quote | |
It worked and it looks like they're being awarded credit. Very nice: | |
ID: 1473 · Reply Quote | |
Well I guess then that my 141 hours and stuck at 73% for over 2 days, has a way to go. Beyond, yes that is the one. It has passed 200 hours now still at 73% and last checkpoint was at 24 hours. But still running and using a full core so we will see how it goes, must finish soon? Conan | |
ID: 1475 · Reply Quote | |
Conan, looks like you're running this WU: WU=17810575 I'm guessing 240-264 hours. Let's start a pool... ;-) I'll take 252 hours. | |
ID: 1476 · Reply Quote | |
Conan, looks like you're running this WU: WU=17810575 It is now at 232 hours, still at 73.00%, I will have a go at 270 hours as my computers are not the quickest. Conan | |
ID: 1477 · Reply Quote | |
Conan, looks like you're running this WU: WU=17810575 Well Beyond, it has passed 252 Hours and is still running and still at 73.00%. Care for another guess? Conan | |
ID: 1478 · Reply Quote | |
Well Beyond, it has passed 252 Hours and is still running and still at 73.00%. 264 hours because 2+4=6 and 6+6=12 and 2x6=12 and 2x12x12-2x12=264. Figure out that reasoning... ;-) | |
ID: 1479 · Reply Quote | |
Well Beyond, it has passed 252 Hours and is still running and still at 73.00%. Well that reasoning is pretty good but alas it is also out, as well as my 270 hours (with no reasoning). It has now passed 277 hours but has moved to 83.5% so we have progress. Maybe 200 hours? Conan | |
ID: 1480 · Reply Quote | |
293 hours because it's prime. | |
ID: 1481 · Reply Quote | |
293 hours because it's prime. It didn't validate correctly, nevertheless I just assigned credits. | |
ID: 1482 · Reply Quote | |
You have got to be one of the nicest, most conscientious admins in BoincLand. | |
ID: 1484 · Reply Quote | |
Message boards :
Number crunching :
New TCGA workunits (TCGAz)