log in |
Message boards : Number crunching : New TCGA workunits (TCGAz)
1 · 2 · 3 · Next
Author | Message |
---|---|
The new TCGA workunits (with a modified dataset) that contain the TCGAz string should behave correctly. Please let them run. I started with just a few batches. | |
ID: 1409 · Reply Quote | |
Thanks for the warning - some of these jobs seem to need considerably more CPU-time. 17 h for 32% in one case. Anyhow, if there's a chance they come to an end I'll let them run. | |
ID: 1410 · Reply Quote | |
Thanks for the warning - some of these jobs seem to need considerably more CPU-time. 17 h for 32% in one case. Anyhow, if there's a chance they come to an end I'll let them run. if you can, let them run. If you need to turn off your pc at the end of the day, abort them. If you are in the middle of long checkpoint, and you turn off your pc, you'll come back to the previously checkpoint, losing your time. One wu stay at 52% for 9h. Total run time 33h | |
ID: 1411 · Reply Quote | |
Thanks for the warning - some of these jobs seem to need considerably more CPU-time. 17 h for 32% in one case. Anyhow, if there's a chance they come to an end I'll let them run. I refer to old TCGA wus | |
ID: 1412 · Reply Quote | |
Thanks for the warning - some of these jobs seem to need considerably more CPU-time. 17 h for 32% in one case. Anyhow, if there's a chance they come to an end I'll let them run. No problem to let them run - looking good. One job finished after 1 days 3 hours 52 min 36 sec, completed and validated. There's one job now at 5% after one day, 19 days to go. Let's see. :-) | |
ID: 1413 · Reply Quote | |
OK, just to summarize: the TCGA workunits are the problematic ones, really long and with long periods of time (hours) without any checkpoint. Nevertheless we would be very happy if you let them run until completion. | |
ID: 1414 · Reply Quote | |
OK, just to summarize: the TCGA workunits are the problematic ones, really long and with long periods of time (hours) without any checkpoint. Nevertheless we would be very happy if you let them run until completion. No problem, my CPUs are working 24/7. I have no mercy. | |
ID: 1415 · Reply Quote | |
Here's one job with no progress but growing time: | |
ID: 1416 · Reply Quote | |
Well, I got this https://gene.disi.unitn.it/test/result.php?resultid=37659353 that spent almost 24 hours without any (visible) progress and eventually got validated. This one https://gene.disi.unitn.it/test/result.php?resultid=37623246 ran for about three days and still needs to be validated. The longest one, by now, is this one https://gene.disi.unitn.it/test/workunit.php?wuid=17810415 that got more than 2300 credits. | |
ID: 1417 · Reply Quote | |
...I'm constantly monitoring the situation, adding credits for 'invalid' and 'too late to validate' workunits. Thanks, that's what I love to read. If someone cares for the result, there's some reason in it. I'll let it run and we'll see what happens. Curiosity is a strong motive. | |
ID: 1418 · Reply Quote | |
Here's the longest one I've seen yet: | |
ID: 1420 · Reply Quote | |
Here's the longest one I've seen yet: Now at 183 hours, and 156 hours since the last checkpoint (which is troubling). Still at 12%. I assume that if the computer reboots or the WU gets interrupted for any reason it will restart. It appears that 6 computers are currently running this WU and no one has finished it. Is it viable? | |
ID: 1426 · Reply Quote | |
from the few i have looked at, the sse2 and avx clearly have problems but fma are succeeding. | |
ID: 1432 · Reply Quote | |
Here's the longest one I've seen yet: Now at 282 hours, 238 hours since the last checkpoint. Still at 12%. Now there seems to be 7 computers actively running this WU (counting the 5 that are listed as "Timed out - no response", one of which is mine). Any possibility of adding some checkpoints to these TCGA WUs? A power glitch or any other interruption would wipe out nearly 10 days of work. | |
ID: 1439 · Reply Quote | |
Now at 183 hours, and 156 hours since the last checkpoint (which is troubling). Still at 12%. I have seen a few of those. Before they get that far, I abort them. I think you are unnecessarily conscientious; they are duds. | |
ID: 1440 · Reply Quote | |
There are about 30 TCGA workunits still around and for sure those are the very long ones. Theoretically those could run forever. The reason is that for a certain, very rare, type of input the algorithm's completely is exponential. We usually manage to adjust the input dataset in order to avoid this but in the current case there were an issue that we were able to fix only after the workunits were distributed. The results are of scientific value, of course, but without a checkpoint inside the critical section of the algorithm this is not the kind of computation to do inside the BOINC framework. | |
ID: 1442 · Reply Quote | |
There are about 30 TCGA workunits still around and for sure those are the very long ones. Theoretically those could run forever. The reason is that for a certain, very rare, type of input the algorithm's completely is exponential. We usually manage to adjust the input dataset in order to avoid this but in the current case there were an issue that we were able to fix only after the workunits were distributed. The results are of scientific value, of course, but without a checkpoint inside the critical section of the algorithm this is not the kind of computation to do inside the BOINC framework. Thanks for the update and explanation. I have four of these WUs currently running on 3 machines. Thought I'd post a small BoinkTasks screenshot: From clues gleaned by looking at the WU history I expect a couple of these to finish within the next 2 days. A third one will probably be longer and the currently 306 hour one is totally mysterious as several faster machines are running it and it's never been completed. | |
ID: 1443 · Reply Quote | |
Three of the above WUs finished as expected. The 4th is STILL running at 12% completion after 377 hours (and 312 hours since the last checkpoint). It looks like 8 machines are still running this one, some longer than I have: | |
ID: 1452 · Reply Quote | |
Do not worry, if you need to abort them do that. There are about 18 workunits of the TCGA batch still around (of a total of 1240), some of them are probably the most problematic ones. | |
ID: 1453 · Reply Quote | |
The 4th is STILL running at 12% completion after 377 hours (and 312 hours since the last checkpoint). All projects have stuck work units; some more than others. If the Progress % is not making any progress after a few hours (24 hours is more than enough time), then it is stuck in a loop. | |
ID: 1454 · Reply Quote | |
Message boards :
Number crunching :
New TCGA workunits (TCGAz)