log in |
Message boards : Number crunching : New TCGA workunits (TCGAz)
1 · 2 · 3 · Next
Author | Message |
---|---|
The new TCGA workunits (with a modified dataset) that contain the TCGAz string should behave correctly. Please let them run. I started with just a few batches. | |
ID: 1409 · ![]() | |
Thanks for the warning - some of these jobs seem to need considerably more CPU-time. 17 h for 32% in one case. Anyhow, if there's a chance they come to an end I'll let them run. | |
ID: 1410 · ![]() | |
Thanks for the warning - some of these jobs seem to need considerably more CPU-time. 17 h for 32% in one case. Anyhow, if there's a chance they come to an end I'll let them run. if you can, let them run. If you need to turn off your pc at the end of the day, abort them. If you are in the middle of long checkpoint, and you turn off your pc, you'll come back to the previously checkpoint, losing your time. One wu stay at 52% for 9h. Total run time 33h | |
ID: 1411 · ![]() | |
Thanks for the warning - some of these jobs seem to need considerably more CPU-time. 17 h for 32% in one case. Anyhow, if there's a chance they come to an end I'll let them run. I refer to old TCGA wus | |
ID: 1412 · ![]() | |
Thanks for the warning - some of these jobs seem to need considerably more CPU-time. 17 h for 32% in one case. Anyhow, if there's a chance they come to an end I'll let them run. No problem to let them run - looking good. One job finished after 1 days 3 hours 52 min 36 sec, completed and validated. There's one job now at 5% after one day, 19 days to go. Let's see. :-) | |
ID: 1413 · ![]() | |
OK, just to summarize: the TCGA workunits are the problematic ones, really long and with long periods of time (hours) without any checkpoint. Nevertheless we would be very happy if you let them run until completion. | |
ID: 1414 · ![]() | |
OK, just to summarize: the TCGA workunits are the problematic ones, really long and with long periods of time (hours) without any checkpoint. Nevertheless we would be very happy if you let them run until completion. No problem, my CPUs are working 24/7. I have no mercy. | |
ID: 1415 · ![]() | |
Here's one job with no progress but growing time: | |
ID: 1416 · ![]() | |
Well, I got this https://gene.disi.unitn.it/test/result.php?resultid=37659353 that spent almost 24 hours without any (visible) progress and eventually got validated. This one https://gene.disi.unitn.it/test/result.php?resultid=37623246 ran for about three days and still needs to be validated. The longest one, by now, is this one https://gene.disi.unitn.it/test/workunit.php?wuid=17810415 that got more than 2300 credits. | |
ID: 1417 · ![]() | |
...I'm constantly monitoring the situation, adding credits for 'invalid' and 'too late to validate' workunits. Thanks, that's what I love to read. If someone cares for the result, there's some reason in it. I'll let it run and we'll see what happens. Curiosity is a strong motive. | |
ID: 1418 · ![]() | |
Here's the longest one I've seen yet: | |
ID: 1420 · ![]() | |
Here's the longest one I've seen yet: Now at 183 hours, and 156 hours since the last checkpoint (which is troubling). Still at 12%. I assume that if the computer reboots or the WU gets interrupted for any reason it will restart. It appears that 6 computers are currently running this WU and no one has finished it. Is it viable? | |
ID: 1426 · ![]() | |
from the few i have looked at, the sse2 and avx clearly have problems but fma are succeeding. | |
ID: 1432 · ![]() | |
Here's the longest one I've seen yet: Now at 282 hours, 238 hours since the last checkpoint. Still at 12%. Now there seems to be 7 computers actively running this WU (counting the 5 that are listed as "Timed out - no response", one of which is mine). Any possibility of adding some checkpoints to these TCGA WUs? A power glitch or any other interruption would wipe out nearly 10 days of work. | |
ID: 1439 · ![]() | |
Now at 183 hours, and 156 hours since the last checkpoint (which is troubling). Still at 12%. I have seen a few of those. Before they get that far, I abort them. I think you are unnecessarily conscientious; they are duds. | |
ID: 1440 · ![]() | |
There are about 30 TCGA workunits still around and for sure those are the very long ones. Theoretically those could run forever. The reason is that for a certain, very rare, type of input the algorithm's completely is exponential. We usually manage to adjust the input dataset in order to avoid this but in the current case there were an issue that we were able to fix only after the workunits were distributed. The results are of scientific value, of course, but without a checkpoint inside the critical section of the algorithm this is not the kind of computation to do inside the BOINC framework. | |
ID: 1442 · ![]() | |
There are about 30 TCGA workunits still around and for sure those are the very long ones. Theoretically those could run forever. The reason is that for a certain, very rare, type of input the algorithm's completely is exponential. We usually manage to adjust the input dataset in order to avoid this but in the current case there were an issue that we were able to fix only after the workunits were distributed. The results are of scientific value, of course, but without a checkpoint inside the critical section of the algorithm this is not the kind of computation to do inside the BOINC framework. Thanks for the update and explanation. I have four of these WUs currently running on 3 machines. Thought I'd post a small BoinkTasks screenshot: ![]() From clues gleaned by looking at the WU history I expect a couple of these to finish within the next 2 days. A third one will probably be longer and the currently 306 hour one is totally mysterious as several faster machines are running it and it's never been completed. | |
ID: 1443 · ![]() | |
Three of the above WUs finished as expected. The 4th is STILL running at 12% completion after 377 hours (and 312 hours since the last checkpoint). It looks like 8 machines are still running this one, some longer than I have: | |
ID: 1452 · ![]() | |
Do not worry, if you need to abort them do that. There are about 18 workunits of the TCGA batch still around (of a total of 1240), some of them are probably the most problematic ones. | |
ID: 1453 · ![]() | |
The 4th is STILL running at 12% completion after 377 hours (and 312 hours since the last checkpoint). All projects have stuck work units; some more than others. If the Progress % is not making any progress after a few hours (24 hours is more than enough time), then it is stuck in a loop. | |
ID: 1454 · ![]() | |
Message boards :
Number crunching :
New TCGA workunits (TCGAz)