New TCGA workunits (TCGAz)
log in

Advanced search

Message boards : Number crunching : New TCGA workunits (TCGAz)

Previous · 1 · 2 · 3 · Next
Author Message
Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 50
Credit: 44,372,499
RAC: 0
United States
Message 1459 - Posted: 29 Dec 2018, 5:17:43 UTC - in response to Message 1442.

There are about 30 TCGA workunits still around and for sure those are the very long ones. Theoretically those could run forever. The reason is that for a certain, very rare, type of input the algorithm's completely is exponential. We usually manage to adjust the input dataset in order to avoid this but in the current case there were an issue that we were able to fix only after the workunits were distributed. The results are of scientific value, of course, but without a checkpoint inside the critical section of the algorithm this is not the kind of computation to do inside the BOINC framework.

Anyway, I will wait for them for another couple of days then I will abort them server side, I will figure out a way to give credits even in this case.

It looks like this WU has been cancelled by the server? Yet it's still running on my system and probably others. Here's the latest:



The top WU should finish within a day but the bottom one is still at 12% and now over 605 hours (483 hours since the last checkpoint). It looks "dead". Here's a link to the WU:

http://gene.disi.unitn.it/test/workunit.php?wuid=17810256

and here's a different WU that looks hopeless:

http://gene.disi.unitn.it/test/workunit.php?wuid=17810093

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 624
Credit: 34,677,535
RAC: 1
Italy
Message 1460 - Posted: 29 Dec 2018, 11:19:51 UTC - in response to Message 1459.
Last modified: 29 Dec 2018, 11:30:55 UTC

There are 9 workunits belonging to the TCGA 'bad' patch still 'around'. Some of them were canceled automatically by the server because they hit the "errors: Too many total results" limit. At his point there is no reason to keep the workunits running.
I'm thinking about giving some credits for those results (even if aborted), but I have to write down some code. I will do that while back at work, the next week.

Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 50
Credit: 44,372,499
RAC: 0
United States
Message 1461 - Posted: 29 Dec 2018, 18:38:38 UTC - in response to Message 1460.

Thanks for the update. 148368_Hs_TCGA-KLF6_wu-6_1543433660987 above finished this morning and validated. I'm aborting 148366_Hs_TCGA-BRCA2_wu-136_1543431411382 at 619 hours (still at 12%). I'd bet this has to be a record run time for a WU:

http://gene.disi.unitn.it/test/workunit.php?wuid=17810256

Profile Conan
Send message
Joined: 6 Sep 15
Posts: 13
Credit: 7,885,837
RAC: 0
Australia
Message 1463 - Posted: 3 Jan 2019, 5:51:07 UTC - in response to Message 1461.
Last modified: 3 Jan 2019, 5:52:51 UTC

Thanks for the update. 148368_Hs_TCGA-KLF6_wu-6_1543433660987 above finished this morning and validated. I'm aborting 148366_Hs_TCGA-BRCA2_wu-136_1543431411382 at 619 hours (still at 12%). I'd bet this has to be a record run time for a WU:

http://gene.disi.unitn.it/test/workunit.php?wuid=17810256


Well I guess then that my 141 hours and stuck at 73% for over 2 days, has a way to go.
Still taking up a full core so letting it run. Would be nice to get many, many thousands of credit for it (ha, ha).

Conan

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 624
Credit: 34,677,535
RAC: 1
Italy
Message 1464 - Posted: 3 Jan 2019, 11:32:58 UTC - in response to Message 1463.
Last modified: 3 Jan 2019, 12:13:43 UTC

I will give credits for every 'aborted by user' result inside a "Too many total results" workunit. I will calculate the average credit per hour of the involved pc, grant credits proportionally, plus a 20% bonus. For result 37630664 inside http://gene.disi.unitn.it/test/workunit.php?wuid=17810256, runtime of 2,230,059.45 sec will result in about 10k credits

37630664 run time: 2230059.45166 hostid: 3526 (n: 74) average credit : 143.67223166234 average runtime: 36048.057297297 factor: 0.0039855748807055 credit: 8888.068933016 credit+bonus: 10665.682719619


let me know if you agree with this (you need to abort all your still running TCGA wus)

Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 50
Credit: 44,372,499
RAC: 0
United States
Message 1465 - Posted: 3 Jan 2019, 17:43:40 UTC - in response to Message 1464.

Sounds fair to me. Thanks!

Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 50
Credit: 44,372,499
RAC: 0
United States
Message 1466 - Posted: 3 Jan 2019, 17:56:30 UTC - in response to Message 1463.

Well I guess then that my 141 hours and stuck at 73% for over 2 days, has a way to go.
Still taking up a full core so letting it run. Would be nice to get many, many thousands of credit for it (ha, ha).

Conan

Conan, looks like you're running this WU:

http://gene.disi.unitn.it/test/workunit.php?wuid=17810575

Judging from the speed of the one machine that completed it vs. your machine I'd say you're looking at around 10-11 days total for this WU. It should complete though if your computer doesn't do something untoward (such as reboot).

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 624
Credit: 34,677,535
RAC: 1
Italy
Message 1467 - Posted: 4 Jan 2019, 12:19:21 UTC - in response to Message 1465.

Sounds fair to me. Thanks!

Ok, I did the 'credit granting' for all the results that were "aborted by user" or "can't validate" belonging to the "Too many total results" workunits, like this one http://gene.disi.unitn.it/test/workunit.php?wuid=17810097.
The ones without credits are those without any recent host statistics (I cannot figure out the values)

Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 50
Credit: 44,372,499
RAC: 0
United States
Message 1470 - Posted: 4 Jan 2019, 16:31:37 UTC - in response to Message 1467.

Thanks valterc! It's also quite possible that some of those "timed out - no response" are still running unless the server sends an abort message...

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 624
Credit: 34,677,535
RAC: 1
Italy
Message 1471 - Posted: 4 Jan 2019, 18:53:46 UTC - in response to Message 1470.

Yep, thanks, I also cancelled, server side, those workunits. Will check if it worked.

Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 50
Credit: 44,372,499
RAC: 0
United States
Message 1473 - Posted: 5 Jan 2019, 15:35:09 UTC - in response to Message 1471.

It worked and it looks like they're being awarded credit. Very nice:

http://gene.disi.unitn.it/test/workunit.php?wuid=17810256

Profile Conan
Send message
Joined: 6 Sep 15
Posts: 13
Credit: 7,885,837
RAC: 0
Australia
Message 1475 - Posted: 5 Jan 2019, 19:55:34 UTC - in response to Message 1466.

Well I guess then that my 141 hours and stuck at 73% for over 2 days, has a way to go.
Still taking up a full core so letting it run. Would be nice to get many, many thousands of credit for it (ha, ha).

Conan

Conan, looks like you're running this WU:

WU=17810575

Judging from the speed of the one machine that completed it vs. your machine I'd say you're looking at around 10-11 days total for this WU. It should complete though if your computer doesn't do something untoward (such as reboot).


Beyond, yes that is the one.

It has passed 200 hours now still at 73% and last checkpoint was at 24 hours.

But still running and using a full core so we will see how it goes, must finish soon?

Conan

Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 50
Credit: 44,372,499
RAC: 0
United States
Message 1476 - Posted: 6 Jan 2019, 15:46:33 UTC - in response to Message 1475.

Conan, looks like you're running this WU: WU=17810575
Judging from the speed of the one machine that completed it vs. your machine I'd say you're looking at around 10-11 days total for this WU. It should complete though if your computer doesn't do something untoward (such as reboot).

Beyond, yes that is the one.
It has passed 200 hours now still at 73% and last checkpoint was at 24 hours.
But still running and using a full core so we will see how it goes, must finish soon?

I'm guessing 240-264 hours. Let's start a pool... ;-)
I'll take 252 hours.

Profile Conan
Send message
Joined: 6 Sep 15
Posts: 13
Credit: 7,885,837
RAC: 0
Australia
Message 1477 - Posted: 7 Jan 2019, 1:34:10 UTC - in response to Message 1476.

Conan, looks like you're running this WU: WU=17810575
Judging from the speed of the one machine that completed it vs. your machine I'd say you're looking at around 10-11 days total for this WU. It should complete though if your computer doesn't do something untoward (such as reboot).

Beyond, yes that is the one.
It has passed 200 hours now still at 73% and last checkpoint was at 24 hours.
But still running and using a full core so we will see how it goes, must finish soon?

I'm guessing 240-264 hours. Let's start a pool... ;-)
I'll take 252 hours.


It is now at 232 hours, still at 73.00%,

I will have a go at 270 hours as my computers are not the quickest.

Conan

Profile Conan
Send message
Joined: 6 Sep 15
Posts: 13
Credit: 7,885,837
RAC: 0
Australia
Message 1478 - Posted: 7 Jan 2019, 21:01:37 UTC - in response to Message 1477.

Conan, looks like you're running this WU: WU=17810575
Judging from the speed of the one machine that completed it vs. your machine I'd say you're looking at around 10-11 days total for this WU. It should complete though if your computer doesn't do something untoward (such as reboot).

Beyond, yes that is the one.
It has passed 200 hours now still at 73% and last checkpoint was at 24 hours.
But still running and using a full core so we will see how it goes, must finish soon?

I'm guessing 240-264 hours. Let's start a pool... ;-)
I'll take 252 hours.


It is now at 232 hours, still at 73.00%,

I will have a go at 270 hours as my computers are not the quickest.

Conan


Well Beyond, it has passed 252 Hours and is still running and still at 73.00%.
Care for another guess?

Conan

Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 50
Credit: 44,372,499
RAC: 0
United States
Message 1479 - Posted: 7 Jan 2019, 21:38:41 UTC - in response to Message 1478.

Well Beyond, it has passed 252 Hours and is still running and still at 73.00%.
Care for another guess?

Conan

264 hours because 2+4=6 and 6+6=12 and 2x6=12 and 2x12x12-2x12=264. Figure out that reasoning... ;-)

Profile Conan
Send message
Joined: 6 Sep 15
Posts: 13
Credit: 7,885,837
RAC: 0
Australia
Message 1480 - Posted: 8 Jan 2019, 21:55:48 UTC - in response to Message 1479.

Well Beyond, it has passed 252 Hours and is still running and still at 73.00%.
Care for another guess?

Conan

264 hours because 2+4=6 and 6+6=12 and 2x6=12 and 2x12x12-2x12=264. Figure out that reasoning... ;-)


Well that reasoning is pretty good but alas it is also out, as well as my 270 hours (with no reasoning).
It has now passed 277 hours but has moved to 83.5% so we have progress.

Maybe 200 hours?

Conan

Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 50
Credit: 44,372,499
RAC: 0
United States
Message 1481 - Posted: 9 Jan 2019, 6:53:23 UTC - in response to Message 1480.
Last modified: 9 Jan 2019, 7:00:08 UTC

293 hours because it's prime.
Edit: just looked, think it was between 281 and 282 hours. 281 is prime too...
You received > 7,000 credits. Cool. Wonder why the first guy who finished it didn't get credits?

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 624
Credit: 34,677,535
RAC: 1
Italy
Message 1482 - Posted: 9 Jan 2019, 10:02:36 UTC - in response to Message 1481.

293 hours because it's prime.
Edit: just looked, think it was between 281 and 282 hours. 281 is prime too...
You received > 7,000 credits. Cool. Wonder why the first guy who finished it didn't get credits?

It didn't validate correctly, nevertheless I just assigned credits.

Profile Beyond
Avatar
Send message
Joined: 2 Nov 16
Posts: 50
Credit: 44,372,499
RAC: 0
United States
Message 1484 - Posted: 9 Jan 2019, 18:27:24 UTC - in response to Message 1482.

You have got to be one of the nicest, most conscientious admins in BoincLand.

Previous · 1 · 2 · 3 · Next
Post to thread

Message boards : Number crunching : New TCGA workunits (TCGAz)


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN