Tasks won't suspend
log in

Advanced search

Message boards : Number crunching : Tasks won't suspend

Author Message
Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 119 - Posted: 20 Dec 2013, 17:14:14 UTC
Last modified: 20 Dec 2013, 17:27:45 UTC

Another general problem is that tasks won't actually suspend after a user request. I just checked this behavior on a x64 platform but it seems platform independent. The Boinc client shows the application as 'suspended' but if you check the task manager you may see it actually running.

Remember that there are actually a few ways for a task to be 'suspended' (and we have to check them all)
- user side: switch off the computer, stop the boinc manager, press the suspend button for a single task or for the corresponding project
- boinc client: doing benchmarks or just stopping the task for his own scheduling reasons.
(also there may be some behavioral changes if a user selects the 'Leave application in memory while suspended' checkbox in the manager.

I don't know how exactly the client tells an application to stop, should be some shared memory message. I also found this on the web

If an application quits with a non-zero code (be it with exit,
boinc_exit, or boinc_finish), the task is marked as having a
computation error and it's reported back to the server.

If an application quits with boinc_finish(0), the task is marked as
successfully finished and the result is uploaded.

If an application quits with a zero code, but doesn't create the file
saying it finished successfully (which is what boinc_finish does), the
BOINC client will restart it. But if BOINC has to restart the
application more than a hundred times, it will be aborted with
ERR_TOO_MANY_EXITS.


Also, enabling the following client side (inside cc_config.xml log section) maybe helpful

Shared-memory messages received from applications.

Shared-memory messages sent to applications.

Debugging information about CPU benchmarks. List-add.pngNew in 5.8

Show when applications checkpoint

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 121 - Posted: 20 Dec 2013, 17:20:24 UTC - in response to Message 119.
Last modified: 20 Dec 2013, 17:28:02 UTC

I enabled the logging, got the following (messages are obviously repeated):

12/20/2013 6:19:10 PM | TN-Grid Test Platform | [app_msg_receive] got msg from slot 3: 1.156547e+0030.000000e+0006.100000e-001

Clicked suspend
12/20/2013 6:23:11 PM | TN-Grid Test Platform | task Expansion_At_work1386239342.xml_pn19465_1 suspended by user
12/20/2013 6:23:11 PM | TN-Grid Test Platform | [app_msg_receive] got msg from slot 3: 1.391063e+0030.000000e+0007.300000e-001
12/20/2013 6:23:11 PM | | [app_msg_send] sent to Expansion_At_work1386239342.xml_pn19465_1
12/20/2013 6:23:12 PM | TN-Grid Test Platform | [app_msg_receive] got msg from slot 3: 1.393078e+0030.000000e+0007.400000e-001

Clicked resume
12/20/2013 6:24:03 PM | TN-Grid Test Platform | task Expansion_At_work1386239342.xml_pn19465_1 resumed by user
12/20/2013 6:24:03 PM | TN-Grid Test Platform | [app_msg_receive] got msg from slot 3: 1.444453e+0030.000000e+0007.600000e-001
12/20/2013 6:24:03 PM | | [app_msg_send] sent to Expansion_At_work1386239342.xml_pn19465_1
12/20/2013 6:24:03 PM | TN-Grid Test Platform | Resuming task Expansion_At_work1386239342.xml_pn19465_1 using gene version 1 in slot 3

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 122 - Posted: 20 Dec 2013, 18:24:10 UTC - in response to Message 121.
Last modified: 20 Dec 2013, 18:25:15 UTC

I have some ideas, did you use void boinc_get_status(BOINC_STATUS*); http://boinc.berkeley.edu/trac/wiki/StatusApi?

BOINC_STATUS boinc_status; boinc_get_status(&boinc_status); if (boinc_status.quit_request || boinc_status.abort_request || boinc_status.suspended)) { /// do things // boinc_end_critical_section(); // boinc_finish(0); }

Profile NadirS [SSC11]
Project developer
Project tester
Project scientist
Send message
Joined: 20 Nov 13
Posts: 13
Credit: 133,573
RAC: 0
Italy
Message 126 - Posted: 20 Dec 2013, 22:24:00 UTC - in response to Message 122.

At the moment we are not catching this state. We will investigate through this api.

Profile luca@gene [SSC11]
Project developer
Project tester
Project scientist
Send message
Joined: 20 Nov 13
Posts: 3
Credit: 205,242
RAC: 0
Italy
Message 133 - Posted: 21 Dec 2013, 12:11:40 UTC - in response to Message 122.

I have some ideas, did you use void boinc_get_status(BOINC_STATUS*); http://boinc.berkeley.edu/trac/wiki/StatusApi?
BOINC_STATUS boinc_status; boinc_get_status(&boinc_status); if (boinc_status.quit_request || boinc_status.abort_request || boinc_status.suspended)) { /// do things // boinc_end_critical_section(); // boinc_finish(0); }

Should we threat the three different status in the same way?

Does boinc_finish(0) implies the upload of the file to the server?

Isn't better to use a different finish code?

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 135 - Posted: 21 Dec 2013, 12:29:39 UTC - in response to Message 133.

I don't know ... it's just five minutes before I will be forced to leave the office and will be with very limited netwrok connection for the next week, maybe we can try to arrange a meeting with people of the server and application group the next 30 Dec in the afternoon... If you want to contact me please use my name.familyname gmail.com address.

(Auguri a tutti!)

marco giglio
Send message
Joined: 12 Nov 13
Posts: 20
Credit: 1,708
RAC: 0
Italy
Message 155 - Posted: 23 Dec 2013, 17:59:33 UTC

wow, it seems that app 0.02 had the side effect to solve another issue!
Up to version 0.01 our app took 100% of the processor even if the user setting was different. It seems to me that the new version solve this issue, my processor is running at 60% as desired


Post to thread

Message boards : Number crunching : Tasks won't suspend


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN