Completed, marked as invalid cases
log in

Advanced search

Message boards : Number crunching : Completed, marked as invalid cases

Author Message
Profile AnandBhat
Send message
Joined: 14 Feb 22
Posts: 6
Credit: 1,056,897
RAC: 14
Australia
Message 2640 - Posted: 26 Apr 2022, 7:10:35 UTC

I've noticed about a dozen of my tasks have failed validation ("Completed, marked as invalid"). These are non-overclocked stable clients. In all cases, I see the task that was marked as invalid was stopped and then started using a checkpoint. E.g.,
Task 72335483 - Start from checkpoint: 266
Task 73023296 - Start from checkpoint: 600

Is there a known problem where tasks that start from a checkpoint fail to produce valid results?

I also noticed this in another BOINC project (although the client that produced the invalid result from a saved checkpoint wasn't mine). Is checkpointing handled differently by each BOINC project or is it a standard functionality offered by the BOINC platform?

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 619
Credit: 34,528,135
RAC: 1,223
Italy
Message 2641 - Posted: 26 Apr 2022, 10:14:19 UTC - in response to Message 2640.
Last modified: 26 Apr 2022, 10:15:11 UTC

We actually do have a mysterious bug, from the very beginning of the project, that may invalidate the result in some cases. We figured out that it is related with the check-pointing but usually it may happens if a task is stopped and restarted at the very beginning, i.e. before the first checkpoint. So, both of your cases are strange. The second one particularly; any workunit of the current type is made up by 294 chunks and checkpoints are written on the disk only when one chunk has been computed: so there shouldn't be checkpoint #600.

For the second question, there is not any BOINC built-in way to handle checkpoints, only some suggestions. While writing the applications you should figure out if it is possible (in some cases it isn't, depends on the algorithm) and write the checkpoint code (saving and restoring the computational "state")

Profile AnandBhat
Send message
Joined: 14 Feb 22
Posts: 6
Credit: 1,056,897
RAC: 14
Australia
Message 2642 - Posted: 26 Apr 2022, 12:25:04 UTC - in response to Message 2641.

Thanks, I'll keep a look out for any such cases in the future.

arcturus
Send message
Joined: 18 May 22
Posts: 17
Credit: 5,806,368
RAC: 0
United States
Message 2692 - Posted: 4 Jun 2022, 13:01:46 UTC

Approximately 5% of all my submitted WU's are coming back as invalid with checkpoints all over the spectrum. The same hosts running another project come back at virtually 0% invalid. All hosts run around 12 hrs/today. Is this checkpoint bug somehow being aggravated by shutting my hosts down and is this failure rate consistent with other hosts operating part time?


Post to thread

Message boards : Number crunching : Completed, marked as invalid cases


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN