Curious
log in

Advanced search

Message boards : Number crunching : Curious

1 · 2 · 3 · Next
Author Message
iFoggz
Send message
Joined: 16 Mar 17
Posts: 5
Credit: 2,338,439
RAC: 0
Message 2356 - Posted: 8 Aug 2021, 17:09:36 UTC

If you work unit is marked invalid is there any information that could show that it was invalid for sure?

I tried task details but any I had invalid were the same task details as everyone else.

Task details doesn't show any helpful data. Like for example 3 replications all ended with that same number (thou in past I've seen that number different on a few invalids) Would be nice to know if its some instability in my cluster etc.

ex:
Start @ Sat Jul 31 13:08:11 2021
|156224256|
Finish @ Sun Aug 1 12:33:07 2021

Just curious how that part works :)

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 132
Italy
Message 2357 - Posted: 9 Aug 2021, 11:22:12 UTC - in response to Message 2356.
Last modified: 18 Aug 2021, 10:21:31 UTC

We use "redundancy" (two results from different computers must be exactly the same) in order to check if a workunit is "valid" (successful computation). If not a third copy of the workunit is sent to another computer. At the end, if two results are identical they are declared "OK" and all the others are marked "invalid".
So, an "invalid" is, briefly, a computation that reached its normal end but with something wrong in it, usually incorrect numeric calculations. There are many possible reasons for getting an "invalid", like overheating or a faulty hardware component. In this case I suggest to run a stress test, like Prime95, on your computer and check the results.
There is also a known bug in our code that may flag a result as invalid, this may happen if you stop your calculation at the very beginning, before the first checkpoint.

Speedy
Send message
Joined: 13 Nov 21
Posts: 33
Credit: 1,020,742
RAC: 0
New Zealand
Message 2612 - Posted: 2 Apr 2022, 21:54:37 UTC

I am interested to know whether or not the current project that is running Homo Sapiens (OneGenE - FANTOM-1) has a estimated completion date?

Profile AnandBhat
Send message
Joined: 14 Feb 22
Posts: 6
Credit: 1,056,897
RAC: 0
Australia
Message 2613 - Posted: 3 Apr 2022, 6:07:56 UTC - in response to Message 2612.

From the science stats page,

H. sapiens (α=0.05, FANTOM-1) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | # genes/isoforms | Queued | Executed | Last 10 days | | 87554 | 61490 | 59986 (68.51%) | 88.80/day | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

This indicates there are (87554 - 59986) / 88.80 = 310.45 days remaining which gives February 7th, 2023 as the estimated completion date at the current rate.

Speedy
Send message
Joined: 13 Nov 21
Posts: 33
Credit: 1,020,742
RAC: 0
New Zealand
Message 2614 - Posted: 3 Apr 2022, 6:51:30 UTC - in response to Message 2613.

Thank you I was aware that was there I just wasn't sure how to work out the end date

Profile Buro87 [Lombardia]
Send message
Joined: 23 Nov 16
Posts: 100
Credit: 4,000,541
RAC: 0
Italy
Message 2632 - Posted: 19 Apr 2022, 7:16:05 UTC - in response to Message 2612.

I am interested to know whether or not the current project that is running Homo Sapiens (OneGenE - FANTOM-1) has a estimated completion date?



Valter just added "ETA" column in the Science Status page :) https://gene.disi.unitn.it/test/gene_science.php

Speedy
Send message
Joined: 13 Nov 21
Posts: 33
Credit: 1,020,742
RAC: 0
New Zealand
Message 2634 - Posted: 19 Apr 2022, 22:57:10 UTC - in response to Message 2632.

Thank you I am aware of this

Aurum
Send message
Joined: 18 Jul 18
Posts: 97
Credit: 291,386,295
RAC: 0
United States
Message 2694 - Posted: 4 Jun 2022, 13:41:55 UTC

Recently the "Last 10 Days" value, not sure what the units are, maybe genes/day, has dropped from 88 to 66. Server status indicates there's still a lot computers working here. Time to complete WUs is still around 4 hours. I've long thought it'd be nice to have a long term chart to understand the change in genes/day.
http://gene.disi.unitn.it/test/gene_science.php

Speedy
Send message
Joined: 13 Nov 21
Posts: 33
Credit: 1,020,742
RAC: 0
New Zealand
Message 2695 - Posted: 4 Jun 2022, 22:22:43 UTC - in response to Message 2694.

Maybe this link https://www.boincstats.com/stats/150/project/detail/overview will help you

Speedy
Send message
Joined: 13 Nov 21
Posts: 33
Credit: 1,020,742
RAC: 0
New Zealand
Message 2696 - Posted: 5 Jun 2022, 8:42:22 UTC

I haven't had any in a while but occasionally I will receive a task that runs for under 1 hour (Ryzen 9 3900X). I am guessing these are just shorter tasks. Has anybody else noticed the shorter tasks?

Aurum
Send message
Joined: 18 Jul 18
Posts: 97
Credit: 291,386,295
RAC: 0
United States
Message 2697 - Posted: 5 Jun 2022, 10:10:35 UTC

Sure I've seen a number that ran around an hour. Right now I have plenty that are running around 2.5 hours.
I have 31 completed WUs that won't upload. They're all files under 10 kb and Retrying never sends them up. At some point they do seem to have gone up but the list has been growing longer each day for the last week.
My Validation Pending has grown to over 5000 when normally it's well under 2000 which seems odd since they finish faster.
Just curious.

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 132
Italy
Message 2698 - Posted: 5 Jun 2022, 13:37:15 UTC - in response to Message 2696.

I haven't had any in a while but occasionally I will receive a task that runs for under 1 hour (Ryzen 9 3900X). I am guessing these are just shorter tasks. Has anybody else noticed the shorter tasks?

The last workunit of every gene expansion batch is shorter than the others, it is the one containing "wu-294" in its name

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 132
Italy
Message 2699 - Posted: 5 Jun 2022, 13:42:07 UTC - in response to Message 2697.

Sure I've seen a number that ran around an hour. Right now I have plenty that are running around 2.5 hours.
I have 31 completed WUs that won't upload. They're all files under 10 kb and Retrying never sends them up. At some point they do seem to have gone up but the list has been growing longer each day for the last week.
My Validation Pending has grown to over 5000 when normally it's well under 2000 which seems odd since they finish faster.
Just curious.

On the server status page there is a higher than usual number of "tasks in progress". Will check tomorrow, while back in the office, if there is something strange on the server.

Aurum
Send message
Joined: 18 Jul 18
Posts: 97
Credit: 291,386,295
RAC: 0
United States
Message 2700 - Posted: 5 Jun 2022, 16:47:16 UTC - in response to Message 2698.

I haven't had any in a while but occasionally I will receive a task that runs for under 1 hour (Ryzen 9 3900X). I am guessing these are just shorter tasks. Has anybody else noticed the shorter tasks?

The last workunit of every gene expansion batch is shorter than the others, it is the one containing "wu-294" in its name

Bingo! I perused my list of running WUs and the one that's running will finish in under an hour.

Aurum
Send message
Joined: 18 Jul 18
Posts: 97
Credit: 291,386,295
RAC: 0
United States
Message 2701 - Posted: 5 Jun 2022, 17:02:08 UTC - in response to Message 2699.

On the server status page there is a higher than usual number of "tasks in progress". Will check tomorrow, while back in the office, if there is something strange on the server.
That does seem high. Does it include the Ready To Start WUs as well?
Every few days my Ready To Start WUs accumulate to almost 300 and I switch preferences to Resource Zero Mode. TN-GRID works really well in RZM and never seems to give me more than one extra WU waiting in the wings. But at Resource 100% it does not seem to honor the BOINC preference for how much work to buffer. All my computers are set to either 0.5 or 1.0 days but you send more than that. I believe some projects limit the maximum amount of WUs to twice the number of CPU threads and GPUgrid limits it to twice the number of GPUs.
It's been a good while since I've noticed the server running out of available WUs. Nice work tuning it up.

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 64
Credit: 15,299,594
RAC: 0
United States
Message 2703 - Posted: 6 Jun 2022, 6:36:12 UTC

I have two finished tasks that I haven't been able to get uploaded to the server all day for some reason.

Tried all the tricks that I know, but nothing has worked. Just keep getting retried. Counts are 11 and 14 attempts so far.

Dave J
Send message
Joined: 3 Mar 22
Posts: 2
Credit: 65,600
RAC: 0
United Kingdom
Message 2704 - Posted: 6 Jun 2022, 14:32:45 UTC - in response to Message 2703.
Last modified: 6 Jun 2022, 14:37:24 UTC

I have two finished tasks that I haven't been able to get uploaded to the server all day for some reason.

Tried all the tricks that I know, but nothing has worked. Just keep getting retried. Counts are 11 and 14 attempts so far.


I have one too. It is at 100% and then
Mon 06 Jun 2022 15:29:49 BST | TN-Grid Platform | Temporarily failed upload of 208612_Hs_T059718-LRRC46_wu-122_1654480886442_1_0: transient HTTP error


Interestingly, my tasks page is whowing only six tasks waiting for validation. The ones that were on there validated seem to have gone. Also it is only showing one task in progress as opposed to 8 which are actually running.

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 132
Italy
Message 2705 - Posted: 6 Jun 2022, 16:25:59 UTC - in response to Message 2704.

OK, I stopped and restarted the server, checked the databases with mysqlcheck, did a consistency check on the file system, deleted a bunch of zero bytes files inside the upload directory and looked for weird errors inside the logs.
I didn't find something looking particularly "strange", however.
Will monitor the situation and try something else tomorrow

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 64
Credit: 15,299,594
RAC: 0
United States
Message 2706 - Posted: 6 Jun 2022, 16:46:24 UTC - in response to Message 2705.
Last modified: 6 Jun 2022, 16:50:02 UTC

OK, I stopped and restarted the server, checked the databases with mysqlcheck, did a consistency check on the file system, deleted a bunch of zero bytes files inside the upload directory and looked for weird errors inside the logs.
I didn't find something looking particularly "strange", however.
Will monitor the situation and try something else tomorrow

I still am unable to upload my two stalled tasks.

Set http_xfer_debug and got this.

Mon 06 Jun 2022 09:42:57 AM PDT | TN-Grid Platform | Temporarily failed upload of 208508_Hs_T004877-MARCH8_wu-86_1654367869573_1_0: transient HTTP error
Mon 06 Jun 2022 09:42:57 AM PDT | TN-Grid Platform | Backing off 05:14:00 on upload of 208508_Hs_T004877-MARCH8_wu-86_1654367869573_1_0
Mon 06 Jun 2022 09:42:58 AM PDT | | [http_xfer] [ID#0] HTTP: wrote 2415 bytes
Mon 06 Jun 2022 09:42:58 AM PDT | | [http_xfer] [ID#0] HTTP: wrote 2542 bytes
Mon 06 Jun 2022 09:42:58 AM PDT | | [http_xfer] [ID#0] HTTP: wrote 2808 bytes
Mon 06 Jun 2022 09:42:58 AM PDT | | [http_xfer] [ID#0] HTTP: wrote 3113 bytes
Mon 06 Jun 2022 09:42:58 AM PDT | | [http_xfer] [ID#0] HTTP: wrote 2888 bytes
Mon 06 Jun 2022 09:42:58 AM PDT | | [http_xfer] [ID#0] HTTP: wrote 1278 bytes
Mon 06 Jun 2022 09:42:58 AM PDT | | Internet access OK - project servers may be temporarily down.

And with http_debug got this:

Mon 06 Jun 2022 09:47:59 AM PDT | TN-Grid Platform | [http] [ID#4397] Info: Recv failure: Connection reset by peer
Mon 06 Jun 2022 09:47:59 AM PDT | TN-Grid Platform | [http] [ID#4397] Info: Closing connection 9062
Mon 06 Jun 2022 09:47:59 AM PDT | TN-Grid Platform | [http] HTTP error: Failure when receiving data from the peer
Mon 06 Jun 2022 09:47:59 AM PDT | | Project communication failed: attempting access to reference site
Mon 06 Jun 2022 09:47:59 AM PDT | | [http] HTTP_OP::init_get(): https://www.google.com/
Mon 06 Jun 2022 09:47:59 AM PDT | TN-Grid Platform | Temporarily failed upload of 208508_Hs_T004877-MARCH8_wu-86_1654367869573_1_0: transient HTTP error
Mon 06 Jun 2022 09:47:59 AM PDT | TN-Grid Platform | Backing off 05:16:40 on upload of 208508_Hs_T004877-MARCH8_wu-86_1654367869573_1_0

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 132
Italy
Message 2707 - Posted: 6 Jun 2022, 17:49:26 UTC - in response to Message 2706.
Last modified: 6 Jun 2022, 17:50:24 UTC

server side I got something like (apache cgi:error)

(104)Connection reset by peer: [client x.x.x.x:40986] AH01225: Error reading request entity data

BTW I also have one unable to upload task... Can't figure out why...

1 · 2 · 3 · Next
Post to thread

Message boards : Number crunching : Curious


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN