No wus
log in

Advanced search

Message boards : Number crunching : No wus

1 · 2 · Next
Author Message
Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 908,753
RAC: 1,429
Italy
Message 116 - Posted: 20 Dec 2013, 15:16:16 UTC
Last modified: 20 Dec 2013, 15:16:32 UTC

The server status is

Attività pronte per essere spedite 58 Attività in corso 1,942

but there are no wus for Windows (nor for Linux)

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 320
Credit: 16,278,261
RAC: 4,455
Italy
Message 117 - Posted: 20 Dec 2013, 15:39:47 UTC - in response to Message 116.

I just downloaded 5 wu 20 Dec 2013, 15:19:13 UTC

I don't have any clue about the work generator, nor the number of workunits of this batch. Maybe someone of the server group could comment on this.

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 908,753
RAC: 1,429
Italy
Message 118 - Posted: 20 Dec 2013, 16:10:11 UTC - in response to Message 117.

I just downloaded 5 wu 20 Dec 2013, 15:19:13 UTC

I don't have any clue about the work generator, nor the number of workunits of this batch. Maybe someone of the server group could comment on this.


Ok, now i download too....

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 908,753
RAC: 1,429
Italy
Message 120 - Posted: 20 Dec 2013, 17:15:37 UTC - in response to Message 118.

Again, 0 new task
But server status says "74 is ready"
I cannot understand...

Profile danicampa90
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Nov 13
Posts: 28
Credit: 552
RAC: 0
Italy
Message 123 - Posted: 20 Dec 2013, 20:00:47 UTC

Seems very strange,
It does not seem to be a configuration issue (unless there were someone stopping the project exactly at that time). Theoretically if it works for one work-unit it should work for all the others.
The intermittence of the problem make me think that it could be a performance problem server-side. I will look into that soon.

Thanks for reporting the issue.

Profile danicampa90
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Nov 13
Posts: 28
Credit: 552
RAC: 0
Italy
Message 124 - Posted: 20 Dec 2013, 21:45:55 UTC

I checked the CPU time on the server and it does not seem a performance problem.
I also quickly checked the logs of the feeder and scheduler, but in general they didn't give any error.
Furthermore, from the workunit list that other users seem to correctly receive workunits.
I will check the issue in more details tomorrow.

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 908,753
RAC: 1,429
Italy
Message 127 - Posted: 21 Dec 2013, 7:25:22 UTC - in response to Message 124.

The queue seems to be stable today
I'm dowloading correctly from 3 machines...

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 908,753
RAC: 1,429
Italy
Message 130 - Posted: 21 Dec 2013, 10:51:36 UTC - in response to Message 127.

The queue seems to be stable today
I'm dowloading correctly from 3 machines...


Forget it. No wus (and 87 in server status)....

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 320
Credit: 16,278,261
RAC: 4,455
Italy
Message 131 - Posted: 21 Dec 2013, 11:29:36 UTC - in response to Message 130.
Last modified: 21 Dec 2013, 11:40:19 UTC

Thinking about this issue...

- we started boinc using bin/start, I didn't add anything to crontab (and I don't know if I have to do this...)
- we enforced some limits (5 is 5 x number of cores).

5
5

- there are a lot of HOST::parse(): unrecognized: in scheduler.log (maybe posting this on the boinc developer forum could help...)

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 320
Credit: 16,278,261
RAC: 4,455
Italy
Message 132 - Posted: 21 Dec 2013, 12:00:10 UTC - in response to Message 131.

Just got this

2013-12-21 12:53:41.4709 [PID=26557] Request: [USER#22] [HOST#27] [IP 188.216.239.183] client 7.0.28
2013-12-21 12:53:41.4979 [PID=26557] Sending reply to [HOST#27]: 0 results, delay req 121.20
2013-12-21 12:53:41.4992 [PID=26557] Scheduler ran 0.040 seconds

This computer (from Boboviz) is a 6 core, with already 30 in progress, got nothing because hits the limit


***but we have to check the errors in the log***

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 908,753
RAC: 1,429
Italy
Message 134 - Posted: 21 Dec 2013, 12:15:25 UTC - in response to Message 132.
Last modified: 21 Dec 2013, 12:17:40 UTC

This computer (from Boboviz) is a 6 core, with already 30 in progress, got nothing because hits the limit


I've seen the limit and now i have 30 wu
But i have "0 unit" message also when i have NO wu in my pcs...
And after 1 or 2 h. my computers restart to download.
Is there a "time limit" for download (only x wu in 24 h, for example), like ralph@home??

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 320
Credit: 16,278,261
RAC: 4,455
Italy
Message 136 - Posted: 21 Dec 2013, 12:32:11 UTC - in response to Message 134.

500

but I have to investigate further ...

(Auguri!)

Profile danicampa90
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Nov 13
Posts: 28
Credit: 552
RAC: 0
Italy
Message 137 - Posted: 21 Dec 2013, 13:10:51 UTC

I examined the log and it seems that at the same when bobovitz didn't get any WU, other requests (from other users) are served correctly, so it is probably a problem with the limits.

We may try to increase a bit?

I would also have a look at this:
120
5
5

and see what happens....

Valter, what do you think?

Profile danicampa90
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Nov 13
Posts: 28
Credit: 552
RAC: 0
Italy
Message 138 - Posted: 21 Dec 2013, 13:24:02 UTC

I want to add an approximate calculation:
Currently the average of time needed for a workunit max 25 minutes, This means that each core is able to compute 57 workunits/day.

If you have 2 computers with i7 processors you could theoretically reach the limit of 500/day (8 virtual cores * 2 pcs * 57 WU/day > 500 WU/day).

If that's the case, increasing the difficulty of workunits to about an hour may mitigate this problem.

I was planning to increase in difficulty when we get the new application and the new input files from the preprocessing group.

If you agree I can also increase it now (to about 1 hour - 1 hour and half).



Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 320
Credit: 16,278,261
RAC: 4,455
Italy
Message 142 - Posted: 21 Dec 2013, 17:03:50 UTC - in response to Message 138.

IMHO
-stay with the current limits while in this alpha phase
-do not increase the wu length until we have an application that a) validates correctly if checkpointed b) correctly acts if suspended or resumed

Profile danicampa90
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Nov 13
Posts: 28
Credit: 552
RAC: 0
Italy
Message 144 - Posted: 22 Dec 2013, 16:17:39 UTC

We just updated the application, and deprecated the old one.
The new version should hopefully fix the problems we were having with checkpoints and suspensions.
After a short period of testing we will increase the workunit length (I planned this for tomorrow).

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 908,753
RAC: 1,429
Italy
Message 145 - Posted: 22 Dec 2013, 20:24:14 UTC - in response to Message 136.
Last modified: 22 Dec 2013, 20:27:45 UTC

<daily_result_quota>500</daily_result_quota>

(Auguri!)


It's quite frustrating.
In the last 24h i've downloaded 20 or 30 wus with 8 core free...In other projects alpha/beta in witch i partecipate (ralph, albert) the daily limit of wu get this message: "You have reached the daily quota of wus".
I try, also, to force the download with a script (scheduled every 5 minutes), without results

Buone feste a tutti!!

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 908,753
RAC: 1,429
Italy
Message 146 - Posted: 22 Dec 2013, 20:26:18 UTC - in response to Message 144.

We just updated the application, and deprecated the old one.
The new version should hopefully fix the problems we were having with checkpoints and suspensions.
After a short period of testing we will increase the workunit length (I planned this for tomorrow).


Well done!!

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 908,753
RAC: 1,429
Italy
Message 153 - Posted: 23 Dec 2013, 14:41:39 UTC - in response to Message 145.

It's quite frustrating.
In the last 24h i've downloaded 20 or 30 wus with 8 core free...In other projects alpha/beta in witch i partecipate (ralph, albert) the daily limit of wu get this message: "You have reached the daily quota of wus".
I try, also, to force the download with a script (scheduled every 5 minutes), without results


A doubt. I'm the only one with this problem??

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 130
Credit: 908,753
RAC: 1,429
Italy
Message 189 - Posted: 26 Dec 2013, 18:54:15 UTC - in response to Message 153.

With new app "long" version the problem has gone...

1 · 2 · Next
Post to thread

Message boards : Number crunching : No wus


Main page · Your account · Message boards


Copyright © 2017 CNR-TN & UniTN