Author |
Message |
|
The server status is
Attività pronte per essere spedite 58
Attività in corso 1,942
but there are no wus for Windows (nor for Linux) |
|
|
valtercProject administrator Project tester Send message
Joined: 30 Oct 13 Posts: 623 Credit: 34,677,535 RAC: 2
|
I just downloaded 5 wu 20 Dec 2013, 15:19:13 UTC
I don't have any clue about the work generator, nor the number of workunits of this batch. Maybe someone of the server group could comment on this. |
|
|
|
I just downloaded 5 wu 20 Dec 2013, 15:19:13 UTC
I don't have any clue about the work generator, nor the number of workunits of this batch. Maybe someone of the server group could comment on this.
Ok, now i download too....
|
|
|
|
Again, 0 new task
But server status says "74 is ready"
I cannot understand... |
|
|
|
Seems very strange,
It does not seem to be a configuration issue (unless there were someone stopping the project exactly at that time). Theoretically if it works for one work-unit it should work for all the others.
The intermittence of the problem make me think that it could be a performance problem server-side. I will look into that soon.
Thanks for reporting the issue. |
|
|
|
I checked the CPU time on the server and it does not seem a performance problem.
I also quickly checked the logs of the feeder and scheduler, but in general they didn't give any error.
Furthermore, from the workunit list that other users seem to correctly receive workunits.
I will check the issue in more details tomorrow.
|
|
|
|
The queue seems to be stable today
I'm dowloading correctly from 3 machines... |
|
|
|
The queue seems to be stable today
I'm dowloading correctly from 3 machines...
Forget it. No wus (and 87 in server status)....
|
|
|
valtercProject administrator Project tester Send message
Joined: 30 Oct 13 Posts: 623 Credit: 34,677,535 RAC: 2
|
Thinking about this issue...
- we started boinc using bin/start, I didn't add anything to crontab (and I don't know if I have to do this...)
- we enforced some limits (5 is 5 x number of cores).
5
5
- there are a lot of HOST::parse(): unrecognized: in scheduler.log (maybe posting this on the boinc developer forum could help...) |
|
|
valtercProject administrator Project tester Send message
Joined: 30 Oct 13 Posts: 623 Credit: 34,677,535 RAC: 2
|
Just got this
2013-12-21 12:53:41.4709 [PID=26557] Request: [USER#22] [HOST#27] [IP 188.216.239.183] client 7.0.28
2013-12-21 12:53:41.4979 [PID=26557] Sending reply to [HOST#27]: 0 results, delay req 121.20
2013-12-21 12:53:41.4992 [PID=26557] Scheduler ran 0.040 seconds
This computer (from Boboviz) is a 6 core, with already 30 in progress, got nothing because hits the limit
***but we have to check the errors in the log*** |
|
|
|
This computer (from Boboviz) is a 6 core, with already 30 in progress, got nothing because hits the limit
I've seen the limit and now i have 30 wu
But i have "0 unit" message also when i have NO wu in my pcs...
And after 1 or 2 h. my computers restart to download.
Is there a "time limit" for download (only x wu in 24 h, for example), like ralph@home?? |
|
|
valtercProject administrator Project tester Send message
Joined: 30 Oct 13 Posts: 623 Credit: 34,677,535 RAC: 2
|
500
but I have to investigate further ...
(Auguri!) |
|
|
|
I examined the log and it seems that at the same when bobovitz didn't get any WU, other requests (from other users) are served correctly, so it is probably a problem with the limits.
We may try to increase a bit?
I would also have a look at this:
<min_sendwork_interval>120</min_sendwork_interval>
<max_wus_in_progress>5</max_wus_in_progress>
<max_wus_to_send>5</max_wus_to_send>
and see what happens....
Valter, what do you think?
|
|
|
|
I want to add an approximate calculation:
Currently the average of time needed for a workunit max 25 minutes, This means that each core is able to compute 57 workunits/day.
If you have 2 computers with i7 processors you could theoretically reach the limit of 500/day (8 virtual cores * 2 pcs * 57 WU/day > 500 WU/day).
If that's the case, increasing the difficulty of workunits to about an hour may mitigate this problem.
I was planning to increase in difficulty when we get the new application and the new input files from the preprocessing group.
If you agree I can also increase it now (to about 1 hour - 1 hour and half).
|
|
|
valtercProject administrator Project tester Send message
Joined: 30 Oct 13 Posts: 623 Credit: 34,677,535 RAC: 2
|
IMHO
-stay with the current limits while in this alpha phase
-do not increase the wu length until we have an application that a) validates correctly if checkpointed b) correctly acts if suspended or resumed |
|
|
|
We just updated the application, and deprecated the old one.
The new version should hopefully fix the problems we were having with checkpoints and suspensions.
After a short period of testing we will increase the workunit length (I planned this for tomorrow).
|
|
|
|
<daily_result_quota>500</daily_result_quota>
(Auguri!)
It's quite frustrating.
In the last 24h i've downloaded 20 or 30 wus with 8 core free...In other projects alpha/beta in witch i partecipate (ralph, albert) the daily limit of wu get this message: "You have reached the daily quota of wus".
I try, also, to force the download with a script (scheduled every 5 minutes), without results
Buone feste a tutti!! |
|
|
|
We just updated the application, and deprecated the old one.
The new version should hopefully fix the problems we were having with checkpoints and suspensions.
After a short period of testing we will increase the workunit length (I planned this for tomorrow).
Well done!!
|
|
|
|
It's quite frustrating.
In the last 24h i've downloaded 20 or 30 wus with 8 core free...In other projects alpha/beta in witch i partecipate (ralph, albert) the daily limit of wu get this message: "You have reached the daily quota of wus".
I try, also, to force the download with a script (scheduled every 5 minutes), without results
A doubt. I'm the only one with this problem??
|
|
|
|
With new app "long" version the problem has gone... |
|
|