Posts by Dj Ninja
log in
1) Message boards : Number crunching : Unknown error number (0xffffffffc000001d) (Message 873)
Posted 7 Feb 2017 by Dj Ninja
Windows by default opens few ports for file and printer sharing, please check that you block them too.

That all ends up in the router's firewall.
2) Message boards : Number crunching : Unknown error number (0xffffffffc000001d) (Message 867)
Posted 6 Feb 2017 by Dj Ninja
All of these computers are behind firewalls and can't be reached by unrequested incoming connections. They are no http servers or something like that. This is why I have to drive to them when I have to make changes. I think this is safe enough.

As the standard application chooses the processor-bounded module by itself, a check function could switch the application to SSE even on AVX-capable processors. This way the workunit will not crash. Of course, this can't be done for manually installed applications (which is not necessary because the user should know what he does when playing with optimized applications).

I updated the machine to SP1 but now I am getting errors that the workunits are committed to another platforms, maybe due to the newest standard application change.

I'll be leaving TN Grid for now, maybe it is more stable and not changed constantly in a couple of months. If I have a single machine only to maintain, then this might be tolerable, but I haven't. I don't like projects which are modified frequently because of that.

Anyway, thanks for your support.
3) Message boards : Number crunching : Inconclusive workunits - checkpointing problem? (Message 859)
Posted 6 Feb 2017 by Dj Ninja
Okay, thank you for explanation. I found the number quite high, much higher than that what I've seen on other projects.
4) Message boards : Number crunching : Unknown error number (0xffffffffc000001d) (Message 858)
Posted 6 Feb 2017 by Dj Ninja
That might be. I'll try that out, just to confirm. Some of my machines are used for nothing else than boinc and therefore not updated to prevent problems with the updates (unplanned restarts, machines not coming up again).

Can the standard application get a check against this before it uses AVX? This will make it more stable.
5) Message boards : Number crunching : Inconclusive workunits - checkpointing problem? (Message 853)
Posted 6 Feb 2017 by Dj Ninja
I've got a couple of inconclusive workunits on multiple machines.
here is one of them. When I looked into them I noticed that all of my wingmen's workunits have been restarted from checkpoints while mine ran straight through without interruption.

May there be a checkpointing problem which leads to different results when a workunit is restarted?
6) Message boards : Number crunching : Unknown error number (0xffffffffc000001d) (Message 851)
Posted 5 Feb 2017 by Dj Ninja
Sorry to tell you... but it crashes exactly like the previous AVX version.
7) Message boards : Number crunching : Unknown error number (0xffffffffc000001d) (Message 848)
Posted 4 Feb 2017 by Dj Ninja
This machine is not constantly monitored because of its location. I have to drive there when there is such a problem or to do majer changes (no remote disk access).

When the science application fails (immediately after the WU is started) windows throw up an error report, which has to be acknowledged by the user. If nobody does this the WU will halt until someone does, blocking the working slot and leaving the CPU idle.

EDIT:
The machine is rockstable, even on Primegrid LLR tests (which are really hard for the CPU) and has produced 107 good and 59 pending WUs since I installed the SSE2 app manually.
8) Message boards : Number crunching : Optimization (Message 846)
Posted 4 Feb 2017 by Dj Ninja
I think he better should try the SSE2 version.

I have an i5-3570 which is nearly an i7-3770 without HT and your AVX (not AVX2) app crashes instantly on this machine.
9) Message boards : Number crunching : Unknown error number (0xffffffffc000001d) (Message 845)
Posted 4 Feb 2017 by Dj Ninja
http://gene.disi.unitn.it/test/result.php?resultid=6502952
http://gene.disi.unitn.it/test/result.php?resultid=6502909

There you have two of them. The standard app switched randomly between SSE2 and AVX and all the AVX tasks failed, blocking the work unit slot due to a windows error report.

I tried to use the new AVX optimized app on this machine too, which failed the same way. The manually installed new SSE2 optimized app runs without problems.
10) Message boards : Number crunching : Unknown error number (0xffffffffc000001d) (Message 840)
Posted 3 Feb 2017 by Dj Ninja
Hi!

I got these crashes with the standard app too.

It seems that there are different "versions" of AVX (not AVX2) or some AVX-capable processors don't support the full used instruction set. This problem affects the new optimized AVX (not AVX2) app too, both crashed here on and i5 3570, which should be able to run AVX. The SSE2 app runs great on this machine.




Main page · Your account · Message boards


Copyright © 2017 CNR-TN & UniTN