Unknown error number (0xffffffffc000001d)
log in

Advanced search

Message boards : Number crunching : Unknown error number (0xffffffffc000001d)

1 · 2 · Next
Author Message
Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 1
Italy
Message 835 - Posted: 30 Jan 2017, 17:48:52 UTC
Last modified: 30 Jan 2017, 17:49:05 UTC

Got a couple of hosts getting this error: -1073741795 (0xffffffffc000001d) Unknown error number.
The first host is a Xeon E5-2650 running MS Windows Server 2008 "R2" Enterprise x64 (app is v0.10 (avx))
The second one is Xeon X5680 running MS Windows 10 Pro X64 (anonymous platform)

Any hints about this kind of error and about how to fix it?

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 837 - Posted: 30 Jan 2017, 18:37:26 UTC - in response to Message 835.
Last modified: 30 Jan 2017, 18:46:43 UTC

Recently I investigated similar case here: http://gene.disi.unitn.it/test/forum_thread.php?id=135&postid=817#817. Someone tried to run AVX app on non-AVX CPU. When I googled for this error code (truncated to 32-bit, 0xc000001d) I found pages where people also had this problem when they tried to run some SSE apps on non-SSE CPU.

Edit: I found that it is possible to disable AVX support in Windows by executing command "bcdedit /set xsavedisable 1". Maybe first person did this for some reason (overheating?).
2nd CPU (Xeon X5680) supports up to SSE 4.2, so maybe he/she tried to run AVX or FMA app on it.
____________

Dj Ninja
Send message
Joined: 3 Feb 17
Posts: 13
Credit: 1,013,889
RAC: 0
Germany
Message 840 - Posted: 3 Feb 2017, 20:42:35 UTC

Hi!

I got these crashes with the standard app too.

It seems that there are different "versions" of AVX (not AVX2) or some AVX-capable processors don't support the full used instruction set. This problem affects the new optimized AVX (not AVX2) app too, both crashed here on and i5 3570, which should be able to run AVX. The SSE2 app runs great on this machine.

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 844 - Posted: 4 Feb 2017, 10:05:36 UTC - in response to Message 840.
Last modified: 4 Feb 2017, 10:06:50 UTC

Hi!

I got these crashes with the standard app too.

It seems that there are different "versions" of AVX (not AVX2) or some AVX-capable processors don't support the full used instruction set. This problem affects the new optimized AVX (not AVX2) app too, both crashed here on and i5 3570, which should be able to run AVX. The SSE2 app runs great on this machine.

Unfortunately your computers are hidden, so I cannot check details. Please send me a link to some AVX WU which crashed for you.
____________

Dj Ninja
Send message
Joined: 3 Feb 17
Posts: 13
Credit: 1,013,889
RAC: 0
Germany
Message 845 - Posted: 4 Feb 2017, 17:25:13 UTC
Last modified: 4 Feb 2017, 17:49:16 UTC

http://gene.disi.unitn.it/test/result.php?resultid=6502952
http://gene.disi.unitn.it/test/result.php?resultid=6502909

There you have two of them. The standard app switched randomly between SSE2 and AVX and all the AVX tasks failed, blocking the work unit slot due to a windows error report.

I tried to use the new AVX optimized app on this machine too, which failed the same way. The manually installed new SSE2 optimized app runs without problems.

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 847 - Posted: 4 Feb 2017, 18:04:49 UTC - in response to Message 845.
Last modified: 4 Feb 2017, 18:13:50 UTC

http://gene.disi.unitn.it/test/result.php?resultid=6502952
http://gene.disi.unitn.it/test/result.php?resultid=6502909

There you have two of them. The standard app switched randomly between SSE2 and AVX and all the AVX tasks failed, blocking the work unit slot due to a windows error report.

I tried to use the new AVX optimized app on this machine too, which failed the same way. The manually installed new SSE2 optimized app runs without problems.

Thanks. Your CPU is an Intel Ivy Bridge, so it should have working AVX. I checked these WUs. They worked for some time before crashing, so looks that they were able to execute AVX for some time. Apps on CPUs without AVX usually crashes within few seconds.
I have noticed one thing: second WU worked for over 11 hours before it finally crashed, what is strange. Do you have similar problems with apps from other projects? I suspect that your CPU may be overheating or you have some other hardware issue, e.g. with memory. Please try to stress-test your PC, here is list of some software do do this: https://www.raymond.cc/blog/test-system-stability-by-putting-heavy-load-on-system-resources/. And here are memory testers: http://www.howtogeek.com/260813/how-to-test-your-computers-ram-for-problems/
____________

Dj Ninja
Send message
Joined: 3 Feb 17
Posts: 13
Credit: 1,013,889
RAC: 0
Germany
Message 848 - Posted: 4 Feb 2017, 18:13:39 UTC
Last modified: 4 Feb 2017, 18:17:30 UTC

This machine is not constantly monitored because of its location. I have to drive there when there is such a problem or to do majer changes (no remote disk access).

When the science application fails (immediately after the WU is started) windows throw up an error report, which has to be acknowledged by the user. If nobody does this the WU will halt until someone does, blocking the working slot and leaving the CPU idle.

EDIT:
The machine is rockstable, even on Primegrid LLR tests (which are really hard for the CPU) and has produced 107 good and 59 pending WUs since I installed the SSE2 app manually.

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 850 - Posted: 4 Feb 2017, 21:18:30 UTC

Well, I am puzzled. It should work for you, bot for some reason it crashes. I checked compilation options and they should be fine, according to various pages enabled instruction sets should be supported by your CPU.

I have created app specifically tailored for IvyBridge CPUs (compiled with -march=ivybridge -mtune=ivybridge). Please try it and let me know if it works or still crashes.
https://bitbucket.org/sirzooro/pc-boinc/downloads/TN-Grid.windows-x86-64-ivybridge-v1.1.zip
____________

Dj Ninja
Send message
Joined: 3 Feb 17
Posts: 13
Credit: 1,013,889
RAC: 0
Germany
Message 851 - Posted: 5 Feb 2017, 13:59:49 UTC

Sorry to tell you... but it crashes exactly like the previous AVX version.

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 852 - Posted: 5 Feb 2017, 19:16:35 UTC - in response to Message 851.
Last modified: 5 Feb 2017, 19:26:34 UTC

Sorry to tell you... but it crashes exactly like the previous AVX version.

Strange. I suspect that my app uses some rarely used AVX instruction, which is not recognized by your CPU because of some bug in its microcode so it reports error "illegal instruction". Other projects apparently does not use it, so they work fine. Please try updating microcode in your CPU. This update should work for you: https://support.microsoft.com/pl-pl/help/3064209/june-2015-intel-cpu-microcode-update-for-windows. It can be also done from Linux: https://askubuntu.com/questions/545925/how-to-update-intel-microcode-properly/546056. You can also try to update BIOS, CPU microcode updates may be distributed this way too.
____________

Woof
Send message
Joined: 16 Jan 17
Posts: 3
Credit: 650,991
RAC: 0
Message 854 - Posted: 6 Feb 2017, 6:06:04 UTC - in response to Message 852.
Last modified: 6 Feb 2017, 6:07:05 UTC


Strange. I suspect that my app uses some rarely used AVX instruction, which is not recognized by your CPU


Looking at that host's specs,he's on Win7 but it doesn't declare SP1,which is required for AVX support.

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 855 - Posted: 6 Feb 2017, 6:54:07 UTC - in response to Message 854.


Strange. I suspect that my app uses some rarely used AVX instruction, which is not recognized by your CPU


Looking at that host's specs,he's on Win7 but it doesn't declare SP1,which is required for AVX support.

You are right, I missed this detail. SP1 is required for AVX.
____________

Dj Ninja
Send message
Joined: 3 Feb 17
Posts: 13
Credit: 1,013,889
RAC: 0
Germany
Message 858 - Posted: 6 Feb 2017, 13:13:10 UTC

That might be. I'll try that out, just to confirm. Some of my machines are used for nothing else than boinc and therefore not updated to prevent problems with the updates (unplanned restarts, machines not coming up again).

Can the standard application get a check against this before it uses AVX? This will make it more stable.

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 862 - Posted: 6 Feb 2017, 18:05:18 UTC - in response to Message 858.

That might be. I'll try that out, just to confirm. Some of my machines are used for nothing else than boinc and therefore not updated to prevent problems with the updates (unplanned restarts, machines not coming up again).

This is very bad from security perspective - new security holes are found every month, and computers connected to Internet and constantly scanned for these holes in attempt to turn them into zombies connected to some botnet. There is even malware which scans local network and tries to infect computers there. Without antivirus and firewall which will block all incoming traffic such computer will sooner or later be infected.

Can the standard application get a check against this before it uses AVX? This will make it more stable.

Yes, it is possible. App could check this and print some user-friendly error. However it still will exit with fail status, unsupported instruction set usually is not something what user may be able to fix without upgrading CPU. I will include such check when I will be releasing new app.
____________

Dj Ninja
Send message
Joined: 3 Feb 17
Posts: 13
Credit: 1,013,889
RAC: 0
Germany
Message 867 - Posted: 6 Feb 2017, 20:47:56 UTC

All of these computers are behind firewalls and can't be reached by unrequested incoming connections. They are no http servers or something like that. This is why I have to drive to them when I have to make changes. I think this is safe enough.

As the standard application chooses the processor-bounded module by itself, a check function could switch the application to SSE even on AVX-capable processors. This way the workunit will not crash. Of course, this can't be done for manually installed applications (which is not necessary because the user should know what he does when playing with optimized applications).

I updated the machine to SP1 but now I am getting errors that the workunits are committed to another platforms, maybe due to the newest standard application change.

I'll be leaving TN Grid for now, maybe it is more stable and not changed constantly in a couple of months. If I have a single machine only to maintain, then this might be tolerable, but I haven't. I don't like projects which are modified frequently because of that.

Anyway, thanks for your support.

Dave Peachey
Send message
Joined: 6 Nov 16
Posts: 7
Credit: 2,364,725
RAC: 0
United Kingdom
Message 868 - Posted: 6 Feb 2017, 20:55:15 UTC - in response to Message 867.
Last modified: 6 Feb 2017, 20:56:14 UTC

... now I am getting errors that the workunits are committed to another platforms, maybe due to the newest standard application change.

I think that particular problem has now been resolved - refer to valterc's response to the thread at http://gene.disi.unitn.it/test/forum_thread.php?id=155.

I was getting that problem myself but have just now been able to download new WUs again so, if you're inclined, it might be worth your while to try agin with the latest official version (v0.11) of the application and see what success you have.

Profile [B@P] Daniel
Volunteer developer
Send message
Joined: 19 Oct 16
Posts: 90
Credit: 2,205,103
RAC: 0
Poland
Message 870 - Posted: 6 Feb 2017, 21:58:35 UTC - in response to Message 867.
Last modified: 6 Feb 2017, 22:07:36 UTC

All of these computers are behind firewalls and can't be reached by unrequested incoming connections. They are no http servers or something like that. This is why I have to drive to them when I have to make changes. I think this is safe enough.

Windows by default opens few ports for file and printer sharing, please check that you block them too.

As the standard application chooses the processor-bounded module by itself, a check function could switch the application to SSE even on AVX-capable processors. This way the workunit will not crash. Of course, this can't be done for manually installed applications (which is not necessary because the user should know what he does when playing with optimized applications).

BOINC Client sends list of CPU capabilities to server, and server uses it to select app version which will be used. Additionally it can try to compute few WUs using every version supported by given CPU to find which one is the fastest one for it. This for sure could be improved a bit, to check if Win 7 has SP1 installed and do not sent AVX if it does not have it.
It is also possible to create app which will contains all 3 code versions, and
will check CPU capabilities during start to select appropriate one. However creation of such app is more difficult, also performance tests of different versions would be more complicated. So for me simple sanity check in app that required instruction set is available is more reasonable.

I updated the machine to SP1 but now I am getting errors that the workunits are committed to another platforms, maybe due to the newest standard application change.

This is resolved now, you can try again.

I'll be leaving TN Grid for now, maybe it is more stable and not changed constantly in a couple of months. If I have a single machine only to maintain, then this might be tolerable, but I haven't. I don't like projects which are modified frequently because of that.

Anyway, thanks for your support.

You can stick to official app versions, BOINC Client will take care of them. Usually this works fine, except rare situation like this missing SP1.
____________

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 1
Italy
Message 871 - Posted: 6 Feb 2017, 22:05:18 UTC

I could define a minimum os version for avx apps. This will for sure solve this problem. Will check this tomorrow.

Dj Ninja
Send message
Joined: 3 Feb 17
Posts: 13
Credit: 1,013,889
RAC: 0
Germany
Message 873 - Posted: 7 Feb 2017, 7:15:21 UTC

Windows by default opens few ports for file and printer sharing, please check that you block them too.

That all ends up in the router's firewall.

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 1
Italy
Message 880 - Posted: 7 Feb 2017, 16:55:19 UTC - in response to Message 871.
Last modified: 7 Feb 2017, 16:59:57 UTC

I could define a minimum os version for avx apps. This will for sure solve this problem. Will check this tomorrow.

Well. It's not so easy (I should make another platform/plan_class etc...) This is something that should be checked by the Boinc client... I don't know why it may report unsupported cpu features.

Does someone know how the handle this at Asteroids@home? (the only project I know that uses "explicit" avx apps)

1 · 2 · Next
Post to thread

Message boards : Number crunching : Unknown error number (0xffffffffc000001d)


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN