Help with invalid tasks and computation errors?
log in

Advanced search

Message boards : Number crunching : Help with invalid tasks and computation errors?

Author Message
autouzi
Send message
Joined: 14 Jan 20
Posts: 4
Credit: 0
RAC: 0
United States
Message 1665 - Posted: 14 Jan 2020, 3:39:48 UTC

Would anybody who knows the terminology mind taking a look at my invalid tasks and errors to see why they are happening? I use GRC Pool, so you will need to use the link to my PC at the bottom of text.

Primarily, I am interested in the computation errors and why this is happening. I have not been able to replicate any instability in any other tests. I am also confused with the invalid tasks and how they can be invalid without being a computation error.

Any help is appreciated!
Link to computer 56280
http://gene.disi.unitn.it/test/results.php?hostid=56280&offset=0&show_names=0&state=0&appid=

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 623
Credit: 34,677,535
RAC: 1
Italy
Message 1667 - Posted: 14 Jan 2020, 10:05:49 UTC - in response to Message 1665.

Sometimes, for a lot of different reasons, the computation ends 'correctly' but the results are not. This is the reason we implement the 'redundancy' feature (one result is marked correct if it is bit-wise identical to another one).

We also know we have a small bug in our code (very infrequent, that we were not able to catch). In some cases, when the computation of a task is stopped at the very beginning, before the first checkpoint, the output file become 'inconsistent', thus the computation will produce an 'invalid' result (it can happen if you see 'Start from checkpoint: 1' in the log). Keeping a small workunit queue (thus avoiding BOINC going into 'rush' mode) will mitigate this problem.

Error 194 is sometimes an effect of the computer being unresponsive, too much load, see http://wuprop.boinc-af.org/forum_thread.php?id=402

autouzi
Send message
Joined: 14 Jan 20
Posts: 4
Credit: 0
RAC: 0
United States
Message 1668 - Posted: 15 Jan 2020, 1:22:31 UTC - in response to Message 1667.

So fairly normal. Thank you for your response and all you do for this project! This is my favorite project available on GRC Pool because of the potential to help us better understand the complex subject of genetics.

Timber
Send message
Joined: 20 Jan 20
Posts: 5
Credit: 1,885,561
RAC: 0
Canada
Message 1672 - Posted: 21 Jan 2020, 17:57:16 UTC

3 failed (and errored) tasks so far on a Ryzen 7 1800x, running Windows 10.
An example of one of the errored tasks:
https://gene.disi.unitn.it/test/result.php?resultid=46767050
the machine this is happening on:
https://gene.disi.unitn.it/test/show_host_detail.php?hostid=56399
At least it's not a day of work lost. None of the tasks have shown as invalid, yet.

Jim1348
Send message
Joined: 29 Dec 16
Posts: 87
Credit: 21,013,002
RAC: 0
United States
Message 1673 - Posted: 21 Jan 2020, 20:36:37 UTC - in response to Message 1672.

3 failed (and errored) tasks so far on a Ryzen 7 1800x, running Windows 10.

It could be the segfault error. I see them on my Ryzen 1700 occasionally, but not on my Ryzen 2700. (And my Ryzen 1700 is one of the "fixed" versions, produced after they introduced the fix.)

M0CZY
Avatar
Send message
Joined: 8 Nov 19
Posts: 4
Credit: 651
RAC: 0
United Kingdom
Message 1917 - Posted: 9 Aug 2020, 19:45:55 UTC

The application gene@home PC-IM v1.10 (sse2) doesn't work on my 32-bit Linux computer (Computer ID 54726).
It runs for about 15 seconds, then ends in Computation error (Exit status 193 (0xc1) EXIT_SIGNAL).
My /proc/cpuinfo file contains:

processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 13 model name : Intel(R) Pentium(R) M processor 2.26GHz stepping : 8 microcode : 0x20 cpu MHz : 2267.000 cache size : 2048 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss tm pbe nx bts cpuid est tm2 pti bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit bogomips : 4522.40 clflush size : 64 cache_alignment : 64 address sizes : 32 bits physical, 32 bits virtual power management:
so it does support sse2.

A typical Stderr output looks like:
<core_client_version>7.16.6</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> Start @ Sun Aug 9 20:24:17 2020 SIGILL: illegal instruction Stack trace (7 frames): ../../projects/gene.disi.unitn.it_test/gene_pcim_v1.10_linux32__sse2[0x8072b8a] linux-gate.so.1(__kernel_sigreturn+0x0)[0xb7eecd14] ../../projects/gene.disi.unitn.it_test/gene_pcim_v1.10_linux32__sse2[0x804c974] ../../projects/gene.disi.unitn.it_test/gene_pcim_v1.10_linux32__sse2[0x80528ac] ../../projects/gene.disi.unitn.it_test/gene_pcim_v1.10_linux32__sse2[0x804b239] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0xb7be6e91] ../../projects/gene.disi.unitn.it_test/gene_pcim_v1.10_linux32__sse2[0x804b4a3] Exiting... </stderr_txt> ]]>

I am unable to tell what I have done wrong.
The non-sse2 app seems to work, but is probably a lot slower than the sse2 version.

manalog
Send message
Joined: 5 Oct 15
Posts: 33
Credit: 1,098,442
RAC: 0
Italy
Message 1918 - Posted: 10 Aug 2020, 9:08:55 UTC - in response to Message 1917.

You are doing nothing wrong: the linux 32 app is compiled having a 'core2' processor target, so is not a 'pure' sse2 pentium4 compatible app ranher it probably ingorporates some newer extensions. I have compiled an sse2 version tdat runs on p4 and should run also on yours if it supports sse2. If not, I compiled also a sse version that sdould run even on pentium IiI. Now I am travelling, I will post tdem on this forum for you before friday ;)
Or you can compile it by yourself, just check my post 'sse3 optimization and android binary' on this post and figure out how to do it.

M0CZY
Avatar
Send message
Joined: 8 Nov 19
Posts: 4
Credit: 651
RAC: 0
United Kingdom
Message 1921 - Posted: 14 Aug 2020, 16:30:44 UTC

Yes, I would like very much for an sse2 app that is compiled to work correctly on my Pentium M powered computer.
Which thread shall I look in for the link to download the modified app?

manalog
Send message
Joined: 5 Oct 15
Posts: 33
Credit: 1,098,442
RAC: 0
Italy
Message 1922 - Posted: 15 Aug 2020, 20:40:08 UTC - in response to Message 1921.

I got these two versions:
Pentium 4 SSE2
Pentium 3 SSE
The first one should work also on your Pentium M.

If I remember well these binaries are compiled with a "O2" GCC optimization. Using another string (https://gene.disi.unitn.it/test/forum_thread.php?id=270&postid=1883) you can get a performance increase. Now I am in vacation and I only have these two binaries I've already compiled, in the next days I'll compile a version for Pentium 4 and Pentium M using these new optimizations ;)
By they way the version I am sending to you now should already give a great boost compared to the non-SIMD version.

Here the Drive folder where I will put also the new versions, stay tuned!:

https://drive.google.com/drive/folders/1l5MC2VGjXqyapUoxfo9Blw3I8_Faz4N4?usp=sharing

M0CZY
Avatar
Send message
Joined: 8 Nov 19
Posts: 4
Credit: 651
RAC: 0
United Kingdom
Message 1924 - Posted: 16 Aug 2020, 16:34:59 UTC - in response to Message 1922.
Last modified: 16 Aug 2020, 16:38:35 UTC

I have installed the file "pc_pentium4_sse2" and the app_info.xml file, and set the boinc permissions correctly. I changed all instances of gene_pcim to pc_pentium4_sse2 in the file.

It doesn't work, giving these error messages:

| TN-Grid Platform | Message from server: Unknown app name in app_info.xml | TN-Grid Platform | Message from server: Your app_info.xml file doesn't have a usable version of gene@home PC-IM.

I understand that the syntax in the app_info.xml file needs to be very precise, so I need help to correct my mistake.
This is the file I am trying to use, without any success. Where did I go wrong?
<app_info> <app> <name>pc_pentium4_sse2</name> </app> <file_info> <name>pc_pentium4_sse2</name> <executable/> </file_info> <app_version> <app_name>pc_pentium4_sse2</app_name> <version_num>100</version_num> <file_ref> <file_name>pc_pentium4_sse2</file_name> <main_program/> </file_ref> </app_version> </app_info>

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 64
Credit: 15,299,594
RAC: 0
United States
Message 1925 - Posted: 16 Aug 2020, 17:29:43 UTC - in response to Message 1924.

Instead of going through the hassle of creating a app_info for the new binary, why didn't you just rename the binary to the stock appname?

M0CZY
Avatar
Send message
Joined: 8 Nov 19
Posts: 4
Credit: 651
RAC: 0
United Kingdom
Message 1926 - Posted: 16 Aug 2020, 18:25:38 UTC - in response to Message 1925.

Instead of going through the hassle of creating a app_info for the new binary, why didn't you just rename the binary to the stock appname?

I didn't know you could do that. I thought that the client would be able to tell that the new app wasn't the correct app, and would reject it.

I have done some experimentation, and have got the new app working, with the app_info.xml file.

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 64
Credit: 15,299,594
RAC: 0
United States
Message 1930 - Posted: 26 Aug 2020, 1:33:09 UTC - in response to Message 1926.
Last modified: 26 Aug 2020, 1:35:36 UTC

Instead of going through the hassle of creating a app_info for the new binary, why didn't you just rename the binary to the stock appname?

I didn't know you could do that. I thought that the client would be able to tell that the new app wasn't the correct app, and would reject it.

I have done some experimentation, and have got the new app working, with the app_info.xml file.

Ha ha LOL. I have worked with a Seti developer that created the special sauce application.

He never changed the application name through a dozen revisions even when the code inside the app completely changed. The original descriptive name of the application bore no resemblance at all to the final application and what it would have been descriptively named.

Doesn't affect in the slightest the name you call an app, it is what the code inside does is the important part.


Post to thread

Message boards : Number crunching : Help with invalid tasks and computation errors?


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN