Validation problem (host #81) Client version v6.
log in

Advanced search

Message boards : Number crunching : Validation problem (host #81) Client version v6.

Author Message
Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 366 - Posted: 7 Apr 2014, 10:39:07 UTC
Last modified: 7 Apr 2014, 10:44:33 UTC

Host #81 http://gene.disi.unitn.it/test/results.php?hostid=81 is producing nothing but validation errors. Although stderr and runtime seems to indicate valid results the validator cannot even check the host's output files simply because they are not in the server. As an example WU 3866 has two results, one in progress and one returned by computer 81, WU name is 16_At_fos-lgn_wu-1099_1396713783839 and, server side I have:

boincadm@gene:~/projects/test$ find . -name "16_At_fos-lgn_wu-1099_1396713783839*"
boincadm@gene:~/projects/test$ grep 16_At_fos-lgn_wu-1099_1396713783839 log_gene/*
boincadm@gene:~/projects/test$

I really don't know what is happening, but it is happening only with this host. One strange thing is that, server side, I cannot see the host's IP number. Also the cpu describes itself as GenuineIntel Genuine Intel(R) CPU 000 @ 3.20GHz, which sounds a little strange to me...

Any hints?

Bok
Send message
Joined: 11 Feb 14
Posts: 10
Credit: 159,717
RAC: 0
United States
Message 367 - Posted: 7 Apr 2014, 11:21:19 UTC

I'm not entirely sure what is going on either, I see nothing obvious in all the files on the machine. The chip itself is an engineering sample (I7-920) so that's why it appears as GenuineIntel Genuine Intel(R) CPU 000 @ 3.20GHz.

It's running CentOS 6, but everything else appears ok.

Can I perhaps run one of the tests manually outside of BOINC to see what is going on?

Bok
Send message
Joined: 11 Feb 14
Posts: 10
Credit: 159,717
RAC: 0
United States
Message 368 - Posted: 7 Apr 2014, 15:03:05 UTC
Last modified: 7 Apr 2014, 15:05:39 UTC

Could it be the version of BOINC I'm using on that particular host ? 6.12.34

I added some debug flags in to try and capture anything, nothing too promising so far.

4/7/2014 7:05:53 AM | TN-Grid Test Platform | [task] Process for 16_At_fos-lgn_wu-1229_1396713946319_0 exited 4/7/2014 7:05:53 AM | TN-Grid Test Platform | [task] task_state=EXITED for 16_At_fos-lgn_wu-1229_1396713946319_0 from handle_exited_app 4/7/2014 7:05:53 AM | TN-Grid Test Platform | [task] process exited with status 0 4/7/2014 7:05:53 AM | TN-Grid Test Platform | Computation for task 16_At_fos-lgn_wu-1229_1396713946319_0 finished 4/7/2014 7:05:53 AM | TN-Grid Test Platform | [task] result state=FILES_UPLOADING for 16_At_fos-lgn_wu-1229_1396713946319_0 from CS::app_finished 4/7/2014 7:05:53 AM | TN-Grid Test Platform | [task] result state=FILES_UPLOADED for 16_At_fos-lgn_wu-1229_1396713946319_0 from CS::update_results 4/7/2014 7:05:53 AM | TN-Grid Test Platform | [task] ACTIVE_TASK::start(): forked process: pid 2201 </code>

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 369 - Posted: 7 Apr 2014, 15:42:11 UTC - in response to Message 368.

That's getting weird... I just checked the file system for the file mentioned in your log and found nothing, in the server log files the only reference I got is:

grep 16_At_fos-lgn_wu-1229_1396713946319_0 log_gene/* log_gene/gene_network_validator.log:md5_file: can't open /home/boincadm/projects/test/upload/da/16_At_fos-lgn_wu-1229_1396713946319_0_0 log_gene/gene_network_validator.log:2014-04-07 17:00:34.5171 [CRITICAL] [RESULT#8247 16_At_fos-lgn_wu-1229_1396713946319_0] md5_file() failed for /home/boincadm/projects/test/upload/da/16_At_fos-lgn_wu-1229_1396713946319_0_0: fopen() failed log_gene/gene_network_validator.log:2014-04-07 17:00:34.5171 [CRITICAL] check_set: init_result([RESULT#8247 16_At_fos-lgn_wu-1229_1396713946319_0]) failed: fopen() failed log_gene/gene_network_validator.log:2014-04-07 17:00:34.5224 [RESULT#8247 16_At_fos-lgn_wu-1229_1396713946319_0] Invalid [HOST#81]

Which is 'normal', I guess, the validator couldn't open the file because it's not on the disk.... Host#81 requests work from IP 75.189.xxx.161 (from scheduler.log) but there is nothing in file_upload_handler.log neither in /var/log/apache2/* (the following gives nothing...)
grep 75.189 /var/log/apache2/* |grep upload |grep "6.12.34"

From the hosts page of the administrative interface I see that this host as NO internal IP address (it's blank):
Info [BOINC|6.12.34] Total credit 0 Average credit 0 Average update time 11 Feb 2014, 19:00:20 UTC IP address (same the last 856 times) External IP address 75.189.xxx.161 Domain name dbase2 ... % of time host connected -100 %

So, from the server side it seems that your host neither has a proper ip address and never connected to the server. I really don't know on which side the error is. I don't think your client version is the problem. The only thing I can suggest is to detach/reattach and see if something changes...

Bok
Send message
Joined: 11 Feb 14
Posts: 10
Credit: 159,717
RAC: 0
United States
Message 370 - Posted: 7 Apr 2014, 16:01:02 UTC

yes, that is strange.

eth0 Link encap:Ethernet HWaddr 00:24:8C:94:BF:B3
inet addr:192.168.1.91 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::224:8cff:fe94:bfb3/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:35683 errors:0 dropped:0 overruns:0 frame:0
TX packets:82796 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:14870785 (14.1 MiB) TX bytes:106354243 (101.4 MiB)

I've detached and re-attached. Let's see if that brings anything.

Bok
Send message
Joined: 11 Feb 14
Posts: 10
Credit: 159,717
RAC: 0
United States
Message 371 - Posted: 8 Apr 2014, 16:05:28 UTC

I got this from Slicker who runs the Collatz project. Perhaps it's relevant here

---
There are some config settings that must be set on the server to ignore file upload certificates.

<dont_generate_upload_certificates>1</dont_generate_upload_certificates>
<ignore_upload_certificates>1</ignore_upload_certificates>

If they aren't set on the server, the server will be incompatible with older clients. The BOINC developers broke this a year or two ago but I don't know whether it is a default setting and why, if it is broke, it isn't just hard coded to ignore them.
---

I can't actually update to versions 7 on this machine as it needs a higher version of GLIBC than is available.

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 372 - Posted: 8 Apr 2014, 16:27:01 UTC - in response to Message 371.
Last modified: 8 Apr 2014, 16:47:59 UTC

I've just checked our config.xml and both options are in... The thing that I really don't understand is that I cannot see any requests for an upload connection in my apache logs.... Just to be sure to check everything, do you use a proxy? I just found this https://boinc.berkeley.edu/dev/forum_thread.php?id=6940

Bok
Send message
Joined: 11 Feb 14
Posts: 10
Credit: 159,717
RAC: 0
United States
Message 373 - Posted: 8 Apr 2014, 17:00:43 UTC

no proxy at all.

This machine is on my local network (192.168.1.*), going through a linux firewall (192.168.1.1) and to my cable modem (75.189.*.* address).

I have other windows machines connecting ok via the same route though and they manage ok.

perhaps post this to the BOINC dev mailing list?

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 374 - Posted: 8 Apr 2014, 17:16:14 UTC - in response to Message 373.

I just installed a 6.12.34 client on a virtualized XP. If I will be able to catch some work (we have planned to stop the work generator for a couple of days) maybe I can replicate the problem... otherwise I will write for help to the boinc project mailing list...

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 375 - Posted: 9 Apr 2014, 10:00:06 UTC - in response to Message 374.

WTF! Got exactly your same errors..... Time to ask for help to the boinc mailing lists...

Profile [VENETO] boboviz
Send message
Joined: 12 Dec 13
Posts: 182
Credit: 4,633,870
RAC: 24
Italy
Message 376 - Posted: 9 Apr 2014, 10:10:08 UTC - in response to Message 374.

I just installed a 6.12.34 client on a virtualized XP.


https://www.microsoft.com/en-us/windows/enterprise/end-of-support.aspx

:-P

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 382 - Posted: 16 Apr 2014, 9:13:45 UTC - in response to Message 376.

We found that the error described in this thread affected the results returned by *any* pre v7 boinc clients (like Bok's v6.12.34 computer). It was a configuration error by our side. We just fixed it but please wait some days, better one week, before asking for new jobs or attaching using pre v7 client.

Thank you.

Bok
Send message
Joined: 11 Feb 14
Posts: 10
Credit: 159,717
RAC: 0
United States
Message 383 - Posted: 16 Apr 2014, 17:28:08 UTC

is the fix only for *new* wu's sent out? I had some units still on that machine and kicked them off, but it looks like they are getting the same errors.

Bok
Send message
Joined: 11 Feb 14
Posts: 10
Credit: 159,717
RAC: 0
United States
Message 385 - Posted: 17 Apr 2014, 10:46:03 UTC

looks like this computer has now started getting tasks validated :)

Profile valterc
Project administrator
Project tester
Send message
Joined: 30 Oct 13
Posts: 616
Credit: 34,514,943
RAC: 340
Italy
Message 386 - Posted: 17 Apr 2014, 10:58:08 UTC - in response to Message 385.

looks like this computer has now started getting tasks validated :)

Ok, I'm glad that this problem has been solved. Thank you again for your cooperation.


Post to thread

Message boards : Number crunching : Validation problem (host #81) Client version v6.


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN