1) Message boards : News : Storage problem (Message 2394)
Posted 4 hours ago by Profile valterc
We are still facing the "storage" problem. Everything, sometimes, is painfully slow (I temporarily disabled the work generator)
2) Message boards : Number crunching : Error reported by file upload server: can't lock file /storage/boinc/upload/ (Message 2393)
Posted 16 hours ago by Profile valterc
The University of Trento storage system, that we use, is having problems, sometimes it get stuck and the whole BOINC system slows down... They are working on fixing it, there is nothing that we can do meanwhile
3) Message boards : News : Storage problem (Message 2391)
Posted 7 days ago by Profile valterc
Good news!!!
Really looking forward to that, as my computers are starting to get very hungry now! :-)

Btw., I can see that there where no stats exported to BoincStats today. Is that issue also related to the failing storage system?

I had also to stop all the BOINC's server cron activities...
4) Message boards : News : Storage problem (Message 2386)
Posted 10 days ago by Profile valterc
Well, I really don't know what's behind the University storage system (hardware, configuration, etc.), we just use it :)
BTW, it seems that the system is slowly recovering, when the numbers in the project status page will be close to zero I will try to start the work generator again and see if it will work as usual.
5) Message boards : News : Storage problem (Message 2384)
Posted 11 days ago by Profile valterc
The University's storage system (that we heavily use) is having big performance issues thus everything is painfully slow. I stopped the work generator, which actually was not working because of this. No new tasks will be generated until the assimilator will be able to move the finished results to the proper place. There is nothing else to do but wait until the problem will be solved, no ETA, unfortunately.
6) Message boards : Number crunching : Work Generator Down (Message 2383)
Posted 11 days ago by Profile valterc
The University is having big issues with its storage system (we heavily use it in the BOINC pipeline), so everything was slowing down until our server became practically stuck. We are working to solve this problem...
7) Message boards : Science : AMD COVID-19 HPC Fund (Message 2380)
Posted 19 days ago by Profile valterc
One software technician of the University of Trento will shortly attend a dedicated training course offered by AMD about HIP/ROCm, this will eventually provide us some needed skills in order to develop a GPU application
8) Message boards : Number crunching : Formulaa Boinc Sprint (Message 2373)
Posted 11 Sep 2021 by Profile valterc
The current server is a virtualized 4 cores AMD Opteron with 4G RAM, not that easy to run a project on it. The new one, a Xeon Gold 6238R, is here. The project should be moved this October (hopefully). This won't, however, completely solve our (slow) work generation problem, we will gain some raw speed because of the upgraded hardware but the real change will be a new optimized and parallel generator (no ETA on this...).
The AMD funding is actually free access to some big computational resources (mainly devoted to machine learning). It's is a very useful asset for our research but not related to the BOINC server.
9) Message boards : Number crunching : Curious (Message 2357)
Posted 9 Aug 2021 by Profile valterc
We use "redundancy" (two results from different computers must be exactly the same) in order to check if a workunit is "valid" (successful computation). If not a third copy of the workunit is sent to another computer. At the end, if two results are identical they are declared "OK" and all the others are marked "invalid".
So, an "invalid" is, briefly, a computation that reached its normal end but with something wrong in it, usually incorrect numeric calculations. There are many possible reasons for getting an "invalid", like overheating or a faulty hardware component. In this case I suggest to run a stress test, like Prime95, on your computer and check the results.
There is also a known bug in our code that may flag a result as invalid, this may happen if you stop your calculation at the very beginning, before the first checkpoint.
10) Message boards : Number crunching : Work Generator Down (Message 2349)
Posted 29 Jul 2021 by Profile valterc
Got a database error and the work generator stopped, hopefully just a harmless server glitch... Sorry for having catched this after one day (you know, summer vacation...)
11) Message boards : Number crunching : Android applications on ARM (Message 2339)
Posted 24 Jun 2021 by Profile valterc
Hi, any update on this? I confirm that my old phone is still crunching h24 without any error. I think it's safe to add official support to Android in order to allow also users with unrooted phone to participate too.

OK, I completely forgot about this... I will hang a post-it in the proper place...

Is this ever going to be done??

You are right... I will hang a bigger post-it...
12) Message boards : Science : AMD COVID-19 HPC Fund (Message 2336)
Posted 21 Jun 2021 by Profile valterc
Hi bozz4science, we don't plan to move TN-Grid server's components to the AMD computational node, so you won't see it on the server's status page. We recently bought one server for this purpose. The AMD node will probably be stressed this September, focusing on ML techniques, when we will start classes at the University. Right now we do just small experiments on it, also using BOINC.
13) Message boards : Number crunching : FMA application for windows_x86_64 (Message 2334)
Posted 21 Jun 2021 by Profile valterc
On my FX-8370, FMA jobs fail.
I guess that in your case a MB BIOS upgrade might help.

ASUS Sabertooth 990FX R2.0, BIOS latest version 2901.

Well, I don't know how to solve your particular problem... In the meantime I moved the FMA Win64 application to beta again. If someone would like to give it a try please enable beta applications in your TN-Grid preferences.
14) Message boards : Number crunching : FMA application for windows_x86_64 (Message 2331)
Posted 20 Jun 2021 by Profile valterc
On my FX-8370, FMA jobs fail.

That was the main reason I kept FMA for Windows as beta. I guess that in your case a MB BIOS upgrade might help.

I moved the FMA/Win64 app out of beta a few days ago, to see if the old problems were solved, I will probably switch it back to beta...
15) Message boards : Science : AMD COVID-19 HPC Fund (Message 2316)
Posted 24 May 2021 by Profile valterc
I don't want to make this an annoying habit to inquire on a monthly basis whether the promised hardware can yet be accessed or not, but I am starting to question AMD's commitment at this stage. Has the project lead already exerted a bit of pressure on this topic?
... Just curious.

Well, we actually got access to one of the promised PODs (computational nodes, as described in a previous post) at the beginning of April, it took a couple of weeks to configure the system (with the help of the guys of Penguin Computing). We are still learning how to use it the proper way (the queuing system etc.). I simply forgot to tell everybody here about this, sorry...
16) Message boards : Science : proterna_vae64lat dataset (Message 2312)
Posted 18 May 2021 by Profile valterc
I just inserted in the queue 51 genes (51*18 workunits) related to leukemia (see, to be "expanded" within the proterna_vae64lat dataset.
17) Message boards : Science : proterna_vae64lat dataset (Message 2303)
Posted 10 May 2021 by Profile valterc
There is a new data set floating around: proterna_vae64lat

It has been made by "crossing", using machine learning techniques, two different datasets collected from human tissues: protein and RNA expression levels. More (scientific) information will follow.

There will be just a small set of workunits available, at least at the beginning. You may recognize them from the workunit's name containing the string "Hs_PRT-genename". This kind of workunit should be very fast to compute (around half an hour on a decent computer).

Please let me know if you notice something strange with them.
18) Message boards : Development : Maybe avoid file I/O? Just read all input into memory? (Message 2298)
Posted 6 May 2021 by Profile valterc
Hi Steffen
The application reads two input files: the one with the list of tiles (600 tiles for the FANTOM dataset) and the expression dataset. Both of them should already be read at the very beginning, converted to a suitable format and kept in memory until the end (or a restart from the checkpoint). I didn't write, nor check, the logic inside the application, I don't know if there is something wrong with it.

There is the need to look at the source code and do some I/O tracing (unfortunately I'm very busy right now...)
19) Message boards : Number crunching : Bad hosts topic (Message 2289)
Posted 19 Apr 2021 by Profile valterc
This probably belongs to someone that even don't know what it happens here... Anyway, I just blacklisted it
20) Message boards : News : Upgrading the server (Message 2275)
Posted 26 Mar 2021 by Profile valterc
Over 27,000 WUs queued and ready for BOINCers to DL. Looks like you upgraded the server and you're more than ready for the next Race Day.

Well, the WU ready to send are just around 16000...The server is still the old little one. We found some money and ordered a new server, no ETA right now. When ready to do the upgrade we will for sure post a news here.

