Posts by Bryn Mawr
log in
1) Message boards : Number crunching : OUT of tasks (Message 3262)
Posted 28 Jun 2023 by Bryn Mawr


Nice. RTS tasks seem to be limited to 30000, each with a 1.6 MB input file for the client should equal ~48GB server capacity usage for these.


That’s likely to be the immediate queue for the scheduler to consume rather than the full queue that’s output by the work generator.
2) Message boards : Number crunching : Curious (Message 3126)
Posted 3 Apr 2023 by Bryn Mawr
Can you accelerate "gene_work_generator" anyhow ?

The amount of tasks up for grabs is always zero, meaning that people have more workers than your TN-GRID servers can generate.

This is especially important now, where WCG project is down and Rosetta is out-of-work-units (again).


But that would just mean that TN-Grid would go dormant in (say) 8 days instead of 10.

Better the effort was put into bringing forward the next set of gene expansions than accelerating this set.
3) Message boards : Number crunching : SSE2 AVX FMA ? (Message 3075)
Posted 5 Feb 2023 by Bryn Mawr
I have recently restartet contact with this project.
I have downloaded -
SSE2
AVX
FMA
They are equally slow.
What is the point with all these different versions ?


When you first start running tasks with a new app the system will send each of the variants to you and monitor which runs the fastest on your system. It will then send you just that variant (occasionally slipping in the odd one to make sure your system hasn’t changed).
4) Message boards : Number crunching : OUT of tasks (Message 2856)
Posted 29 Aug 2022 by Bryn Mawr
Where a project has a fixed limit of 600 tasks every 14 minutes and the existing volunteers process 600 tasks every 14 minutes how are you providing short term help? This is my confusion.

If a project is in a position to provide a surge of tasks to coincide with the challenge then wonderful, work gets done faster, results get back to the researchers sooner, the challenge members have fun with a ready supply of tasks and everything is hunky dory but I cannot see the benefit of this to either party.


I note your comment BUT as of now (and as the Challenge comes to an end at 9pm UTC tonight) the project has

Tasks ready to send 933
Tasks in progress 57191


so, there seems to be a significant number of available tasks now and generally speaking also most days as well - and I guess that the regular volunteers have only been inconvenienced for maybe 2 or 3 days.

And in the meantime, if every days full quota of available tasks have all been crunched then the project gains from this, surely?

regards
Tim


Ignore my grumbling, I’m just a grumpy old git not getting my usual daily fix and missing it.
5) Message boards : Number crunching : OUT of tasks (Message 2855)
Posted 29 Aug 2022 by Bryn Mawr
Where a project has a fixed limit of 600 tasks every 14 minutes and the existing volunteers process 600 tasks every 14 minutes how are you providing short term help? This is my confusion.


Because the existing volunteer pool isn't consuming the full 600 in 14 minutes is my guess as there's usually no shortage. This 'excess' is quickly eaten up with a challenge as we've seen.

If the number of volunteers grow, unless the project distributes more work, clearly there's a diminishing benefit to allowing challenges.


If you guess was correct then the number of available work units would grow over time but it remains fairly stable in normal circumstances so I guess your guess is incorrect :-)
6) Message boards : Number crunching : OUT of tasks (Message 2850)
Posted 28 Aug 2022 by Bryn Mawr
Why would anyone set up a challenge involving a project that is known to be borderline supplying WUs on a day to day basis?

Not only does it make the challenge a lottery, it also messes up the long term volunteers.


With respect, the Project Admin was asked, IN ADVANCE, as to whether such a Challenge could be handled by the project.

However, the admin said that the project only issues about 600 tasks every 14 minutes...

Even so, this Challenge only lasts for 3 days (until Sunday at 9pm UTC) so it is hardly a long term inconvenience AND any members doing this Challenge are supporting a "smaller" project that can do with some short term help?

regards
Tim


Where a project has a fixed limit of 600 tasks every 14 minutes and the existing volunteers process 600 tasks every 14 minutes how are you providing short term help? This is my confusion.

If a project is in a position to provide a surge of tasks to coincide with the challenge then wonderful, work gets done faster, results get back to the researchers sooner, the challenge members have fun with a ready supply of tasks and everything is hunky dory but I cannot see the benefit of this to either party.
7) Message boards : Number crunching : OUT of tasks (Message 2843)
Posted 27 Aug 2022 by Bryn Mawr
Tasks ready to send 0

I will be out of work in four hours.

There's a Formula BOINC challenge going on (three days) ...


Why would anyone set up a challenge involving a project that is known to be borderline supplying WUs on a day to day basis?

Not only does it make the challenge a lottery, it also messes up the long term volunteers.
8) Message boards : Number crunching : OUT of tasks (Message 2805)
Posted 3 Aug 2022 by Bryn Mawr
Does the Work Generator need a little adjustment? I've been getting a lot of Server Aborts. It appears that shortly before the Deadline another pair of WUs is sent out to new computers then when the WUs actually get submitted before the Deadline the duplicates get Server Aborted. E.g.,
http://gene.disi.unitn.it/test/workunit.php?wuid=36782082
http://gene.disi.unitn.it/test/workunit.php?wuid=36765727
http://gene.disi.unitn.it/test/workunit.php?wuid=36669384
Waiting until after the actual Deadline to send out replacements might be more efficient.


Reduce your cache so that you don’t have tasks waiting until near to the deadline.
9) Message boards : Number crunching : SSE2 with app_info? (Message 2782)
Posted 22 Jul 2022 by Bryn Mawr
A good point, but FMA is working for me well enough.
It is mainly west of the Mississippi River that they have the problems.

We may need to give up the Louisiana Purchase.


And France and Spain and Greece and England and …
10) Message boards : Number crunching : Bad hosts topic (Message 2674)
Posted 17 May 2022 by Bryn Mawr
I suspended all but 5 and went away. They're running much faster now. The CPU is 18c/36t so I'm going to work my way up to 18 WUs. Feels like the problem when WUs load too much into the L3 cache and choke the CPU traffic cop.

What feels so strange is that TN-Grid is the only project I'm running and yet this only affected a single computer. I sorted BoincTasks by WU name and other WUs with the same prefix on other computers are running at normal speed.

Does anyone know of a utility that monitors CPU cache utilization?


That makes sense, the Ryzen t series has twice the L3 cache of the 3 series.
11) Message boards : Number crunching : Bad hosts topic (Message 2671)
Posted 17 May 2022 by Bryn Mawr
I have a slug of WUs that are going to take over 3 days to run, e.g. http://gene.disi.unitn.it/test/workunit.php?wuid=35037615

My wingman completed it in a few hours as normal. It's all the WUs running on the same computer Rig-31 with a Xeon E5-2699 v3 with 4x8 GB RAM. I've rebooted and reduced the CPU utilization but that does not speed them up.

My wingman Technologov is also running Linux with an E5-2680 v4 (14c/28t) with 56 processors so must be a dual CPU server MB and 64 GB RAM.
http://gene.disi.unitn.it/test/show_host_detail.php?hostid=78642

I can't see any reason for this computer to run them so slow. But, I've been running BOINC 24x7 for years and I'm literally wearing out MBs. Some ooze oil from the PWM caps or stop communicating with one or more PCIe slots. This ASRock X99 Extreme4 may be at the end of its life and ready for the scrap heap.

Penny for your thoughts if you have a suggestion as to what might be my problem. TIA


Probably not you PC.

I also have the occasional WU that takes double the normal time or more where a wingman running similar kit takes the normal time. Why? I don’t know, I just accept it and carry on.

Latest example, my R9/3900 took 20,000 seconds, wingman’s R7/5800x took the normal 10,000 seconds.
12) Message boards : Number crunching : Server Status - No Work Available (Message 2608)
Posted 1 Apr 2022 by Bryn Mawr
We are inmediately crunching whatever it produces

Thanks. That is what I wanted to know.


All the additional people from WCG tasking the work rate over the rate of production.
13) Message boards : Number crunching : Why I get different tasks for SSE2 and for AVX platform ? (Message 2530)
Posted 16 Feb 2022 by Bryn Mawr
Why restrict it? Easier to let the system choose the fastest and you’ll get more work done.

Right you are, just my idiosyncrasy.


Fair enough, we all have our own ways :-)
14) Message boards : Number crunching : Why I get different tasks for SSE2 and for AVX platform ? (Message 2527)
Posted 15 Feb 2022 by Bryn Mawr
This is the way I restrict all WUs to FMA:
http://gene.disi.unitn.it/test/forum_thread.php?id=304&postid=2209#2209
You need to save a copy of the FMA executable that you can copy to all your computers:
/var/lib/boinc-client/projects/gene.disi.unitn.it_test/gene_pcim_v1.10_linux64__fma
Beware that when you install this and then click "Read config files" BOINC will wipe out all your WUs in the TN-Grid project folder so you might as well Abort them first.
Also, if you're new to BOINC you may benefit from a couple of tweaks to your cc_config.xml file.
I also recommend using BoincTasks to manage so many computers.


Why restrict it? Easier to let the system choose the fastest and you’ll get more work done.
15) Message boards : Number crunching : Why I get different tasks for SSE2 and for AVX platform ? (Message 2508)
Posted 4 Feb 2022 by Bryn Mawr
I understand that it will recheck occasionally but I’ve never noticed it doing so on my machines.



You can see your current performanche here http://gene.disi.unitn.it/test/show_host_detail.php?hostid=62240
click near by "Application details" it will show this page http://gene.disi.unitn.it/test/host_app_versions.php?hostid=62240

as you can see, FMA is currently faster on your pc


Yes, and avx on my other machine with a near identical spec :-

http://gene.disi.unitn.it/test/host_app_versions.php?hostid=60043 :-)
16) Message boards : Number crunching : Why I get different tasks for SSE2 and for AVX platform ? (Message 2502)
Posted 3 Feb 2022 by Bryn Mawr
https://www.dropbox.com/s/gx3j3brt3gi6oqd/TN-Grid_AVX.png?dl=0

Here I see some tasks based on SSE, others on AVX. Why this is so ?

My processors are all new ("Haswell" or newer), support SSE4.x, AVX2, and FMA.


When you first join the server will send you a mix of tasks using different optimisations and work out which is the quickest for each machine. It will then send you that type of task exclusively.

I understand that it will recheck occasionally but I’ve never noticed it doing so on my machines.
17) Message boards : Number crunching : Formulaa Boinc Sprint (Message 2374)
Posted 12 Sep 2021 by Bryn Mawr
The current server is a virtualized 4 cores AMD Opteron with 4G RAM, not that easy to run a project on it. The new one, a Xeon Gold 6238R, is here. The project should be moved this October (hopefully). This won't, however, completely solve our (slow) work generation problem, we will gain some raw speed because of the upgraded hardware but the real change will be a new optimized and parallel generator (no ETA on this...).
The AMD funding is actually free access to some big computational resources (mainly devoted to machine learning). It's is a very useful asset for our research but not related to the BOINC server.


Thank you, that shows the scale of the problem nicely.
18) Message boards : Number crunching : Formulaa Boinc Sprint (Message 2370)
Posted 11 Sep 2021 by Bryn Mawr
http://gene.disi.unitn.it/test/forum_thread.php?id=255&postid=1632

Ok, we don't have fancy graphs showing the workflow. The only informational page is https://gene.disi.unitn.it/test/gene_science.php, which shows some statistics about the currently running experiment.
BTW, the algorithm for generating workunits is rather complicated, i.e. it takes some time to generate a workunit and add it to the queue. With our current setup (hardware/software) we are able to create 294 workunits (588 results, because of the replication) every ~15 minutes, I let you do the math ;). This is our theoretical limit, when reached the queue will start to dry up.


It would be nice if someone could answer the other half of the question - what would be required to resolve the problem, more hardware?
19) Message boards : Number crunching : Work Generator Down (Message 2347)
Posted 28 Jul 2021 by Bryn Mawr
And no work units available - 16:30 28/07/21
20) Message boards : Development : Maybe avoid file I/O? Just read all input into memory? (Message 2299)
Posted 6 May 2021 by Bryn Mawr
Hi Steffen
The application reads two input files: the one with the list of tiles (600 tiles for the FANTOM dataset) and the expression dataset. Both of them should already be read at the very beginning, converted to a suitable format and kept in memory until the end (or a restart from the checkpoint). I didn't write, nor check, the logic inside the application, I don't know if there is something wrong with it.

There is the need to look at the source code and do some I/O tracing (unfortunately I'm very busy right now...)


Is the source code available? Maybe in GitHub?


Next 20

Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN