Posts by valterc

21) Message boards : News : Summer break (Message 3293)
Posted 24 Oct 2023 by

valterc

Perhaps you could make a new application and make the BOINC volunteers do part of the analyzing of the data themselves?
I assume this analysis requires a degree in biology, and isn't something done by computer.

Well, the initial phase is done by a computer: data collection, some consistency checks and basic statistical analysis. It takes just some days to do that, no need of tremendous computational power. Then you need to look at the results from a very specific biological point of view, by looking at the genes you are interested to and their mutual relationships. If you browse the published papers list you may get some hints about the pipeline.
Also it may be important to notice that here every result (any so-called expansion) is potentially useful. In some other kinds of computation you search for something (like a prime number or the minimum energy of a protein configuration) and keep only the best one discarding all the rest.
In the meantime we are also working in setting up a web service for "publishing" all our results and make them available to the biological community in a easy (and fancy) way. Just check this, for example: http://vitis.onegenexp.eu

22) Message boards : News : Summer break (Message 3290)
Posted 24 Oct 2023 by

valterc

Hello all :)

Any ETA regarding the start of a new experiment ? I'm also curious about what it will be about: should we expect more Vitis vinifera or maybe some Mus musculus ?

Anyway, I'm looking forward to crunching again for this project :)

Best regards, Samuel

No news unfortunately... The biologist are mostly busy analyzing the huge amount of work that we/you already have done.
Talking about the future I guess that the priority is Homo sapiens, grapes and mice are also possible.

23) Message boards : News : Summer break (Message 3284)
Posted 27 Sep 2023 by

valterc

Hi all,
No news yet from the researchers.

BTW, I have to solve a new problem. With the last batch of experiments we completed more than 250k gene expansions, the post-processed results are two files for any expansion, so more than 500k... and this hits a limit on the number of files that we can store in a single directory on Google drive. Have to rewrite a lot of scripts and move a lot of things ...

24) Message boards : News : Summer break (Message 3279)
Posted 8 Sep 2023 by

valterc

Just a brief update:
- No new experiment ready yet (people are working on but no ETA)
- The latest batch on Vitis vinifera has been collected and validated, ready for further biological analysis and publication
- I am re-organizing all the (many years of) data that we collected, trying to optimize the whole system
- Some of us will be here https://grapedia.org/annual-meeting/

25) Message boards : News : Summer break (Message 3277)
Posted 18 Aug 2023 by

valterc

I hope to be able to restart the system with a new project at the beginning of August.

Is there any news on the new project?

Not yet, most of the people here is either on vacation or around for conferences/workshops etc. I was probably too optimistic in my previous statement, I guess we will have some news in September.

26) Message boards : Number crunching : Division by Zero ERROR ! (Message 3275)
Posted 11 Aug 2023 by

valterc

Fixed, thanks for having noticed it.

27) Message boards : News : Summer break (Message 3272)
Posted 11 Jul 2023 by

valterc

The plan is to going around, by car, through Bosnia, Croatia, Montenegro and Albania, moving from place to place and eventually stop if we really like one...
I'm happy you enjoined your volcano "climbing". When I was younger I went on top of Stromboli, starting from the sea level, and spent the night up there on a sleeping bag (it was a wonderful experience).

28) Message boards : News : Summer break (Message 3270)
Posted 10 Jul 2023 by

valterc

I just put the last Vitis vinifera genes into our queue. With the current pace it will take about a week, maybe less, to distribute them all. The new datasets are not ready yet... Also, I will be on vacation the next two weeks, and it wouldn't be advisable to start a new work without having the possibility to easily oversee the system.
Therefore, in a few days, the work generator will not be able to produce new tasks, only resends will be floating around.
I hope to be able to restart the system with a new project at the beginning of August.

Thank you all!

29) Message boards : Number crunching : OUT of tasks (Message 3268)
Posted 6 Jul 2023 by

valterc

Morning,

Which is next project? Current will be over in a few days...

Thx¡
Javi F

Don't really know right now, there are two possibilities, both related to Humans. The scientists are working on the input datasets, but during the summertime, everything tends to slow down.

30) Message boards : News : The end of the FANTOM-1 experiment (Message 3266)
Posted 5 Jul 2023 by

valterc

As a follow-up, I would like to mention that all the results of the FANTOM-1 experiment have been collected and locally double-checked. A couple of gene expansions were "corrupted" probably because of the consequences of our faulty file-system. However, this issue has been fixed, and now all the data is available to the scientists. We are also working on setting up a suitable way to publish the results, which will include a dedicated web application and scientific publications.
Summertime is here, and related vacations will somewhat slow this process.

31) Message boards : Number crunching : OUT of tasks (Message 3217)
Posted 26 May 2023 by

valterc

I might try this solution but I don't think it will work, I will try to explain why.
The speed of the work generator depends only on the number of genes in our dataset. The program itself, from the computational point of view, is rather simple; it builds 2000 random permutation of (in case of Vitis) around 22000 numbers, then it makes tiles (slices) of them and (now) it packs them 600 per workunits. It's a little bit more complex than this but this gives you the idea.
The speed of the PC algorithm on a "tile" (the application that you run) depends on the "structure" of the dataset and on Vitis-Vespucci is faster than on the previous one (Human-FANTOM).
Right now a single run of the work generator builds 77 workunits (154 tasks because of the validation requisites). Say that, for example, the result is computed, on a ideal computer, in one hour. So this will keep busy that ideal computer for 77 hours. If I, theoretically, will pack all the tiles into a single workunit this will keep that computer busy for 77 hours, exactly the same time but increasing the risks of computational errors. The time of the work generator will be the same, say a few seconds faster because of creating just one file instead of 77.
Anyway I may be wrong... I will slightly increase the tiles per workunits starting from the next batch (it will take a couple of days)

32) Message boards : Number crunching : OUT of tasks (Message 3209)
Posted 24 May 2023 by

valterc

We are at 0 (Zero) available tasks again.

I suggest running multiple work generating in parallel

This would be good but we don't have the resources. Only one instance of the work generator can be run at the same time (it needs a rewrite). Also our current hardware would not be able to support it. We need new hardware and a (slightly new) work generator, we are working on it, but it is not that easy (finding money)

33) Message boards : Number crunching : OUT of tasks (Message 3205)
Posted 23 May 2023 by

valterc

I just modified wus*core from 8 to 6 (don't remember if I also need to restart the server), also reduced the deadline from 5 to 4 days.

34) Message boards : Number crunching : OUT of tasks (Message 3202)
Posted 23 May 2023 by

valterc

I don't like your definition of a work unit. A work unit is one unit that BOINC worker downloads aka "expansion" in your term is a "work unit" in my term.

"Work unit" = 1 unit of work downloaded by BOINC = 1 task

What you have there is not a work unit but kore like a gene-slice

Anyway we need more work units and bigger queue on the BOUNC server side of those available work units

That's exactly the definition of workunit. What you actually download is a collection of 600 computational chunks (you also downloaded the expression dataset, shared among the genes), Any chunk is the run of the PC algorithm on a tile, size 1000, made up with the "seed" gene and a random subset of the other genes.
A single gene expansion is made up of 77 workunits. We wait until all of them come back and build up the gene expansion list.
You can inherit this also by looking at the name of a workunit, like:
236784_Vv_vv-VIT-01s0150g00140_wu-57_1684831345366
236784 a internal id
Vv the organism
vv-VIT-01s0150g00140 the gene name
wu-57 workunit 57 (out of 77)
1684831345366 a timestamp

35) Message boards : Science : The new Vitis vinifera project is underway (Message 3200)
Posted 23 May 2023 by

valterc

Sorry for that, but I made a mistake in my previous statement... it's not 800 but 600

36) Message boards : Number crunching : OUT of tasks (Message 3198)
Posted 23 May 2023 by

valterc

> Another solution to this problem is to double workunit lenght

But generating longer (or larger?) work units are harder or the same as small work units ?
So they will take up 6 hours of CPU time rather than 3 hours on average ?

There are, right now, 600 small computational chunks inside a workunit (the last one for every gene is smaller). I could easily increase or decrease that number, the computational time is proportional to it. The work generation time is almost independent of it so it will take around the same time to split an "expansion" in, say, 77 (right now) or 144 or 35 workunits.
The choice of the right number is a compromise between a lot of things. A very fast workunit will overuse the network connection, a very long one will not be good for people with slow computers or intermittent dedicated time and increases the chance of wasting resources due to computational errors. The deadline should also be adjusted accordingly.

37) Message boards : Science : Homo Sapiens (OneGenE - FANTOM-1) - End (Message 3186)
Posted 22 May 2023 by

valterc

You may see some Hs (FANTOM) workunits floating around again. After a preliminary check of the results we figured out the need to expand another dozen of genes.

38) Message boards : Number crunching : OUT of tasks (Message 3185)
Posted 22 May 2023 by

valterc

Since there are 22393 genes in V. Vinefera project, could we run 22393 work-unit generators in parallel, on different servers?

Just a dozen in parallel would be more than enough (without stressing the system: db and storage)

39) Message boards : Number crunching : OUT of tasks (Message 3181)
Posted 21 May 2023 by

valterc

How difficult is it to build parallel version of the work generator?

It shouldn't be too difficult, not real parallelism, it would be more than enough to have a version that supports multiple instances running at the same time.
Tha problems are that it's a rather complicated code that should carefully interact with the local gene db and it's written in python (a language that I refused to learn;)

40) Message boards : Number crunching : OUT of tasks (Message 3179)
Posted 20 May 2023 by

valterc

What would it take to generate work units faster? New software? More CPU cores? Faster cores?
Amount of available work units is pretty low for both TN-GRID and Rosetta projects.

A brand new server :)
And a new parallel version of the work generator

Previous 20 · Next 20