Posts by marco giglio
log in
1) Message boards : Development : Team creation (Message 256)
Posted 6 Jan 2014 by marco giglio
Hi. What do you mean?
The possibility to create a team is already provided by boinc
2) Message boards : Web site : Suggestions for forum (Message 255)
Posted 6 Jan 2014 by marco giglio
Thank you!
You're suggestions seems good!
3) Message boards : Development : Disk space problem (Message 247)
Posted 4 Jan 2014 by marco giglio
Ok, I made some experiments regarding substituting the csv input files with binary ones and here are the results.

Files used for testing:
separate_At_AT114 ~ 70 MB
separate_At2_AT106 ~ 71 MB
At2_all_obs_script.csv ~ 242 MB

Size of the files when compressed using bzip2
separate_At_AT114.bz2 ~ 24 MB
separate_At2_AT106.bz2 ~ 25 MB
At2_all_obs_script.csv.bz2 ~ 85 MB

I wrote a program which parse the csv files and write binary files in which the data are represented as floating point variables.
Each data in the csv file occupies, in average, 8 byte, each floating point 4 bytes so we obtain files which are half the size of csvs.
separate_At_AT114.bin ~ 35 MB
separate_At2_AT106.bin ~ 36 MB
At2_all_obs_script.csv.bin ~ 121 MB

Good! Then I compressed these binaries file with bzip2 and here is what I've obtained
separate_At_AT114.bin.bz2 ~ 23 MB
separate_At2_AT106.bin.bz2 ~ 24 MB
At2_all_obs_script.csv.bin.bz2 ~ 83 MB

If you compare the compressed binaries with the compressed csv you can see there is just a little difference, hence I don't think it is worthy to move to binaries input.

Another possibility is to correlate the data.
the current csv is written as
data1,data2,data3,data4
I don't know how much correlate are these data, but it seems they tend to be close one another, hence we could try to write a new input csv as follows:
data1,data1-data2,data2-data3,data3-data4...
doing so we maximize the number of 0 characters, hence incrementing the compression factor.
This is just an idea, and honestly I'm not so confident about that neither, but it is a possibility...
4) Message boards : Development : Disk space problem (Message 245)
Posted 4 Jan 2014 by marco giglio
As written some days ago in another post, I made some attempts of compressing the input files. With the current inputs, the best compression is of course lzma, but it takes way too much time to compress one file. bzip2 achieves almost the same compression ratio but it takes less time (in exchange of a big amount of memory, if I'm not wrong). The compressed file are ~38% of the original ones.
A modification to the application will be needed in order to decompress files on windows (on linux bzip2 should be already available on all machines and we can state the dependency on the website).

Another improvement I discussed together with Daniele is the possibility to write inputs in binary instead of a csv. Doing that we should decrease the size of the file, however I also expect the compression factor of a binary file to be less than the one achieved on the csv file, so we should perform experiment about that.

Last improvement is regarding the stickyness. If I'm not wrong, right now from a 70MB input file we extract from 4 to 6 WUs, whose duration is ~2 hrs each. We could choose to increase the duration of each WUs in order to have that input file as dependency of a single WUs (or 2 WUs if we prefer a shorter computation time). Doing so we don't need sticky files.
However it depends on the size of the input files. If the input files are ~200MB (boboviz and I computed a WU coming from such a big file) the things are different since the computation time and the number of dependent WUs grow.
5) Message boards : Cafe : GPU issue (Message 237)
Posted 3 Jan 2014 by marco giglio
I tried both with the client started automatically at init and started manually as root (never configured sudo on my pc)

btw, I've lost interest in the thing. I tried the GPU computation on Windows but it requires a huge amount of time (more than 3 days) and the fan gets very annoying.
6) Message boards : Number crunching : Disk space (Message 210)
Posted 27 Dec 2013 by marco giglio
In the future we'll address the problem for sure!
However, here's some data.
The disk space is mostly needed to store those sticky files I mentioned before. I took a look in my BOINC folder and I have many of them, many probably obsolete.
I took 3 of them, whose sizes were 241, 70 and 68 MB and using different compression algorithm I achieved compression factors around 38%, which means that when we'll use the compression you'll spare almost 2/3 of the space.
From 4.3GB of used space we'll go to 1.7GB
It is still a lot, but it is something.
Moreover, we'll need to talk to the professor and the preprocessing group in order to understand the size of these files and to determine the WUs length.
We may choose to have a unique WU for each file, which is the worst choice in terms of bandwidth consumption, but is probably the best in terms of disk usage, since the file would not be sticky anymore and would be removed at the end of the WU's computation.

EDIT: I run some more test and it seems the best choice would be to use bzip2.
The compression factor is almost as high as in lzma but the compression time is much lower
7) Message boards : Number crunching : Disk space (Message 207)
Posted 27 Dec 2013 by marco giglio
Also, some of the files are marked as sticky because in theory they are needed by more than a WUs. Problem is that boinc has not an intelligent scheduler with respect to sticky files.
The best behavior would be to send new WUs minimizing the number of files to be downloaded, hence WUs which you can compute using the files you've already downloaded; this mechanism is not provided so far. This implies that you probably are storing many sticky files which you just used once or twice but you're client is keeping them because they are sticky.
Right now we are addressing other, more critical issues, but I think we should think whether mark those files as sticky is really necessary.
It depends on some other factors such as the length we decide for the WUs and the variability in the length
8) Message boards : Number crunching : Disk space (Message 205)
Posted 27 Dec 2013 by marco giglio
I think you can clean the project folder, there should be no problem.
Given the number of WUs you have elaborated, I think that data is normal, but we'll investigate this matter, in order to reduce the average used disk space.
Thank you for your cooperation
9) Message boards : Number crunching : No wus (Message 190)
Posted 26 Dec 2013 by marco giglio
good! so, we can rely on the fact that it was due to your very high speed in computing them and your huge amount of request to the scheduler.
10) Message boards : Development : validation issues in 0.02 (Message 188)
Posted 26 Dec 2013 by marco giglio
As Paolo stated, you should (and actually you should already had) contact the Professors and post-processing group in order to decide:
- up to what difference two file can be considered as both correct
- whether we should keep all the lines contained in both files or whether we should keep only the intersection of the two
Please, let us know your progresses.
11) Message boards : News : Application version v0,02 (Message 174)
Posted 24 Dec 2013 by marco giglio
I'm running a 10h WUs too!
The work generator only divide the work considering the number of PC row; it does not consider the number of columns in the row, hence the complexity of the task can vary a bit.
The estimation algorithm, instead, takes in account also the number of columns.

In the future we should discuss with the pre-processing group in order to understand whether we can rely on some fixed size (up to a certain point) input file or not. If the input files has a large variability it will be probably necessary to modify the work generator in order to have WUs around 3-4hrs.

Merry Christmas, everyone!
12) Message boards : News : We are (almost) ready to start a test phase (stage 1) (Message 173)
Posted 24 Dec 2013 by marco giglio
we are aware of this issue and we'll address it in the future.
Right now we are focusing in fixing more important stuff.
13) Message boards : Development : validation issues in 0.02 (Message 172)
Posted 24 Dec 2013 by marco giglio
Hi,

The current validator is checking files by md5. That's why when you receive two files with different check_sum, they will mark as different. I'm improving it with removing \r (Carriage-Return) before comparing it. Thanks

Wait a sec: the md5sum validator is the one provided by boinc or is it something you have written?
Please mind that it seems there are still some slight difference among the WU computed by linux and the ones computed by windows, hence we should use something as the method discussed in one of the meeting with the professors. An md5um, instead, would keep failing
14) Message boards : Web site : Problems... a lot of them! (Message 168)
Posted 24 Dec 2013 by marco giglio
ah, ok you're right. I didn't notice that. If we want to be precise no page is correctly aligned: some are shifted on the left, some other on the right, but IMHO the alignment is never correct

I find also interesting the behavior of the menu on the left of the different pages under description.
They move left and right, the arrow is misplaced....

There's also another thing related to the forum. I don't know why but if you set your language to italian, instead of the button "Invia" (send) you have to click on the button "Nessuna ripsosta" ("No anwser", the typo is also present)

I don't know whether it is a BOINC mistake (I don't think so) or if someone messed up with something (more probable), but could someone fix that?
15) Message boards : Web site : Problems... a lot of them! (Message 161)
Posted 24 Dec 2013 by marco giglio
They are fixed in mine
About the images in the header, instead.
I'm glad you guys have reduced their size in terms of KB, but why don't you reduce their resolution, instead?
There are images that are wallpaper size!

The same for the images in the "People" page.
There are pictures of many MB, high resolution. Please reduce the resolution or mobile users will kill us all

The statistics page is way too verbose and in some paragraphs the text does not fit the title
16) Message boards : Number crunching : Tasks won't suspend (Message 155)
Posted 23 Dec 2013 by marco giglio
wow, it seems that app 0.02 had the side effect to solve another issue!
Up to version 0.01 our app took 100% of the processor even if the user setting was different. It seems to me that the new version solve this issue, my processor is running at 60% as desired
17) Message boards : Science : What are we currently testing? (Message 143)
Posted 21 Dec 2013 by marco giglio
As stated in the website, the purpose of this project is to expand Gene Regulatory Networks, hence the first part of the name is trivial. In particular, at this moment we are working on the Arabidopsis Thaliana, which is a plant used as example organism in many projects.
The PC-IM exploits our knowledge of a certain network (Local Gene Network) and a file which contains some data obtained by experiments (Observation File). In order to perform further processing and analysis we need to link each workunit to the LGN and observations from which it derives.
For this reason, a pair of input files is identified by an xml files which is stored on the server. 1386239275.xml is the name of this file.
Moreover, from a single xml file we obtain more than a WU, hence we need a progressive number (the last part of the WU's name) in order to discriminate among different WUs related to a single input file.

I hope it is clear enough (but maybe it is not)
18) Message boards : Cafe : GPU issue (Message 140)
Posted 21 Dec 2013 by marco giglio
Does anyone of you runs boinc projects on NVIDIA GPU using Optimus on Linux?
I'm having some issue in doing that, since boinc is not recognizing my GPU. I posted on the boinc forum, but nobody seems to have an answer. I thought maybe someone here could help too...

Link to the discussion on the boinc forum:
http://boinc.berkeley.edu/dev/forum_thread.php?id=8793&postid=51788#51788
19) Message boards : Number crunching : Length of WU (Message 139)
Posted 21 Dec 2013 by marco giglio
I would say they take the same amount of time on my i3 @1.8 GHz
About the length, of course it should be increased in the final stage of the project (I would say around 2 hours), but for now I think it could be useful to have such short WUs in order to take less time for all the tests about the validator, the credit system and so on
20) Message boards : Web site : Comments about the web interface (and logos) (Message 46)
Posted 13 Dec 2013 by marco giglio
CSS must be refined in the Scientific Description page (the element on the left are misaligned)




Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN