Author |
Message |
smoeVolunteer developer Send message
Joined: 15 Sep 15 Posts: 4 Credit: 20,804,734 RAC: 0
|
Hello,
Some of you already know that there is an OpenWrt port of BOINC (both in current stable and in 21.02rc1) so you can run TN-GRID on your router. This needs a USB stick to accomdate for the large files, but otherwise - works.
We had tried to substitute the USB stick with an NFS directory. This also works. And if running this on the local network this is just fine, I tend to think.
But, we had this tunneled through the internet and were surprised by the bandwith this took: 5GByte per hour - for 20 hours. Just reads, (mostly) no writes. In comparison, WorldCommunityGrid has basically no I/O whatsoever. It preseume this to be the repeated reading of the input data that causes this. And I agree that if one does no buffer for multiple lines, and has no index to where to fseek, then this is what one needs to do. No, I have not looked up in detail how it is implemented, admittedly, but TN-GRID would benefit a lot if the I/O is reduced, since you do not want to wait for your device to respond for the next correlation to run.
The reason I have not looked into the details is that I thought that TN-GRID could just go and read the whole input data in. After all, a gig of mem is not that exceptional any more, most desktops have 16GB. And I could imagine that the scientific apps agree on some shared memory in /dev/shm for that, so the amount of ram needed in total if I got the concept right does not increase with additional WUs run in parallel. This would then all no longer fit on a single router, but, hey, maybe the user can decide what app run tun?
Just a thought. Would there be many on this list who would (worst case until we can share the mem) allow 1GB of mem to be used per WU? I would then be prepared to look into that.
Best,
Steffen |
|
|
valtercProject administrator Project tester Send message
Joined: 30 Oct 13 Posts: 629 Credit: 34,725,842 RAC: 717
|
Hi Steffen
The application reads two input files: the one with the list of tiles (600 tiles for the FANTOM dataset) and the expression dataset. Both of them should already be read at the very beginning, converted to a suitable format and kept in memory until the end (or a restart from the checkpoint). I didn't write, nor check, the logic inside the application, I don't know if there is something wrong with it.
There is the need to look at the source code and do some I/O tracing (unfortunately I'm very busy right now...) |
|
|
|
Hi Steffen
The application reads two input files: the one with the list of tiles (600 tiles for the FANTOM dataset) and the expression dataset. Both of them should already be read at the very beginning, converted to a suitable format and kept in memory until the end (or a restart from the checkpoint). I didn't write, nor check, the logic inside the application, I don't know if there is something wrong with it.
There is the need to look at the source code and do some I/O tracing (unfortunately I'm very busy right now...)
Is the source code available? Maybe in GitHub? |
|
|
|
Yes, it is available on Bitbucket:
https://bitbucket.org/francesco-asnicar/pc-boinc/src/master/ |
|
|
smoeVolunteer developer Send message
Joined: 15 Sep 15 Posts: 4 Credit: 20,804,734 RAC: 0
|
I already had a look but did not immediately see it. |
|
|
smoeVolunteer developer Send message
Joined: 15 Sep 15 Posts: 4 Credit: 20,804,734 RAC: 0
|
The loop in appMain has
// extract the subgraph from the complete genes file
readCGN(tiles[c], experiments, experimentsDim, g, hibridizationDim);
and that file is then getline()ed. Is this the big file? I presume that the internal representation of that file's content is much smaller than what there is on the disk, such that not only the "fat memory" machines benefit from an optimisation that can afford it. And, actually, hm, maybe there is not even much extra memory required over what is currently implemented since you would store this as float-arrays not as strings.
I could actually well imagine that the relative boost on the older machines is higher, since they are weaker on I/O and have smaller buffers. There is also an atof performed on everything again and again, which would then be spared, which I guess modern processors are also better with than the old ones. I mean, we are talking reading through 5GByte of string data per hour (find separators in there, every character is read by the CPU this means) and some fraction of these is then converted to floats.
The router with the slower USB stick has 10% idle time (divergence between CPU and wall clock time). I do not want to implement any change on this myself, but I promise a some nice beers to be sent south if you could somehow address this.
Best,
Steffen |
|
|
smoeVolunteer developer Send message
Joined: 15 Sep 15 Posts: 4 Credit: 20,804,734 RAC: 0
|
Update: I have implemented a version that perfectly buffers the Fantom data. That takes 1.2GB of memory though, and it is only a minute faster after 300 minutes of run time. My hunch is that the machines with loads of memory do not benefit from it so much since they can afford keeping the file in the cache. |
|
|