log in |
Message boards : Number crunching : Optimization
1 · 2 · 3 · 4 . . . 10 · Next
Author | Message |
---|---|
Sound interesting. PrimeGrid already does something like this - they have two kind of bonuses: one is for apps which takes long time to complete (few days), second is a conjecture bonus. Some tasks have both of these bonuses. | |
ID: 635 · Reply Quote | |
BTW, I am working on optimized app for TN-Grid. After applying various code optimizations and adding AVX support, it works about 30% faster on my Sandy Bridge Xeons, and so far all WUs passed validation. I think I will be able to release its code and compiled binaries to everyone before Christmas, I am still working on it. VERY interesting. The problem is: are results of optimized version "good" for admins team? If Valterc or others will say that it's ok, we will ready to crunch!! :-) | |
ID: 636 · Reply Quote | |
Sound interesting. PrimeGrid already does something like this - they have two kind of bonuses: one is for apps which takes long time to complete (few days), second is a conjecture bonus. Some tasks have both of these bonuses. Yes, I now, the problem is that they don't use CreditNew
Are you doing the work on the Windows platform? (We know that the binary we have for Windows, made with Visual C, is way slower than the Linux one). Anyway we were very happy to have a faster version. What I suggest is to keep the new application inside the "anonymous platform" framework for a while. If it validates against the standard version, everything should be fine. Thanks for your collaboration! [edit]If working on the Linux version please take care of the kernel requirements, i.e. use a 3.0 kernel to make the build. | |
ID: 637 · Reply Quote | |
I am developing it on Windows under Cygwin, but final version was for Linux. I plan to compile Windows version as a standalone app using MinGW, and run it via BOINC wrapper. Code version without SSE/AVX support should compile under MSVC, but SSE/AVX most probably not - MSVC probably does not support gcc vector extensions which I use for SSE/AVX. I already have some classes which provide similar API with help of Intel Intrinsics, but this part will need more work. I have eliminated two top bottlenecks in code: testAndRemove is reduced to isnan and simple range check plus existing code for edge removal; BoincFile::getLine works on big blocks of data so file loading time is reduced from 10+ secs per tile to less than 0.2 sec. I wonder if the latter was even slower on Windows. I also read that on Windows atof function is very slow, so you may need to find some faster replacement. I also wonder if MSVC is able to optimize pow(x,2) to x*x like gcc does. Will see how my version will perform when you compile it using MSVC. ____________ | |
ID: 638 · Reply Quote | |
Just some naive questions (I'm not really expert in cross-compiling): Would be possible to use MinGW-w64 to compile native Windows binaries on Linux? (avoiding MSVC at all). Why drop the boinc API and use the application with a wrapper? | |
ID: 639 · Reply Quote | |
[edit]If working on the Linux version please take care of the kernel requirements, i.e. use a 3.0 kernel to make the build. Why this particular version? I use CentOS 7 with kernel 3.10, is it OK too? Just some naive questions (I'm not really expert in cross-compiling): Would be possible to use MinGW-w64 to compile native Windows binaries on Linux? (avoiding MSVC at all). Yes, assuming that you have such crosscompiler, and was able to build BOINC libs for MinGW target (build using crosscompiler or natively on Windows). Why drop the boinc API and use the application with a wrapper? Compilation of BOINC libraries with MinGW compiler under Cygwin is broken. I tried to fix it but this was taking too much time, so I decided to give up and stub BOINC functions used by TN-Grid app to get standalone app. Maybe it would be possible to compile BOINC from MinGW shell, I did not try to do it. ____________ | |
ID: 640 · Reply Quote | |
Why this particular version? I use CentOS 7 with kernel 3.10, is it OK too? I guess it is. There are only a few users with pre 3.1 kernels (sometimes this may give some problems, because of the old gcc shared libraries). We made the binary this way using shared libraries, making it static will probably solve all the possible problems. | |
ID: 641 · Reply Quote | |
Just some naive questions (I'm not really expert in cross-compiling): Would be possible to use MinGW-w64 to compile native Windows binaries on Linux? (avoiding MSVC at all). I have just spotted that I incorrectly understood and answered your question. Both MinGW and Cygwin runs on Windows, so there is no crosscompilation Linux->Windows. Cygwin is full-blown POSIX environment for Windows. Apps built for it require special library which provides necessary POSIX emulation layer. MinGW is a "Minimalist GNU for Windows", it allows to build native Windows apps in Unix-like development environment. ____________ | |
ID: 643 · Reply Quote | |
Just some naive questions (I'm not really expert in cross-compiling): Would be possible to use MinGW-w64 to compile native Windows binaries on Linux? (avoiding MSVC at all). Actually I was thinking about the opposite direction, i.e. doing all the testing on Linux and then make the Windows exe there (see this http://www.mingw.org/wiki/linuxcrossmingw). I was able to make a 32/64 Win exe (hello world) using it but I didn't try to cross-compile the BOINC api. | |
ID: 646 · Reply Quote | |
Just some naive questions (I'm not really expert in cross-compiling): Would be possible to use MinGW-w64 to compile native Windows binaries on Linux? (avoiding MSVC at all). It is worth trying it. Yesterday I found page how to compile BOINC for Windows https://boinc.berkeley.edu/trac/wiki/CompileAppWin and it says that there is special Makefile created for MinGW. So looks that BOINC team decided in the past to do it this way instead of fixing autoconf scripts. No wonder that my attempts to compile it failed, I tried to use the configure script and Makefiles generated by it. I did not try this special Makefile yet, I hope it would work as expected. I found something what may be a problem. You use undirected graph, so I thought that I could reduce number of iterations of loop at pc.cpp:418 to test (i,j) pairs for j > i only. However after doing this output file size changed from 47.8K to 67.6K. Original code before my changes also generated bigger file after applying this change. I checked code briefly and do not see anything obvious what may cause this. Could you take a look on this? I think that current app version could be converted to GPU version quite easily. It seems that for given l value calculations for (i,j) pair are independent of each other, only final graph edge removal would need special attention. I will try to create some prototype after I finish this AVX app. BTW, this topic was created for discussing different thing. Could you move posts related to my app to a new one, or rename this and create new one for your original question? ____________ | |
ID: 648 · Reply Quote | |
I found something what may be a problem. You use undirected graph, so I thought that I could reduce number of iterations of loop at pc.cpp:418 to test (i,j) pairs for j > i only. However after doing this output file size changed from 47.8K to 67.6K. Original code before my changes also generated bigger file after applying this change. I checked code briefly and do not see anything obvious what may cause this. Could you take a look on this? I'll try to look at this, I also will contact the original authors of the code (maybe they have some clues) I think that current app version could be converted to GPU version quite easily. It seems that for given l value calculations for (i,j) pair are independent of each other, only final graph edge removal would need special attention. I will try to create some prototype after I finish this AVX app. Yes, the edge removal is the problem here. We also have another slightly different version of the algorithm with the removal done only after each major iteration (I'll talk about this with the authors) BTW, this topic was created for discussing different thing. Could you move posts related to my app to a new one, or rename this and create new one for your original question? Done. BTW Thank you again for your efforts. | |
ID: 650 · Reply Quote | |
I think that current app version could be converted to GPU version quite easily. It seems that for given l value calculations for (i,j) pair are independent of each other, only final graph edge removal would need special attention. I will try to create some prototype after I finish this AVX app. This is also one of approaches which I considered. I also thought about introducing extra synchronization for edge removal (probably with double-checked locking pattern), but this may have negative effect on performance. ____________ | |
ID: 651 · Reply Quote | |
I have forked your project in BitBucket and uploaded modified files. (link: https://bitbucket.org/sirzooro/pc-boinc). All my changes are so far on branch sse_avx_optimizations, you can take a look on them. | |
ID: 653 · Reply Quote | |
It turned out that on my SandyBridge CPUs SSE version was the fastest one (I suspect that unaligned loads kills performance of AVX version). it needs about 1 hour per WUs (original version needed about 2.5 hours). :-O | |
ID: 654 · Reply Quote | |
Amazing work! I just tried a simple benchmark (just two tiles, the output results were obviously the same) on a 4770k (Haswell) and the results are impressive: time bin/pc 5560_Ec_ecm-b0624-crcB_wu-1.input.twotiles 5560_Ec_ecm-b0624-crcB_wu-1.output.twotiles 0.05 1 2470
real 1m59.675s
user 1m57.815s
sys 0m0.040s
time bin/TN-Grid.linux-x86-64-fma 5560_Ec_ecm-b0624-crcB_wu-1.input.twotiles 5560_Ec_ecm-b0624-crcB_wu-1.output.twotiles.fma 0.05 1 2470
real 1m2.131s
user 1m0.218s
sys 0m0.008s Ok. Let's go one step further, please try (using the anonymous platform mechanism) the optimized binaries (just Linux x64 for now), so we may see if there is something wrong (like gcc/kernel dependencies), please be aware that: 1-Using AVX (FMA?) extensions will push the cpu to the limits (keep an eye on temperatures) 2-The provided app_info contains an explicit reference to the input data we use with the EC experiment (it won't work if we change organism) | |
ID: 655 · Reply Quote | |
Apps are linked statically with glibc and libstdc++, so chance for incompatibilities should be limited. | |
ID: 656 · Reply Quote | |
I downloaded the optimized version | |
ID: 657 · Reply Quote | |
I downloaded the optimized version Please paste error message here, without it I can only guess what may be wrong. ____________ | |
ID: 658 · Reply Quote | |
I downloaded the optimized version Looking at your host I didn't find workunits marked as anonymous platform. Did you copy all the files in the right place? Check app_info (not app_config) | |
ID: 659 · Reply Quote | |
sorry, it's my bad | |
ID: 660 · Reply Quote | |
Message boards :
Number crunching :
Optimization