log in |
Message boards : Number crunching : OUT of tasks
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
AMD Zen 2 cores are an equivalent of older Intel Skylake cores. (6th Gen up to 10th Gen Core i7 chips).While the actual architectual improvements between the 6th and 8th Gen Intel cores are debated, there is a significant increase in computing performance (and in performance per Watt) between each genereation (except for the 9th and the 10th Gen cores), so it's inadequate to wash all the CPU generations from the 6th to the 10th together. They [the AMD Zen 2 cores] are significantly slower than newer Intel 12th and 13th gen Core i7 chips.While this is true, that's not the reason for their poor performance here (on TN-Grid). The real reason for their poor performance is the misunderstanding of Hyper-Threading (Simultaneous Multi-Threading) that leads to the overwhelming the execution units of the CPU cores. If you are interested, here is the TLDR document: http://www.cslab.ece.ntua.gr/courses/advcomparch/2007/material/readings/Intel%20Hyper-Threading%20Technology.pdf The key concepts of this technology are the same for Intel, AMD, or any other CPU manufacturer. The main point can be found on page 15, titled "Keys to Hyper-Threading Technology Performance" At the bottom of this page: Understand Hyper-Threading Technology Processor Resources That means: if you want your (single-threaded) science application to run as fast as it could, don't use more than 50% of your CPUs for this purpose. The 12th and 13th gen Intel CPUs have E-Cores, which don't have the necessary resources to run TN-Grid (and similar scientific) applications, so the percentage of the usable "CPUs" (threads in reality) on these CPUs are even lower (34% on i9-12xxx, 25% on i9-13xxx). Look at these two i9-13900k's: https://gene.disi.unitn.it/test/results.php?hostid=86237&offset=0&show_names=0&state=4&appid= Run time: 1h 37m CPU time: 1h 8m, 1 error https://gene.disi.unitn.it/test/results.php?hostid=86238&offset=0&show_names=0&state=4&appid= Run time: 1h 34m CPU time: 1h 6m, many errors | |
ID: 3227 · Reply Quote | |
Thank you for all the information very interesting
Yes I agree the bottom host as many "errors" although they are classed as this I wouldn't call them "errors" because the 1st 20 at least are "abandoned" which to me just means "aborted by user" because the 2 tasks have been returned in one example I checked | |
ID: 3228 · Reply Quote | |
Looking at the increase in projected run times, I would venture to guess that the WUs have been increased from 600 to 800 with the last one (#58) still having 400. Number of tasks in progress have been declining for several hours. | |
ID: 3232 · Reply Quote | |
Looking at the increase in projected run times, I would venture to guess that the WUs have been increased from 600 to 800 with the last one (#58) still having 400.I confirm that. The awarded credits went up as well. Number of tasks in progress have been declining for several hours.This is the way I think we could give more time for the work generator to keep up with the pace of the crunchers. (As every function of the BOINC infrastructure runs on the same server, they take resources from each other. If we reduce the overhead of administering the workunits by decreasing their numbers, the host can spend more resources on generating new work, and comparing results.) | |
ID: 3233 · Reply Quote | |
I dedicate 50% threads to BOINC (WCG/SiDock/Rosetta/TN-Grid) and 50% to Stockfish Fishtest chess. To achieve using 50% CPU time boinc do you use preferences "50% of the CPUs"? I also send you a PM I don't know about Intel vs AMD but it does prefer Linux. 10-15% faster IIRC. Yes I completely agree a lot of application related to boinc like Linux. Contrary to this I have a feeling that the current project "Vitis vinifera" appears to run under 3 hours on Windows times given were before they increase work unit size. Previous project ran for over 3 hours most of the time | |
ID: 3234 · Reply Quote | |
Let me answer: To achieve using 50% CPU time boinc do you use preferences "50% of the CPUs"?Yes. ... I have a feeling that the current project "Vitis vinifera" appears to run under 3 hours on Windows times given were before they increase work unit size. Previous project ran for over 3 hours most of the timeMy Windows host with an i7-9700F CPU @4.5GHz can finish one longer "Vitis vinifera" in 1h 52m (~6.700s), the shorter ones took 1h 23m (~5.000s). This CPU is not hyperthreaded (so I don't have to use the above setting), has 8 cores, I run 7 TN-Grid tasks simultaneously. | |
ID: 3235 · Reply Quote | |
Available Tasks are still at Zero. What is being done to address this state of affairs? | |
ID: 3236 · Reply Quote | |
Available Tasks are still at Zero. What is being done to address this state of affairs?The number of workunits in progress is slowly rising, so the server might be able to fill up every host with work using the present settings (if comparing the longer results won't take up too much resources). Give it a few days until all hosts return the "shorter" workunits, then fill up their queue with the "longer" ones, then the longer results are uplodaded and compared. | |
ID: 3237 · Reply Quote | |
With the longer WUs, I'm still getting "Computer has reached limit of tasks in progress" on some of my machines with 1 day cache. Same as with the smaller WUs. This means the number of WUs downloaded is the same as before the change. WUs would have to get bigger before the number downloaded doesn't hit that limit. 6 x # of threads didn't even provide a half day cache with the smaller WUs, now it provides maybe a little over a half day but not one day. | |
ID: 3238 · Reply Quote | |
The longer work units are taking slightly less time to complete than the Hs work units we had before. | |
ID: 3239 · Reply Quote | |
With the longer WUs, I'm still getting "Computer has reached limit of tasks in progress" on some of my machines with 1 day cache.TN-Grid limits the total number of workunits per host (regardless of its core count) to help maximize the total output of the project by spreading the work between as many hosts as possible, regardless of their core counts. Hosts with large core conts finish work more frequently, thus they have better chance to download work during a shortage. Same as with the smaller WUs. This means the number of WUs downloaded is the same as before the change.This means that your host can queue 33% more work than before the change. WUs would have to get bigger before the number downloaded doesn't hit that limit. 6 x # of threads didn't even provide a half day cache with the smaller WUs, now it provides maybe a little over a half day but not one day.That's no problem (both for you and for the project) provided that your host *always* have *some* work in its queue. As every workunit is a piece in a chain of 58 workunits, workunits that just sitting in a computer's queue holds back the completion of the entire chain. The more core a host have the more chains it can put on hold. The reason for limiting the total number of workunits is limiting the number of chains a host can put on hold, without reducing the host's througput. BTW your hosts are a nice example to show why *not* to crunch on virtual cores: Your 13 years old AMD Phenom II 1090T X6 CPU can finish a longer VV workunit in 13.100 seconds, while your 4 year old AMD Ryzen 9 3900X CPU can finish the same workunit in 12.000 seconds. Of course, it can finish twice as much workunits under the same time than the older CPU (so it's RAC is higher), but I guess it could do the same amount of work (or even a little more) if you would limit the number of tasks to the number of cores (6). Depending on the extra cache misses the extra task per core inflict, the performance running limited number of tasks simultaneously can be better. | |
ID: 3240 · Reply Quote | |
TN-Grid limits the total number of workunits per host (regardless of its core count) to help maximize the total output of the project by spreading the work between as many hosts as possible, regardless of their core counts. Hosts with large core conts finish work more frequently, thus they have better chance to download work during a shortage. That isn't my observation. On the 64 thread system I receive 384 WUs (6 x 64). After the change I still get 384 WUs. If I change the thread count to 128 I will get 768 (6 x 64) and it is consistent across all of my systems. This means that your host can queue 33% more work than before the change. I agree as long as we are only considering the number of "chunks" being downloaded (800 vs 600) but the # of WUs is the same as before. Supposedly, the problem we were trying to solve was related to the work generator and the number of work units created per unit of time. That doesn't seem to have changed much after the change. That's no problem (both for you and for the project) provided that your host *always* have *some* work in its queue. As every workunit is a piece in a chain of 58 workunits, workunits that just sitting in a computer's queue holds back the completion of the entire chain. The more core a host have the more chains it can put on hold. The reason for limiting the total number of workunits is limiting the number of chains a host can put on hold, without reducing the host's througput. I am always going to have a number of WUs in "Ready to Start" state to allow me to continue to work through outages whether at the project site or locally. That is a choice I make as a cruncher but I try to limit it to less than a day for the reasons you describe. BTW your hosts are a nice example to show why *not* to crunch on virtual cores: I choose to run on virtual cores as I have found it a pain in the back side to have to constantly go into the BIOS to turn off HT/SMT for different projects. The lower thread count systems don't get me enough extra throughput to justify the trouble. The bigger server gives me better pay back for running fewer threads but I do that through the BOINC Manager (Use 50% of the CPUS) and yes I know that by doing that I'm not truly eliminating virtual cores and work isn't always balanced across the sockets but WUs run in about 1/2 the time. Turning off SMT wouldn't get me that much more throughput. I would advocate for changing the WUs to 1200 from 800. I think that would make them run about the same time as the HS work. | |
ID: 3241 · Reply Quote | |
That isn't my observation. On the 64 thread system I receive 384 WUs (6 x 64). After the change I still get 384 WUs. If I change the thread count to 128 I will get 768 (6 x 64) and it is consistent across all of my systems.That's my mistake. Perhaps I should consider it as a new idea then. I choose to run on virtual cores as I have found it a pain in the back side to have to constantly go into the BIOS to turn off HT/SMT for different projects. The lower thread count systems don't get me enough extra throughput to justify the trouble.Agreed. I don't recommend to turn it off in the BIOS. The bigger server gives me better pay back for running fewer threads but I do that through the BOINC Manager (Use 50% of the CPUS) and yes I know that by doing that I'm not truly eliminating virtual cores and work isn't always balanced across the sockets but WUs run in about 1/2 the time. Turning off SMT wouldn't get me that much more throughput.Modern OSes select the cores wisely for power hungry apps. I let them do it on their own. Windows is bad for running thousands of threads (my Windows 11 PC: 3300) that's one of the reasons for it's degraded performance (compared to Linux). For anyone interested in other methods than changing the BOINC manager options -> Computing preferences -> "Use at most 50% of the CPUs": I put an app_config.xml in each project's directory, that look similar to this: <app_config>
<app>
<name>gene_pcim</name>
<max_concurrent>7</max_concurrent>
</app>
</app_config> You can figure out the app name from the project's webpage, or from the BOINC manager's log. (Nothing bad happens when you put an incorrect name here, BOINC manager will show the known app names in it's log, you can correct the names accordingly)The other method is limit the project itself: <app_config>
<project_max_concurrent>7</project_max_concurrent>
</app_config> This will also limit the maximum number of workunits in the queue.When I crunch for multiple projects at the same time, it's tedious to set these files to add up to the number of cores I want to use, in this case I use the cc_config.xml method: (It's located in the BOINC directory) <cc_config>
<options>
<ncpus>7</ncpus>
</options>
</cc_config> This will also limit the maximum number of workunits in the queue.Originally this value is set to -1 (=all CPUs). Don't forget to make BOINC manager to read the configuration files after any changes you've made. I would advocate for changing the WUs to 1200 from 800. I think that would make them run about the same time as the HS work.Agreed, my suggested number was 920 (50 workunits) or 1150 (40 workunits). (1200 would result 38 + 1/3 workunits.) | |
ID: 3242 · Reply Quote | |
The number of tasks in progress is increasing slowly but steadily (~1000 per day it's 32.318 atm), while the ready to send is fluctuating between 80 and 120 workunits, so the work generator can keep up the pace just barely. Many people (on the northern hemisphere) will leave for summer vacation soon, therefore the available computing power will decrease soon, so I think the settings are fine for now. But I'm still curious how the project would perform when it's generating even larger (1150 chunks) workunits. | |
ID: 3243 · Reply Quote | |
New project will be done before northern people go on vacation...is already less than 2 months to finish, it could be earlier if more workunits were provided:) | |
ID: 3244 · Reply Quote | |
The number of tasks in progress is still increasing slowly. It's 35.016 at the moment, which is near the previous top (35.559). I wonder when will we reach the new top in the number of tasks in progress, and what that number will be. There are 80 workunits ready to send. | |
ID: 3245 · Reply Quote | |
Such amount of workunits to be sent, is almost zero, as soon as a couple of hosts request more WU, again we go to 0 and then hosts will be emptying their queues. | |
ID: 3246 · Reply Quote | |
| |
ID: 3247 · Reply Quote | |
The number of tasks in progress is still increasing slowly. | |
ID: 3248 · Reply Quote | |
Again, WUs to be sent around 0, task in progress 34685. | |
ID: 3249 · Reply Quote | |
Message boards :
Number crunching :
OUT of tasks