Shortest processing time ever
log in

Advanced search

Message boards : Number crunching : Shortest processing time ever

Author Message
Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 2968 - Posted: 2 Nov 2022, 11:39:44 UTC
Last modified: 2 Nov 2022, 11:43:11 UTC

3.817,81 seconds (1 hour 3 minutes 37,81 seconds).
This was done on my i9-12900F
- no CPU overclocking (beside the built-in "turbo mode")
- the RAM is DDR5 @ 5200MHz 40-40-40-80 in dual channel
- no other tasks were running (hence the single core turbo mode)
- processing is done by the FMA app
- the OS is Linux Ubuntu 20.04.5 LTS (5.15.0-52)

When there are 7 TN-Grid tasks + 1 Folding@home GPU task running on this host, the TN-Grid processing times are around 5.100 seconds.
This is ~33,6% increase, that reflects the CPU clock difference between the turbo mode, and the non-turbo mode (5.1GHz vs 3.8GHz, this CPU has a 65W TDP).

To achieve optimal processing times, I suggest to limit the number of usable CPU cores in BOINC to 50% on hyper-threaded (or SMT, in AMD terminology) hosts, especially on Linux (Or even lower on 12th gen (or newer) Intel CPUs with efficiency cores (i5-12600k, i7, i9)). Depending on the CPU architecture and the RAM bandwith, this may even slightly increase the RAC of a given host.
If you don't want to reduce the number of tasks queued on your host by limiting the number of CPU cores generally in BOINC, you can use the app_config.xml file to limit the number of simultaneous tasks for each project.
For example put an app_config.xml file to the C:\ProgramData\BOINC\projects\gene.disi.unitn.it_test folder (/var/lib/boinc-client/projects/gene.disi.unitn.it_test on Linux)
With the following content: (you should set your own number of CPU cores, I used 8 in this example)

<app_config> <project_max_concurrent>8</project_max_concurrent> </app_config>

(repeat this process for all single-threaded CPU projects)

Profile Keith Myers
Send message
Joined: 26 Jun 20
Posts: 64
Credit: 15,299,594
RAC: 0
United States
Message 2971 - Posted: 2 Nov 2022, 19:00:46 UTC

It will be interesting to see if following tasks on that host crunch in the same shorter time.

Or whether it was just an outlier with a limited parameter set that lead to a shorter cruching time.

Falconet
Send message
Joined: 21 Dec 16
Posts: 105
Credit: 3,092,711
RAC: 0
Portugal
Message 2972 - Posted: 2 Nov 2022, 19:03:32 UTC

It could just be a smaller task. I had one such today that ran for almost 8,000 seconds rather than the usual 20,000-21,000 seconds.
____________

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 2977 - Posted: 5 Nov 2022, 18:37:38 UTC - in response to Message 2972.

It could just be a smaller task.

It was a "normal" task.

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 2978 - Posted: 5 Nov 2022, 18:54:11 UTC - in response to Message 2971.
Last modified: 5 Nov 2022, 18:56:13 UTC

it was just an outlier with a limited parameter set that lead to a shorter cruching time.

The major factor that made this task that fast is there was nothing else running on that host at that time, so the single core turbo could kick in.

Previously I had some random restarts when the folding@home was running on that host.
After a week I've figured out that it's PSU-related: there are not enough power cables for the CPU and the GPU, so I've swapped the PSU.
Since then it seems to be running fine at 4.7GHz on all p-cores.
The processing time is around 4.100-4.300 seconds (7 TN-Grid tasks + 1 Folding@home task). (The e-cores are not that great for crunching)

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 2979 - Posted: 6 Nov 2022, 9:54:37 UTC - in response to Message 2968.
Last modified: 6 Nov 2022, 10:04:19 UTC

If you don't want to reduce the number of tasks queued on your host by limiting the number of CPU cores generally in BOINC, you can use the app_config.xml file to limit the number of simultaneous tasks for each project.
A little correction: this method (just like setting a global limit on the usable CPU cores in BOINC) limits the number of tasks queued. (I've set up my host like that to verify it).

The main message of this experiment is that one should not crunch (single-threaded apps) on the "virtual" (hyper-threaded) cores to achieve short runtimes. (Running too many of these tasks makes them wait for each other's FP operations, it could also easily fill up the last level cache of the CPU, resulting in increased cache misses and much slower memory transfers.)

entity
Send message
Joined: 20 Jul 20
Posts: 20
Credit: 31,475,949
RAC: 0
United States
Message 2980 - Posted: 6 Nov 2022, 15:06:04 UTC - in response to Message 2979.

If you don't want to reduce the number of tasks queued on your host by limiting the number of CPU cores generally in BOINC, you can use the app_config.xml file to limit the number of simultaneous tasks for each project.
A little correction: this method (just like setting a global limit on the usable CPU cores in BOINC) limits the number of tasks queued. (I've set up my host like that to verify it).

The main message of this experiment is that one should not crunch (single-threaded apps) on the "virtual" (hyper-threaded) cores to achieve short runtimes. (Running too many of these tasks makes them wait for each other's FP operations, it could also easily fill up the last level cache of the CPU, resulting in increased cache misses and much slower memory transfers.)

If you don't turn off hyperthreading in the BIOS how do you guarantee only one thread per physical CPU core. The OS doesn't know the difference and will schedule threads on whatever is available. If you truly want to only run one thread on a core you have to turn off hyperthreading.

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 2981 - Posted: 6 Nov 2022, 22:11:08 UTC - in response to Message 2980.

If you don't turn off hyperthreading in the BIOS how do you guarantee only one thread per physical CPU core.
I wrote a little batch program (for Windows) to set process affinities in order to assign tn-grid tasks to different CPU cores, but it made a very little difference.

The OS doesn't know the difference and will schedule threads on whatever is available.
Every recent OS knows the difference between CPU cores and threads. (On Windows you can check it in task manager, on the CPU tab you'll see how many cores and how many threads your CPU has). The recent OSes even know the difference between the "performance-cores" (p-cores) and the "efficiency-cores" (e-cores) of the recent Intel CPUs (the latter doesn't have HT, or FPU).
The Ubuntu Linux 22.04 I use scedules tasks for each CPU core (and leaves it running on that core), until it runs out of cores. Windows use a different approach (spreading a high CPU load process evenly between cores to make the cores heat up evenly - probably that makes tasks run slower on Windows), but it won't use the same core for two high CPU load process until it runs out of cores.

If you truly want to only run one thread on a core you have to turn off hyperthreading.
That'll surely force the OS not to schedule tasks on the same core, but it's necessary only for old OSes.

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 3062 - Posted: 15 Jan 2023, 19:44:55 UTC

I've tested a PC with an i5-13600K CPU at 5.1GHz (RAM is dual chanel DDR4 at 3600MHz) under Linux.
This CPU has 6 P-cores and 8 E-cores (20 threads total).
First I run 7 TN-Grid task by mistake, later 5 TN-Grid tasks plus one folding@home GPU task (which uses a full CPU thread to feed the GPU).
The processing times has gone as low as 3800 seconds (1 hour and 3-4 minutes) for the FMA app.
You can check the results here.
The processing times got lower as much as the clock speed got higher.
I've concluded that the 13th generation Intel CPUs have the same P-Cores as the 12th gen CPUs had, though the clock speeds got slightly higher (just as the power limits). If there's any new feature in the 13th gen P-Cores, the TN-Grid client does not benefit from it.

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 3169 - Posted: 15 May 2023, 23:32:50 UTC

Vitis vinifera workunits: ~2110 sec (~35m 10s)
See here.

Profile Conan
Send message
Joined: 6 Sep 15
Posts: 13
Credit: 7,885,837
RAC: 0
Australia
Message 3171 - Posted: 16 May 2023, 8:31:22 UTC

Depends on computer a bit as well.

My AMD Ryzen 9 5900X using all cores and threads (24) takes just under 2 hours running Linux Fedora 37.

Same computer running Windows 10 takes 10 minutes longer at 2 hours 9 minutes.

My newer Ryzen 9 7900X running ECO mode (105W) with all cores and threads (24) running takes 1 hour.

Conan

Retvari Zoltan
Send message
Joined: 31 Mar 20
Posts: 43
Credit: 51,206,467
RAC: 0
Hungary
Message 3259 - Posted: 17 Jun 2023, 19:24:16 UTC
Last modified: 17 Jun 2023, 19:45:30 UTC

I've assembled a Ryzen 9 7900X for a friend, and I'm testing it with TN-Grid.
The OS installation is quite tedious, as I have to
- start with Linux 18.04 to be able to install folding@home,
- then upgrade to 20.04
- then upgrade to 22.04
- then upgrade it to kernel 5.19.
This is a 12 core, 24 thread CPU, so according to my precept I was running 12 TN-Grid tasks simultaneously from the beginning (by setting "use at most 50% of the CPUs" in BOINC manager computing preferences).
The CPU runs at 5.3GHz (as fas as I can tell in Linux).
Before the last upgrade (i.e. on Linux 18.04 and Linux 20.04 and Linux 22.04 kernel 5.15) the runtime was ~4000 seconds (~1h 6m 40s).
After the upgrade to kernel 5.19 the runtimes dropped to ~3300 seconds (55m).
Then I've overclocked the (DDR5) RAM and the runtimes dropped further to ~3000s (50m).

RAM non-overclocked: 4800MHz CL40 [8,3ns] 1.1V, RAM overclocked: 5600MHz CL36 [6,43ns] 1.25V,
The overclocked access time is 77% of the non-overclocked RAM, while the overclocked runtime is 91% of the non-overclocked RAM.
So the Ryzen 9 7900X per core performance was around the per core performance of my i7-11700F, after the upgrade to kernel 5.19 it reached the per core performance of my i5-12400, by overclocking the RAM it's only 7% slower per core than my i9-12900F, but the Ryzen only has "performance" cores, 50% more of them than the i9-12900F, so it does 40% more work in total.
You can check the runtimes here.

Technologov
Send message
Joined: 27 Jan 22
Posts: 36
Credit: 302,393,914
RAC: 1
Ukraine
Message 3260 - Posted: 17 Jun 2023, 19:54:14 UTC - in response to Message 3259.
Last modified: 17 Jun 2023, 19:55:11 UTC

I also recommend to enable ECO-mode, 65 watt TDP limit, via BIOS. (Part of AMD PBO settings)

This gets you 80% performance at only 40% power budget, and effectively doubles your power efficiency. It also converts your PC to an equivalent of Ryzen 9 7900 (non X).


Post to thread

Message boards : Number crunching : Shortest processing time ever


Main page · Your account · Message boards


Copyright © 2024 CNR-TN & UniTN