Exploring CPU Core to Core Latency and the Role that Locks Play

This article has been a LONG time coming since our article on Rocket Lake, where we talked about core to core latency for the first time here on Chips and Cheese. This is a follow up article exclusively about core to core latency.

The core to core latency tests used by us and Anandtech measure contested lock latency, specifically using atomic compare-and-set instructions. These instructions are used in multithreaded applications to enforce ordering between threads, including to implement higher level mutual exclusion locks.

When a thread executes a locked instruction (such as an atomic compare-and-set), it gets exclusive access to a block of data, usually a cache line, and guarantees the instruction will complete its memory operations without any other thread accessing the same memory. Now if another core wants to access that data for any reason, it must be transferred from the core that previously owned it. That’s how the core to core latency test works – two threads continuously try to execute atomic compare-and-set operations on the same data, forcing the cache line to be bounced between the two cores running the threads.

Of course, testing how a CPU performs in a core to core latency test says nothing about how that applies to real world applications. So we decided to test just how often locks cause core to core transfers in War Thunder, Time Spy, Overwatch, and our core to core latency using locks.

How often memory accesses resulted in a core to core transfer. “Retired” means the instruction’s results were committed at the end of the CPU’s pipeline. “XSNP” means a cross core snoop was required after a L3 hit. “HITM” means the line was found in modified state, requiring a core to core transfer.

In the applications that we tested here, locks are a very rare occurrence with the exception of our Core to Core latency test. In all the applications we tested, contested locks affected 0.01% of instructions or less. In fact, core to core transfers are rarer than L3 cache misses. Typically games have around 20-30 per 10000 instructions suffering a L3 cache miss, which means that games are much more bound by memory latency than lock latency. If you picked an instruction at random, it’s 20-30 times more likely to miss L3 than require a core to core transfer. The situation is more skewed for very parallel productivity workloads like Cinebench, where L3 misses happen about 80x as often as core to core transfers.

Comparing the frequency of core to core transfers to that of L3 misses. If anything, this underestimates the number of L3 misses, because we’re only looking at memory loads (not L3 misses from instruction read requests)

So in conclusion, a core to core latency test using locks isn’t very indicative of how a CPU will perform with real world usage either of games or productivity workloads. Core to core latency is merely one part of a CPU’s overall performance, and plays a small role compared to other factors like the performance of a CPU’s cache and memory hierarchy.

I would like to thank Clamchowder for writing all the tests and measurement applications used in this article and for helping me with the writing in this article.

Exploring CPU Core to Core Latency and the Role that Locks Play

Trending Articles

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

FIPS issues in Windows, .NET, and Visual Studio

Youth accused of having 17 forged $100 notes

XLN Audio Addictive Drums 2 Complete v2.1.7 Incl Keygen HAPPY NEW YEAR-R2R

Aberdeen woman in £15k benefit fraud

Exit Code 17006 when trying to update and Office 365 install with Project

Words and Expressions Class 9 Solutions | NCERT Class 9 Words and Expressions...

Error code CE-30095-7

Deployment configuration on indexers - DC:DeploymentClient -...

How to assign the custom BDXXX scripts to NPCs?

Oh Nadhaa - Umar zahir cover.

Practice Sheet of Right form of verbs for HSC Students

Doctor winning fight to the death against cancer

XXX esx.problem.hyperthreading.unmitigated.formatOnHost not found XXX (Build...

Questions regarding Proxmox and Dell Power Edge VRTX

(((IMPORTANT))): How to absorb high info content courses quickly

Forum Post: RE: Pump Selecting

UDP RSS performance issues with vmxnet3 on ESXi-6.7

[Forums] Plugin development : Error: Cannot find module 'lodash'

Manchester United Font 2021-2022