14

Are there any changes that could be made to CPUs to make them perform better for concurrent runtimes like Rust? For instance, are there changes to branch prediction implementations or cache sizes that would help concurrent runtimes?

I have the impression that current CPU designs might be optimized more for procedural runtimes like C. If we were instead going to optimize for concurrent runtimes, how would CPUs look different?

For isntance, branch prediction was implemented based on generalizations drawn in research papers analyzing procedural codes. I'm wondering if the concurrency abstraction will add a significant working-set to the runtime that adversely impacts existing branch prediction algorithms. For example, predicting in a for loop is one thing, but when the target of the branch is always some new portion of memory (graphic, text, etc), it'll always be a cache miss, and there'll never be branch history for it-- because neither have touched it yet.

This is probably a silly question because the content, though it might always be in RAM, will be branched to an order-of-magnitude less than it'll be used (once it's been loaded to cache)...but still, there should be an observable temporal-boundary to the contexts stored in the cache and branch predictors in a procedural runtime, that would be manifest as an abstraction-boundary in a more parallelized environment. So I'm wondering... Have these boundaries been observed? Have any research papers have analyzed this?

Are CPU architectures biased towards procedural code over concurrent code; or are modern CPUs sufficiently general-purpose that a highly concurrent language doesn't suffer?

D.W.
  • 167,959
  • 22
  • 232
  • 500
paIncrease
  • 279
  • 1
  • 4

2 Answers2

1

It is probably more the case that modern computer architectures are designed with the goal of improving the quality of code generated by compilers against a budget of cost in die area and power used. Runtime libraries are just a specific instance of compiled code that needs to be executed in an efficient manner.

For a very long time, the target language for most architectures has been the "C" language. This reflects the modest demands that that language makes on its hardware and the fact that it has become an almost universal systems programming language (Sorry Rust and Go, you have a long way to go to beat C).

A consequence of this seems to be that new languages are often defined in terms of their C equivalent semantics just so that they will avoid needing processor facilities that are likely to be absent on current computers.

The payoff for a processor that matches well with modern compilers is that code from those compilers runs well and the processor has at least a chance to be competitive. The cost of failure here dooms the processor before it can get started. Just two examples in the negative include the iAPX-432 and the Itanium, both by Intel. Both had a very poor relationship with their compilers (Ada and C respectively) with the failure of the products turning into a blame game between silicon and software.

Peter Camilleri
  • 751
  • 6
  • 13
0

Without a doubt, yes.

In particular, the communications model implied by C99 is shared-memory. More advanced concurrent languages have richer communication models, such as message passing channels (as in Rust).

Modern CPU architectures do have explicit hardware support for shared-memory. In particular, cache coherency protocols like MESI are implemented in actual gates and wires. There is no real support for message passing between processes, even though the idea of message passing isn't alien to CPU's. Modern PCI-e buses even emulate shared memory using message passing, whereas CPU processes have to emulate message passing using shared memory!

MSalters
  • 927
  • 4
  • 12