Can a dynamic language like Ruby/Python reach C/C++ like performance?

Question

I wonder if it is possible to build compilers for dynamic languages like Ruby to have similar and comparable performance to C/C++? From what I understand about compilers, take Ruby for instance, compiling Ruby code can't ever be efficient because the way Ruby handles reflection, features such as automatic type conversion from integer to big integer, and lack of static typing makes building an efficient compiler for Ruby extremely difficult.

Is it possible to build a compiler that can compile Ruby or any other dynamic languages to a binary that performs very closely to C/C++? Is there a fundamental reason why JIT compilers, such as PyPy/Rubinius will eventually or will never match C/C++ in performance?

Note: I do understand that “performance” can be vague, so to clear that up, I meant, if you can do X in C/C++ with performance Y, can you do X in Ruby/Python with performance close to Y? Where X is everything from device drivers and OS code, to web applications.

Konrad Rudolph · Answer 1 · 2012-03-28T12:26:27.480

To all those who said “yes” I’ll offer a counter-point that the answer is “no”, by design. Those languages will never be able to match the performance of statically compiled languages.

Kos offered the (very valid) point that dynamic languages have more information about the system at runtime which can be used to optimise code.

However, there‘s another side of the coin: this additional information needs to be kept track of. On modern architectures, this is a performance killer.

William Edwards offers a nice overview of the argument.

In particular, the optimisations mentioned by Kos can’t be applied beyond a very limited scope unless you limit the expressive power of your languages quite drastically, as mentioned by Devin. This is of course a viable trade-off but for the sake of the discussion, you then end up with a static language, not a dynamic one. Those languages differ fundamentally from Python or Ruby as most people would understand them.

William cites some interesting IBM slides:

Every variable can be dynamically-typed: Need type checks

Every statement can potentially throw exceptions due to type mismatch and so on: Need exception checks

Every field and symbol can be added, deleted, and changed at runtime: Need access checks

The type of every object and its class hierarchy can be changed at runtime: Need class hierarchy checks

Some of those checks can be eliminated after analysis (N.B.: this analysis also takes time – at runtime).

Furthermore, Kos argues that dynamic languages could even surpass C++ performance. The JIT can indeed analyse the program’s behaviour and apply suitable optimisations.

But C++ compilers can do the same! Modern compilers offer so-called profile-guided optimisation which, if they are given suitable input, can model program runtime behaviour and apply the same optimisations that a JIT would apply.

Of course, this all hinges on the existence of realistic training data and furthermore the program cannot adapt its runtime characteristics if the usage pattern changes mid-run. JITs can theoretically handle this. I’d be interested to see how this fares in practice, since, in order to switch optimisations, the JIT would continually have to collect usage data which once again slows down execution.

In summary, I’m not convinced that runtime hot-spot optimisations outweigh the overhead of tracking runtime information in the long run, compared to static analysis and optimisation.

Devin Jeanpierre · Answer 2 · 2012-03-28T09:26:58.093

if you can do X in C/C++ with performance Y, can you do X in Ruby/Python with performance close to Y?

Yes. Take, as an example, PyPy. It is a collection of Python code that performs close to C in doing interpretation (not all that close, but not all that far away either). It does this by performing full-program analysis on the source code to assign each variable a static type (see the Annotator and Rtyper docs for details), and then, once armed with the same type information you give C, it can perform the same sorts of optimizations. At least in theory.

The tradeoff of course is that only a subset of Python code is accepted by RPython, and in general, even if that restriction is lifted, only a subset of Python code can do well: the subset that can be analyzed and given static types.

If you restrict Python enough, optimizers can be built that can take advantage of the restricted subset and compile it to efficient code. This is not really an interesting benefit, in fact, it's well known. But the whole point of using Python (or Ruby) in the first place was that we wanted to use interesting features that perhaps don't analyze well and result in good performance! So the interesting question is actually...

Additionally, will JIT compilers, such as PyPy/Rubinius ever match C/C++ in performance?

Nah.

By which I mean: sure, maybe as code runs accumulate you can get enough typing information and enough hotspots to compile all of the code all the way down to machine code. And maybe we can get this to perform better than C for some code. I don't think that's hugely controversial. But it still has to "warm up", and performance is still a bit less predictable, and it won't be as good as C or C++ for certain tasks that require consistently and predictably high performance.

The existing performance data for Java, which has both more type information than Python or Ruby, and a better-developed JIT compiler than Python or Ruby, still doesn't match up to C/C++. It is, however, in the same ballpark.

score 19 · Answer 3 · answered Mar 29 '12 at 01:16

The short answer is: we don't know, ask again in 100 years. (We might still not know then; possibly we'll never know.)

In theory, this is possible. Take all the programs that have ever been written, manually translate them to the most efficient possible machine code, and write an interpreter that maps source codes to machine codes. This is possible since only a finite number of programs have ever been written (and as more programs get written, keep up the manual translations). This is also, of course, completely idiotic on practical terms.

Then again, in theory, high-level languages might be able to reach the performance of machine code, but they won't surpass it. This is still very theoretical, because in practical terms, we very rarely resort to writing machine code. This argument does not apply to comparing higher-level languages: it doesn't imply that C must be more efficient than Python, only that machine code cannot do worse than Python.

Coming from the other side, on purely experimental terms, we can see that most of the time, interpreted high-level languages perform worse than compiled low-level languages. We tend to write non-time-sensitive code in very high-level languages and time-critical inner loops in assembly, with languages like C and Python falling in between. While I don't have any statistics to back this up, I think this is indeed the best decision in most cases.

However, there are uncontested instances where high-level languages beat the code that one would realistically write: special-purpose programming environments. Programs like Matlab and Mathematica are often far better at solving certain kinds of mathematical problems than what mere mortals can write. The library functions may have been written in C or C++ (which is fuel towards the “low-level languages are more efficient” camp), but that's none of my business if I'm writing Mathematica code, the library is a black box.

Is it theoretically possible that Python will get as close, or maybe even closer, to optimal performance than C? As seen above, yes, but we are very far from that today. Then again, compilers have made a lot of progress in the past decades, and that progress is not slowing down.

High-level languages tend to make more things automatic, so they have more work to perform, and thus tend to be less efficient. On the other hand, they tend to have more semantic information, so it can be easier to spot optimizations (if you're writing a Haskell compiler, you don't have to worry that another thread will modify a variable under your nose). One of several efforts to compare ~~apples and oranges~~ different programming languages is the Computer Language Benchmark Game (formerly known as the shootout). Fortran tends to shine at numerical tasks; but when it comes to manipulating structured data or high-rate thread commutation, F# and Scala do well. Don't take these results as gospel: a lot of what they are measuring is how good the author of the test program in each language was.

An argument in favor of high-level languages is that performance on modern systems is not so strongly correlated with the number of instructions that are executed, and less so over time. Low-level languages are good matches for simple sequential machines. If a high-level language executes twice as many instructions, but manages to use the cache more intelligently so it does half as many cache misses, it may end up the winner.

On server and desktop platforms, CPUs have almost reached a plateau where they don't get any faster (mobile platforms are getting there too); this favors languages where parallelism is easy to exploit. A lot of processors spend most of their time waiting for an I/O response; the time spent in computation matters little compared with the amount of I/O, and a language that allows the programmer to minimize communications is at an advantage.

All in all, while high-level languages start with a penalty, they have more room for improvement. How close can they get? Ask again in 100 years.

_{Final note: often, the comparison is not between the most efficient program that can be written in language A and the same in language B, nor between the most efficient program ever written in each language, but between the most efficient program that can be written by a human in a certain amount of time in each language. This introduces an element that cannot be analyzed mathematically, even in principle. In practical terms, this often means that the best performance is a compromise between much how low-level code you need to write to meet performance goals and how much low-level code you have time to write to meet release dates.}

score 11 · Answer 4 · answered Mar 28 '12 at 05:59

The basic difference between the C++ statement x = a + b and the Python statement x = a + b is that a C/C++ compiler can tell from this statement (and a little extra information that it has readily available about the types of x, a, and b) precisely what machine code needs to be executed. Whereas to tell what operations the Python statement is going to do, you need to solve the Halting Problem.

In C that statement will basically compile to one of a few types of machine addition (and the C compiler knows which one). In C++ it might compile that way, or it might compile to calling a statically known function, or (worst case) it might have to compile to a virtual method lookup and call, but even this has a fairly small machine code overhead. More importantly though, the C++ compiler can tell from the statically known types involved whether it can emit a single fast addition operation or whether it needs to use one of the slower options.

In Python, a compiler could theoretically do nearly that good if it knew that a and b were both ints. There's some additional boxing overhead, but if types were statically known you could probably get rid of that too (while still presenting the interface that integers are objects with methods, hierarchy of super-classes, etc). The trouble is a compiler for Python can't know this, because classes are defined at runtime, can be modified at runtime, and even the modules that do the defining and importing are resolved at runtime (and even which import statements are executed depends on things that can only be known at runtime). So the Python compiler would have to know what code has been executed (i.e. solve the Halting Problem) in order to know what the statement it is compiling will do.

So even with with most sophisticated analyses that are theoretically possible, you simply can't tell much about what a given Python statement is going to do ahead of time. This means that even if a sophisticated Python compiler were implemented, it would in almost all cases still have to emit machine code that follows the Python dictionary lookup protocol to determine the class of an object and find methods (traversing the MRO of the class hierarchy, which can also change dynamically at runtime and so is difficult to compile to a simple virtual method table), and basically do what the (slow) interpreters do. This is why there aren't really any sophisticated optimising compilers for dynamic languages. It's not merely hard to create one, the maximum possible payoff isn't as big as it is for languages like C/C++.

Note that this isn't based on what the code is doing, it's based on what the code could be doing. Even Python code that is a simple series of integer arithmetic operations has to be compiled as if it might be invoking arbitrary class operations. Static languages have greater restrictions on the possibilities for what the code could be doing, and consequently their compilers can make more assumptions.

JIT compilers gain on this by waiting until runtime to compile/optimise. This lets them emit code that works for what the code is doing rather than what it could be doing. And because of this JIT compilers have a much huger potential payoff for dynamic languages than for static languages; for more static languages much of what an optimiser would like to know can be known ahead of time, so you might as well optimise it then, leaving less for a JIT compiler to do.

There are various JIT compilers for dynamic languages that claim to achieve execution speeds comparable to that of compiled and optimised C/C++. There are even optimisations that can be by a JIT compiler that cannot be done by an ahead of time compiler for any language, so theoretically JIT compilation (for some programs) could one day outperform the best possible static compiler. But as Devin rightly pointed out, the properties of JIT compilation (only the "hotspots" are fast, and only after a warmup period) means that JIT-compiled dynamic languages are unlikely to ever be suitable for all possible applications, even if they become as fast or faster than statically compiled languages generally.

Raphael · Answer 5 · 2012-03-28T07:08:17.210

Just a quick pointer that outlines the worst case scenario for dynamic languages:

$\qquad$Perl parsing is not computable

As a consequence, (full) Perl can never be compiled statically.

In general, as always, it depends. I am confident that if you try to emulate dynamic features in a statically compiled language, well-conceived interpretors or (partially) compiled variants can come near or undercut performance of statically compiled languages.

Another point to keep in mind is that dynamic languages solve another problem than C. C is barely more than nice syntax for assembler while dynamic languages offer rich abstractions. Runtime performance is often not the prime concern: time-to-market, for instance, depends on your developers being able to write complex, high-quality systems in short timeframes. Extensibility without recompilation, for instance with plugins, is another popular feature. Which language do you prefer in these cases?

score 5 · Answer 6 · answered Mar 28 '12 at 09:56

Can you build compilers for dynamic languages like Ruby to have similar and comparable performance to C/C++?

I think that the answer's "yes". I also believe that they can even surpass the current C/C++ architecture in terms of efficiency (even if slighly).

The reason is simple: There's more information in run-time than in compile-time.

Dynamic types are only a slight obstacle: If a function is always or almost-always executed with the same arguments types, then a JIT optimizer can generate a branch and machine code for that specific case. And there's so much more that can be done.

See Dynamic Languages Strike Back, a speech by Steve Yegge of Google (there's also a video version somewhere I believe). He mentions some concrete JIT optimisations techniques from V8. Inspiring!

I'm looking forward to what we're going to have in the next 5 years!

Patrick87 · Answer 7 · 2012-03-30T18:09:35.140

In an attempt to offer a more objectively scientific answer to this question, I argue as follows. A dynamic language requires an interpreter, or runtime, to make decisions at run time. This interpreter, or runtime, is a computer program and, as such, was written in some programming language, either static or dynamic.

If the interpreter/runtime was written in a static language, then one could write a program in that static language which (a) performs the same function as the dynamic program it interprets and (b) performs at least as well. Hopefully, this is self-evident, since to provide a rigorous proof of these claims would require additional (possibly considerable) effort.

Assuming these claims to be true, the only way out is to require that the interpreter/runtime be written in a dynamic language, as well. However, we run into the same issue as before: if the interpreter is dynamic, it requires an interpreter/runtime, which also must have been written in a programming language, dynamic or static.

Unless you assume that an instance of an interpreter is capable of interpreting itself at runtime (I hope this is self-evidently absurd), the only way to beat static languages is for each interpreter instance to be interpreted by a separate interpreter instance; this leads either to an infinite regress (I hope that this is self-evidently absurd) or a closed loop of interpreters (I hope this is also self-evidently absurd).

It seems, then, that even in theory, dynamic languages can perform no better than static languages, in general. When using models of realistic computers, it seems even more plausible; after all, a machine can only execute sequences of machine instructions, and all sequences of machine instructions can be statically compiled.

In practice, matching the performance of a dynamic language with a static language could require re-implementing the interpreter/runtime in a static language; however, that you can do that at all is the crux and point of this argument. It's a chicken and egg question and, provided you agree with the unproven (though, in my opinion, mostly self-evident) assumptions made above, we can actually answer it; we have to give the nod to the static, not dynamic, languages.

Another way to answer the question, in light of this discussion, is this: in the stored-program, control=data model of computing which lies at the heart of modern computing, the distinction between static and dynamic compilation is a false dichotomy; statically compiled languages must have a means of generating and executing arbitrary code at run time. It's fundamentally related to universal computation.

babou · Answer 8 · 2013-05-25T11:53:21.720

I did not have the time to read all answers in detail ... but I was amused.

There was a similar controversy in the sixties and early seventies (computer science history often repeats itself): can high level languages be compiled to produce code as efficient as the machine code, well, say assembly code, produced manually by a programmer. Everyone knows a programmer is much smarter than any program and can come up with very smart optimisation (thinking actually mostly of what is now called peephole optimization). This is of course irony on my part.

There was even a concept of code expansion : the ratio of the size of the code produced by a compiler to the size of code for the same program produced by a good programmer (as if there had been too many of these :-). Of course the idea was that this ratio was always greater than 1. The languages of the time were Cobol and Fortran 4, or Algol 60 for the intellectuals. I believe Lisp was not considered.

Well there were some rumors that someone had produced a compiler that could sometimes get an expansion ratio of 1 ... until it simply became the rule that compiled code was much better than hand written code (and more reliable too). People were worried about code size in those times (small memories) but the same goes for speed, or energie consumption. I will not go into the reasons.

Weird features, dynamic features of a language do not matter. What matters is how they are used, whether they are used. Performance, in whatever unit (code size, speed, energy, ...) is often dependent on very small parts of programs. Hence there is a good chance that facilities that give expressive power will not really get in the way. With good programming practice, advanced facilities are used only in a disciplined way, to imagine new structures (that was the lisp lesson).

The fact that a language does not have static typing has never meant that programs written in that language are not statically typed. On the other hand it might be that the type system a program uses is not yet sufficiently formalized for a type checker to exist now.

There have been, in the discussion, several references to worst case analysis ("halting problem", PERL parsing). But worst case analysis is mostly irrelevant. What matters is what happens in most cases or in useful cases ... however defined or understood or experienced. Here comes another story, directly related to program optimisation. It took place a long time ago in a major university in Texas, between a PhD student and his advisor (who was later elected in one of the national academies). As I recall, the student was insistent on studying an analysis/optimisation problem the advisor had shown to be untractable. Soon they were no longer on speaking terms. But the student was right: the problem was tractable enough in most practical cases so that the dissertation he produced became reference work.

And to comment further on the statement that Perl parsing is not computable, whatever is meant by that sentence, there is a similar problem with ML, which is a remarkably well formalized language. Type checking complexity in ML is a double exponential in the lenght of the program. That is a very precise and formal result in worst case complexity ... which does not matter at all. Afaik, ML users are still waiting for a practical program that will explode the type checker.

In many cases, as it was before, human time and competence is scarcer than computing power.

The real problem of the future will be to evolve our languages to integrate new knowledge, new programming forms, without having to rewrite all the legacy software that is still used.

If you look at mathematics, it is a very large body of knowledge. The languages used to express it, notations and concepts has evolved over the centuries. It is easy to write old theorems with the new concepts. We do adapt the main proofs, but do not bother for lots of results.

But in the case of programming, we might have to rewrite all the proofs from scratch (programs are proofs). It may be that what we really need is very high level and evolvable programming languages. Optimizer designers will be happy to follow.

score 2 · Answer 9 · edited Apr 13 '12 at 22:00

People who apparently think this is theoretically possible, or in a far future, are completely wrong in my opinion. The point lies in the fact that dynamic languages provide and impose a totally different programming style. Actually, the difference is twofold, even if both aspect are interrelated:

Symbols (vars, or rather id<->datum bindings of all kinds) are untyped.
Structures (the data, all what lives at runtime) are untyped as well by the types of their elements.

The second point provides genericity for free. Note that structures here are composite elements, collections, but also types themselves, and even (!) routines of all kinds (functions, actions, operations)... We could type structures by their element types, but due to the first point the check would happen at runtime anyway. We could have typed symbols and still have structured ones untyped according to their element types (an array a would just be typed as an array not as an array of ints), but even this few is not true in a dynamic language (a could as well contain a string).

The best perf we can achieve in dynamic programming is, in my opinion, equivalent to the following: implement in C the model of a dynamic language, let's call it $L$, ie a kind of fictional machine, or a complete runtime lib. And then program (in C directly) using only the model, no plain C features. This means having:

a totally polymorphic (C) Element type, which includes $L$ type annotations (or a ref to the actual representation of $L$'s type in the model)
all symbols are of that type Element, they can hold elements of any $L$ type
all structures (again, including model routines) receive only Element's

It is clear for me that this only is a huge perf penalty; and I don't even touch all consequence (the myriad of runtime checks of all kinds necessary to ensure program sensibility) well described in other posts.

score 1 · Answer 10 · answered Sep 08 '15 at 00:31

A couple of notes:

Not all high level languages are dynamic. Haskell is very high level, but is fully statically typed. Even systems programming languages like Rust, Nim, and D can express high-level abstractions succinctly and efficiently. In fact, they can be as concise as dynamic languages.
Highly optomizing ahead-of-time compilers for dynamic languages exist. Good Lisp implementations reach half the speed of equivalent C.
JIT compilation can be a big win here. CloudFlare's Web Application Firewall generates Lua code that is executed by LuaJIT. LuaJIT heavily optimizes the execution paths actually taken (typically, the non-attack paths), with the result that the code runs much faster than code produced by a static compiler on the actual workload. Unlike a static compiler with profile-guided optimization, LuaJIT adapts to changes in execution paths at runtime.
Deoptimization is also crucial. Instead of JIT-compiled code needing to check for a class being monkeypatched, the act of monkeypatching triggers a hook in the runtime system that discards the machine code that depended on the old definition.

Can a dynamic language like Ruby/Python reach C/C++ like performance?

10 Answers10

Linked