#18: JIT: bytecode, interpreters and compilers
by Tomasz Nurkiewicz
Source code can then be executed in two ways. Language implementations in general either interpret or compile it. In order to run an interpreted program you need one extra binary: an interpreter. Interpretation is simple: you read source code line by line and execute it. The compilation is much harder. A special program called a compiler reads your source code ahead of time (AOT) and translates it into machine code. After this translation your program is standalone. You don’t need a compiler to run it. Only you and your CPU.
Turns out this distinction is not that clear at all these days. Almost every language implementation performs compilation behind the scenes. And many languages that have a compiler produce code that needs an interpreter anyway. What?
OK, let’s take Python, standard CPython, as an example.
Python is clearly an interpreted language and there is no compiler involved.
Yet, you might have come across
These files contain Python compiled to so-called bytecode.
Bytecode is not a machine code that you can run directly on your x86 CPU.
However, bytecode is sufficiently low-level to be fast and easy to interpret.
For example, let’s take an expression
a + b * c.
An interpreter needs to understand that multiplication precedes addition.
Unless there are parentheses, which interpreter must take into account as well.
In bytecode, on the other hand, this expression looks like this:
LOAD a LOAD b LOAD c MULTIPLY ADD
Yeah, I know it sounds low-level. And that’s the point! A bytecode interpreter is much faster. But it gets even better! At runtime, an interpreter may dynamically translate bytecode into… machine code. So this abstract assembly-like code is turned into real CPU instructions. At runtime, when the program runs for sufficiently long. So an interpreter becomes a compiler. A compiler that’s often very effective because it understands particular CPU architecture.
JIT rarely works on source code. But there are compilers that target bytecode directly. For example most Java and .NET implementations. I believe that’s one of the reasons of the success of both of these platforms. There are dozens of languages targetting Java or .NET bytecode. But you only need one extremely fast interpreter (for example Java Virtual Machine or .NET CLR). This interpreter contains very mature JIT compiler. Compiler that produces fast, optimized code. An interpreter constantly watches your code at runtime and can make very smart decisions. For example, which methods to inline or which parts of code are dead and can be discarded.
In case of dynamically typed languages, an interpreter can also make a lot of educated guesses. For example, inferring types to save memory and avoid excessive type checks. Interestingly, one of the first usages of JIT was more than half a century ago! Ken Thompson compiled regular expressions at runtime into machine code to improve performance.
That’s it all for today. Bye!
- A crash course in just-in-time (JIT) compilers
- If Python is interpreted, what are .pyc files?
- Advanced Vector Extensions
- “What’s new in PHP 8.0?” Nikita Popov (about JIT)
- JIT in Ruby
Be the first to listen to new episodes!
To get exclusive content:
- Unedited, longer content
- More extra materials to learn
- Your user voice ideas are prioritized