LuaJIT
LuaJIT is a just in time compiler for the Lua programming language. It is generally a hard fork of Lua 5.1, although it does feature several backports from Lua 5.2.
![]() The logo featured on the LuaJIT website. | |
![]() | |
Original author(s) | Mike Pall |
---|---|
Stable release | 2.0.5
/ May 1, 2017 |
Repository | github |
Written in | C, Lua |
Operating system | Unix-like, MacOS, Windows, iOS, Android, PlayStation |
Platform | x86, X86-64, PowerPC, ARM, MIPS[1] |
Type | Just in time compiler |
License | MIT License[2] |
Website | luajit |
History
The LuaJIT project was started in 2005 by developer Mike Pall, released under the MIT open source license.[3]
The second major release of the compiler, 2.0.0, bolstered major performance increases[4]
The latest release, 2.0.5 is released in 2017. Since then, the project is not currently maintained by developers other than contributors.[5]
Notable users
- CERN, for their Methodical Accelerator Design 'next-generation' software for describing and simulating particle accelerators[6]
- OpenResty, a fork of nginx with Lua scripting[7]
- Kong, a web API gateway[8]
- Cloudflare, who use LuaJIT in their web application firewall service[9]
Performance
LuaJIT is often the fastest Lua runtime.[10] LuaJIT is also typically the fastest implementation of a dynamic programming language.[11]
Code written in LuaJIT which uses features designed for the just-in-time compiler will see significant performance slowdowns when using the interpreter. For example, while foreign-function interface structs can be faster than LuaJIT's hash table, using structs in non-hot code (eg, code which is likely to be interpreted) will see significant slowdowns when reading or writing to these structs. For this reason, while performance benefits are possible by using JIT-specific features, using the theoretically slower hash tables.
Due to LuaJIT's tracing compiler scheme, code generated by LuaJIT is often recompiled and refined as the stressors of the program change. As code gets 'hotter' and becomes more of a bottleneck for the program, LuaJIT will continuously attempt to refine the traced code to perform optimally for the workload. This behavior is used by Cloudflare, who rely on these linear 're-traces' to increase the resilience of their Web application firewall to Denial-of-service attacks.[12]
Major optimizations performed
- Allocation Sinking, which was introduced in LuaJIT 2.0, is a code sinking optimization which removes many unused or 'temporary' allocations from compiled code, by moving allocations to positions where they might escape to the Lua heap. Developers of LuaJIT programs can allocate many temporary objects, but these will remain on the stack or in registers until it is determined whether or not the temporary object may become long-lived.
- Stitching, which was added in LuaJIT 2.1.0-beta2, enables compiled Lua code to quickly de-optimize into the interpreter in order to call a Lua C function. Previously, when JIT-ed code attempted to call non-FFI C functions, it would abort compilation. Stitching enabled the compiled code to continue to be compiled, by using the interpreter's logic to call the C function.
Internal representation
LuaJIT uses two types of internal representation. A stack-based bytecode is used for the Interpreter (computing), and a static-single assignment form is used for the just-in-time compiler. Bytecode decompilations are available for traces using the -jdump
command-line option.
LuaJIT's interpreter bytecode is portable across architectures and minor version bumps, and can be used for compression. Interpreter bytecode is not secure, however, and bytecode loading should only be enabled if the source is trusted. LuaJIT's SSA form is ephemeral and only used while recording and compiling a trace.
LuaJIT has many "Not-yet implemented" facilities which cannot be JIT compiled. Whenever one of these is encountered, the trace will be aborted and nothing will be compiled.
for i = 1,100 do
io.write("hello, world!")
io.write("The current number is"..((i % 2 == 0 and "even") or "odd"))
end
As LuaJIT is a tracing just-in-time compiler, it's compilation is trace-based. The most branching control flow a trace can have is a conditional jump out of the trace (called a "guard") which resumes execution at the interpreter or an appropriate side trace. Traces can have a loop, but this is not required; LuaJIT may choose to begin a trace at the beginning of a function, especially if it is avoiding a compiler abort in a function higher in the calling stack.
LuaJIT does not support ahead-of-time compilation of traces.
---- TRACE 1 IR
-- Trace setup & constants redacted for brevity
0015 udt FLOAD nil #204
0016 p64 FLOAD 0015 udata.file -- Load STDOUT
0017 > p64 NE 0016 NULL -- Sanity check: Ensure STDOUT pointer is not null
--> loop-invariant code duplication
--> This is the first iteration of the loop, performed above the loop.
--> All loop-invariants are patched into the loop from here by the LuaJIT loop optimization.
0018 nil +17038428 +1 -- Call arguments
0019 nil 0018 +13
0020 nil 0019 0016
0021 int CALLS fwrite ([0x7f4b78fec858] +1 +13 0016)
0022 int BAND 0001 +1
0023 > int NE 0022 +0
0026 nil +16978892 +1 -- Call arguments
0027 nil 0026 +24
0028 nil 0027 0016
0029 int CALLS fwrite ([0x7f4b7900b3e8] +1 +24 0016)
0030 + int ADD 0001 +1
0031 > int LE 0030 +100
0032 ------ LOOP ------------
0033 int CALLS fwrite ([0x7f4b78fec858] +1 +13 0016)
-- ^^^ Write "hello world!" (13 bytes) to STDOUT (0016)
0034 int BAND 0030 +1 -- Perform a modulo by bitwise-ANDing i to find i % 2
0035 > int NE 0034 +0 -- If number is even, then we will abort back to the interpreter.
0036 int CALLS fwrite ([0x7f4b7900b3e8] +1 +24 0016)
-- ^^^ Write "the current number isodd" (24 bytes) to STDOUT (0016)
0037 + int ADD 0030 +1
0038 > int LE 0037 +100
0039 int PHI 0030 0037 -- Note that registers 30 and 37 are registers used in a loop.
---- TRACE 1 mcode 350
In this case, LuaJIT only compiles a trace for a loop which print "the current number is odd", because the odd iteration is the first to reach it's 57th iteration. The other case, "the current number is even", is compiled as a side trace. Instead of falling back to the interpreter at the 0035 "Not equal" guard, it will fall back onto the side trace.
---- TRACE 2 IR
-- Heavily redacted for brevity
0004 > p64 BUFPUT 0003 "The current number i"~
0005 > fun EQ 0002 io.write
0008 > p64 NE 0007 NULL
0012 int CALLS fwrite ([0x7f4b78fdcee0] +1 +25 0007)
Extensions
LuaJIT adds several extensions to its base implementation, Lua 5.1, most of which do not break compatibility.[13]
- "BitOp" for binary operations on unsigned 32-bit integers (these operations are also compiled by the just-in-time compiler)[14]
- "CoCo", which allows the VM to be fully resumable across all contexts[15]
- A foreign function interface[16]
- Portable bytecode (across instruction sets, not across versions)
DynASM
Developer(s) | Mike Pall |
---|---|
Stable release | 2.0.5
/ May 1, 2017 |
Preview release | 2.1.0 beta3 GC64
|
Repository | github |
Written in | Lua, C[17] |
Platform | x86, X86-64, PowerPC, ARM, MIPS |
Type | Preprocessor, Linker |
License | MIT License[2] |
Website | luajit |
DynASM is a lightweight preprocessor for C which was created for LuaJIT 1.0.0 to make developing the just-in-time compiler easier. DynASM replaces assembly code in C files with runtime writes to a 'code buffer', such that a developer may generate and then evoke code at runtime from a C program.
DynASM was phased out in LuaJIT 2.0.0 after a complete rewrite of the assembler, but remains in use by the LuaJIT contributors as a better assembly syntax for the LuaJIT interpreter.
DynASM includes a bare-bones C header file which is used at compile time for logic the preprocessor generates. The actual preprocessor is written in Lua.
Example
|.type L, lua_State, esi // L.
|.type BASE, TValue, ebx // L->base.
|.type TOP, TValue, edi // L->top.
|.type CI, CallInfo, ecx // L->ci.
|.type LCL, LClosure, eax // L->ci->func->value.
|.type UPVAL, UpVal
|.macro copyslot, D, S, R1, R2, R3
| mov R1, S.value; mov R2, S.value.na[1]; mov R3, S.tt
| mov D.value, R1; mov D.value.na[1], R2; mov D.tt, R3
|.endmacro
|.macro copyslot, D, S; copyslot D, S, ecx, edx, eax; .endmacro
|.macro getLCL, reg
||if (!J->pt->is_vararg) {
| mov LCL:reg, BASE[-1].value
||} else {
| mov CI, L->ci
| mov TOP, CI->func
| mov LCL:reg, TOP->value
||}
|.endmacro
|.macro getLCL; getLCL eax; .endmacro
[...]
static void jit_op_getupval(jit_State *J, int dest, int uvidx)
{
| getLCL
| mov UPVAL:ecx, LCL->upvals[uvidx]
| mov TOP, UPVAL:ecx->v
| copyslot BASE[dest], TOP[0]
}
References
- "LuaJIT". LuaJIT. Retrieved 25 February 2022.
- "LuaJIT/COPYRIGHT at v2.1 · LuaJIT/LuaJIT". GitHub. 7 January 2022.
- https://luajit.org
- Pall, Mike. "Re: [ANN] llvm-lua 1.0". lua-users.org. Retrieved 25 February 2022.
- "Download".
- Deniau, Laurent. "Lua(Jit) for computing accelerator beam physics". CERN Document Server. CERN. Retrieved 25 February 2022.
- "OpenResty® - Official Site". openresty.org.
- "Kong/kong". GitHub. Kong. 25 February 2022. Retrieved 25 February 2022.
- "Helping to make Luajit faster". blog.cloudflare.com. 19 October 2017. Retrieved 25 February 2022.
- "LuaJIT Performance".
- "Laurence Tratt: The Impact of Meta-Tracing on VM Design and Implementation". tratt.net. Retrieved 2 March 2022.
- Pall, Mike. "Re: How does LuaJIT's trace compiler work? - luajit - FreeLists". www.freelists.org. Retrieved 2 March 2022.
- "Extensions". LuaJIT. Retrieved 25 February 2022.
- "BitOp Semantics". LuaJIT. Retrieved 25 February 2022.
- "Coco - True C Coroutines". LuaJIT. Retrieved 25 February 2022.
- "FFI Library". LuaJIT. Retrieved 25 February 2022.
- "DynASM Features". DynASM. Retrieved 25 February 2022.