Self-hosting (compilers)
In computer programming, self-hosting is the use of a program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self-hosting include kernels, assemblers, command-line interpreters and revision control software.
If a system is so new that no software has been written for it, you have a bootstrapping problem: the new system either does not have an operating system, and/or does not have any compilers capable of creating executables for that machine. This creates a chicken-and-egg problem: you can't write programs in a specific programming language to run on the system because there is no compiler for that language, and/or has no operating system to run programs.
To solve this issue, software is developed on another system, which itself may or may not be self-hosting, and might not even be using the same programming language. If the work is done using the same programming language on a different system or processor, it might be using one that is a cross-compiler (a compiler that specifically generates code for a different processor and/or architecture). In that case, you make it do a cross-compile, compile itself to be made for that target, then once it can compile itself on the new system, it is no longer a cross-compiler.
In either case, the operating system or compiler and assorted tools, are developed using some programming language. Whether the work is done on a cross-compiler or using a different programming language, where the objective is to create a self-hosting system, the initial compiler (or operating system, if it does not have one) for the new target is compiled, and its output is placed on a storage device that the new system can read. Development continues this way until the new system can reliably host its own development. At this point, either the operating system can at least run the compiler and any other programs needed to create executable programs that can run on that operating system, or the compiler can now at least compile itself. Once it can do so, further development is done using the new operating system and/or compiler.
Writing new software development tools without using another host system is rare. In some cases, if the target machine has a compiler for another programming language, until a program is created that is just enough to create a minimal subset of the language necessary to compile that subset, the other language is used. At the same time, a second compiler, using that minimal subset, is written to be able to compile itself.
History
The first self-hosting compiler (excluding assemblers) was written for Lisp by Hart and Levin at MIT in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.[1]
The compiler as it exists on the standard compiler tape is a machine language program that was obtained by having the S-expression definition of the compiler work on itself through the interpreter.
— AI Memo 39[1]
This technique is usually only practicable when an interpreter already exists for the very same language that is to be compiled; though possible, it is extremely uncommon to humanly compile a compiler with itself.[2] The concept borrows directly from and is an example of the broader notion of running a program on itself as input, used also in various proofs in theoretical computer science, such as the proof that the halting problem is undecidable.
Examples
Ken Thompson started development on Unix in 1968 by writing and compiling programs on the GE-635 and carrying them over to the PDP-7 for testing. After the initial Unix kernel, a command interpreter, an editor, an assembler, and a few utilities were completed, the Unix operating system was self-hosting - programs could be written and tested on the PDP-7 itself.[3]
Douglas McIlroy wrote TMG (a compiler-compiler) in TMG on a piece of paper and "decided to give his piece of paper to his piece of paper," doing the computation himself, thus compiling a TMG compiler into assembly, which he typed up and assembled on Ken Thompson's PDP-7.[2]
Development of the GNU system relies largely on GCC (the GNU C Compiler) and GNU Emacs (a popular editor), making possible the self contained, maintained and sustained development of free software for the GNU Project.
Many programming languages have self-hosted implementations: compilers that are both in and for the same language. In some of these cases, the initial implementation was developed using bootstrapping, i.e. using another high-level language, assembler, or even machine language.
List of languages having self-hosting compilers
The following programming languages have self-hosting compilers:
- Ada
- ALGOL (Burroughs B5000)
- BASIC[4]
- BCPL
- C
- C++ (Visual C++, clang, gcc 4.8)
- C# (Microsoft Roslyn, Mono)
- ClojureScript[5]
- CoffeeScript
- Crystal
- Curry
- D
- Dart
- Delphi
- Dylan
- Eiffel
- Elixir
- F#
- FASM[6]
- Factor
- Forth
- Gambas
- Go
- Haskell[7]
- Idris
- Java
- Kotlin
- Lisp (Common Lisp)
- LiveScript
- Mercury
- Nemerle
- Nim
- Oberon
- Object Pascal (Free Pascal)
- OCaml
- Pascal (Free Pascal)
- Pyret[8]
- Python (PyPy)
- Raku (Rakudo)
- Rust
- Scala
- Scheme
- Smalltalk
- Standard ML (MLton)
- Tcl[9]
- TMG
- TypeScript
- Vala
- Virgil[10]
- Visual Basic .NET (Microsoft Roslyn, Mono)
See also
References
- Tim Hart and Mike Levin. "AI Memo 39-The new compiler" (PDF). Archived from the original (PDF) on 2020-12-13. Retrieved 2008-05-23.
- Ken Thompson. "VCF East 2019 -- Brian Kernighan interviews Ken Thompson". YouTube. Retrieved 2019-10-28.
- Dennis M. Ritchie. "The Development of the C Language". 1993.
- BASICO compiler bootstrapping example
- ClojureScript Next
- "flat assembler". Retrieved 7 January 2022.
The flat assembler is self-hosting and the complete source code is included.
- "Haskell Communities and Activities Report".
- https://www.pyret.org Archived 2018-04-10 at the Wayback Machine
- "Implement TCL in TCL". Archived from the original on 2017-06-04. Retrieved 2017-09-19.
- "Google Code Archive - Long-term storage for Google Code Project Hosting". Archived from the original on 2014-12-28. Retrieved 2015-05-27.