less_retarded_wiki
main page, file list (648), source, all in md+txt+html+pdf, commit RSS feed, report abuse, stats, random article, consoomer version
Bootstrap/Boot
In general bootstrapping (from the idiom "pull yourself up
by your bootstraps"), sometimes shortened to just booting,
refers to a clever process of automatically self-establishing a
relatively complex system starting from
something very small, without much external help. Nature itself provides
a beautiful example: a large plant capable of complex behavior (such as
reproduction) initially grows ("bootstraps") from just a very tiny seed.
As another example imagine something like a "civilization bootstrapping
kit" that contains only a few primitive tools along with instructions on
how to use those tools to mine ore, turn it into metal out of which one
makes more tools which will be used to obtain more material and so on up
until having basically all modern technology and factories set up in
relatively short time (civboot is a project like this). The term
bootstrapping is however especially relevant in relation to computer technology -- here it possesses two main
meanings:
- The process by which a computer starts and sets up the operating system after power on, which
often involves several stages of loading various modules, running
several bootloaders etc. This is traditionally called
booting (rebooting means restarting the
computer).
- Utilizing the principle of bootstrapping for making greatly
independent software, i.e. software that
doesn't depend on other software as it can set
itself up. This is usually what bootstrapping (the
longer term) means. This is also greatly related to self hosting, another principle
whose idea is to "implement technology using itself".
Bootstrapping:
Making Dependency-Free Software
Bootstrapping -- as the general concept of letting a big thing grow
out of a small seed -- may aid us in building extremely free (as in freedom), portable, self-contained (and yes, for those
who care also more secure) technology by
reducing all its dependencies to a bare
minimum. If we are building a big computing environment (such as an
operating system), we should make sure that all the big things it
contains are made only with the smaller things that are further on built
using yet smaller things and so on until some very tiny piece of code,
i.e. we shall make sure there is always a way to set this whole system
from the ground up, from a very small amount of initial code/tools.
Being able to do this means our system is bootstrappable and it
will allow us for example to set our whole system up on a completely new
computing platform (e.g. a new CPU architecture) as long as we can set
up that tiny initial prerequisite code. This furthermore removes the
danger of dependencies that might kill our system and also allows
security freaks to inspect the whole process of the system set up so
that they can trust it (because even free software that sometime in the
past touched a proprietary compiler can't generally be trusted -- see trusting trust). I.e. bootstrapping means
creating a very small amount of code that will self-establish our whole
computing environment by first compiling small compilers that will then
compile more complex compilers which will compile all the tools and
programs etc. This topic is discussed for example in designing programming language compilers and operating
systems. For examples of bootstrapping see e.g. DuskOS (collapse-ready
operating system that bootstraps itself from a tiny amount of code), T3X, onramp, GNU mes (bootstrapping system of
the GNU operating system) or comun (LRS
programming language, now self hosted and bootstrappable e.g. from a few
hundred lines of C).
Why concern ourselves with bootstrapping when we already have
our systems set up? Besides the obvious elegance of this whole
approach there are many other practical reasons -- as mentioned, some
are concerned about "security", some want portability, control and
independence -- one of other notable justifications is that we may lose
our current technology due to societal collapse, which is not improbable as it keeps
happening throughout history over and over, so many people fear
(rightfully so) that if by whatever disaster we lose our current
computers, Internet etc., we will also lose with it all modern art,
data, software we so painfully developed, digitized books and so on; not
talking about the horrors that will follow if we're unable to quickly
reestablish our computer networks we are so dependent on. Setting up
what we currently have completely from scratch would be extremely
difficult, a task for centuries -- just take a while to consider all the
activity and knowledge that's required around the globe to create a
single computer with all its billions of lines of code worth of software
that makes it work. Knowledge of old technology gets lost -- to make
modern computers we first needed older, primitive computers, but now
that we only have modern computers no one remembers anymore how to make
the older computers -- modern computers are sustaining themselves but
once they're gone, we won't know how to make them again, i.e. if we lose
computers, we will also lose tools for making computers. This applies on
many levels (hardware, operating systems, programming languages and so
on).
Bootstrapping has to start with some initial prerequisite machine
dependent binary code that kickstarts the self-establishing process,
i.e. it's not possible to get rid of absolutely ALL binary code and have
a pure bootstrappable code that would run on every computer -- that
would require making a program that can native run on any computer,
which can't be done -- but it is possible to get it to absolute minimum
-- let's say a few dozen bytes of machine code that can even be
hand-made on paper and can be easily inspected for "safety". This
initial binary code is called bootstrapping binary seed. This
code can be as simple as a mere translator of some extremely simple
bytecode (that may consist only of handful of instructions) to the
platform's assembly language. There even exists the extreme case of a
single instruction computer, but in practice it's not necessary to go as
far. The initial binary seed may then typically be used to translate a
precompiled bytecode of our system's compiler to native runnable code
and voila, we can now happily start compiling whatever we want.
Forth is a language that has traditionally
been used for making bootstrapping environments -- its paradigm and
philosophy is ideal for bootstrapping as it's based on the concept of
building a computing environment practically from nothing just by
defining new and new words using previously defined simpler words,
fitting the definition of bootstrapping perfectly. Dusk OS is a project demonstrating this. Similarly
simple language such as Lisp and comun can work too (GNU Mes uses a combination of Scheme and C).
How to do this then? To make a computing environment
that can bootstrap itself this approach is often used:
- Make a simple programming language L. You
can choose e.g. the mentioned Forth but you can
even make your own, just remember to keep it extremely simple --
simplicity of the base language is the key feature here. If you also
need a more complex language, write it in L. The language L will serve
as tool for writing software for your platform, i.e. it will provide
some comfort in programming (so that you don't have to write in
assembly) but mainly it will be an abstraction layer for the programs,
it will allow them to run on any hardware/platform. The language
therefore has to be portable; it should probably
abstracts things like endianness, native
integer size, control structures etc., so as to work nicely on all CPUs, but it also mustn't have too much abstraction
(such as OOP) otherwise it will quickly get
complicated. The language can compile e.g. to some kind of very simple
bytecode that will be easy to translate to any
assembly. Make the bytecode very simple (and
document it well) as its complexity will later on determine the
complexity of the bootstrap binary seed. At first you'll have to
temporarily implement L in some already existing language, e.g. C. NOTE: in theory you could just make bytecode, without
making L, and just write your software in that bytecode, but the
bytecode has to focus on being simple to translate, i.e. it will
probably have few opcodes for example, which will be in conflict with
making it at least somewhat comfortable to program on your platform.
However one can try to make some compromise and it will save the
complexity of translating language to bytecode, so it can be considered
(uxn seems to be doing this).
- Write L in itself, i.e. self
host it. This means you'll use L to write a compiler of L that outputs L's bytecode. Once you
do this, you have a completely independent language and can start using
it instead of the original compiler of L written in another language.
Now compile L with itself -- you'll get the bytecode of L compiler. At
this point you can bootstrap L on any platform as long as you can
execute the L bytecode on it -- this is why it was crucial to make L and
its bytecode very simple. In theory it's enough to just interpret the
bytecode but it's better to translate it to the platform's native
machine code so that you get maximum efficiency (the nature of bytecode
should make it so that it isn't really more diffiult to translate it
than to interpret it). If for example you want to bootstrap on an x86 CPU, you'll have to write a program (L compiler backend) that translates the bytecode to x86
assembly; if we suppose that at the time of bootstrapping you will only
have this x86 computer, you will have to write the translator in x86
assembly manually. If your bytecode really is simple and well made, it
shouldn't be hard though (you will mostly be replacing your bytecode
opcodes with given platform's machine code opcodes). Once you have the
x86 backend, you can completely bootstrap L's compiler on any x86
computer.
- Further help make L bootstrapable. This means
making it even easier to execute the L bytecode on any given platform --
you may for example write backends (the bytecode translators) for common
platforms like x86, ARM, RISC-V, C, Lisp and so on. You can also provide
tests that will help check newly written backends for correctness. At
this point you have L bootstrappable without any work on the platforms for which you provide backends
and on others it will just take a tiny bit of work to write its own
translator.
- Write everything else in L. This means writing the
platform itself and software such as various tools and libraries. You
can potentially even use L to write a higher level language (e.g. C) for
yet more comfort in programming. Since everything here is written in L
and L can be bootstrapped, everything here can be bootstrapped as
well.
However, a possibly even better way may be the Forth-style incremental programming
way, which works like this (see also Macrofucker and portability for explanation of some of the
concepts):
- Start with a trivially simple language. It must be
one that's easy to implement from scratch on any computer without any
extra tools -- something maybe just a little bit more sophisticated than
Brainfuck. This language may even be a
machine specific assembly, let's say x86, that's using just a small subset of the simplest
instructions, as long as it's easy to replace these instructions with
other instructions on another hardware architecture. There should
basically only be as many commands to ensure Turing Completeness and good performance
(i.e. while an increment instruction may be enough for Turing
completeness, we should probably also include instruction performing
general addition, because adding two numbers in a loop using just the
increment instruction would be painfully slow). The goal here is of
course to build the foundations for the rest of our platform -- one
that's simple enough to be easily replaced.
- Build a more complex language on top of it. I.e.
now use this simple language ALONE to build a more complex, practically
usable language. Again, take inspiration in Forth -- you may for example
introduce something like procedures, macros or
words to your simple language, which will allow you to keep adding new
useful things such as arrays or more complex control structures. To add
the system of macros for example just write a preprocessor in the base language that will
take the new, macro-enabled language source code and convert it to the
plain base language; with macros on your disposal now you can start
expanding the language more and more just by writing new macros. I.e.
expanding the base language should be done in small steps, incrementally
-- that is don't build C out of Brainfuck right away; instead first
build just a tiny bit more complex language on top of the initial
language, then a bit more complex one on top of that etc. -- in Forth
this happens by defining new words and expanding the language's
dictionary.
- Now build everything else with the complex
language. This is already straightforward (though time
consuming). First you may even build more language extensions and
development tools like a debugger of text
editor for example. The beauty of this approach is really that to
allow yourself to program on the system you are building the system
itself on-the-go, i.e. you are creating a development environment and
also a user environment for yourself, AND everything you make is
bootstrappable from the original simple language. This is a very
elegant, natural way -- you are setting up a complex system, building a
road which is subsequently easy to walk again from the start, i.e.
bootstrap. This is probably how it should ideally be done.
Booting: Computer Starting Up
Booting as in "staring computer up" is also a kind of setting up a
system from the ground up -- we take it for granted but remember it
takes some work to get a computer from being
powered off and having all RAM empty to having an operating system
loaded, hardware checked and initialized, devices mounted etc.
Starting up a simple computer -- such as some MCU-based embedded open console that runs bare metal programs -- isn't as complicated as
booting up a mainstream PC with an operating system.
First let's take a look at the simple computer. It may work e.g. like
this: upon start the CPU initializes its registers
and simply starts executing instructions from some given memory address,
let's suppose 0 (you will find this in your CPU's data sheet). Here the
memory is often e.g. flash ROM to which we can externally upload a program from
another computer before we turn the CPU on -- in game consoles this can
often be done through USB. So we basically upload
the program (e.g. a game) we want to run, turn the console on and it
starts running it. However further steps are often added, for example
there may really be some small, permanently flashed initial boot program
at the initial execution address that will handle some things like
initializing hardware (screen, speaker, ...), setting up interrupts and so on (which otherwise would have
to always be done by the main program itself) and it can also offer some
functionality, for example a simple menu through which the user can
select to actually load a program from SD card to flash memory (thanks
to which we won't need external computer to reload programs). In this
case we won't be uploading our main program to the initial execution
address but rather somewhere else -- the initial bootloader will jump to
this address once it's done its work.
Now for the PC (the "IBM compatibles"): here things are more
complicated due to the complexity of the whole platform, i.e. because we
have to load an operating system
first, of which there can be several, each of which may be loadable from
different storages (harddisk, USB stick, network, ...), also we have more complex CPU that has to be set in certain operation mode, we
have complex peripherals that need complex initializations etcetc.
Generally there's a huge bloated boot sequence
and PCs infamously take longer and longer to start up despite
skyrocketing hardware improvements -- that says something about state of
technology. Anyway, it usually it works like this:
{ I'm not terribly experienced with this, verify everything.
~drummyfish }
- Computer is turned on, the CPU starts executing at some initial
address (same as with the simple computer).
- From here CPU jumps to an address at which stage one bootloader is located (bootloader is
just a program that does the booting and as this is the first one in a
line of potentially multiple bootloaders, it's called stage
one). This address is in the motherboard ROM and in
there typically BIOS (or
something similar that may be called e.g. UEFI,
depending on what standard it adheres to) is uploaded, i.e. BIOS is
stage one bootloader. BIOS is the first software (we may also call it firmware) that gets run, it's uploaded on the
motherboard by the manufacturer and isn't supposed to be rewritten by
the user, though some based people still rewrite it (ignoring the "read
only" label :D), often to replace it with something more free (e.g. libreboot). BIOS is the most basic software that
serves to make us able to use the computer at the most basic level
without having to flash programs externally, i.e. to let us use keyboard
and monitor, let us install an operating system from a CD drive etc. (It
also offers a basic environment for programs that want to run before the
operating system, but that's not important now.) BIOS is generally
different on each computer model, it normally allows us to set up what
(which device) the computer will try to load next -- for example we may
choose to boot from harddisk or USB flash drive or from a CD. There is
often some countdown during which if we don't intervene, the BIOS
automatically tries to load what's in its current settings. Let's
suppose it is set to boot from harddisk.
- BIOS performs the power on self test (POST) -- basically it
makes sure everything is OK, that hardware works etc. If it's so, it
continues on (otherwise halts).
- BIOS loads the master boot
record (MBR, the first sector of the device) from harddisk
(or from another mass storage device, depending on its settings) into RAM and executes it, i.e. it passes control to it.
This will typically lead to loading the second stage
bootloader.
- The code loaded from MBR is limited by size as it has to fit in one
HDD sector (which used to be only 512 bytes for
a long time), so this code is here usually just to load the bigger code
of the second stage bootloader from somewhere else and then again pass
control to it.
- Now the second stage bootloader starts -- this is a bootloader whose
job it is normally to finally load the actual operating system. Unlike
BIOS this bootloader may quite easily be reinstalled by the user --
oftentime installing an operating system will also cause installing some
kind of second stage bootloader -- example may be GRUB which is typically installed with GNU/Linux systems. This kind of
bootloader may offer the user a choice of multiple operating systems,
and possibly have other settings. In any case here the OS kernel code is loaded and run.
- Voila, the kernel now starts running and here it's free to do its
own initializations and manage everything, i.e. Linux will start the PID 1 process, it will mount filesystems, run initial
scripts etcetc.
Powered by nothing. All content available under CC0 1.0 (public domain). Send comments and corrections to drummyfish at disroot dot org.