less_retarded_wiki

main page, file list (644), source, all in md+txt+html+pdf, commit RSS feed, report abuse, stats, random article, consoomer version

CPU

WORK IN PROGRESS

Central processing unit (CPU, often just processor) is the main, central part of a computer, one that carries out computation by following instructions of the main program, colloquially likened to the computer's "brain". CPU stands at the center of computer design because other parts (such as the main memory, hard disk and input/output devices like keyboard and monitor) are present to serve the CPU, their master. CPU is normally composed of ALU (arithmetic logic unit, the circuit performing calculations), CU (control unit, the circuit that directs the CPU's operation), a relatively small amount of memory (e.g. its registers, temporary buffers and cache, the main RAM memory is NOT part of a CPU!) and possibly also other components. A specific model of CPU is characterized by its instruction set (ISA, e.g. x86 or Arm, which we mostly divide into CISC and RISC) which subsequently determines the machine code it will understand, then by its transistor count (nowadays billions), operation frequency or clock rate (defining how many instructions per second it executes, nowadays typically billions; the frequency can also be increased with overclocking), number of cores (determining how many programs it can run in parallel) and also other parameters and "features" such as amount of cache memory, possible operation modes etcetc. Very commonly we also associate a CPU with a number of bits (called word size or something similar) that's often connected to the data bus width and the CPU's native integer size, i.e. for example a 16 bit CPU will likely consist of 16 bit integer registers, it will see the memory as a sequence of 16 bit words, its memory addresses may be limited to 16 bits etc. (note that the CPU can still handle even wider words by emulating them with the native words, but this will suffer performance penalties) -- nowadays most mainstream CPUs are 64 bit (to allow ungodly amounts of RAM), but 32 or even 16 and 8 bits is usually enough for good programs. CPU in form of a single small integrated circuit is called microprocessor. CPU is not to be confused with MCU, a small single board computer which is composed of a CPU and other parts.

CPU is meant to perform general purpose computation, i.e. it can execute anything reasonably fast but won't reach near optimum speed at certain specialized tasks (e.g. processing HD video or rendering 3D graphics), which is why other specialized processing units such as GPUs (graphics processing unit) and sound cards exist. Because CPU is a general algorithm executing unit, it is made for running linear programs, i.e. a series of instructions that go one after another; even though today CPUs more often than not sport multiple cores and with it the capability of running several linear programs in parallel, their level of parallelism is still low, not nearly in the same league as a GPU for example. CPUs are nonetheless good enough for most tasks and nowadays reach astronomical speeds anyway, so a suckless/LRS program will likely choose to only rely on CPU, knowing it's safe to assume presence of this most essential part of a computer, and by that our program becomes more portable and future proof.

Designs of CPUs differ, some may aim to maximize performance while others prefer lower power consumption or low transistor count -- remember, a more complex CPU is more expensive because it requires more transistors! Of course it will also be harder to design, debug etc., so it may be better to keep it simple when designing a CPU. For this reason many CPUs, e.g. those in embedded microcontrollers, intentionally lack cache, microcode, multiple cores or even a complex instruction pipeline. Space technology for instance highly prefers reliability before performance.

WATCH OUT: modern mainstream CPUs (practically the ones in desktops and spyphones) are shit, they are hugely consumerist, bloated (they literally include shit like GPUs and whole operating systems, e.g. Intel's ME runs Minix) and have built-in antifeatures such as backdoors (post 2010 basically all Intel and AMD CPUs, see Intel Management Engine and AMD PSP) that can't be disabled and that allow remote infiltration of your computer by the CPU manufacturer (on hardware level, no matter what operating system you run). You are much better off using a simple CPU if you can (older, embedded etc.).

Details

TODO: diagrams, modes, transistor count history ...

Let's take a look at how our average CPU operates. Indeed the techno world is diverse and so we mustn't assume that anything is set in stone, CPUs vary in many ways. We may also dumb down some concepts a bit, real world CPUs are remarkably overengineered and complicated as hell.

Firstly then the most pressing question: what is it that a CPU really does? In essence it just reads instructions from the memory (depending on specific computer architecture this may be RAM or ROM) and does whatever they dictate -- these instructions are super simple, often commands like "add two numbers", "write a number to memory" and so on. The instructions themselves are nothing more than binary data in memory and their format depends on each CPU, or more precisely its instruction set (basically a very low level language it understands) -- each CPU, or rather a CPU family, may generally have a different instruction set, so a program in one instruction set can't be executed by a CPU that doesn't understand this instruction set. The whole binary program for the CPU is called machine code and machine code corresponds to assembly language (basically a textual representation of the machine code, for better readability by humans) of the CPU (or better said its instruction set). So a CPU can be seen as a hardware interpreter of specific machine code, machine code depends on the instruction set and programmer can create machine code by writing a program in assembly language (which is different for each instruction set) and then using an assembler to translate the program to machine code. Nowadays mostly two instruction sets are used: x86 and Arm, but there are also other ones, AND it's still not so simple because each instruction set gets some kind of updates and/or has some extensions that may or may not be supported by a specific CPU, so it's a bit messy. For example IA-32 and x86_64 are two different versions of the x86 ISA, one 32 bit and one 64 bit.

The CPU has an internal state (we can picture it as a state machine), i.e. it has a few internal variables, called registers; these are NOT variables in RAM but rather in the CPU itself, there is only a few of them (let's say 32 for example) but they are stunningly fast, much faster than any other memory. What exactly these registers are, what they are called, how many bits they can hold and what their purpose is depends again on the instruction set architecture. However there are usually a few special registers, notably the program counter which holds the address of the currently executed instruction. After executing an instruction program counter is incremented so that in the nest step the next instruction will be executed, AND we can also modify program counter (sometimes directly, sometimes by specialized instructions) to jump between instruction to implement branching, loops, function calls etc.

So at the beginning (when powered on) the CPU is set to some initial state, most notably it sets its program counter to some initial value (depending on each CPU, it may be e.g. 0) so that it points to the first instruction of the program. Then it performs so called fetch, decode, execute cycle, i.e. it reads the instruction, decodes what it means and does what it says. In simpler CPUs this functionality is hard wired, however more complex CPUs (especially CISC) are programmed in so called microcode, a code yet at the lower level than machine code, machine code execution is programmed in microcode -- microcode is something like "firmware for the CPU" (or a "CPU shader"?), it basically allows later updates and reprogramming of how the CPU internally works. However this is pretty overcomplicated and you shouldn't make crappy CPUs like this.

A CPU works in clock cycles, i.e. it is a sequential circuit which has so called clock input; on this input voltage periodically switches between high and low (1 and 0) and each change makes the CPU perform another operation cycle. How fast the clock changes is determined by the clock frequency (nowadays usually around 3 GHz) -- the faster the frequency, the faster the CPU will compute, but the more it will also heat up (so we can't just set it up arbitrarily high, but we can overclock it a bit if we are cooling it down). WATCH OUT: one clock cycle doesn't necessarily equal one executed instruction, i.e. frequency of 1 Hz doesn't have to mean the CPU will execute 1 instruction per second because executing an instruction may take several cycles (how many depends on each instruction and also other factors). The number saying how many cycles an instruction takes is called CPI (cycles per instruction) -- CPUs try to aim for CPI 1, i.e. they try to execute 1 instruction per cycle, but they can't always do it.

One way to approach CPI 1 is by optimizing the fetch, decode, execute cycle in hardware so that it's as BLAZINGLY fast as possible. This is typically done by utilizing an instruction pipeline -- a pipeline has several stages working in parallel so that as soon as one instruction is entering e.g. the decode stage, another one is already coming to the fetch stage (and the previous instruction is in execute stage), i.e. we don't have to wait for an instruction to be fully processed before starting to process the next one. This is practically the same principle as that of manufacturing lines in factories; if you have a long car manufacturing pipeline, you can make a factory produce let's say one car each hour, though it is impossible to make a single car from scratch in one hour (or imagine e.g. a university producing new PhDs each year despite no one being able to actually earn PhD in a year). This is also why branching (jumps between instructions) are considered bad for program performance -- a jump to different instruction makes the CPU have to throw away its currently preprocessed instruction because that will not be executed (though CPUs again try to deal with this with so called branch prediction, but it can't work 100%). Some CPUs even have multiple pipelines, allowing for execution of multiple instructions at the same time -- however this can only be done sometimes (the latter instruction must be independent of the former, also the other pipelines may be simpler and able to only handle simple instructions).

In order for a CPU to be useful it has to be able to perform some input/output, i.e. it has to be able to retrieve data from the outside and present what it has computed. Notable ways of performing I/O are:

Through memory: here some parts of memory serve to pass data to the CPU and to retrieve computed results back. For example a keyboard may be mapped to memory so that when certain keys are pressed, the memory bits are set to 1 -- this way a CPU can simply read from memory and know if a key is pressed. Similarly a display may be mapped to memory so that when a CPU writes a value to this address, a pixel appears on the display. Note that his doesn't always have to PHYSICALLY pass through memory, there may be a special circuit that translate e.g. memory access in some address range to signals to hardware etc., but the CPU is using the same instructions it would use for interacting with memory.
Through GPIO pins: CPUs typically have pins that are reserved for general purpose input/output, i.e. we can electronically communicate through them with whatever device we physically connect to those pins. A CPU can set and read voltage to/from those pins e.g. with some special instructions. This may be convenient if we just want to e.g. light up some LED without having to somehow hook it to the main memory.
Interrupts: a CPU can be informed about an external event with an interrupt (see further on).

CPUs often also have a cache memory that speeds up communication with the main memory (RAM, ROM, ...), though simpler CPUs may live even without cache of course. Mainstream CPUs even have several levels of cache, called L1, L2 etc. Caches are basically transparent for the programmer, they don't have to deal with them, it's just something that makes memory access faster, however a programmer knowing how a cache works can write code so as to be friendlier to the cache and utilize it better.

Mainstream consoomer CPUs nowadays have multiple cores so that each core can essentially run a separate computation in parallel. The separate cores can be seen kind of like duplicate copies of the single core CPU with some connections between them (details again depend on each model), for example cores may share the cache memory, they will be able to communicate with each other etc. Of course this doesn't just magically make the whole CPU faster, it can now only run multiple computations at once, but someone has to make programs so as to make use of this -- typical use cases are e.g. multitasking operating systems which can run different programs (or rather processes) on each core (note that multitasking can be done even with a single core by rapidly switching between the processes, but that's slower), or multithreading programming languages which may run each thread on a separate core.

Interrupts are an important concept for the CPU and for low level programming, they play a role e.g. in saving power -- high level programmers often don't know what interrupts are, to those interrupts can be likened to "event callbacks". An interrupt occurs on some sort of event, for example upon a key press, when timer ticks, when error occurred etc. (An interrupt can also be raised by the CPU itself, this is how operating system syscalls are often implemented). What kinds of interrupts there are depends on each CPU architecture (consult your datasheet) and one can usually configure which interrupts to enable and which "callbacks" to use for them -- this is often done through so called vector table, a special area in memory that records addresses ("vectors") of routines (functions/subprograms) to be called on specified interrupts. When interrupt happens, the current program execution is paused and the CPU automatically jumps to the subroutine for handling the interrupt -- after returning from the subroutine the main program execution continues. Interrupts are contrasted with polling, i.e. manually checking some state and handling things as part of the main program, e.g. executing an infinite loop in which we repeatedly check keyboard state until some key is pressed. However polling is inefficient, it wastes power by constantly performing computation just by waiting -- interrupts on the other hand are a hard wired functionality that just performs a task when it happens without any overhead of polling. Furthermore interrupts can make programming easier (you save many condition checks and memory reads) and mainly interrupts allow CPU to go into sleep mode and so save a lot of power. When a CPU doesn't have any computation to do, it can stop itself and go into waiting state, not executing any instructions -- however interrupts still work and when something happens, the CPU jumps back in to work. This is typically what the sleep/wait function in your programming language does -- it puts the CPU to sleep and sets a timer interrupt to wake up after given amount of time. As a programmer you should know that you should call this sleep/wait function in your main program loop to relieve the CPU -- if you don't, you will notice the CPU utilization (amount of time it is performing computations) will go to 100%, it will heat up, your computer starts spinning the fans and be noisy because you don't let it rest.

Frequently there are several modes of operation in a CPU which is typically meant for operating systems -- there will usually be some kind of privileged mode in which the CPU can do whatever it wants (this is the mode for the OS kernel) and a restricted mode in which there are "restrictions", e.g. on which areas of memory can be accessed or which instructions can be used (this will be used for user program). Thanks to this a user program won't be able to crash the operating system, it will at worst crash itself. Most notably x86 CPUs have the real mode (addresses correspond to real, physical addresses) and protected mode (memory is virtualized, protected, addresses don't generally correspond to physical addresses).

A CPU may also have integrated some coprocessors, though sometimes coprocessors are really a separate chip. Coprocessors that may be inside the CPU include e.g. the FPU (floating point unit) or encryption coprocessor. Again, this will make the CPU a lot more expensive.

TODOOOOOOO: ALU, virtual memory, IP cores, architectures (register, ...), ...

Notable CPUs

UNDER CONSTRUCTION

Here are listed some notable CPUs (or sometimes CPU families or cores).

{ I'm not so great with HW, suggest me improvements for this section please, thanks <3 ~drummyfish }

{ WTF, allthetropes has quite a big list of famous CPUs, isn't it a site about movies? https://allthetropes.org/wiki/Central_Processing_Unit. ~drummyfish }

TODO: add more, mark CPUs with ME, add features like MMX, FPU, ...

CPU	year	bits (/a)	ISA	~tr. c.	tr. size	freq.	pins	cores	other	notes
Intel 4004	1971	4 / 12	own	2.3 K	10 um	75O K	16	1		1st commercial microproc.
Intel 8008	1972	8 / 14	own	3.5 K	10 um	800 K	18	1
Intel 8080	1974	8 / 16	own	6 K	6 um	3 M	40	1
AMD Am9080	1975	8 / 16	own	6 K	6 um	4 M	40	1		reverse-eng. clone of i8080
MOS Technology 6502	1975	8 / 16	own	3.5 K	8 um	3 M	40	1		popular, cheap, Atari 2600, C64, ...
Zilog Z80	1976	8 / 16	own	8.5 K	4 um	10 M	40	1		popular
Intel 8086	1978	16 / 20	x86 (x86-16)	29 K	3 um	10 M	40	1		started x86 ISA
Motorola 68000	1979	32 / 24	own (CISC)	68 K			64	1		popular, e.g. Amiga, Mega Drive, ...
Intel (80)286	1982	16 / 24	x86 (x86-16)	130 K	1.5 um	25 M	68	1
Intel (80)386	1985	32	x86 (IA-32)	275 K	1 um	40 M	132	1
Intel (80)486	1989	32	x86 (IA-32)	1.6 M	600 nm	100 M	196	1	16 K cache, FPU	1st intel with cache and FPU
AMD Am386	1991	32	x86 (IA-32)	275 K	800 nm	40 M	132	1		clone of i386, lawsuit
Intel Pentium P5	1993	32	x86 (IA-32)	3 M	800 nm	60 M	273	1	16 K cache	starts Pentium line with many to follow
AMD K5	1996	32	x86 (IA-32)	4.3 M	500 nm	133 M	296	1	24 K cache	1st in-house AMD CPU, compet. of Pentium
Intel Pentium II	1997	32	x86 (IA-32)	7 M	180 nm	450 M	240	1	512 K L2 cache, MMX
ARM7TDMI	1994	32	ARM			100 M		1		ARM core, e.g. GBA, PS2, Nokia 6110 ...
AMD Athlon 1000 Thunderbird	2000	32	x86 (IA-32)	37 M	180 nm	1 G	453	1	~300 K cache	1st 1GHz+ CPU
RAD750	2001	32	PowerPC	10 M	150 nm	200 M	360	1	64 K cache	radiation hard., space (Curiosity, ...)
AMD Opteron	2003	64	x86 (x86-64)	105 M	130 nm	1.6 G	940	1	~1 M cache	1st 64 bit x86 CPU
Intel Pentium D 820	2005	64	x86 (x86-64)	230 M	90 nm	2.8 G	775	2	~2 M cache	1st desktop multi core CPU
Intel Core i5-2500K	2011	64	x86 (x86-64)	1 B	32 nm	3.3 G		4	~6 M cache, ME
PicoRV32	2015?	32	RISC-V (RV32IMC)			~700 M				simple, free hardware RISV-V core
Apple A9	2015	64	ARM (ARMv8)	2 B	14 nm	1.8 G		2	~7 M cache	iPhones
AMD Ryzen Threadrip. PRO 5995WX	2022	64	x86 (x86-64)	33 B	7 nm	4.5 G	4094	64	~300 M cache, PSP	high end bloat
Talos ES	2023	8	own (RISC)							simple but usable DIY free hardware CPU

less_retarded_wiki

CPU

Details

Notable CPUs

See Also