See also 3D modeling.
In computer graphics 3D rendering is the process of computing images which represent a projected view of 3D objects through a virtual camera.
There are many methods and algorithms for doing so differing in many aspects such as computation complexity, implementation complexity, realism of the result, representation of the 3D data, limitations of viewing and so on. If you are just interested in the realtime 3D rendering used in gaymes nowadays, you are probably interested in GPUaccelerated 3D rasterization with APIs such as OpenGL and Vulkan.
LRS has a simple 3D rendering library called small3dlib.
As most existing 3D "frameworks" are harmful, a LRS programmer is likely to write his own 3D rendering system that suits his program best, therefore we should list some common methods of achieving 3D. Besides that, it's just pretty interesting to see what there is in the store.
A very important realization of a graphics programmer is that 3D rendering is to a great extent about faking (especially the mainstream realtime 3D)  it is an endeavor that seeks to produce something that looks somehow familiar to HUMAN sight specifically and so even though the methods are mathematical, the endeavor is really an art in the end, not dissimilar to that of a magician who searches for "smoke and mirrors" hacks to produce illusions for the audience. Reality is infinitely complex, we use nothing else but approximations and neglecting that rely on assumptions about human sight such as "60 FPS looks like smooth movement to human eye", "infrared spectrum is invisible", "humans can't tell a mirror reflection is a bit off", "inner corners are usually darker than flat surfaces", "no shadow is completely black because light scatters in the atmosphere" etc. Really 3D graphics is nothing but searching for what looks good enough, and deciding this relies on a SUBJECTIVE judgement of a human (and sometimes every individual). In theory  if we had infinitely powerful computers  we would just program in a few lines of electromagnetic equations and run the precise simulation of light propagating in 3D environment to produce an absolutely realistic result, but though some methods try to come close to said approach, we simply won't ever have infinitely powerful computers. For this we have to resort to a bit more ugly approach of identifying specific notable reallife phenomena individually (for example caustics, Fresnel, mirror reflections, refractions, subsurface scattering, metallicity, noise, motion blur and myriads of others) and addressing each one individually with special treatment, many times correcting and masking our imperfections (e.g. applying antialiasing because we dared to use a simplified model of light sampling, applying texture filtering because we dared to only use finite amount of memory for our data, applying postprocessing etc.).
Rendering spectrum: The book RealTime Rendering mentions that methods for 3D rendering can be seen as lying on a spectrum, one extreme of which is appearance reproduction and the other physics simulation. Methods closer to trying to imitate the appearance try to simply focus on imitating the look of an object on the monitor that the actual 3D object would have in real life, without being concerned with how that look arises in real life (i.e. closer to the "faking" approach mentioned above)  these may e.g. use image data such as photographs; these methods may rely on lightfields, photo textures etc. The physics simulation methods try to replicate the behavior of light in real life  their main goal is to solve the rendering equation, still only more or less approximately  and so, through internally imitating the same processes, come to similar visual results that arise in real world: these methods rely on creating 3D geometry (e.g. that made of triangles or voxels), computing light reflections and global illumination. This is often easier to program but more computationally demanding. Most methods lie somewhere in between these two extremes: for example billboards and particle systems may use a texture to represent an object while at the same time using 3D quads (very simple 3D models) to correctly deform the textures by perspective and solve their visibility. The classic polygonal 3D models are also usually somewhere in between: the 3D geometry and shading are trying to simulate the physics, but e.g. a photo texture mapped on such 3D model is the opposite appearancebased approach (PBR further tries to shift the use of textures more towards the physics simulation end).
With this said, let's now take a look at possible classifications of 3D rendering methods. As seen, there are many ways:
Finally a table of some common 3D rendering methods follows, including the most simple, most advanced and some unconventional ones. Note that here we talk about methods and techniques rather than algorithms, i.e. general approaches that are often modified and combined into a specific rendering algorithm. For example the traditional triangle rasterization is sometimes combined with raytracing to add e.g. realistic reflections. The methods may also be further enriched with features such as texturing, antialiasing and so on. The table below should help you choose the base 3D rendering method for your specific program.
The methods may be tagged with the following:
method  notes 

3D raycasting  IO off, shoots rays from camera 
2D raycasting  IO 2.5D, e.g. Wolf3D 
AI image synthesis  "just let AI magic do it" 
beamtracing  IO off 
billboarding  OO 
BSP rendering  2.5D, e.g. Doom 
conetracing  IO off 
"dungeon crawler"  OO 2.5D, e.g. Eye of the Beholder 
edge list, scanline, span rasterization  IO, e.g. Quake 1 
ellipsoid rasterization  OO, e.g. Ecstatica 
flatshaded 1 point perspective  OO 2.5D, e.g. Skyroads 
reverse raytracing (photon tracing)  OO off, inefficient 
image based rendering  generally using images as 3D data 
light fields  imagebased, similar to holography 
mode 7  IO 2.5D, e.g. FZero 
parallax scrolling  2.5D, very primitive 
pathtracing  IO off, Monte Carlo, high realism 
portal rendering  2.5D, e.g. Duke3D 
prerendered view angles  2.5D, e.g. Iridion II (GBA) 
raymarching  IO off, e.g. with SDFs 
raytracing  IO off, recursive 3D raycasting 
segmented road  OO 2.5D, e.g. Outrun 
shear warp rednering  IO, volumetric 
splatting  OO, rendering with 2D blobs 
texture slicing  OO, volumetric, layering textures 
triangle rasterization  OO, traditional in GPUs 
voxel space rendering  OO 2.5D, e.g. Comanche 
wireframe rendering  OO, just lines 
TODO: Rescue On Fractalus!
TODO: find out how build engine/slab6 voxel rendering worked and possibly add it here (from http://advsys.net/ken/voxlap.htm seems to be based on raycasting)
TODO: VoxelQuest has some innovative voxel rendering, check it out (https://www.voxelquest.com/news/howdoesvoxelquestworknowaugust2015update)
If you're a complete noob and are asking what the essence of 3D is or just how to render simple 3Dish pictures for your game without needing a PhD, here's the very basics. Yes, you can use some 3D engine such as Godot that has all the 3D rendering preprogrammed, but you'll surrender to bloat, you won't really know what's going on and your ability to tinker with the rendering or optimizing it will be basically zero... AND you'll miss on all the fun :) So here we just foreshadow some concepts you should start with if you want to program your own 3D rendering.
The absolute basic thing in 3D is probably perspective, or the concept which says that "things further away look smaller". This is basically the number one thing you need to know and with which you can make simple 3D pictures, even though there are many more effects and concepts that "make pictures look 3D" and which you can potentially study later (lighting, shadows, focus and blur, stereoscopy, parallax, visibility/obstruction etc.). { It's probably possible to make something akin "3D" even without perspective, just with orthographic projection, but that's just getting to details now. Let's just suppose we need perspective. ~drummyfish }
If you don't have rotating camera and other fancy things, perspective is actually mathematically very simple, you basically just divide the object's size by its distance from the viewer, i.e. its Z coordinate (you may divide by some multiple of Z coordinate, e.g. by 2 * Z to get different field of view)  the further away it is, the bigger number its size gets divided by so the smaller it becomes. This "dividing by distance" ultimately applies to all distances, so in the end even the details on the object get scaled according to their individual distance, but as a first approximation you may just consider scaling objects as a whole. Just keep in mind you should only draw objects whose Z coordinate is above some threshold (usually called a near plane) so that you don't divide by 0! With this "dividing by distance" trick you can make an extremely simple "3Dish" renderer that just draws sprites on the screen and scales them according to the perspective rules (e.g. some space simulator where the sprites are balls representing planets). There is one more thing you'll need to handle: visibility, i.e. nearer objects have to cover the further away objects  you can do this by simply sorting the objects by distance and drawing them backtofront (painter's algorithm).
Here is some "simple" C code that demonstrates perspective and draws a basic animated wireframe cuboid as ASCII in terminal:
#include <stdio.h>
#define SCREEN_W 50 // ASCII screen width
#define SCREEN_H 22 // ASCII screen height
#define LINE_POINTS 64 // how many points for drawing a line
#define FOV 8 // affects "field of view"
#define FRAMES 30 // how many animation frames to draw
char screen[SCREEN_W * SCREEN_H];
void showScreen(void)
{
for (int y = 0; y < SCREEN_H; ++y)
{
for (int x = 0; x < SCREEN_W; ++x)
putchar(screen[y * SCREEN_W + x]);
putchar('\n');
}
}
void clearScreen(void)
{
for (int i = 0; i < SCREEN_W * SCREEN_H; ++i)
screen[i] = ' ';
}
// Draws point to 2D ASCII screen, [0,0] means center.
int drawPoint2D(int x, int y, char c)
{
x = SCREEN_W / 2 + x;
y = SCREEN_H / 2 + y;
if (x >= 0 && x < SCREEN_W && y >= 0 && y <= SCREEN_H)
screen[y * SCREEN_W + x] = c;
}
// Divides coord. by distance taking "FOV" into account => perspective.
int perspective(int coord, int distance)
{
return (FOV * coord) / distance;
}
void drawPoint3D(int x, int y, int z, char c)
{
if (z <= 0)
return; // at or beyond camera, don't draw
drawPoint2D(perspective(x,z),perspective(y,z),c);
}
int interpolate(int a, int b, int n)
{
return a + ((b  a) * n) / LINE_POINTS;
}
void drawLine3D(int x1, int y1, int z1, int x2, int y2, int z2, char c)
{
for (int i = 0; i < LINE_POINTS; ++i) // draw a few points to form a line
drawPoint3D(interpolate(x1,x2,i),interpolate(y1,y2,i),interpolate(z1,z2,i),c);
}
int main(void)
{
int shiftX, shiftY, shiftZ;
#define N 12 // side length
#define C '*'
// cuboid points:
// X Y Z
#define PA 2 * N + shiftX, N + shiftY, N + shiftZ
#define PB 2 * N + shiftX, N + shiftY, N + shiftZ
#define PC 2 * N + shiftX, N + shiftY, 2 * N + shiftZ
#define PD 2 * N + shiftX, N + shiftY, 2 * N + shiftZ
#define PE 2 * N + shiftX, N + shiftY, N + shiftZ
#define PF 2 * N + shiftX, N + shiftY, N + shiftZ
#define PG 2 * N + shiftX, N + shiftY, 2 * N + shiftZ
#define PH 2 * N + shiftX, N + shiftY, 2 * N + shiftZ
for (int i = 0; i < FRAMES; ++i) // render animation
{
clearScreen();
shiftX = N + (i * 4 * N) / FRAMES; // animate
shiftY = N / 3 + (i * N) / FRAMES;
shiftZ = 0;
// bottom:
drawLine3D(PA,PB,C); drawLine3D(PB,PC,C); drawLine3D(PC,PD,C); drawLine3D(PD,PA,C);
// top:
drawLine3D(PE,PF,C); drawLine3D(PF,PG,C); drawLine3D(PG,PH,C); drawLine3D(PH,PE,C);
// sides:
drawLine3D(PA,PE,C); drawLine3D(PB,PF,C); drawLine3D(PC,PG,C); drawLine3D(PD,PH,C);
drawPoint3D(PA,'A'); drawPoint3D(PB,'B'); // corners
drawPoint3D(PC,'C'); drawPoint3D(PD,'D');
drawPoint3D(PE,'E'); drawPoint3D(PF,'F');
drawPoint3D(PG,'G'); drawPoint3D(PH,'H');
showScreen();
puts("press key to animate");
getchar();
}
return 0;
}
One frame of the animation should look like this:
E*******************************F
* * *** *
* ** *** *
* H***************G* *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * *
* D***************C *
* ** *** *
* * * *
* * ** *
*** * * *
A*******************************B
press key to animate
PRO TIP: It will also help if you learn a bit about photography because 3D usually tries to simulate cameras and 3D programmers adopt many terms and concepts from photography. At least learn the very basics such as focal length, pinhole camera, the "exposure triangle" (shutter speed, aperture, ISO) etc. You should know how focal length is related to FOV, what the "f number" means, how to use exposure settings to increase or decrease things like motion blur and depth of field, what HDR means etc.
You may have come here just to learn about the typical realtime 3D rendering used in today's games because aside from research and niche areas this kind of 3D is what we normally deal with in practice. This is what this section is about.
These days "game 3D" means a GPU accelerated 3D rasterization done with rendering APIs such as OpenGL, Vulkan, Direct3D or Metal (the last two being proprietary and therefore shit) and higher level engines above them, e.g. Godot, OpenSceneGraph etc. The methods seem to be evolving to some kind of rasterization/pathtracing hybrid, but rasterization is still the basis.
This mainstream rendering uses an object order approach (it blits 3D objects onto the screen rather than determining each pixel's color separately) and works on the principle of triangle rasterization, i.e. 3D models are composed of triangles (or higher polygons which are however eventually broken down into triangles) and these triangles are projected onto the screen according to the position of the virtual camera and laws of perspective. Projecting the triangles means finding the 2D screen coordinates of each of the triangle's three vertices  once we have thee coordinates, we draw (rasterize) the triangle to the screen just as a "normal" 2D triangle (well, with some asterisks).
Additionally things such as zbuffering (for determining correct overlap of triangles) and double buffering are used, which makes this approach very memory (RAM/VRAM) expensive  of course mainstream computers have more than enough memory but smaller computers (e.g. embedded) may suffer and be unable to handle this kind of rendering. Thankfully it is possible to adapt and imitate this kind of rendering even on "small" computers  even those that don't have a GPU, i.e. with pure software rendering. For this we e.g. replace zbuffering with painter's algorithm (triangle sorting), drop features like perspective correction, MIP mapping etc. (of course quality of the output will go down).
Also additionally there's a lot of bloat added in such as complex screen space shaders, pathtracing (popularly known as raytracing), megatexturing, shadow rendering, postprocessing, compute shaders etc. This may make it difficult to get into "modern" 3D rendering. Remember to keep it simple.
On PCs the whole rendering process is hardwareaccelerated with a GPU (graphics card). GPU is a special hardware capable of performing many operations in parallel (as opposed to a CPU which mostly computes sequentially with low level of parallelism)  this is ideal for graphics because we can for example perform mapping and drawing of many triangles at once, greatly increasing the speed of rendering (FPS). However this hugely increases the complexity of the whole rendering system, we have to have a special API and drivers for communication with the GPU and we have to upload data (3D models, textures, ...) to the GPU before we want to render them. Debugging gets a lot more difficult. So again, this is bloat, consider avoiding GPUs.
GPUs nowadays are no longer just focusing on graphics, but are kind of a general device that can be used for more than just 3D rendering (e.g. crypto mining, training AI etc.) and can no longer even perform 3D rendering completely by themselves  for this they have to be programmed. I.e. if we want to use a GPU for rendering, not only do we need a GPU but also some extra code. This code is provided by "systems" such as OpenGL or Vulkan which consist of an API (an interface we use from a programming language) and the underlying implementation in a form of a driver (e.g. Mesa3D). Any such rendering system has its own architecture and details of how it works, so we have to study it a bit if we want to use it.
The important part of a system such as OpenGL is its rendering pipeline. Pipeline is the "path" through which data go through the rendering process. Each rendering system and even potentially each of its version may have a slightly different pipeline (but generally all mainstream pipelines somehow achieve rasterizing triangles, the difference is in details of how they achieve it). The pipeline consists of stages that follow one after another (e.g. the mentioned mapping of vertices and drawing of triangles constitute separate stages). A very important fact is that some (not all) of these stages are programmable with so called shaders. A shader is a program written in a special language (e.g. GLSL for OpenGL) running on the GPU that processes the data in some stage of the pipeline (therefore we distinguish different types of shaders based on at which part of the pipeline they reside). In early GPUs stages were not programmable but they became so as to give a greater flexibility  shaders allow us to implement all kinds of effects that would otherwise be impossible.
Let's see what a typical pipeline might look like, similarly to something we might see e.g. in OpenGL. We normally simulate such a pipeline also in software renderers. Note that the details such as the coordinate system handedness and presence, order, naming or programmability of different stages will differ in any particular pipeline, this is just one possible scenario:
WORK IN PROGRESS
{ This turned out to be long as hell, sowwy. ~drummyfish }
This is an example of how two very simple 3D models would be rendered using the traditional triangle rasterization pipeline. Note that this is VERY simplified, it's just to give you an idea of the whole process, BUT if you understand this you will basically get an understanding of it all.
Keep in mind this all can be done just with fixed point, floating point is NOT required.
First we need to say what conventions we'll stick to:
Y ^ _
 _/ Z
 _/
_/
'> X
Now let's have a simple 3D model data of a quad. Quad is basically just a square made of four vertices and two triangles, it will look like this:
quadModel:
v3________v2
 _/ 
 _/ 
 _/ 
/_______
v0 v1
In a computer this is represented with two arrays: vertices and triangles. Our vertices here are (notices all Z coordinates are zero, i.e. it is a 3D model but it's flat):
quadVertices:
v0 = [1, 1, 0]
v1 = [ 1, 1, 0]
v2 = [ 1, 1, 0]
v3 = [1, 1, 0]
And our triangles are (they are indices to the vertex array, i.e. each triangle says which three vertices from the above array to connect to get the triangle):
quadTriangles:
t0 = [0,1,2]
t1 = [0,2,3]
Note the triangles here (from our point of view) go counterclockwise  this is called winding and is usually important because of so called backface culling  the order of vertices basically determines which is the front side of the triangle and which is the back side, and rendering systems often just draw the front sides for efficiency (back faces are understood to be on the inside of objects and invisible).
Now the vertex coordinates of the model above are in so called model space  these are the coordinates that are stored in the 3D model's file, it's the model's "default" state of coordinates. The concept of different spaces is also important because 3D rendering is a lot about just transforming coordinates between different spaces ("frames of reference"). We'll see that the coordinates in model space will later on be transformed to world space, view space, clip space, screen space etc.
OK, so next let's have 2 actual 3D model instances that use the above defined 3D model data. Notice the difference between 3D model DATA and 3D model INSTANCE: instance is simply one concrete, specific model that has its own place in the global 3D world (world space) and will be rendered, while data is just numbers in memory representing some 3D geometry etc. There can be several instances of the same 3D data (just like in OOP there can be multiple instances/objects of a class); this is very efficient because there can be just one 3D data (like a model of a car) and then many instances of it (like many cars in a virtual city). Our two model instances will be called quad0 and quad1.
Each model instance has its own transformation. Transformation says where the model is placed, how it's rotated, how it's scaled and so on  different 3D engines may offer different kind of transformations, some may support things like flips, skews, nonuniform scaling, but usually at least three basic transforms are supported: translation (AKA offset, shift, position), rotation and scale. The transforms can be combined, e.g. a model can be shifted, rotated and scaled at the same time. Here we'll just rotate quad0 by 45 degrees (pi/4 radians) around Y (vertical) axis and translate quad1 one unit to the right and 2 back:
quad0:
quad1:
So now we have two model instances in our world. For rendering we'll also need a camera  the virtual window to our world. Camera is also a 3D object: it doesn't have 3D model data associated but it does have a transformation so that we can, naturally, adjust the view we want to render. Camera also has additional properties like field of view (FOV), aspect ratio (we'll just consider 1:1 here), near and far distances and so on. Our camera will just be shifted two units back (so that it can see the first quad that stays at position [0,0,0]):
camera:
It is important to mention the near and far planes. Imagine a camera placed at some point in space, looking in certain direction: the volume of space that it sees creates a kind of infinitely long pyramid (whose "steepness" depends on the camera field of view) with its tip at the point where the camera resides. Now for the purpose of rendering we define two planes, perpendicular to the camera's viewing direction, that are defined by the distance from the camera: the near plane (not surprisingly the one that's the nearer of the two) and the far plane. For example let's say our camera will have the near plane 1 unit in front of it and the far plane 5 units in front of it. These planes will CUT OFF anything that's in front and beyond them, so that only things that are BETWEEN the two planes will be rendered (you can notice this in games with render distance  if this is not cleverly masked, you'll see things in the distance suddenly cut off from the view). These two planes will therefore CUT OFF the viewing pyramid so that now it's a six sided, finitevolume shape that looks like a box with the front side smaller than the back side. This is called the view frustum. Nothing outside this frustum will be rendered  things will basically be sliced by the sides of this frustum.
You may ask WHY do we establish this frustum? Can't we just leave the near and far planes out and render "everything"? Well, firstly it's obvious that having a far cutoff view distance can help performance if you have a very complex model, but this is not the main reason why we have near and far planes. We basically have them for mathematical convenience  as we'll see for example, perspective mapping means roughly "dividing by distance from camera" and if something was to be exactly where the camera is, we'd be dividing by zero! Attempting to render things that are just very near or on the back side of the camera would also do very nasty stuff. So that's why we have the near plane. In theory we might kind of get away with not having a strict far plane but it basically creates a nice finitevolume that will e.g. allow us to nicely map depth values for the zbuffer. Don't worry if this doesn't make much sense, this is just to say there ARE good reasons for this stuff.
Now let's summarize what we have with this topdown view of our world (the coordinates here are now world space):
Z
:
       3 +         far plane
:
:
2 +* quad2
:
:
1 +
: quad1
0 : _/
...........*/.......... X
2 1 _/: 1 2
. / : .
    '.=====:=====.'     near plane
'. : .'
'. : .'
2 '*' camera
:
:
NOW actually let's see how to in fact render this. The big picture overview is this:
As seen this involves doing many transformations between different spaces. We do these using linear algebra, i.e. with vectors and matrices. Key things here are:
You HAVE TO learn how to multiply vector with matrix and matrix with matrix (it's not hard) else you will understand nothing now.
BIG BRAIN MOMENT: homogeneous coordinates. Please DO NOT ragequit, it looks complicated as hell (it is a little bit) but it makes sense in the end, OK? We have to learn what homogeneous coordinates are because we need them to be able to do all the awesome matrix stuff described above. In essence: in 3D space we can perform linear transformations with 3x3 matrices  linear operations are for example scaling and rotation, BUT some, most importantly translation (shifting and object, which we absolutely NEED), are not linear (but rather affine) so they cannot be performed by a 3x3 matrix. But it turns out that if we use special kind of coordinates, we CAN do affine 3D transformations with 4x4 matrices, OK? These special coordinates are homogeneous coordinates, and they simply add one extra coordinate, w, to the original x, y and z, while it holds that that multiplying all the x, y, z and w components by the same number does nothing with the point they represent. Let's show it like this:
If we have a 3D point [1,2,3], in homogeneous coordinates we can represent it as [1,2,3,1] or [2,4,6,2] or [3,6,9,3] and so on. That's easy no? So we will ONLY add an additional 1 at the end of our vertex coordinates and that's basically it.
Let's start doing this now!
Firstly let us transform quad0 from model space to world space. For this we construct so called model matrix based on the transformation that the model instance has. Our quad0 is just rotated by pi/4 radians and for this the matrix will look like this (you don't have to know why, you usually just look up the format of the matrix somewhere, but you can derive it, it's EZ):
quad0 model matrix:
 cos(A) 0 sin(A) 0 0.7 0 0.7 0
Mm0 =  0 1 0 0 ~= 0 1 0 0
sin(A) 0 cos(A) 0 0.7 0 0.7 0
 0 0 0 1 0 0 0 1
Let's see if this works, we'll try to multiply the first model vertex with this matrix (notice we add 1 at the end of the vertex, to convert it to homogeneous coordinates):
0.7 0 0.7 0
0 1 0 0
0.7 0 0.7 0
0 0 0 1
v0 * Mm0 = [1, 1, 0, 1] [0.7 1 0.7 1] < result
So from [1,1,0] we got [0.7,1,0.7]  looking at the topdown view picture above this seem pretty correct (look at the coordinates of the first vertex). Try it also for the rest of the vertices. Now for the model matrix of quad1 (again, just look up what translation matrix looks like):
quad1 model matrix:
1 0 0 0
Mm1 = 0 1 0 0
0 0 1 0
1 0 2 1
Here you can even see that multiplying a vector by this will just add 1 to x and 2 to z, right? Again, try it.
NEXT, the view matrix (matrix that will transform everything so that it's "in front of the camera") will basically just do the opposite transformation of that which the camera has. Imagine if you shift a camera 1 unit to the right  that's as if the camera stands still and everything shifts 1 unit to the left. Pretty logical. So our view matrix looks like this (notice it just pushes everything by 2 to the front):
view matrix:
1 0 0 0
Mv = 0 1 0 0
0 0 1 0
0 0 2 1
Then we'll need to apply perspective and get everything to the clip space. This will be done with so called projection matrix which will in essence make the x and y distances be divided by the z distance so that further away things will shrink and appear smaller. You can derive the view matrix, its values depend on the field of view, near and far plane etc., here we'll just copy paste numbers into a "template" for the projection matrix, so here it is:
projection matrix (n = near plane distance, f = far plane distance, r = right projection plane distance = 1, t = top projection plane distance = 1):
n/r 0 0 0 1 0 0 0
Mp = 0 n/t 0 0 = 0 1 0 0
0 0 (f+n)/(fn) 1 0 0 3/2 1
0 0 2*f*n/(fn) 0 0 0 5/2 0
This matrix will basically make the points so that their w coordinates will be such that when in the end we divide all components by it (to convert them back from homogeneous coordinates), we'll get the effect of the perspective (it's basically the "dividing by distance from the camera" that perspective does). That is what the homogeneous coordinates allow us to do. To visually demonstrate this, here is a small picture of how it reshapes the view frustum into the clip space box (it kind of "squishes" all in the back of the frustum pyramid and also squeezes everything to shape it into that little box of the clipping space, notice how the further away objects became closer together  that's the perspective):
___________________ far plane _____________
\ A B C /  ABC 
\ /  
\ D E F /  D E F 
\ /  
\G H I/ G H I
\_______/ near plane _____________
: : : :
: : : screen :
: : : :
* camera
At this point we have the matrices of the individual transforms, but as we've said, we can combine them into a single matrix. First let's combine the view matrix and projection matrix into a single viewprojection matrix by multiplying the two matrices (WATCH OUT: the order of multiplication matters here! It defines in which order the transformations are applied):
viewprojection matrix:
1 0 0 0
Mvp = Mv * Mp = 0 1 0 0
0 0 3/2 1
0 0 1/2 2
The rendering will begin with quad0, we'll combine its model matrix and the viewprojection matrix into a single uber matrix that will just do the whole transformation for this model instance:
quad0 modelviewprojection matrix:
0.7 0 21/20 0.7
Mm0vp = Mm0 * Mvp = 0 1 0 0 
0.7 0 21/20 0.7
0 0 1/2 2 
Now we'll just transform all of the model's vertices by multiplying with this matrix, and then we'll convert back from the homogeneous coordinates to "normal" coordinates by dividing all components by w (AKA "perspective divide") like this:
v0: [1, 1, 0, 1] (matrix multiplication) => [0.7, 1, 0.55, 1.3] (w divide) => [0.53, 0.76, 0.43]
v1: [ 1, 1, 0, 1] (matrix multiplication) => [ 0.7, 1, 1.55, 2.7] (w divide) => [ 0.26, 0.37, 0.57]
v2: [ 1, 1, 0, 1] (matrix multiplication) => [ 0.7, 1, 1.55, 2.7] (w divide) => [ 0.26, 0.37, 0.57]
v3: [1, 1, 0, 1] (matrix multiplication) => [0.7, 1, 0.55, 1.3] (w divide) => [0.53, 0.76, 0.43]
And let's also do this for quad1.
quad1 modelviewprojection matrix:
1 0 0 0
Mm1vp = Mm1 * Mvp = 0 1 0 0
0 0 3/2 1
0 0 7/2 4
and
v0: [1, 1, 0, 1] (matrix multiplication) => [1, 1, 3.5, 4] (w divide) => [0.25, 0.25, 0.87]
v1: [ 1, 1, 0, 1] (matrix multiplication) => [ 1, 1, 3.5, 4] (w divide) => [ 0.25, 0.25, 0.87]
v2: [ 1, 1, 0, 1] (matrix multiplication) => [ 1, 1, 3.5, 4] (w divide) => [ 0.25, 0.25, 0.87]
v3: [1, 1, 0, 1] (matrix multiplication) => [1, 1, 3.5, 4] (w divide) => [0.25, 0.25, 0.87]
Hmmm mkay let's draw the transformed points to an X/Y grid:
Y

[1,1]_______________:_______________[1,1]
 
 v3 +.... 
  '''+ v2 
  .' 
 v3 +:+ v2 
   .' .. 
   :..'  
   ''  
 v0 +'+ v1 
  :  
 .' ...+ v1 
 v0 +'''' 
__________________________________X
[1,1] : [1,1]
HOLY SHIT IT'S 3D!!1! Magic! In the front we see quad0, rotated slightly around the vertical (Y) axis, behind it is quad1, nonrotated but smaller because it's further away. This looks very, very good! We're almost there.
Also notice that the points  now nicely projected onto a 2D X/Y plane  still have 3 coordinates, i.e. they retain the Z coordinate which now holds their depth, or distance from the camera projection plane kind of. This depth is now in range from 1 (near plane) to 1 (far plane). The depth will be important in actually drawing pixels, to decide which are more in the front and therefore visible (this is the problem of visibility). The depth value can also be used for cool effects like the distance fog and so on.
The work up until now  i.e. transforming the vertices with matrices  is what vertex shaders do. Now comes the rasterization part  here we literally draw triangles (as in individual pixels) between the points we have now mapped on the screen. In systems such as OpenGL This is usually done automatically, you don't have to care about rasterization, however you will have to write the algorithm if you're writing e.g. your own software renderer. Triangle rasterization isn't trivial, it has to be not only efficient but also deal with things such as not leaving holes between adjacent triangles, interpolating triangle attributes and so on. We won't dive deeper into this, let's just suppose we have a rasterization algorithm now. For example rasterizing the first triangle of quad0 may look like this:
_______________________________
 
 +.... 
  '''# 
  .## 
  +####+ 
   .##### 
   :###### 
   ######## 
  ##########+ 
  ########### 
  ############ 
 ####### 
_______________________________
During this rasterization process the Z coordinates of the mapped vertices are important  the rasterizer interpolates the depth at the vertices, so it knows the depth of each pixel it creates. This depth is written into so called zbuffer (AKA depth buffer)  basically an offscreen buffer that stores one numeric value (the depth) for each pixel. Now when let's say the first triangle of quad1 starts to be rasterized, the algorithm compares the rasterized pixel's depth to that stored on the same position in the zbuffer  if the zbuffer value is small, the new pixel mustn't be drawn because it's covered by a previous drawn pixel already (here probably that of the triangle shown in the picture).
So the rasterization algorithm just shits out individual pixels and hands them over to the fragment shader (AKA pixel shader). Fragment is a program that just takes a pixel and says what color it should have (basically)  this is called shading. For this the rasterizer also hands over additional info to the fragment shader which may include: the X/Y coordinates of the pixel, its interpolated depth (that used in zbuffer), vertex normals, ID of the model and triangle and, very importantly, the barycentric coordinates. These are threecomponent coordinates that say where exactly the pixel is in the triangle. These are used mainly for texturing, i.e. if the model we're rendering has a texture map (so called UV map) and a bitmap image (texture), the fragment shader will use the UV map and barycentric coords to compute the exact pixel of the texture that the rasterized pixel falls onto AND this will be the pixel's color. Well, not yet actually, there are more things such as lighting, i.e. determining what brightness the pixel should have depending on how the triangle is angled towards scene lights (for which we need the normals), how far away from them it is, what colors the lights have etcetc. And this is not nearly all, there are TONS and tons of other things, for example the interpolation done by rasterizer has to do perspective correction (linearly interpolating in screen space looks awkward), then there is texture filtering to prevent aliasing (see e.g. mipmapping, transparency, effects like bump mapping, environment mapping, screen space effects, stencil buffer etcetc.  you can read whole books about this. That's beyond the scope of this humble tutorial  in simple renderers you can get away with ignoring a lot of this stuff, you can just draw triangles filled with constant color, or even just draw lines to get a wireframe renderer, all is up to you. But you can see it is a bit bloated if everything is to be done correctly  don't forget there also exist other ways of rendering, see for example raytracing which is kind of easier.
Powered by nothing. All content available under CC0 1.0 (public domain). Send comments and corrections to drummyfish at disroot dot org.