Sorting denotes the action of rearranging a sequence, such as a list of numbers, so that the elements are put in a specific order (e.g. ascending or descending). It is the opposite of shuffling. In computer science sorting enjoys the status of a wide and curious topic, there are dozens, maybe hundreds of sorting algorithms, each with pros and cons and different attributes are being studied, e.g. the algorithm's time complexity, stability etc. Sorting algorithms are a favorite subject of programming classes as they provide a good exercise for programming and analysis of algorithms and can be nicely put on tests :) Sorting algorithms are like Pokemon for computer nerds, some are big, some are small and cute and everyone has a favorite. { Gotta implement them all? ~drummyfish }
Some celebrities among sorting algorithms are the bubble sort (a simple KISS algorithm), quick sort (a super fast one), merge sort (also lightning fast) and stupid sort (just tries different permutations until it hits the jackpot).
In our day-to-day lives we commonly get away with some of the simplest, uncomplicated sorting algorithms (such as bubble sort or insertion sort) anyway, unless we're programming a database or otherwise treating enormous amounts of data. If we need to sort just a few hundred of items and/or the sorting doesn't occur very often, a simple algorithm does the job well, sometimes even faster due to a potential initial overhead of a very complex algorithm. So always consider the KISS approach first.
Attributes of sorting algorithms we're generally interested in are the following:
In practice not only the algorithm but also details of its implementation matters. For instance if we have a sequence of very large data structures to sort, we may want to avoid physically rearranging these structures in memory, this could be slow. In such scenario we may want to use indirect sorting: we create an additional list whose elements are indices to the main sequence, and we only sort this list of indices.
TODO
For starters let's take a look at one of the simplest sorting algorithms, bubble sort. Its basic version looks something like this (pseudocode):
for j from 0 to N - 2 (inclusive)
for i from 0 to to N - 2 - j (inclusive)
is array[i] > array[i + 1]
swap array[i] and array[i + 1]
How does this work? Firstly notice there are two loops. The outer loop, with counter variable j, runs N - 1 times -- in each iteration of this loop we will ensure one value gets to its correct place; specifically the values will be getting to their correct places from the top -- highest values will be sorted first (you can also implement the algorithm the other way around too, to sort the lowest values first, try it as an exercise). This makes sense, imagine that we have e.g. a sequence of length N = 4 -- then the outer loop will run N - 1 = 3 times (j will have values 0, 1 and 2); after fist iteration 1 value will be in its correct place, after 2 iterations 3 values will be in place and after 3 iterations 3 values will be in place which also means the last (forth) value has to be in place too, i.e. the array must be sorted. Now for the inner loop (with variable i): this one ensures actually getting the value in its place. Notice it goes from 0 to the top and always compares two neighbors in the array -- if the bottom neighbor is higher than the top neighbor, the loop swaps them, ensuring that the highest value will get to the top (it kind of "bubbles" up, hence the algorithm name). Also notice this loop doesn't always go to the very end of the array! It subtracts the value j from its top boundary because there the values that are already in place reside, so we don't need to sort them anymore; the inner loop can end earlier and earlier as the outer loop progresses. The algorithm would still work if we went through the whole array every time (try it), but its time complexity would suffer, i.e. by noticing the inner loop can get progressively shorter we greatly optimize the algorithm. Anyway, how the algorithm actually works is best seen on an example, so let's now try to use the algorithm to sort the following sequence:
3 7 8 3 2
The length of the sequence is N = 5, so j (the outer loop) will go from 0 to 3. The following shows how the array changes (/\
shows comparison of neighbors, read top to bottom and left to right):
j = 0 j = 1 j = 2 j = 3
swapped
i = 0 /\ /\ /\ /\
37832 37328 33278 23378 <-- SORTED
""""
swapped swapped
i = 1 /\ /\ /\
37832 33728 32378 <--- last 3 items are in their places
"""
swapped swapped
i = 2 /\ /\
37382 33278 <--- last 2 items are in their places
""
swapped
i = 3 /\
37328 <--- last item is in its place
"
Hopefully it's at least a bit clear -- if not, try to perform the algorithm by hand, that's a practically guaranteed way of gaining understanding of the algorithm.
Now let's see other algorithms and some actual runnable code. The following is a C program that shows implementations of some of the common sorting algorithms and also measures their speed:
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define N 64
char array[N + 1]; // extra 1 for string terminating zero
void swap(int i1, int i2)
{
int tmp = array[i1];
array[i1] = array[i2];
array[i2] = tmp;
}
void setupArray(void) // fills the array with pseudorandom ASCII letters
{
array[N] = 0;
srand(123);
for (int i = 0; i < N; ++i)
array[i] = 'A' + rand() % 26;
}
void test(void (*sortFunction)(void), const char *name)
{
int timeTotal = 0;
for (int i = 0; i < 64; ++i) // run the sort many times to average it a bit
{
setupArray();
long int t = clock();
sortFunction();
timeTotal += clock() - t;
}
printf("%-10s: %s, CPU ticks: %d\n",name,array,(int) timeTotal);
}
// ============================ sort algorithms ================================
void sortBubble(void)
{
for (int j = 0; j < N - 1; ++j)
for (int i = 0; i < N - 1 - j; ++i)
if (array[i] > array[i + 1])
swap(i,i + 1);
}
void sortBubble2(void) // simple bubble s. improvement, faster if already sorted
{
for (int j = 0; j < N - 1; ++j)
{
int swapped = 0;
for (int i = 0; i < N - 1 - j; ++i)
if (array[i] > array[i + 1])
{
swap(i,i + 1);
swapped = 1;
}
if (!swapped) // if no swap happened, the array is already sorted
break;
}
}
void sortInsertion(void)
{
for (int j = 1; j < N; ++j)
for (int i = j; i > 0; --i)
if (array[i] < array[i - 1]) // keep moving down until we find its place
swap(i,i - 1);
else
break;
}
void sortSelection(void)
{
for (int j = 0; j < N - 1; ++j)
{
int min = j;
for (int i = j + 1; i < N; ++i) // find the minimum
if (array[i] < array[min])
min = i;
swap(j,min);
}
}
void sortStupid(void)
{
while (1)
{
for (int i = 0; i < N; ++i) // check if the array is sorted
if (i == (N - 1))
return;
else if (array[i] > array[i + 1])
break; // we got to the end, the array is sorted
for (int i = 0; i < N; ++i) // randomly shuffle the array
swap(i,rand() % N);
}
}
void _sortQuick(int a, int b) // helper recursive function for the main quick s.
{
if (b <= a || a < 0)
return;
int split = a - 1;
for (int i = a; i < b; ++i)
if (array[i] < array[b])
{
split++;
swap(split,i);
}
split++;
swap(split,b);
_sortQuick(a,split - 1);
_sortQuick(split + 1,b);
}
void sortQuick(void)
{
_sortQuick(0,N - 1);
}
int main(void)
{
setupArray();
printf("array : %s\n",array);
#if 0 // stupid sort takes too long, only turn on while decreasing N to like 10
test(sortStupid,"stupid");
#endif
test(sortBubble,"bubble");
test(sortBubble2,"bubble2");
test(sortInsertion,"insertion");
test(sortBubble2,"selection");
test(sortQuick,"quick");
return 0;
}
// TODO: let's add more algorithms in the future :-)
It may output for example:
array : RLPALFTOCFWGVJYPLLUNEPDBSOMIBMXSXMVLROZUWXARHAIUNCJTUNVMDHWHTTZT
bubble : AAABBCCDDEFFGHHHIIJJLLLLLMMMMNNNOOOPPPRRRSSTTTTTUUUUVVVWWWXXXYZZ, CPU ticks: 1191
bubble2 : AAABBCCDDEFFGHHHIIJJLLLLLMMMMNNNOOOPPPRRRSSTTTTTUUUUVVVWWWXXXYZZ, CPU ticks: 1164
insertion : AAABBCCDDEFFGHHHIIJJLLLLLMMMMNNNOOOPPPRRRSSTTTTTUUUUVVVWWWXXXYZZ, CPU ticks: 665
selection : AAABBCCDDEFFGHHHIIJJLLLLLMMMMNNNOOOPPPRRRSSTTTTTUUUUVVVWWWXXXYZZ, CPU ticks: 1217
quick : AAABBCCDDEFFGHHHIIJJLLLLLMMMMNNNOOOPPPRRRSSTTTTTUUUUVVVWWWXXXYZZ, CPU ticks: 365
Powered by nothing. All content available under CC0 1.0 (public domain). Send comments and corrections to drummyfish at disroot dot org.