less_retarded_wiki

main page, file list (578), source, all in md+txt+html+pdf, report abuse, stats, random article, consoomer version

Programming Language

Programming language is an artificial formal (mathematically precise) language created in order to allow humans to relatively easily write algorithms for computers. It basically allows a human to very specifically and precisely but still relatively comfortably tell a computer what to do. We call a program written in programming language the program's source code. Programming languages often try to mimic some human language -- practically always English -- so as to be somewhat close to humans but programming language is actually MUCH simpler so that a computer can actually analyze it and understand it precisely (as computers are extremely bad at understanding actual human language), without ambiguity, so in the end it all also partially looks like math expressions. A programming language can be seen as a middle ground between pure machine code (the computer's native language, very hard to handle by humans) and natural language (very hard to handle by computers).

For beginners: a programming language is actually much easier to learn than a foreign language, it will typically have fewer than 100 "words" to learn (out of which you'll mostly use like 10) and once you know one programming language, learning another becomes a breeze because they're all (usually) pretty similar in basic concepts. The hard part may be learning some of the concepts.

A programming language is distinct from a general computer language by its purpose to express algorithms and be used for creation of programs. This is to say that there are computer languages that are NOT programming languages (at least in the narrower sense), such as HTML, json and so on.

We write two basic types of programs in these languages: executable programs (programs that can actually be directly run) and libraries (code that cannot be run on its own but is supposed to be used in other programs, e.g. library for mathematical functions, networking, games and so on).

A simple example of source code in the C programming language is the following:

// simple program computing squares of numbers
#include <stdio.h>

int square(int x)
{
  return x * x;
}

int main(void)
{
  for (int i = 0; i < 5; ++i)
    printf("%d squared is %d\n",i,square(i));

  return 0;
}

Which prints:

0 squared is 0
1 squared is 1
2 squared is 4
3 squared is 9
4 squared is 16

We divide programming languages into different groups. Perhaps the most common divisions is to two groups:

Sometimes the distinction here may not be completely clear, for example Python is normally considered an interpreted language but it can also be compiled into bytecode and even native code. Java is considered more of a compiled language but it doesn't compile to native code (it compiles to bytecode). C is traditionally a compiled language but there also exist C interpreters. Comun is meant to be both compiled and interpreted etc.

Another common division is by level of abstraction roughly to (keep in mind the transition is gradual and depends on context, the line between low and high level is extremely fuzzy):

We can divide language in many more ways, for example based on their paradigm (roughly its core idea/model/"philosophy", e.g. impertaive, declarative, object-oriented, functional, logical, ...), purpose (general purpose, special purpose), computational power (turing complete or weaker, many definitions of a programming language require Turing completeness), typing (strong, weak, dynamic, static) or function evaluation (strict, lazy).

A computer language consists of two main parts:

We also commonly divide a language to two main parts:

Besides the standard library there will also exist many third party libraries, but these are no longer considered part of the language itself, they are already a products of the language.

What is the best programming language and which one should you learn? (See also programming.) These are the big questions, the topic of programming languages is infamous for being very religious and different people root for different languages like they do e.g. for football teams. For minimalists, i.e. suckless, LRS (us), Unix people, Plan9 people etc., the standard language is C, which is also probably the most important language in history. It is not in the league of the absolutely most minimal and objectively best languages, but it's relatively minimalist (much more than practically any modern language) and has great advantages such as being one of the absolutely fastest languages, being extremely well established, long tested, supported everywhere, having many compilers etc. But C isn't easy to learn as a first language. Some minimalist also promote go, which is kind of like "new C". Among the most minimal usable languages are traditionally Forth and Lisp which kind of compete for who really is the smallest, then there is also our comun which is a bit bigger but still much smaller than C. To learn programming you may actually want to start with some ugly language such as Python, but you should really aim to transition to a better language later on.

Can you use multiple programming languages for one project? Yes, though it may be a burden, so don't do it just because you can. Combining languages is possible in many ways, e.g. by embedding a scripting language into a compiled language, linking together object files produces by different languages, creating different programs that communicate over network etc.

History

The first higher level programming language was probably Plankalkul made by Konrad Zuse in 1942.

TODO

More Details And Context

What really IS a programming language -- is it software? Is it a standard? Can a language be bloated? How does the languages evolve? Where is the exact line between a programming language and non-programming language? Who makes programming languages? Who "owns" them? Who controls them? Why are there so many and not just one? These are just some of the questions one may ask upon learning about programming. Let's try to quickly answer some of them.

Strictly speaking programming language is a formal language with semantics, i.e. just something akin a "mathematical idea" -- as such it cannot be directly "owned", at least not on the grounds of copyright, as seems to have been quite strongly established by a few court cases now. However things related to a language can sadly be owned, for example their specifications (official standards describing the language), trademarks (the name or logo of the language), implementations (specific software such as the language's compiler), patents on some ideas used in the implementation etc. Also if a language is very complex, it can be owned practically; typically a corporation will make an extremely complicated language which only 1000 paid programmers can maintain, giving the corporation complete control over the language -- see bloat monopoly and capitalist software.

At this point we should start to distinguish between the pure language and its implementation. As has been said, the pure language is just an idea -- this idea is explained in detail in so called language specification, a document that's kind of a standard that precisely describes the language. Specification is a technical document, it is NOT a tutorial or promotional material or anything like that, its purpose is just to DEFINE the language for those who will be implementing it -- sometimes specification can be a very official standard made by some standardizing organization (as e.g. with C), other times it may be just a collaborative online document that at the same time serves as the language reference (as e.g. with Lua). In any case it's important to version the specification just as we version programs, because when specification changes, the specified languages usually changes too (unless it's a minor change such as fixing some typos), so we have to have a way to exactly identify WHICH version of the language we are referring to. Theoretically specification is the first thing, however in practice we usually have someone e.g. program a small language for internal use in a company, then that language becomes more popular and widespread and only then someone decides to standardize it and make the official specification. Specification describes things like syntax, semantics, conformance criteria etc., often using precise formal tools such as grammars. It's hugely difficult to make good specification because one has to decide what depth to go to and even what to purposefully leave unspecified! One would thought that it's always better to define as many things as possible, but that's naive -- leaving some things up to the choice of those who will be implementing the language gives them freedom to implement it in a way that's fastest, most elegant or convenient in any other way.

It is possible for a language to exist without official specification -- the language is then basically specified by some of its implementations, i.e. we say the language is "what this program accepts as valid input". Many languages go through this phase before receiving their specification. Language specified purely by one implementation is not a very good idea because firstly such specification is not very readable and secondly, as said, here EVERYTHING is specified by this one program (the language EQUALS that one specific compiler), we don't know where the freedom of implementation is. Do other implementations have to produce exactly the same compiled binary as this one (without being able to e.g. optimize it better or produce binaries for other platforms)? If not, how much can they differ? Can they e.g. use different representation of numbers (may be important for compatibility)? Do they have to reproduce even the same bugs as the original compiler? Do they have to have the same technical limitations? Do they have to implement the same command line interface (without potentially adding improvements)? Etc.

Specification typically gets updated just as software does, it has its own version and so we then also talk about version of the language (e.g. C89, C99, C11, ...), each one corresponding to some version of the specification.

Now that we have a specification, i.e. the idea, someone has to realize it, i.e. program it, make the implementation; this mostly means programming the language's compiler or interpreter (or both), and possibly other tools (debugger, optimizer, transpiler, etc.). A language can (and often does) have multiple implementations; this happens because some people want to make the language as fast as possible while others e.g. want to rather have small, minimalist implementation that will run on limited computers, others want implementation under a different license etc. The first implementation is usually so called reference implementation -- the one that will serve as a kind of authority that shows how the language should behave (e.g. in case it's not clear from the specification) to those who will make newer implementations; here the focus is often on correctness rather than e.g. efficiency or minimalism, though it is often the case that reference implementations are among the best as they're developed for longest time. Reference implementations guide development of the language itself, they help spot and improve weak points of the language etc. Besides this there are third party implementations, i.e. those made later by others. These may add extensions and/or other modifications to the original language so they spawn dialects -- slightly different versions of the language. We may see dialects as forks of the original language, which may sometimes even evolve into a completely new language over time. Extensions of the languages may sound like a good thing as they add more "comfort" and "features", however they're usually bad as they create a dependency and fuck up the standardization -- if someone writes a program in a specific compiler's dialect, the program won't compile under other compilers.

A new language comes to existence just as other things do -- when there is a reason for it. I.e. if someone feels there is no good language for whatever he's doing or if someone has a brilliant idea and want to write a PhD thesis or if someone smokes too much weed or if a corporation wants to control some software platform etc., a new language may be made. This often happen gradually (again, like with many things), i.e. someone just starts modifying an already existing language -- at first he just makes a few macros, then he starts making a more complex preprocessor, then he sees it's starting to become a new language so he gives it a name and makes it a new language -- such language may at first just be transpiled to another language (often C) and over time it gets its own full compiler. At first a new language is written in some other language, however most languages aim for self hosted implementation, i.e. being written in itself. This is natural and has many advantages -- a language written in itself proves its maturity, it becomes independent and as it itself improves, so does its own compiler. Self hosting a language is one of the greatest milestones in its life -- after this the original implementation in the other language often gets deletes as it would just be a burden to keep maintaining it.

So can a language be inherently fast, bloated, memory efficient etc.? When we say a language is so and so, we generally refer to its implementations and our experience from practice because, as explained previously, a language in itself is only an idea that can be implemented in many ways with different priorities and tradeoffs, and not only that; even if we choose specific implementations of languages, the matter of benchmarking and comparing them is very complicated because the results will be highly dependent for example on hardware architecture we use (some ISA have slow branching, lack the divide instruction, some MCUs lack floating point unit etcetc., all of which may bias results heavily to either side) AND on test programs we use (some types of problems may better fit the specialization of one language that will do very well at it while it would do much worse at other types of problems), the way they are written (the problem of choosing idiomatic code vs transliteration, i.e. performance will depend on whether we try to solve the benchmark problem in the way that's natural for the language or the way that's more faithful to the described solution) and what weight we give to each one (i.e. even when using multiple benchmarks, we ultimately have to assign a specific importance to each one). It's a bit like trying to say who the fastest human is -- generally we can pick the top sportsmen in the world but then we're stuck because one will win at sprint while the other one at long distance running and another one at swimming, and if we consider even letting them compete in different clothes, weather conditions and so on, we'll just have to give up. So speaking about languages and their quantitative properties in practice generally means talking about their implementations and practical experience we have. HOWEVER, on the other hand, it does make sense to talk about properties of languages as such as well -- a language CAN itself be seen as inherently having some property if it's defined so that its every implementation has to have this property, at least practically speaking. Dynamic typing for example means the language will be generally slower because operations on variables will inevitably require some extra runtime checks of what's stored in the variable. A very complicated language just cannot be implemented in a simple, non-bloated way, an extremely high level and flexible language cannot be implemented to be among the fastest -- so in the end we also partially speak about languages as such because eventually implementations just reflect the abstract language's properties. How to tell if a language is bloated? One can get an idea from several things, e.g. list of features, paradigm, size of its implementations, number of implementations, size of the specification, year of creation (newer mostly means more bloat) and so on. However be careful, many of these are just heuristics, for example small specification may just mean it's vague. Even a small self hosted implementation doesn't have to mean the language is small -- imagine e.g. a language that just does what you write in plain English; such language will have just one line self hosted implementation: "Implement yourself." But to actually bootstrap the language will be immensely difficult and will require a lot of bloat.

Judging languages may further be complicated by the question of what the language encompasses because some languages are e.g. built on relatively small "pure language" core while relying on a huge library, preprocessor, other embedded languages and/or other tools of the development environment coming with the language -- for example POSIX shell makes heavy use of separate programs, utilities that should come with the POSIX system. Similarly Python relies on its huge library. So sometimes we have to make it explicitly clear about this.

Notable Languages

Here is a table of notable programming languages in chronological order (keep in mind a language usually has several versions/standards/implementations, this is just an overview).

language minimalist/good? since speed mem. ~min. selfhos. impl. LOC spec. (~no stdlib pages) notes
"assembly" yes but... 1947? NOT a single language, non-portable
Fortran kind of 1957 1.95 (G) 7.15 (G) 300, proprietary (ISO) similar to Pascal, compiled, fast, was used by scientists a lot
Lisp yes 1958 3.29 (G) 18 (G) 100 (judg. by jmc lisp) 1 elegant, KISS, functional, many variants (Common Lisp, Closure, ...)
Basic kind of? 1964 mean both for beginners and professionals, probably efficient
Forth yes 1970 100 (judg. by milliforth) 1 stack-based, elegant, very KISS, interpreted and compiled
Pascal kind of 1970 5.26 (G) 2.11 (G) 80, proprietary (ISO) like "educational C", compiled, not so bad actually
C kind of 1972 1.0 1.0 20K (lcc) 160, proprietary (ISO) compiled, fastest, efficient, established, suckless, low-level, #1 lang.
Prolog maybe? 1972 logic paradigm, hard to learn/use
Smalltalk quite yes 1972 47 (G) 41 (G) 40, proprietary (ANSI) PURE (bearable kind of) OOP language, pretty minimal
C++ no, bearable 1982 1.18 (G) 1.27 (G) 500, proprietary bastard child of C, only adds bloat (OOP), "games"
Ada ??? 1983 { No idea about this, sorry. ~drummyfish }
Object Pascal no 1986 Pascal with OOP (like what C++ is to C), i.e. only adds bloat
Objective-C probably not 1986 kind of C with Smalltalk-style "pure" objects?
Perl rather not 1987 77 (G) 8.64 (G) interpreted, focused on strings, has kinda cult following
Bash well 1989 Unix scripting shell, very ugly syntax, not so elegant but bearable
Haskell kind of 1990 5.02 (G) 8.71 (G) 150, proprietary functional, compiled, acceptable
Python NO 1991 45 (G) 7.74 (G) 200? (p. lang. ref.) interpreted, huge bloat, slow, lightweight OOP, artificial obsolescence
POSIX shell well, "kind of" 1992 50, proprietary (paid) standardized (std 1003.2-1992) Unix shell, commonly e.g. Bash
Brainfuck yes 1993 100 (judg. by dbfi) 1 extremely minimal (8 commands), hard to use, esolang
FALSE yes 1993 1 very small yet powerful, Forth-like, similar to Brainfuck
Lua quite yes 1993 91 (G) 5.17 (G) 7K (LuaInLua) 40, free small, interpreted, mainly for scripting (used a lot in games)
Java NO 1995 2.75 (G) 21.48 (G) 800, proprietary forced OOP, "platform independent" (bytecode), slow, bloat
JavaScript NO 1995 8.30 (G) 105 (G) 50K (est. from QuickJS) 500, proprietary? interpreted, the web lang., bloated, classless OOP
PHP no 1995 23 (G) 6.73 (G) 120 (by Google), CC0 server-side web lang., OOP
Ruby no 1995 122 (G) 8.57 (G) similar to Python
C# NO 2000 4.04 (G) 26 (G) proprietary (yes it is), extremely bad lang. owned by Micro$oft, AVOID
D no 2001 some expansion/rework of C++? OOP, generics etcetc.
Rust NO! lol 2006 1.64 (G) 3.33 (G) 0 :D extremely bad, slow, freedom issues, toxic community, no standard, AVOID
Go kind of 2009 4.71 (G) 5.20 (G) 130, proprietary? "successor to C" but not well executed, bearable but rather avoid
LIL yes 2010? not known too much but nice, "everything's a string"
uxntal yes but SJW 2021 400 (official) 2? (est.), proprietary assembly lang. for a minimalist virtual machine, PROPRIETARY SPEC.
comun yes 2022 < 3K 2, CC0 "official" LRS language, WIP, similar to Forth

NOTE on performance data: the speed/mem. column says a benchmarked estimate running time/memory consumption of the best case (best compiler, best run, ...) relateive to C (i.e. "how many times the language is worse than C"). The data may come from various sources, for example the The Computer Language Benchmark Game (G), own measurement (O) etc.

TODO: add "relative speed" column, make some kinda benchmark program and say how many times each languages is slower than C

TODO: Tcl, Rebol

Interesting Languages

Some programming languages may be interesting rather than directly useful -- following this trail may lead you to more obscure and underground programming communities -- however these languages are important too as they teach us a lot and may help us design good practically usable languages. In fact professional researches in theory of computation spend their whole lives dealing with practically unusable languages and purely theoretical computers. Even a great painter sometimes draws funny silly pictures in his notebook, it helps build a wide relationship with the art and you never know if a serious idea can be spotted in a joke.

One such language is e.g. Unary, a programming language that only uses a single character while being Turing complete (i.e. having the highest possible "computing power", being able to express any program). All programs in Unary are just sequences of one character, differing only by their length (i.e. a program can also be seen just as a single natural number, the length of the sequence). We can do this because we can make an ordered list of all (infinitely many) possible programs in some simple programming language (such as a Turing machine or Brainfuck), i.e. assign each program its ordinal number (1st, 2nd, 3rd, ...) -- then to express a program we simply say the position of the program on the list.

There is a community around so called esoteric programming languages which takes great interest in such languages, from mere jokes (e.g. languages that look like cooking recipes or languages that can compute everything but can't output anything) to discussing semi-serious and serious, even philosophical and metaphysical questions. They make you think about what really is a programming language; where should we draw the line exactly, what is the absolute essence of a programming language? What's the smallest thing we would call a programming language? Does it have to be Turing complete? Does it have to allow output? What does it even mean to compute? And so on. If you dare, kindly follow the rabbit hole.

See Also


Powered by nothing. All content available under CC0 1.0 (public domain). Send comments and corrections to drummyfish at disroot dot org.