4. Towards A Language – tukoblog

In the next chapter, we will at last breach the world of programming and look at a real programming language, C#. This is one of the most highly-regarded programming languages around today. However, before we get to C# itself, the aim of the current chapter is to place it in a wider context as to 1) what a programming language is and 2) in particular what a ‘C-like’ programming language is.

Algorithms

I finish this section with a second metaphor for an algorithm, but let me begin with a first: an algorithm is a script and code that uses an algorithm is an actor. Code reads algorithms like scripts, for an algorithm is a canonical way of solving a problem. If you use an algorithm you are wise to follow it to the letter.

Take as an example problem a computer screen. It is filled up with lines and each line needs to be drawn onto the screen and drawn fast. To solve the problem, enter the algorithm.

The problem here being? Lines can be described algebraically with a linear equation and that equation can be mapped to cartesian coordinates. These equations though are decimal things, not integers, and decimals use up more time and space than integers when they are being processed. Linear equations are not ideal for computer graphics.

Enter Jack Bresenham in the year 1962, Jack being one of those ‘little giants’ who figured out one of the key pieces that became a secure part of Ymir the world giant. He invented the ‘Line Algorithm’ named after him, a method of drawing a line using only integers. His key insight was to see that pixels are effectively the cartesian coordinates of the computer display. The problem of line drawing is therefore getting to pixel Y from pixel X. The algorithm, in a nutshell, goes from pixel to pixel, nudging the line in the right direction as it goes. If we advance for example from coordinate { 300 : 300 } to { 301 : 300 }, we need only a simple integer addition to move forward.

The algorithm is an important one. So many lines are drawn onto the computer screen every second that shaving off even a tiny amount of processing time adds up (or in this case subtracts down) and this illustrates a basic principle in programming: the more times an action is repeated the more need there is for speed, and after a certain threshold the need becomes a requirement.

Algorithms can be considered as the cartilage of programming, flexing its joints and helping it run. Fundamental algorithms form part of most programs because they are integrated into the frameworks used in every program. Behind the scenes, a simple line of code such as

DrawLine(300, 300, 400, 400)

will be using at least a variant of Jack Bresenham’s algorithm.

Assembly

While algorithms can provide a global optimisation, assembly language can optimise individual tasks. It offers a local optimisation.

Although code written in high-level programming languages ends up as machine code via a compiler, compilers are generalists. Of course they aim to create good clean assembly code but it is obviously quite a task to make that code the most optimised possible in every case. Enter the specialist assembly language programmer. The ALP has a huge advantage over the compiler because they can focus on the specific problem at hand. In the past a good ALP was a better bet than any compiler. For music or audio processing, say, an ALP was a must in order to optimise the code as much as possible.

However, as we know by now the giants of Ymir are always busy and among them are the giants of the compiler workshop. As a result, today’s compilers are masterpieces of entwork. Only a supreme assembly language programmer could hope to improve on the assembly code produced by a modern compiler. There are still edge cases where a human can still outshine the giants, but generally speaking only a fool would chance their hand these days at assembly code in the hope that their code would improve on the compiler’s code.

In the far off days of old though, low-level was an achievable aim. So, C.

C

C, it should not be surprising, is based on a language called ‘B’ which in turn was based on a long-forgotten ‘A’ language. C was developed in 1972 by one of the giants of computing, Dennis Ritchie (he even got to be on first-name terms with Ymir who although he only has one name likes to be called Mr. Ymir by the likes of us). Anyway, C is undoubtedly one of the great programming languages.

A basic C program looks like this :

int main() {
  // do stuff
}

As you familiarise yourself with programming, you will immediately recognise this as a ‘C-like’ language by those {} curly braces. C is a high-level language with a deliberately concise syntax. Before C, programming languages tended to aim for a chatty English-like appearance that made the code, it was thought, more readable.

Here is a simple Pascal example (Pascal being a language designed by Niklaus Wirth in 1970) :

function Add(X, Y: Integer): Integer;
var
   Result: Integer;
begin
   Result := X + Y;
end;

Here is the much more concise C equivalent :

int add(int x, int y) {
  return x + y;
}

C is of particular interest here as it is a high-level programming language that goes low-level with a feature called ‘pointers’, which constitute a first class part of the language. Pointers look like this :

int* ptr = &value;

A pointer points to a memory location on the computer and is how C bridges the gap between a high and a low-level language. So in our example, &value ‘points’ to a memory location on the computer. *ptr points in turn to &value. Pointers allow the programmer to do assembly-language type things in a high-level language and to communicate with the machine at the lowest level. (Pointers also have the uncanny ability to make C code look like Martian.) Anyway, C had all the high-level stuff other languages had, but it was pointers that made it C.

In the long run though, what with imitation being the sincerest etc, it is a testament to C that it became not just a language but a type of language. A lot of people it seems just liked those curly braces.

C-type Languages

C-type languages include C++, Java (and its successor Kotlin), JavaScript and C#. That is an impressive collection of descendants.

Here is our add code in JavaScript :

function add(x, y) {
  return x + y;
}

Here is a Kotlin equivalent :

fun add(x: Int, y: Int) {
  return x + y;
}

Spot the difference?

C itself began all of a sudden to be old when a group of giants invented the object. We will discuss objects in detail in the next chapter, but for now we can say that an object is a way of grouping code into the expression of a thing. A common example used to explain objects is an Animal. In the real world things tend to be types of things and so a Mammal is a type of animal and a Cat is a type of mammal. So, animal <> mammal <> cat.

If we code for an object called ‘Animal’, we can then create another object called ‘Mammal’ that inherits from the ‘Animal’ object. A ‘Cat’ object can likewise inherit from the ‘Mammal’ object (and therefore indirectly from an Animal object).

C does not do objects, so Bjarne Stroustrup invented a C-like language called C++ that did and does do objects. Here are our objects in C++ :

class Animal{
};

class Mammal: public Animal {
};

class Cat: public Mammal {
};

Class? An object is a thing. A class is what defines an object. So in your code you would create a Cat object that you had defined as a class. As an analogy, we can say that the FA Cup Final is played between ‘two teams’, but a specific final between ‘Liverpool’ and ‘Arsenal’. Here ‘team’ is a class, ‘Liverpool’ an object.

C++ is still today a go-to language because it does all of the low-level C things while offering the advantages of objects. But it is a verbose language. Masses of code are required to do common tasks. It is still popular, but marginal and many would say ‘showing its age’ (like C itself).

A big issue with C and C++ is memory. These languages — this is of course due to their facilitation of low-level coding, also their great strong point — require careful memory management. All computer programs have commands and data and all data points to a memory location. What about when the program no longer needs the data? At that point, data should be dereferenced. In C and C++, programmers are responsible for deallocating memory. With this responsibility comes great power but does the average program need this? If not, why must the poor programmer needlessly endure a responsibility which is so obviously a recipe for disaster?

Enter Java (designed by James ‘the giant’ Gosling) and meet the garbage collector. In a Java program, all memory references are logged by the Java ecosystem. The ecosystem regularly checks if data is being referenced. Every so often, dereferenced memory is declared as ‘garbage’ in a great sweep (a ‘garbage collection’) and is cleared away. The programmer, therefore, does not have to worry about ‘leaking’ memory. Java does it all for you.

Another innovation of Java is that it is not compiled into machine code (or assembly language). Java runs via the ‘Java Virtual Machine’ that exists in a world of bytecode. Bytecode is Java-specific and is essentially a generic form of assembly language. It is this bytecode that is compiled to machine code. What this means is that, because bytecode is universal, Java can run on any operating system. The bytecode is compiled into the machine code of whatever system is running, be it Windows, Mac or Linux.

Java, then, is a C-like language that is nothing like C. It is not a hybrid low-high-level language or a replacement assembly language. It is a modern language. Or was. Even Java is now ‘showing its age’.

Kotlin is the new Java, and is a language strongly influenced by C#. Kotlin is I think a big improvement over Java and is certainly a good deal more concise. The interesting thing about Kotlin here is that it is compiled (the term is transpiled) into Java, indicating the profound changes in computing technology over the years :

THEN    C -> machine code
NOW     Kotlin -> Java -> bytecode -> machine code

Those are the C’s, then. Next let us take a look at .NET, the natural habitus of C# that first entered the world at the beginning of 2002.

.NET and its Ancestors

The story of .NET begins, in this narrative at least, with the release of Visual Basic way back in 1991. This was a landmark release that revolutionised software development — WYSIWYG programming, who’d’a’thunkit? What you got with VB was a language (VB itself) and a designer. You created a screen (called a ‘form’) and then wrote code to respond to button clicks and checks and mouse moves and so on. You would also, of course, write non-user-interface code for all the general functionality that your program required. The clue to the intent of VB lies in the language chosen — BASIC, the more or less toy language often used to teach children how to program. VB was not a toy and it was not BASIC, but it was a limited language. It was a brilliant piece of technology — revolutionary, as I said — but the limitation was all too obvious.

Two years later in 1993, Microsoft released the first version of Visual C++. This, the company seemed to say, was for proper programmers. As we have seen, C++ is a C-like language equally at home as a low- or a high-level of coding. VC++ also shipped with the MFC (‘Microsoft Foundation Classes’), its helper-library. Still, helper-library or not, VC++ is verbose. As a low-level language, the argument went, that was a necessary burden for the programmer and VC++ did indeed offer a low-level of control inconceivable with VB.

By 1993, then, Microsoft had the over-simplified VB and the over-complicated VC++. So, meet Delphi from Borland, released two years after VC++ in 1995 together with its VCL (‘Visual Component Library’). To my mind, this is one of the finest pieces of software ever released that situated itself in the perfect middle space between VB and VC++. Rather than BASIC, Delphi used Pascal. No one would deny that Pascal is a better general-purpose language than BASIC. The Delphi dialect was Borland’s own Object Pascal. While VB was object-based its implementation of ‘Object-Oriented Programming’ (to be discussed later) was limited. Object Pascal was fully-OOP. Moreover, Delphi had the beautifully-designed VCL, which aimed to cover about 80% of development needs, leaving the power of the Object Pascal (OP) language to cover the rest. Finally, there was a RAD (‘Rapid Application Development’) screen designer tied in to a well-thought-out hierarchy of screen (‘form’) objects. The major player behind all this was another of the all-time giants, Anders Hejlsberg. Poached by Microsoft, he went on to design C#.

Lastly, a brief mention of COM (‘Component Object Model’), another Microsoft technology that first appeared in 1992. This technology tried to solve the problem of sharing data and functionality between Windows programs. Using COM, me.exe could as it were ‘merge’ with you.exe, and ‘me’ code run inside ‘you’ code. COM was clever, COM worked, but it was not pleasant to work with. Like VB, it was kind of good and kind of bad in equal measure.

A question worth asking, then, is ‘Why didn’t Delphi sweep the board?’ Delphi was, at least it seems to me, so clearly superior to either VB or VC++ (though VC++ could always make a plausible claim to be better at C-type stuff) that it ought to have done more. Well, that Borland was a much smaller company than Microsoft must have been a factor. But I wonder if another issue was with Pascal, which is not a mainstream language. Although Pascal is a very clean language, and the VCL is very well-designed, a Delphi program is necessarily much more verbose than one written in a concise C-like language.

When the .NET Framework appeared, any Delphi developer would recognise the VCL in WinForms. In a sense, .NET was a supercharged Delphi. But there at its core was the clean and concise C#, not the verbose Pascal.

.NET, m’okay?

I’ve always thought, since its first release, that .NET is a magnificent technology. I also remember being greatly surprised. At first glance, it seemed to me a mix of Java, COM, VB and Delphi. But, unlike COM and VB, it was good. Not just ‘it works’, or ‘loads of devs use it’. It was elegant through and through.

Like Java, .NET Framework languages didn’t touch machine code. Instead of bytecode, .NET used the Intermediate Language (IL). However, .NET was not a ‘universal’ platform. It was Windows-only. The purpose of IL was to unify all the .NET languages. Whatever language you used to code in, the result was always IL. Any language could be used in any project.

The .NET Framework was a world of assemblies each of which was all IL. The framework libraries were assemblies, your program was an assembly, everything was an assembly. Everything now being IL, COM was redundant. .NET did the job of COM, but did it far better. Now me.exe and you.exe, both rewritten in .NET, could communicate seamlessly with each other. Both me.exe and you.exe were, after all, assemblies and both therefore IL.

// this code is in me.exe
using yousoft.you; // this is the you.exe assembly

. . .

// me.exe is calling code in you.exe
mecode = youexe.foo();

. . .

// this code is in you.exe
using mesoft.me; // this is the me.exe assembly

. . .

// you.exe is calling code in me.exe
youcode = meexe.foo();

C# at first glance looked like Java, another C-like language spitting out code to a virtual machine, but it wasn’t. The IL was Windows-only. Curiously though, with the newer .NET Core, things are still IL but now IL is much like Java’s Virtual Machine. With .NET Core, IL can now compile into Windows, Mac and Linux machine code and so .NET programs (should we call them ‘apps’ nowadays?) can run on Macs and Linux boxes. This has certainly expanded .NET’s reach, though I suspect the root cause of this is the ever-decreasing range of Microsoft’s own reach.

.NET today is a mature (quarter of a century-old!) platform that (with .NET Core) is adapting to a rapidly-changing development environment, what with the Internet and the tablets and phones and watches all encroaching in on Microsoft’s safety-zone, the Windows PC. The monolithic Framework is gone, replaced by a deliberately modular architecture. The general principle nowadays is that, if you need to foo something, you just choose the tool you think foos best. This might be Gengal or Broobant or Woolp or any of the other foo frameworks out there. The one stable thing though seems to be C#. The .NET Core pretty much is C# these days, which sounds to me like a lead-in to the next chapter.

Summary

C-like languages begin with the eponymous ‘C’, a sort of low-high-level hybrid. Then ‘Hello, objects!’ and objects came to C via the good offices of C++. Next came Java, a language with a C-like syntax that removed the low-level stuff. Last not least is C#, the core language of .NET, a language that builds in one way or another on each of these earlier C’s.