9. Web – tukoblog

The World Wide Web entered the world it is named after in the year 1993, though internal development (within particle-colliding CERN) began back in 1989. As this momentous event happened over three decades ago, it seems not quite right to call it a revolution. No revolution lasts that long. Perhaps we can better think of it as an ‘evolving revolution‘, its basic principles leading to a continuous sequence of revolutionary changes.

Whatever, no one would deny that the www has revolutionised not only computing but its namesake the world.

In computing terms, the www has changed the very nature of computing so that as web sites become more like programs, a program runs less and less for a computer but in a computer: for web sites run in a web browser that runs on a computer. A web site can be viewed on a PC, a tablet, a phone, a watch or a TV. Same site, different worlds. In the Overton Window of computing, this is perhaps nowadays the worlds wide web.

So what, we ask, were the origins of the www revolution?

Protocol + Language + Reader

There was an internet before the www. The TTY teletype was an interactive system, but much of the early internet was focused on files. Files could be exchanged via FTP (‘File Transfer Protocol’), communications could be exchanged via electronic-mail (in which files were sent to the specified email address) and in message boards, where messages (in other words, files) were sent to a message board server and read by message board members.

This was the environs in which our revolution took place, and the revolution revolved around a language (HTML, ‘Hypertext Markup Language’), a protocol (HTTP, ‘Hypertext Transfer Protocol’) and a reader (the browser). Underlying all these was hypertext.

Hypertext is built upon the brilliant insight that computer text is potentially more flexible than everyday text in a book or a newspaper. Computer text can link to other texts and thereby build up an information network. The problem before HTML lay in building up such a network. Hypertext software ran on a computer and the information network had to be contained within the hypertext program. This ensured that the network was always limited.

So to HTML and markup. We have already met with the markup language XML. This is derived from SGML (‘Standard Generalized Markup Language’), a markup language used to generate other markup languages. HTML (like XML) was generated from SGML.

HTML had two main features. First, it contained markup elements with which to create and format fairly sophisticated documents. Second, there was the anchor and its href (‘hypertext reference’), which is where we find the hypertext.

Here is a simple fragment of ‘early-HTML’ :

<!--
  The 'table' provides a flexible way
  of structuring page content.
-->
<table>
   <tr> <!-- table row -->
      <td> <!-- table data (ie table column) -->
        <!-- specify the look and feel of the text -->
        <font color="lime" face="Ariel">
           <!-- paragraph of text -->
           <p>Interesting fact:</p>
        </font>  
      </td>
      <td> <!-- second column in this table -->
         <!-- hypertext anchor with 'reference' (href) -->
         <a href="http://www.interesting.org/fact.htm">
            <!-- 'reference' (hyperlink) text -->        
            Hypertext!
         </a>
      </td>
   </tr>
</table>

The second plank of the web was HTTP. This was how .htm pages were delivered from the server to the client.

The third plank was the reader program for the markup code. This is now universally known as a browser.

The job of the browser was to read the HTML markup and display the page according to the markup layout and styling instructions. HTML allowed quite sophisticated layouts that were indistinguishable from a basic word-processed document. Above all, of course, the browser handled the hyperlinks within a page’s anchor tags. When the user clicked a link, the browser sent an HTTP request to the specified web address and the server duly responded with the requested page. This was read, parsed, dispatched and finally displayed in the browser, replacing the page that requested it.

It took a little while for the web to catch on because at first it suffered from the same restriction as the earlier hypertext programs, a limited information network. But whereas the limitation of the old programs was fixed, the web could expand indefinitely.

One phenomenon that quickly grew out of the web was the site. Over at ‘www.interesting.org’ and its facsimiles, it became apparent the hyperlinks could not only connect remote computers but also files on the local server. A traditional web site is in essence a set of linked files on a local server. Sites themselves could become informational networks devoted to specialist information. The informational network of the web as a whole grew exponentially in a short space of time, so much so that the www became known simply as ‘The Internet’.

Client Revolutions

There was room for improvement, however, and so began the relentless sequences of ‘evolutionary revolution’ that has characterised The Internet ever since the www began.

First, it quickly became apparent that the <font> tag was a mistake. Ideal for an individual web page or the small sites of the early web, it lead to a maintenance nightmare for larger sites. Changing the style of the site, or keeping styles consistent, were needlessly hard tasks.

The giants therefore invented Cascading Style Sheets (CSS). The revolution here was that styling a document was now essentially external to the documents (though styles could still be embedded if preferred).

With CSS, styling text became easy to do and far more maintainable :

p {
     color: lime;
}

Including this stylesheet in an HTML page would style every <p>aragraph. But as that is somewhat inflexible, stylesheets also have what are called (as per OOP) classes :

.lime-colour {
     color: lime;
}

. . .

<p class="lime-colour">Yuck!</p>

You can also style individual tags by applying a style to an HTML id (id’s must be unique to a page) :

#yuck {
     color: lime;
}

. . .

<p id="yuck">Yuck!</p>

You can even embed styles directly into a tag :

<p style="color: lime;">Yuck!</p>

The giants also introduced a second revolution with the ability to script web pages. Scripting was discussed in chapter two, so suffice it to say here that JavaScript activates web pages. The structure of the HTML is analysed into a DOM (‘Document Object Model’) and scripts operate on this DOM. Nodes can be added to and removed from the DOM using scripts, which can also modify any existing node. There is also a BOM (‘Browser Object Model’) that allows scripts to interact with the browser itself.

Another significant innovation was Ajax (‘Asynchronous JavaScript and XML’). This allowed scripts to download data from the web server. The reason this is a big deal is because it made server communication granular or perhaps even modular. Before Ajax, the browser requested web pages and the context of the web was therefore the page itself. With Ajax, the page could now fetch data for any part of the page from the server without reloading the page.

Pages could now be modified via either the client (by using script working purely within the browser itself) or the server (by making an Ajax call). This is illustrated in the code below :

<p id="placeholder">Placeholder Text</p>

. . .

<script>
     // this code searches for the above tag in the DOM
     // using its unique id (id's are marked with a hashtag)
     const placeholder = document.querySelector('#placeholder');

     // CLIENT-BASED mod (code runs on client)
     placeholder.innerHTML = getReplacementText(); // local js code

     // SERVER-BASED mod (code calls server via Ajax)
     placeholder.innerHTML = getReplacementTextFromAjax();
</script>

Server Revolutions

With Ajax, we arrive at the server. In chapter two we described the revolution introduced by the gateway on the server. This was the early term during the days of the CGI (‘Common Gateway Interface’) and a scripting language called Perl. The term ‘gateway’ is I think still useful as a metaphor, but is rarely used today when it is enough to refer to the ‘server’ or ‘server-side’.

Microsoft’s earliest attempt to control the server was ASP (‘Active Server Pages’, an archaic name dating from the time when the company was excited about the COM term ‘Active’, tying in as it did with the then-new ‘ActiveX’ technology). ASP was closely tied to Visual Basic. In ASP you would write code like this :

<p id="placeholder">
     <%
          Response.Write 'Replaced!'
     %>
</p>

Long-superseded by ASP.NET, in its time ASP marked a significant advance in server-side programming, for unlike the CGI and Perl dynamic duo, ASP was a unified environment, for example tying in with Microsoft’s ADO (‘ActiveX Data Objects’, that ‘active’ again) tech that allowed ASP pages to easily connect to databases.

In today’s world, we have reached the stage of .NET Core and ASP.NET MVC (‘Model View Controller’) or API (‘Application Programming Interface’) REST (wait for it, ‘Representational State Transfer’) sites.

Hold on, I’ll repeat that bit without the meaningless ‘keys’ to the TLA’s.

In today’s world, we have reached the stage of .NET Core and ASP.NET MVC or API REST sites. Because Core code is compiled from IL into the machine code of the host operating system, ASP.NET sites are no longer tied to Windows. Moreover, there is now a lightweight Core web server called Kestrel which can also be executed on the host OS. Kestrel runs in the host’s main web server and because Core sites work directly with Kestrel, not the main web server, they will work with any system Kestrel can. (Before Core and Kestrel, ASP.NET sites ran only on Windows machines and were mostly deployed to Microsoft’s main web server software, Internet Information Server.)

A .NET site can also choose the technology to power the site. MVC and API have been mentioned, but there is also React.js (a Facebook language that reimagines JavaScript in components), which in turn is often tied to TypeScript (a Microsoft language that reimagines JavaScript into a less wild coding habitus). What a web application is and does is becoming less clearly-defined. API sites do not necessarily have anything to do with either hypertext or HTML — they return data, often in JSON format, but also more sophisticated technologies such as GraphQL (another Facebook language).

MVC creates traditional HTML web sites but even so reimagines their creation. With MVC the actual HTML is now called the View. The Model is a C# class holding the data displayed in the view. The Controller handles requests for pages (that is, views) and is responsible for sending the right page back to the client (that is, the browser).

Here is an example of a controller :

public class MontyPythonController : Controller
{
     [Route("[controller]/[action]")]
     public IActionResult MontyPython()
     {
          MontyPythonModel model = new();
          model.Message = "spam spam spam spam spam";
          return View(model);
     }
}

The view in this example is called ‘MontyPython’ and illustrates the MVC naming convention. The controller is called ‘MontyPythonController’ and the model ‘MontyPythonModel’. The view will be named ‘MontyPython.cshtml’ (‘C# html’).

When a request for the Python page reaches the web server, it will be routed to the controller. This will create the model and pass it to the view, which is another server-side file that mixes client-side code (HTML, CSS, JavaScript) with server-side C#. Microsoft calls this hybrid technology Razor. It is Razor files that are given that ‘cshtml’ extension.

The ‘MontyPython.cshtml’ page/view might look something like this :

@model MontyPythonModel
@{

     <p>@Model.Message</p>

}

As you can see, the model has been passed to the view, which outputs the ‘spam’ message. Using this web application, any number of Python quotes could be sent to the view and the site could easily be expanded to include a FawltyTowers view, though please no not a Yellowbeard view.

The MVC architecture blurs, from the programmer’s point of view, the distinction between server and client :

<script>
     var x = 1; // Javascript
     @{
          var y = 1;  // Razor
     }
     
     // ERROR : y is here an undeclared js variable!
     x = y;
     
     // OK : but @y is only assigned on the server;
     // on the client, this is equivalent to x = 1
     // which may be what is intended but - the statement
     // looks like a variable assignment (x = y) and
     // client-side it is *not* a variable assignment
     x = @y; 
</script>

The gateway is still there in fact but is now almost invisible in code.

Another curious side-effect of this apparent hybridity is the appearance of the illusory concise code :

@{
   foreach(var stuff in @Model.Stuffs)
   {
      @stuff.Html;
   }
}

This server code looks concise, but what is in that ‘Html’ property? If the HTML is wordy and there are 100 or say 1,000 items — well if you go to ‘View page source’ (Chrome, or the equivalent in your favoured browser) you will get an unpleasant surprise.

Distributed / Cloud

Moving from the apparent merging of client and server, it is appropriate to finish with a brief note on distributed computing and the cloud.

A client/server relationship is traditionally between computer and computer. One computer is the client and the other is the server. In distributed computing, many machines are involved. How many may vary, but by definition it is ‘many’.

The latest web trend is programming for this distribution. Let us take for example a process distributed over three machines. X machine, Y machine, Z machine. Let us say that a key piece of data is the Turtle class (it is after all turtles, all the way down). There is a turtle called Bertha and the Turtle class stores all the information about Bertha. Should machines X, Y and Z share information about her? One influential view responds with an emphatic ‘No!’. This might lead to a loss of data, or perhaps worse corrupted data. Each machine needs to be separate from each of the others and in that way preserve the integrity of its data. Even if one or even two machines fails, the surviving information about Bertha on the third machine will have retained its integrity.

This loose and unlocalised distribution of data seems to be where at least a great part of computing is heading, a nebulous mass of machines anchoring The Internet. No one machine can afford to be an independent voice, it must be part of a larger choir. Data must retain its integrity, but it must do so inside the choir. Even to set ‘x’ to ‘1’ is a hard problem, for the simplest of data still needs data integrity.

A cloud, meanwhile, is precisely a nebulous mass of machines. Nebulous means ‘cloudy’. Data in the cloud is lost in the cloud. Programming is programming for the cloud. Programs run on any machine in the cloud.

What is programming? In the cloud, where the program itself leaps from machine to machine, perhaps this is the beginning of a new answer to the question of what is programming and the beginning of a new computing.

At this new beginning, then — we end.