Saturday, June 4, 2011

Primacy of syntax

One of the most influential papers in the history of computer science is Christopher Strachey's 1967 "Fundamental Concepts in Programming" — lecture notes for the International Summer School on Computer Programming, Copenhagen.  Christopher Strachey basically created the academic subject of programming languages, and in those lecture notes you will find many core concepts laid out.  First-class objects.  Left-values and right-values.  Polymorphism.  And so on.

Although Strachey did write a finished version of the lecture notes for publication in the proceedings of the summer school, those proceedings never materialized — so copies of the lecture notes were passed from hand to hand.  (The modern name for such a phenomenon is "going viral", but it's surely been happening since before writing was invented.)  Finally, on the twenty-fifth anniversary of Strachey's untimely death, the finished paper was published.

By that time, though, I'd spent over a decade studying my cherished copy of the raw lecture notes.  The notes consist largely of abbreviated, even cryptic, non-sentences, but their pithy phrases are often more expressive than the polished prose of the published paper — and sometimes the ideas candidly expressed didn't even make it into the published paper.  A particular item from the lecture notes, that has stuck with me from the first moment I read it (about a quarter century ago, by now) is
Basic irrelevance of syntax and primacy of semantics.
This seems to me to neatly capture an attitude toward syntax that became firmly rooted in the late 1960s, and has scarcely been loosened since.  My own passion, though, within the academic subject Strachey created, is abstraction; and from my first explorations of abstraction in the late 1980s, it has seemed to me that abstraction defies the separation between syntax and semantics.  Sometime during the 1988/89 academic year, I opined to a professor that the artificial distinction between syntax and semantics had been retarding the development of abstraction technology for over twenty years — and the professor I said it to laughed, and after a moment distractedly remarked "that's funny" as they headed off to their next appointment.  Nonplussed by that response (I was still rather in awe of professors, in those days), to myself alone I thought, but I wasn't joking.

Abstraction

A source code element modifies the language in which it occurs.  Starting, say, with standard Java, one introduces a new class, and ends up with a different programming language — almost exactly like standard Java, but not quite because it now has this additional class in it.  That is abstraction, building a new language on top of an old one.
Abstraction:  Transformation of one programming language into another by means of facilities available in the former language.
What makes it abstraction, rather than just an arbitrary PL transformation, is that it uses the facilities of the pre-existing programming language.  (For those who prefer to trace their definitions back a few centuries, the key requirement that the new language be latent in the old —that it be drawn out of the old, Latin abstrahere— appears in John Locke's account of abstraction in his An Essay Concerning Human Understanding; the relevant passage is the epigraph at the top of Chapter 1 of the Wizard Book — and is also the epigraph at the top of Chapter 1 of my dissertation.)

An abstractively powerful programming language is, by my reckoning, a language from which one can abstract to a wide variety of other languages.  The precise formalization of that is a subtle thing; and worthy of a blog entry of its own, especially since my formal treatment of it, WPI-CS-TR-08-01, gets rather dense in some parts.  For the current entry, though, we don't need those details; the key point here is that the result of the abstraction is another language.  This is in contrast to the denotational approach to programming languages (which Strachey helped create, BTW, and which is visible in nascent form in the 1967 lecture notes):  denotationally, the programming language is essentially a function which, when applied to a source-code term, produces a semantic value of another sort entirely.
[Note:  I've since written an entry on abstractive power.]
The idea that semantic results are languages is a very powerful one.  The results of computations (and any other embellishment one wants) can be modeled by introducing additional sorts of terms that may occur in languages; the significance of each language is then fully represented by the possible sequences of terms that can follow from it.  But then, one doesn't really need the explicit semantics at all.  Only the sequences of terms matter, and one can define a programming language by a set of sequences of terms.  At that point, one could say that all semantics is being represented as syntax (which is a truism about semantics, anyway), or one could just as well say that semantics has vanished entirely to be replaced with pure syntax.

Lisp and syntax

Lisp has been described as a language with no syntax.  There is a sense in which that's true:  if by "syntax" one means "syntax for representing programs rather than data".  In primordial S-expression Lisp, the only syntax exclusively represents data.  (The way that happened —to remind— was that McCarthy had originally envisioned a second kind of syntax, M-expressions, representing programs, but he'd also described an algorithm for encoding M-expressions as S-expressions.  He expected to have years in which to polish details of the language since writing a compiler was then understood to be such a colossal undertaking, but in the meantime they were hand-coding specific Lisp functions — and then S.R. Russell hand-coded eval, and abruptly they had a working interpreter for S-expression Lisp.)

I believe, by the way, this is how Lisp should be taught to novices:  Teach them S-expression syntax first, set it firmly in their minds that such expressions are data, and only after that begin to teach them about evaluation.  Mike Gennert and I tried this approach with a class back in the spring semester of 1999/2000.  Over the duration of the course, we led the students through writing a Scheme interpreter in Java, starting with a level-0 "REPL" loop that was missing the "E" — it would read in an S-expression, and just write it out without evaluating it.  By the end of the term we'd added in proper tail recursion.  The experiment as a whole wasn't as successful as we'd hoped, because at the moment we tried it, the department's curriculum was in a state of flux, and many of the students didn't already know Java; but we didn't have the sorts of problems I've seen, or heard others describe, due to novice students failing to think in terms of evaluating S-expressions.

The connection to abstraction is immediate and compelling.  Abstraction is all about specifying how syntax will be used for subsequent program code, and the design of Lisp is focused on virtuoso manipulation of the very type of data structure (S-expressions) that is effectively the syntax of program code.  The more power Lisp gives the programmer to control how syntax will be interpreted, the more abstractive power accrues.  Since fexprs greatly expand the programmer's direct control over the interpretation-time behavior of the evaluator (short of the "throw everything out and start over" tactic of running a meta-circular evaluator — a tactic that lacks stability), fexprs should massively increase the (already formidable) abstractive power of Lisp.  That's why the subtitle of my dissertation is $vau: the ultimate abstraction.

Note: Polymorphism

Strachey's 1967 lecture notes are most known —and cited— for coining the term polymorphism.  Therein, he divided polymorphism into two forms:  parametric polymorphism, and ad hoc polymorphism.

The parametric/ad hoc distinction struck me as a logical bipartition of polymorphism; that is, every possible form of polymorphism would necessarily be either parametric or ad hoc.  Evidently Cardelli and Wegner did not interpret Strachey this way; their taxonomy placed "inclusion polymorphism" outside of both parametric and ad hoc.

It also strikes me that the name ad hoc reflects real disapproval.  This goes back to the "basic irrelevance of syntax" remark.  At heart, parametric polymorphism is semantic, ad hoc is syntactic; one might be tempted to call them "semantic polymorphism" and "syntactic polymorphism", or even "good polymorphism" and "bad polymorphism".  There is a close connection between my perception that parametric/ad hoc reflects semantic/syntactic, and my perception that parametric/ad hoc was meant to be exhaustive.

Although I expect parametric polymorphism should have greater abstractive power than ad hoc polymorphism, I don't like biasing terminology.  I'd like to see any formal results stand on their own merits.  So, years ago, I looked for an alternative to "ad hoc".  The best I came up with at the time:  selection polymorphism.