Friday, December 20, 2013

Abstractive power

each extensible language is surrounded by an envelope of possible extensions reachable by modest amounts of labor by unsophisticated users.
Thomas A. Standish, "Extensibility in Programming Language Design", SIGPLAN Notices 10 no. 7 (July 1975) [Special Issue on Programming Language Design], p. 20.

I said in an earlier post I should blog about abstractive power "eventually".  This is it.

This material is very much a work in progress.  Throughout this post I'll emphasize insight and intuition — but while the post starts out non-technical, it will get more mathematical by increments as it goes along.  The relation between the math and the insights works both ways:  insights into abstraction guide the mathematical development, and the mathematical development is pursued partly in hopes of eventual further insights into abstraction.  I also hope, by presenting the mathematical development in an intuitive form (as opposed to the much drier form in my 2008 techreport) to get insights into the mathematical development.  The post ends, as does the current state of the work, with an elementary test case to show feasibility.

Contents
The idea
The goal
There is no semantics
The second derivative of semantics
Recasting expressiveness
Abstractiveness
Test case
The idea

The extensible languages movement peaked around 1970, and was on its way out when Standish wrote the above.  Extensibility enthusiasts had hoped, frankly, that by means of language-extension mechanisms it would become possible for everyone to use a single base language and transform it into anything anyone needed for any particular purpose.  Standish was noting that the extension mechanisms primarily used by the movement — macro preprocessors and perhaps the ability to add new syntax rules — had a limited range, after which it became quite difficult to extend the language further.

Macro preprocessing, in particular, cannot easily be used to build a series of extensions, one on top of another, because as extension follows extension, the programmer is rapidly overcome by accumulating complexity.  In order to use a macro, you have to be able to see whatever underlying facility the macro uses.  Thus, to add a second layer of macros to a base language, you have to understand and account for the base language and all of its first layer of macros; to add a third layer of macros, you have to understand the base language, the first layer of macros, and the second layer of macros; and so on.  The visibility of the underlying layers also limits how different the extended language can be from the base language.

The extensibility movement was supplanted by the abstraction movement, which had a more semantic focus, and came to be dominated — at least for a while — by the Object-Oriented Paradigm.  Something of the spirit of the new movement is visible in this remark on lexical scoping from Steele and Sussman's 1978 The Art of the Interpreter; or, The Modularity Complex (p. 24):

What is interesting about this is that we can write procedures which construct other procedures.  This is not to be confused with the ability to construct S-expression representations of procedures; that ability is shared by all of the interpreters we have examined.  The ability to construct procedures was not available in the dynamically scoped interpreter.  In solving the violation of referential transparency we seem to have stumbled across a source of additional abstractive power.

Abstraction helps with the problem of accumulating complexity, because you can — at least, ideally — use an extension without having to worry about all the details of what underlies it.  There is still some accumulated complexity, though.  I noted this in my earlier blog post on types:

In mathematics, there may be several different views of things any one of which could be used as a foundation from which to build the others.  That's essentially perfect abstraction, in that from any one of these levels, you not only get to ignore what's under the hood, but you can't even tell whether there is anything under the hood.  Going from one level to the next leaves no residue of unhidden details: you could build B from A, C from B, and A from C, and you've really gotten back to A, not some flawed approximation of it that's either more complicated than the original, more brittle than the original, or both.
The central point of that blog post is that typing, which is evidently meant to help us manage complexity, can easily become itself a source of complexity.  The same danger applies to other tools we use to manage complexity; the tools become effectively part of the language, and are thus added complexity and subject to accumulation of further complexity.

Another four decades of experience (since Standish's post-mortem on the extensible languages movement) suggests that all programming languages have their own envelopes of reachable extensions.  However, some languages have much bigger, or smaller, envelopes than others.  What factors determine the size, and shape, of the envelope?  What, in particular, can a programming language designer do to maximize this envelope, and what are the implications of doing so?

As a useful metaphor, I call the breadth of a language's envelope its radius of abstraction.  Why "abstraction"?  Well, consider how the languages in this envelope are reached.  Starting from the base language, you incrementally modify the language by using facilities provided within the language.  That is, the new (extended) language is drawn out from the old (base) language, in which the new language had been latently present.  (Latin abs-, "out", and trahere, "pull/draw")  The terminology of layers of abstraction, in programming, goes back to the 1970s.  One also finds this use of abstraction in philosophy, for drawing out something latently present; here's a passage from Locke's 1689 An Essay Concerning Human Understanding (you may recognize this, as it's quoted at the front of the Wizard Book):

The acts of the mind, wherein it exerts its power over its simple ideas, are chiefly these three :  (1) Combining several simple ideas into one compound one ; and thus all complex ideas are made.  (2) The second is bringing two ideas, whether simple or complex, together, and setting them by one another, so as to take a view of them at once, without uniting them into one ; by which way it gets all its ideas of relations.  (3) The third is separating them from all other ideas that accompany them in their real existence : this is called abstraction : and thus all its general ideas are made.
Our programming-language use of the term abstraction does take a bit of getting used to, because we usually expect something abstracted to be smaller than what it was abstracted from.  The abstract of a paper is a short summary of it.  An abstract thought has left behind the details of concrete instances — though it seems the abstract thought may be somehow "bigger" than the more concrete thoughts from which it was drawn.  In our case, the extended language is probably equi-powerful with the base language, and therefore, even if some specific implementation details are hidden during extension, the two languages still feel as if they're the same size.  This is not really strange; recall Cantor's definition of an infinite set — a set whose elements can be put in one-to-one correspondence with those of a proper subset of itself.  Since we rarely work with finite languages, it shouldn't surprise us if we abstract from one language another language just as "big".

Ironically, though, the reason we're discussing this at all is that, despite our best efforts, the extended language is smaller than the base in the sense that its "envelope" of reachable extensions is smaller.  We'd really rather it weren't smaller.

What general principles govern radius of abstraction?  My first candidate is smoothness, a term I borrowed from M. D. McIlroy, one of the founders of the extensible languages movement.  I mean by it the property of a language that its abstractive facilities apply to the language in a free and uniform way.  This concept is also close kin to Strachey's first-class objects, and van Wijngaarden's orthogonality.  I proposed the following principle in my dissertation:

(Smoothness Conjecture)  Every roughness (violation of smoothness) in a language design ultimately bounds its radius of abstraction.
When a base language contains a defect of smoothness, I suggest, successive extensions magnify the defect, creating unbounded complexity that drags down the programmer.

The goal

The Smoothness Conjecture is a neat expression of a design priority shared by a number of programming language designers; but, looking at programming language designs over the decades, clearly many designers either don't share the priority, or don't agree on how to pursue it.

What, though, if we could develop a mathematical framework for studying the abstractive power of programming languages — a theory of abstraction.  One might then have an objective basis for discussing design principles such as the Smoothness Conjecture, that to date have always been largely a matter of taste.  I wouldn't expect the Smoothness Conjecture itself to be subject to formalization, let alone proof, at least not until the study of the subject reached quite a mature phase; but the Conjecture may inspire any number of more specific claims that could then be weighed objectively.

This, in my humble opinion, would be very cool and, as an added bonus, immensely useful.

It is not, however, a short-term goal.  For a ballpark estimate, say people have been pursuing abstractive power (under whatever name) since the founding of the extensible languages movement, circa 1960.  When I got into the game, it had already been going on for about three decades.  The more extreme OOP advocates were making claims for it that could have been lifted nearly verbatim from the more extreme extensibility advocates of two decades earlier, and by my assessment then (understandably unpopular with the OOP advocates) there was still more we didn't know than that we did.  I didn't expect to tie it all up quickly; but I'm still excited about the prospects, because every few years my thinking on it has moved (forward, I hope) slightly — and by my estimate of the difficulty, any progress at all is a very encouraging sign.

There is no semantics

Shortly after I started thinking on abstractive power, Matthias Felleisen's classic paper "On the Expressive Power of Programming Languages" was published (in the proceedings of ESOP '90), and I was encouraged by this evidence that I wasn't the only person in the world crazy enough to try to mathematize traditionally informal aspects of language design.  Felleisen's treatment has been quite a successful meme in the years since, and has some features of interest for abstraction theory — both features that apply to abstractive power, and features that offer insight because of why they don't apply to abstractive power.

Felleisen's expressiveness works roughly thus:  A programming language is a set of programs together with a partial mapping from programs to semantic values.  Language A can express language B if there is a simple way to rewrite B-programs as A-programs that preserves the overall pattern of semantics of programs — not only the semantic values of valid programs, but which programs are valid, i.e., halt and which are not valid/don't halt.  (In this case, the overall pattern of behavior is sufficiently captured by the pattern of halting/not-halting, so one might as well say there is just a single semantic value, or technically replace the "partial mapping from programs to semantic values" with a "subset of programs designated as halting".)

That is, A can express B when there exists a decidable function φ mapping each B-program p to an A-program φ(p) such that φ(p) halts iff p halts.  A can weakly express B when φ(p) halts if p halts (but not necessarily only if p halts).

How readily A can express B depends on how disruptively φ is allowed to rearrange the internal structure of p.  Felleisen's paper particularly focuses on the class of "macro", a.k.a. "polynomial", transformations φ, which correspond to Landin's notion of syntactic sugar.  Each syntactic operator σ in language B is replaced by a polynomial (a "macro", or "template") σφ in language A; thus,

φ(σ(e1, ... en))  =  σφ(φ(e1), ... φ(en))
When φ is of this form, one says A can macro-express B.

I described the criterion for A can express B as preserving the "overall pattern of semantics of programs".  I meant to suggest that this is more than each individual mapping p ↦ φ(p) preserving semantics; it involves preserving, across mapping φ, how the shape of program texts affects their semantics.  This preservation-of-shape is more apparent when considering macro-expressiveness, which demands similarities of program shape between p and φ(p), because this implies that the similarities and differences between φ(p1) and φ(p2) would be akin to the similarities and differences between p1 and p2; but it's not clear that polynomial/macro rewriting would be the only useful measure of similarity of program shape.  (Cf. Steele and Sussman's 1976 Lambda: The Ultimate Imperative.)  To explore more general aspects of expressiveness, one might parameterize the theory by what class of transformations are allowed.

In preparing for an analogous treatment of abstractiveness, the first thing to recognize is that while expressiveness views each program as inducing a semantic value, abstractiveness views each program as inducing a programming language.  When comparing two B-programs p1 and p2, we don't just ask whether they induce the same programming language, because they almost certainly do not.  Rather, we want to compare the induced programming languages to each other, probably using some measurement at least as sophisticated as expressiveness.

Consider what this means for the definition of programming language.  Picture a base language as the center of a web of languages connected by directed arrows —abstractions— each arrow labeled by a program text.  The whole thing is a sort of state machine, where the states are languages, the state transitions are abstractions, and the input "alphabet" is the set of program texts.  We could also integrate semantics into this model, by adding transitions labeled with reserved "observable" terms — and then there isn't really any need for the states of the machine at all.  Everything we could ever want to know about a given programming language is contained in the set of all possible sequences of labels on paths starting from that language; so we might as well define a language to be that set of sequences.  That is,

(D1)  A programming language over set of terms T is a set of sequences of terms P ⊆ T* such that for all sequences of terms x and y, if xy ∈ P then x ∈ P.
This approach also appeals to the recognition that although computation theory tends to look only at computations that halt, a great many of our software processes are open-ended.

This purely syntactic, unbounded view of programming languages is foundational.  The expectation of halting — what one might call the terminal-semantic assumption — is ubiquitous:  the assumption, hardwired into one's core definitions, that a computation is meant to get an answer and stop.  Denotational semantics is a terminal-semantic model.  Theory of computation, and complexity theory, are founded on the terminal-semantic assumption.

To my mind, an essential difficulty with the terminal-semantic approach is that, patently, it prefers to disregard properties that relate to unbounded sequences of future developments.  Abstractive power is directly concerned with such sequences, but one suspects all computation really should take them into account, as most macroscopic software processes are interactive (in one or another sense) and open-ended rather than self-contained and merely producing a final result.  (Cf. Dina Goldin et al.)

The unbounded-syntax approach does not, of course, really "eliminate" semantics; but it does cause semantics to become a dependent concept, grounded in syntax.  For abstraction theory as I'm currently developing, semantics is a set of sequences of terms; in RAGs (from my master's thesis), semantics is a nonterminal symbol of a grammar.  (In a modern treatment of RAGs I'd be inclined to replace the term "metasyntax" with "co-semantics"; but I digress... sort-of.)

Note:  I've described RAGs in a later blog post, here.
The second derivative of semantics

The expressiveness of a programming language —what we ask φ to conserve when mapping B to A— is about the contours of the overall pattern of semantics of programs.  That is, it's about how variations in the text of a program change the semantics induced by the program; in short, expressiveness is the first derivative of semantics.

The abstractiveness of a programming language —which we will want conserved when we assert that one language is "as abstractively powerful" as another— is about how variations in the text of a program change the expressiveness of the language induced by the program.  Thus, as expressiveness looks at variations in semantics, and is in this sense the first derivative of semantics, abstractiveness looks at variations in expressiveness, and is thus the second derivative of semantics.

As I've set out to mathematize this, I've found the treatment becomes rather off-putting-ly elaborate (a trend I mean to minimize in this post).  I observe a lesser degree of the same effect even in Felleisen's paper, which was apparently trying to stick to just a few simple ideas and yet somehow got progressively harder to keep track of.  Some of this sort of thing is a natural result of breaking new ground:  appropriate simplifications may be recognized later.  I omitted one substantial complication from my description of Felleisen's treatment, that I suspect was simply a consequence of techniques he'd inherited from other purposes.  However, I've also introduced one new complication into expressiveness, namely parameterization by the class of transformations allowed — and I'm about to introduce a second complication, as I adapt Felleisen's treatment to my unbounded-syntax strategy.

Recasting expressiveness

In adapting expressiveness to the unbounded-syntax definition of programming language (D1), the first consideration is that a mapping φ between two languages of this sort has to respect the all-important prefix structure of the languages:

(D2)  For programming languages P and Q, a morphism from P to Q is a function φ : P → Q that preserves prefixes.
That is, for every xy ∈ P, there exists z such that φ(xy) = φ(x)z.  We write P/x for the language reached from language P by text sequence x ∈ P; and similarly, when φ : P → Q, we write φ/x for the corresponding morphism from P/x to Q/φ(x).  Thus,  P/x = { y | xy ∈ P }  and  φ(xy) = φ(x) (φ/x)(y).

As remarked earlier, we parameterize expressiveness relationships by the class of allowable morphisms.  We won't allow arbitrary classes of morphisms, though.

(D3)  A category (over programming languages with terms T) is a set C of morphisms between languages over T, closed under composition and including the identity morphism of each language over T.
For given terms T, some useful categories:
Any = category of all morphisms.
Map = category of morphisms that perform a term transformation uniformly, φ(t1...tn) = φ(t1)...φ(tn).
Macro = category of map morphisms whose term transformation is macro/polynomial.
Inc = category of inclusion morphisms, φ : P → Q with φ(x)=x
Id = category of identity morphisms, φ : P → P with φ(x)=x
Where Felleisen could avoid complicated semantic values by relying on halting behavior, we need a set of observable terms, O ⊆ T.
(D4)  Morphism φ : P → Q respects observables O if
  • φ maps observables, and nothing else, into those same observables — that is, (φ/x)y=o iff y=o — and
  • φ transforms each derived language into a language with exactly the same observables — that is, o∈(P/x) iff o∈(Q/φ(x)).
Morphism φ : P → Q weakly respects observables O if it satisfies all the conditions for respecting observables O except that o∈(P/x) implies o∈(Q/φ(x))  (rather than implication going both ways).
ObsO = category of all morphisms that respect observables O
WObsO = category of all morphisms that weakly respect observables O
Our basic expressiveness relation is then
(D5)  For category C and languages A and BA can C-express B for observables O if there exists φ : B → A with φ ∈ C ∩ ObsO.
We say A is as C,O-expressive as B.  Weak C,O-expressiveness uses WObsO in place of ObsO.  Evidently, the as C,O-expressive as relation is transitive, as is its weak variant.  Expressiveness implies weak expressiveness (because ObsO ⊆ WObsO).

There is a simple theorem that the expressiveness relation is preserved if the category is made bigger, and if the set of observables is made smaller.  That is, if A is as C1,O1-expressive as B, category C1C2, and observables O2O1, then A is as C2,O2-expressive as B.

Note:  Although the notation here is simplified from the techreport, it still poorly handles the codomain of a derived morphism, producing such un-self-explanatory expressions as Q/φ(x).  Belatedly I see that by adopting for φ: P → Q the additional notation φ(P)=Q, one would have the more mnemonic φ/x : P/x → φ(P)/φ(x).
Abstractiveness

As expressiveness depends on φ : B → A preserving the semantic landscape of B, abstractiveness should depend on φ preserving the expressive landscape of B.  A semantic landscape for B arose from specifying a set of observables.  An expressive landscape for B arises from specifying a category of expressiveness morphisms by which to compare different languages B/x.

If it wasn't entirely certain what category to use when comparing expressiveness (hence the introduction of parameter C into the expressiveness relation, where Felleisen's treatment only looked at two choices of C), the choice of expressiveness category becomes much more obscure when considering abstractiveness.  Note in Standish's remark the phrase by modest amounts of labor; dmbarbour has remarked that multiple degrees of difficulty are of interest, and this suggests that no one choice of category will give us a full picture.  One might even imagine that a φ : B → A of one category would map expressive relations on B of a second category onto expressive relations on A of yet a third category.  The point is that, without prior experience with this mathematics, we have no clear notion which of the myriad imaginable relations we want to look at.  We hope that experience working with it may give us insight into what special case we really want, but meanwhile the endeavor seems to require a lot of parameters.  It would look rather silly to have an abstractiveness relation with, say, four parameters, so we redistribute the parameters by bundling the choice of expressive category with the language.

(D6)  A programming language with expressive structure over terms T is a pair A=〈S,C〉 of a programming language S over T with a category C over programming languages with terms T.
We call these expressive languages for short.  We may write seq(A) and cat(A) for the components of expressive language A.

For φ to preserve the expressive landscape from B to A is more involved than preserving the semantic landscape.  The expressive landscape of B=〈S,C〉 is a web of relations between languages derived from S.  Suppose gC with g : S/x → S/y.  Through φ : 〈S,C〉 → 〈R,D〉, derived languages S/x and S/y correspond to R/φ(x) and R/φ(y).  Moreover, we already have three morphisms between these four derived languages:

g : S/x → S/y
φ/x : S/x → R/φ(x)
φ/y : S/y → R/φ(y)
If only we had a fourth morphism hD with h : R/φ(x) → R/φ(y), we could ask these four morphisms to commute, h ∘ φ/x = φ/y ∘ g.
(D7)  For expressive languages P and Q,  a morphism from P to Q is a morphism φ : seq(P) → seq(Q) such that
  • for all g ∈ cat(P) with g : seq(P)/x → seq(P)/y, there exists h ∈ cat(Q) with h ∘ φ/x = φ/y ∘ g, and
  • for all x,y ∈ seq(P), if there is no g ∈ cat(P) with g : seq(P)/x → seq(P)/y, then there is no h ∈ cat(Q) with h : seq(Q)/φ(x) → seq(Q)/φ(y).
For category C and expressive languages A, BA can C-express B for observables O if there exists φ : B → A with φ ∈ C ∩ ObsO.
Note:  I've corrected a typo in this definition:  the second condition read for all x,y ∈ seq(Q) rather than seq(P). (24 April 2014)
We say A is as C,O-abstractive as B; and weak C,O-abstractiveness uses WObsO in place of ObsO.  The as C,O-abstractive as relation (and its weak variant) is transitive, and is preserved by making C bigger or O smaller.  Abstractiveness implies weak abstractiveness (because, again, ObsO ⊆ WObsO).  The techreport proves several other rather generic theorems; for example, if A is as C,O-abstractive as B, and cat(A) respects observables O, then cat(B) respects observables O.

When the expressive languages get their expressive structure from the identity category, C,O-abstractiveness devolves to C,O-expressiveness.  That is, for all languages A and B, category C, and observables O,  〈A,Id〉 is as C,O-abstractive as 〈B,Id〉  iff  A is as C,O-expressive as B.

Test case

An obvious question at this point is, now that we've got this thing put together, does it work?  In the techreport, my sanity check was a test suite of toy languages, chosen to minimally capture the difference between a language in which local declarations are globally visible, and one in which local declarations can be private.  The wish list at the end of the techreport, of results to try for, was much more impressive — I targeted encapsulation of procedures; hygienic macros; fexprs; and strong typing — but the sanity check, as the only example thus far actually worked out in full detail, has some facets of interest.

In language L0, records are declared with constant fields, and the fields can then be queried.  When a query occurs in a text sequence, the next text in the sequence reports the value contained in the queried field; these reports are the observable terms.  Language Lpriv is identical except that individual fields may be declared "private"; private fields cannot be queried, though they can be used when specifying the value of another field in the same record.

It's easy to transform an Lpriv text sequence into a valid L0 sequence:  just remove all the "private" keywords from the field declarations.  This is a macro/polynomial transformation, and all the queries that worked in Lpriv still work and give the same results in L0.  Therefore, L0 is weakly as Macro-expressive as Lpriv.

Whether or not L0 is as Macro-expressive as Lpriv (without the "weakly" qualifier) depends on just how one sets up the languages.  Each query is a separate text in the sequence, and the only text that can follow a query is a report of the result from the query.  Expressiveness hinges on whether or not it is permissible to query a field that is not visible; and the reason this matters is that although strong expressiveness requires each derived language B/x map into a derived language A/φ(x) that has no extra observable terms, it does not require that there be no extra non-observable terms.  To see how this works, suppose u ∈ Lpriv is a sequence of declarations, after which query q1 is valid but q2 attempts to query a private field.  Following u, the result of q1 is r1, and the result of q2 would be r2 if the field were made public.  Let v be the result of removing all "private" keywords from u.

In the techreport, a query is not permitted unless its target is visible.  Therefore, Lpriv sequences include uq1 and uq1r1, but not uq2L0 sequences include vq1, vq1r1, vq2, and vq2r2.  But expressiveness does not object to vq2r2 ∈ L0 unless there is some x ∈ Lpriv such that φ(x)=vq2 but xr2 ∉ Lpriv.  Since there is no such x, the observable r2 in vq2r2 causes no difficulty.  Although it is true that vq2 ∈ L0 even though uq2 ∉ Lpriv, this does not interfere with expressiveness because q2 is not an observable.  The upshot is that in the techreport, L0 is as Macro-expressive as Lpriv.

Alternatively, one could allow queries regardless of whether they're valid; this is essentially a "dynamic typing" approach, where errors are detected lazily.  In that case, uq2 ∈ Lpriv; the allowance of the observable r2 in L0 is then a violation of strong expressiveness, and L0 is not as Macro-expressive as Lpriv (although it's still weakly as Macro-expressive).

What about Macro-abstractiveness?  Given the relative triviality of the languages, I chose to use weak category Inc for the expressive structure of the languages:  〈L0,Inc〉 and 〈Lpriv,Inc〉.  L0 ⊆ Lpriv, so using the inclusion morphism φ(x)=x from L0 to Lpriv, evidently 〈Lpriv,Inc〉 is as Inc-abstractive as 〈L0,Inc〉; and consequently, as a weaker result, 〈Lpriv,Inc〉 is as Macro-abstractive as 〈L0,Inc〉.

As for the other direction, we've already observed that L0 is as Macro-expressive as Lpriv, which is to say that 〈L0,Id〉 is as Macro-abstractive as 〈Lpriv,Id〉.  So if 〈L0,Inc〉 isn't as Macro-abstractive as 〈Lpriv,Inc〉, the difference has to be in the additional expressive structure.  This expressive structure gives us a g : L/x → L/y when, and only when, L/x ⊆ L/y.  Therefore, in order to establish (contrary to our intention) that 〈L0,Inc〉 is as Macro-abstractive as 〈Lpriv,Inc〉, we would have to demonstrate a MacroObsO morphism φ : Lpriv → L0 that preserves all of the subsetting relationships between derived languages in Lpriv.

For example, the transformation remove-all-"private"-keywords, which gave us Macro-expressiveness, will not work for Macro-abstractiveness, because it washes out subsetting relationships.  Suppose u is a sequence of declarations in Lpriv that contains some private fields, and v is the result of removing the "private" keywords from u.  Since v can be followed by some things that u cannot, Lpriv/u is a proper subset of Lpriv/v.  However, the remove-"private" transformation maps both of these languages into L0/v, which evidently isn't a proper subset of itself; so remove-"private" doesn't preserve the expressive structure.

In fact, no Macro morphism will preserve the expressive structure.  Consider a particular private field declaration, call it d.  When d occurs within a record declaration in an Lpriv sequence, it can affect later expansions of the sequence, through its use in defining other, non-private fields of the same record, changing the results of queries on those non-private fields.  The only thing that can affect later queries in L0 is a field declaration; therefore, a Macro morphism φ must map d into one or more field declarations.  But if d is embedded in a record declaration that doesn't use it, it won't have any effect at all on later queries — and the only way that can be true in L0 is if φ(d) doesn't declare any fields at all.  So there is no choice of φ that preserves the expressive structure (which is to say, the subsetting structure) in all cases.

So 〈Lpriv,Inc〉 is strictly more Macro-abstractive than 〈L0,Inc〉.