Monday, March 5, 2018

Thoughts on Jaynes's Breakdown of the Bicameral Mind

It is one of those books that is either complete rubbish or a work of consummate genius, nothing in between!  Probably the former, but I'm hedging my bets.
— comment about Jaynes's The Origin of Consciousness in the Breakdown of the Bicameral Mind in Richard Dawkins's The God Delusion, 2006.

I've just read Julian Jaynes's 1976 book The Origin of Consciousness in the Breakdown of the Bicameral Mind, and here I'm posting my thoughts; built on roughly the structure of, though wider-ranging than, a book review.

This book engages three of my particular interests, deeply entangled in the instance so that they come as a package.  I'm interested in the evolution and nature of the human mind, which of course is Jaynes's subject matter.  I'm also interested in how to read a forceful presentation of a theory without missing its fault lines.  And I'm interested in how best to present an unorthodox theory.  (I've touched on all three of these in various past posts on this blog.)

To be clear:  I enjoyed reading Jaynes's book; I think he's glimpsing something real though it might not be quite what he thinks it is; and I think his book, and his ideas, are worth studying.  Keep those things in mind, moving forward through this post.  My interests will cause me to emphasize criticisms of Jaynes's theories, I'll be trying to assemble a coherent alternative to contrast with Jaynes's theories, and with all that going on in this post the positive aspects of my assessment might get a bit buried.  But I wouldn't be paying such close attention to Jaynes if I didn't see his work as fundamentally deserving of that attention.

When studying any forceful presentation of a theory, there is risk of joining the author in whatever traps of thinking they're caught in.  The best time to scout out where the traps/fault lines are (take your pick of metaphors) is on first reading.  That's true of both orthodox and unorthodox theories, btw, indeed it's a common challenge for orthodox theories, where the traps must be easily overlooked for the theory to have achieved orthodoxy.  Unorthodox theories are sometimes presented with markers that make them sound crazy, in which case the larger challenge may be to avoid underestimating them; but a strong unorthodox presentation, without craziness markers, — such as Jaynes's — can also contain hidden traps, and moreover the reader has to distinguish genuine traps from, so to speak, legitimate unorthodoxy.

Hence my reading Jaynes slowly and cautiously, jotting down whatever notes came to mind as I went along.

It wouldn't be difficult for Jaynes's basic theory to sound crazy; Dawkins has a point there.  At its baldest, Jaynes's big idea is that until about four thousand years ago, human beings didn't have a conscious mind, but instead had a self-unaware left brain that took orders from hallucinated gods generated by their right brain — the bicameral mind.  You don't want to dive right into the thick of a thing like that, and Jaynes doesn't do so.  He builds his case slowly, so that as he adds pieces to the puzzle it's clear how they fit.

I see no need to choose between completely accepting or completely rejecting Jaynes's ideas, though.  There seems room for Jaynes to be seeing some things others have missed, while missing some factors that lead him to a more-extreme-than-necessary explanation of what he sees.  This particularly works if one has a suggestion for what Jaynes might be missing; and I do.  I have in mind broadly memetics, and particularly the notion of verbal society which I suggested on this blog some time back and have revisited several times, notably [1], [2].

As for a work of consummate genius, well, that depends on one's view of genius.  If it's possible for a work to be a masterstroke regardless of how much of it is right or wrong, then, why not?  It's easy, when the Iliad says that someone did something because a god told them to, to say, oh, that's a poetic device; but in an academic climate where "poetic device" is the standard explanation, it takes something special to say — seriously, and with extensive scholarly research to back it up — that maybe, when the Iliad says a god told them to do something, the Iliad means just what it says.

Contents
Caveats
Background
The book that Jaynes wrote
Commentary
Storytelling
Caveats

When seeking to show an audience the plausibility of a paradigm scientific theory, it's common to point out things that are consistent with the theory.  However, if you're trying to show plausibility of a highly unorthodox scientific theory (the sort whose opponents might call "lunatic fringe"), imo that technique basically doesn't work.  My reasoning has to do with contrast between rival theories.

Imagine I've got a large whiteboard, with nothing written on it.  (When I was in high-school, it would have been a blackboard; and some years from now perhaps it'll be some sort of giant touchscreen technology.  At any rate, it's big; say at least a yard/meter high and wider than it is high, perhaps a lot wider.)  The points on this whiteboard are possible explanations for things; that is, explanations that we could, in principle, entertain.  I draw on it a small circle, perhaps the size of the palm of my hand.  The points inside the circle are tried and true sorts of scientific theories; we have repeatedly used experiments to test them against alternative explanations, and in that way they have earned good reputations as solid, viable explanations.  So if one of those well-reputed explanations works very well for some new thing you're looking at, it's a credible candidate to explain that thing.

What if none of the explanations in that small circle works for the new thing you're studying?  I'll draw a larger circle around that one, maybe four times the diameter.  Points inside this larger circle are explanations we have thought of, even if they seem quite loony.  There are flying saucers in there, and Sasquatches, and ancient aliens visiting Earth to build pyramids.  But they're all explanations that we've thought of, even if we didn't think highly of them.  Even the strangest among them, though, may have some people who favor them.  And when we've got a new thing that doesn't afford an explanation in the smaller circle, but it could be explained by, say, ancient pyramid-building aliens (to take a vivid example), some people will claim that's evidence for ancient pyramid-building aliens.

Except, it isn't evidence for ancient pyramid-building aliens.  It's consistent with ancient pyramid-building aliens, but ancient pyramid-building aliens don't have the earned reputation of things in the smaller circle.  Remember, those orthodox explanations earned their reputations through experiments that contrasted them against alternatives.  But when none of those orthodox explanations works for this new thing, and ancient pyramid-building aliens does work for the new thing, what alternative theories should we be considering?  Presumably, anything that has as much repute as ancient pyramid-building aliens.

Uhuh.

And this is why I've made these circles much smaller than the whole whiteboard.  The points in the larger circle are explanations we have thought of; but most of the whiteboard is outside that circle, and all that larger outside is explanations that we could consider, but we haven't thought of them.  And really, we don't know how much of that vast array of explanations we haven't thought of might be (if we thought of it) at least as well reputed as ancient pyramid-building aliens.

The moral of the story, it would seem, is that if you're studying a really unorthodox explanation, and you want to be able to say something stronger than just that it would suffice to explain the phenomenon, you should work at finding alternatives.

I don't mean to lambaste Jaynes for not coming up with alternatives; Jaynes was pulling off a profoundly impressive feat by coming up with one solid unorthodoxy, it's hardly fair to complain that he didn't come up with several.  But it does seem that however many facts he finds to be consistent with his unorthodoxy, one ought not to interpret that as support for the unorthodoxy, as such.  Throughout my reading of Jaynes, I kept this sort of skepticism in mind.

Another sort of trap for the unwary researcher in areas relating to the mind —orthodox or no— is highly abstract terms that really don't mean at all the same thing to everyone.  (The same sort of problem may arise in religion, another area with really extraordinarily abstract terms.)  I experienced this problem myself, some years ago, when reading Susan Blackmore's The Meme Machine.  Through most of the book I felt Blackmore seemed pretty much on-target, until I came to her chapter on the self; and when I hit that chapter it was quickly clear that something was going horribly wrong.  Suddenly, I found Blackmore saying things that on their face (the face presented to me, of course) were obviously, glaringly false.  And not just saying them, reveling in them.  She was quite excited, after having believed all her life that she had a self, to realize that the self did not exist.  This struck me as beyond silly.  If she was so sure she didn't have a self, who did she imagine had written her book?

I didn't take this to be, necessarily, a mistake by Blackmore; it didn't feel that way, though there wasn't any other explanation that felt compellingly right either.  But not chalking it up to a mistake by Blackmore did not in any way change the overt falsity of what she was saying.  Hence my initial phrasing, that something was going horribly wrong.

After considerable puzzling (about a week's worth), I worked out what was going wrong.  It wasn't a problem with the concepts, neither on Blackmore's part nor mine.  It was a problem with the word "self".  Susan Blackmore had believed all her life in... something... and was quite excited to realize that that something did not exist.  But she called that something "self".  And that thing, that she called "self", was something I had never believed in to begin with.  I had always used the word "self" to mean something else.  So when she said she had realized that the self does not exist, to me she was denying the existence of something quite different from what she intended to say did not exist.  I think she was denying the existence of what Daniel Dennett would call the audience of the Cartesian theater — which Dennett spent much of his classic book Consciousness Explained debunking.

The moral here would seem to be, don't assume that other people mean the same thing you do by these sorts of highly abstract words. 

Those two potential traps came to mind for me pretty quickly when I started reading Jaynes.  Another, more content-specific, trap occurred to me a few chapters into the book.  There is a well-known (in some circles) phenomenon that medical students, as they learn about various diseases, start worrying that they themselves may be suffering from those diseases.  I've inherited a story of someone remarking, about an instance of this phenomenon, "Just wait till they start studying psychiatry."  Well.  Jaynes was a psychologist.  There's this tendency to think in terms of pathologies.  And it seemed to me, as I got into the thick of the book, that Jaynes was placing undue weight on pathological states such as schizophrenia.  Without that emphasis, it seemed, one should be able to formulate a theory in the same general direction as Jaynes was exploring, without going to the extreme he went to (his bicameral man).

Background

Jaynes is concerned with the development of consciousness over time, and, peripheral to that, the development of language over time.

Some major milestones of human development, more-or-less agreed upon:

  • About two and a half million years ago, stone tools appear.  Start of the paleolithic (old stone age).
  • About forty or fifty thousand years ago, give or take, there is an explosion in the variety of artifacts.  Art, tools for making tools, tools with artistic flare, tools for making clothing, etc.  Start of the upper paleolithic (late stone age).
  • About ten thousand years ago (your millennium may vary), human agriculture begins.  Start of the neolithic (new stone age).
  • About four thousand years ago, the first writing appears.  This is a bit after the neolithic (perhaps a thousand years) and into the Bronze Age.
  • About 2500 years ago, around the time of Plato, science and philosophy blossom in ancient Greek civilization.  Eric Havelock proposed that this is when ancient Greek society passes from orality to literacy.
According to Havelock's theory, the shift from oral society, in which knowledge is founded on oral epics such as the Iliad, to literate society in which knowledge is founded on writing, profoundly changes the character of human thinking.  Modern Afghanistan has been suggested as an example of orality.

To Havelock's theory, I've proposed to add a still earlier phase of language and society, preceding orality, which I've tentatively called verbality.  My notion of what verbality might look like has been inspired by the Pirahã language lately studied by Daniel Everett as recounted in his 2008 book Don't Sleep, There Are Snakes: Life and Language in the Amazonian Jungle.  In particular, amongst many other peculiar features of the Pirahã, their culture has no art or storytelling, while their language has no tense and no numerical or temporal vocabulary.  It seems perfectly reasonable that the Pirahã would not be typical of verbality, since it's typical for a verbal culture to have vanished many thousands of years go; but I see it as a demonstration of possibility.  It may be significant for Jaynes's theory that Everett describes a Pirahã group hallucination.

I don't have a good handle on just what precipitated the ancient transition from verbality to orality — although, if one speculates that the story of the expulsion from Eden might be, in some part, a distant memory of the verbality/orality transition, it may have been pretty traumatic.  However, I do have a timeframe.  If verbality does not support art, one would expect the transition to be clearly marked by the appearance of art; besides which, I expect a dramatic acceleration of memetic development starting at the transition; so, I place the end of verbality and start of orality circa forty thousand years go, at the beginning of the upper paleolithic.

Once orality starts, about forty thousand years ago, it would then be necessary to work out increasingly effective ways to tell stories.  It seems likely to have been a very difficult and slow process; one would, on reflection, hardly expect ancient humans to immediately shift from not telling stories at all to great epics.  I'm guessing that writing, which didn't show up for about thirty six thousand years, was a natural development once the art of storytelling reached a certain level of maturity.  I really hadn't thought about oral society struggling to develop the art of storytelling, though, until I started reading Jaynes.

The book that Jaynes wrote

Jaynes begins with a chronological rundown of theories of consciousness.  This is good strategy, as it places his ideas solidly into context, allows the reader to see him doing so, and allows him to be seen considering alternatives, which helps not only the credibility of the theory, but also of Jaynes himself; not incidentally, as proponents of unorthodoxy need to be seen to be well-informed and attentive.  On the downside, his treatment of individual past theories tends to make light of them — although, I notice, on at least one occasion some chapters later, he acknowledges having just used such a tactic, suggestive that he perceives it as a perfectly valid stylistic mode and not something to be taken too much to heart.  I think he'd come across better by showing more respect for rival theories; at any rate, it's my preference.

His rundown of past theories seems likely to suffer from a problem, such as I described earlier, with the highly abstract term consciousness.  He's quite clear that these different theories are saying different things, but he appears to assume they are all trying to get at a single idea.  The difficulty might also be described in terms of Kuhnian paradigms (which I've discussed often on this blog, e.g. [3], [4]).  Amongst the functions of a paradigm, according to Kuhn, it determines what entities exist, what sorts of questions can be asked about them, and what sorts of answers can be given.  So, while the different paradigms Jaynes describes are all searching for truth in the same general neighborhood, one should expect that some of the variance between them is not merely about what answer to give to a single common question that all of them are pursuing, but about what question is most useful to ask.  As a reader, I struggled to deduce, from how Jaynes presented these past theories, just what question he wanted to answer; and I was still working on pinning that down after I'd finished the book.  His own notion of consciousness is, to my understanding, substantially about narratization, essentially telling a story about the self.  This notion of the self as a character in a story told by the mind seems fairly close to what I think of as self (as opposed to what Susan Blackmore apparently used to think of as self before studying memetics); and is clearly an application of storytelling (the thing that, by my hypothesis, is missing from verbality).

He is a good writer; reading his prose is — at least if you're interested in the subjects he's discussing — perhaps not a page-turner but nonetheless interesting rather than oppressive.

Following the introduction, he divides the work into three parts — Books I, II, and III — addressing the nature of the bicameral mind (Book I); the evidence he sees, burning across history, of the bicameral mind and its progressive breakdown (Book II); and the remnants of bicameralism he sees in our modern state (Book III).  He added a substantial Afterword in 1990, apparently when he stopped lecturing at Princeton, at the age of 70, and seven years before his death.

Jaynes's central idea is that for some time leading up to about 4000 years ago, human minds functioned along different lines than the narratization-based consciousness we experience today.  Instead, the human mind was, in Jaynes's terminology, bicameral.  The left brain (more properly the hemisphere opposite the dominant side of the body, but most people are right-handed) handled ordinary stuff, and when additional oversight was needed, the right brain provided a hallucination of someone telling the left brain what to do.  These hallucinations were perceived to be gods; or rather, in Jaynes's framework, by definition they were gods.  One illustration he mentions, from the Iliad, has an angry Achilles asking Agamemnon to account for his behavior, Agamemnon says a god told him to, and Achilles just accepts that.  The way Jaynes talks about these gods often makes them sound as if they were coherent beings, which struck me as an overestimation of how much coordination a civilization would likely be afforded simply by its population being bicameral.  Jaynes portrays a nation of bicameral humans as extraordinarily well-coordinated (in terms that sometimes seem to flirt with group selection, a particular pet peeve of Richard Dawkins that he spent most of his book The Selfish Gene debunking).

Jaynes's notion of bicamerality is extensively tied to his ideas about human language.  The area of the brain ordinarily responsible for language is on the left side of the brain but the corresponding right-side structure is largely unused; he figures that right-hand structure is where hallucinated voices came from.  His general view of the differing functions of the hemispheres is largely in line with, if distinctly more cautious than, the pop-psychology notion of analytic left brain and synthetic/artistic right brain (apparently the pop-psychology view had just gotten started a few years before Jaynes's book came out).  He has some specific notions about the stages by which human language developed, which I didn't fully absorb (too detailed, perhaps, to pick up while struggling with the big picture of the book on a first reading), though apparently he sees metaphor as key to the way full-blown human language works in general.  In a passage that stuck in my mind, he says that his linguist friends tell him human language is very old, stretching far back in the paleolithic (I've read estimates from a hundred thousand years all the way back to the start of the paleolithic at two and a half million years); he suggests this is implausible because things ought to have moved along much faster if language had been around during all that time, and instead he proposes language only started at the beginning of the upper stone age, forty thousand years ago.

He dates the start of bicamerality to the onset of agriculture, at the paleolithic/neolithic boundary, circa ten thousand years ago.  His reasoning (to the best of my understanding) is that to make agriculture work required coordination of large groups, and this coordination was achieved via bicameral gods.  For some thousands of years (nominally, about six thousand) this worked well, but then the world got more stressful, partly due to increasing population through agriculture enabled by bicamerality, and the gods couldn't keep up, forcing the development of the new regime of consciousness.

Commentary

Jaynes seems to me to be operating at a disadvantage.  Drawing inspiration from something he's familiar with, and viewing history through the lens of his individual perspective, he sees a pattern that he finds compellingly evident in history.  It seems — from my individual perspective — that a less extreme explanation for the historical evidence might well be formulated; but the less extreme explanation I see uses tools that weren't available yet when Jaynes was developing his theory.  Jaynes draws inspiration from his knowledge of the phenomenon of schizophrenics taking orders from hallucinations; which imho really is a delightfully bold move to shake up an orthodoxy that, like most orthodoxy, could do with a good shake-up.  But, Jaynes's book was published in the same year with Dawkins's The Selfish Gene, which coined the word meme.  It's easy to look back from four decades later and say that memetics can account for profound changes in population dynamics on a scale that Jaynes felt needed a radical hypothesis like bicamerality; but you can't stand on the shoulders of giants who haven't arrived yet.

Jaynes is concerned primarily, of course, with the breakdown of the bicameral mind, over time starting about four thousand years ago; he has little to say about the upper paleolithic, the thirty-thousand-or-so years from the start of language (by his reckoning, or the start of orality by mine) to the onset of bicamerality (by his reckoning, when the technological practice of agriculture was developed).  His later discussion of modern hallucinations describes them as vestiges of bicamerality, which rather begs the question of whether humans in the upper paleolithic had hallucinations.  The example of the Pirahã suggests to me that hallucinations —and language— were already part of the human condition even before the upper paleolithic.  (An interesting question for further consideration is whether the Pirahã's group hallucinations were non-linguistic.)

My own preference is for less radical transitions (consistent with Occam's razor).  Jaynes may be underestimating how much of qualitative consciousness can exist without narratization in the modern sense; how much of language can exist without support for art or storytelling; how far social structure may be determined by what is believed without involving any fundamental change in how belief is processed by the mind.  He also appears, in particular, to be underestimating how loosely organized the modern "conscious" mind is.  His view of consciousness is monolithic (something I particularly noted when he began to discuss schizophrenia in Book III).  Recall the atomic notion of self, which Susan Blackmore described rejecting after having previously believed in it.  If the self is a character in a story we tell ourselves, then the mind that's telling the story was never really atomic in the first place, and we needn't expect a mind that tells such a story to be fundamentally differently organized than one that doesn't tell such a story.  If hallucinations are somewhere within the penumbra of normal human mental functioning (and to my non-psychologist's eye it seems they may bear some kinship to narratization), it's possible for such phenomena to have had changing roles in society over the millennia without requiring a traumatic shift to/from a bicameral mind.

Another major pitfall he's at risk for concerns interpretation of evidence.  Our perception of the distant past is grounded in physical evidence, but we have to build up layers on layers of interpretation on it to produce a coherent picture, so that what we actually see in our coherent picture is almost all interpretation.  That leaves tremendous scope for self-fulfilling expectations in the sort of reasoning Jaynes is doing, where he reconsiders the evidence in light of his theory to see how well it fits.  Some of his remarks reveal he's aware of this, but still, there it is.  When he talks about how the meaning of a word changed over time, one should keep in mind that this is how he supposes it changed over time; the actual evidence is only written words themselves, while all the meanings involved are couched in a vast network of guesses.

The distinction between supportive evidence and consistent evidence is not absolute; it depends on how distinctive the evidence is — how much it calls for explanation.  This needs care when applied at scale.  Jaynes, in particular, examines a great pile of assorted evidence.  When the theory intersects with a sufficient mass of evidence, just being consistent with so much begins to seem impressive; but really one has to sum up over the whole mass, and the sum of very many data points can still be zero; it depends on the data points.

One doesn't want to give an unorthodox theory credit for explaining things that hadn't needed explaining.

Sometimes an explanation seems warranted.  Jaynes remarks of the Iliad that it never describes human bodies as a whole, but rather as collections of parts, and that the same trend is visible in visual art of the time; though that seems open to an explanation in terms of evolving technology for storytelling, it doesn't seem gratuitous to ask for some explanation of it.  Another point that gave me pause was his claim that the extraordinarily easy Spanish conquest of the Inca Empire was because the Inca Empire was bicameral, with the entire population following the dictates of their bicameral gods; though I didn't find it an altogether compelling case for his explanation, that chapter in history is odd enough that orthodox explanation isn't entirely at ease with it either.

Jaynes as a whole, though, gave me some general sense of unnecessary explanations.  He sees evidence of hallucinations where I see unremarkable phenomena (such as "Houses of God") that may be consistent with his theory but don't need it.  In Book III he is particularly keen on the idea that modern humans look for authority to take the place of the bicameral gods they have been deprived of; he sees a quest for bicameral authority in our attitude toward science (he's missing the difference between science and religion, btw, which may in part follow from predating memetics but I still found unsettling), and even sees the same quest for authority in our enjoyment of stage magic; but I have never felt that people looking for authorities to follow needed explanation.  I figure it's a basic behavioral impulse with some evolutionary value, rather like the impulse to be fair to others, or the impulse to hate people who don't belong to one's own social group (a very mixed bag, our basic behavioral impulses).  Yet more broadly, throughout the book he presents religion as a remnant of bicamerality.  Admittedly, this may come under the heading of things he missed by predating memetics; I now react to it by thinking, religion is neatly explained by evolution of memetic organisms — which ties in to the verbality/orality hypothesis — but I only made that evolutionary connection myself in the mid-1990s (earlier post).

Occasionally, in Jaynes's efforts to fit his theory to know facts, he encounters facts that don't fit easily.  Overall, this happens to him only sporadically.  He is aware that demonic possession doesn't fit his model, and tries to make it fit anyway.  He finds himself reaching to explain why poetry and music, which he maintains are remnants of bicameralism, still exist — which wouldn't be a problem if he hadn't started by hypothesizing they were remnants of a radically different type of mind rather than being phenomena within the normal range of the sort of mind we now have.

Storytelling

I look forward — after I fully digest my first reading of Jaynes — to a second reading.  My particular objective on a second reading would be to consider in detail how the evidence he claims for his bicamerality storyline fits with my verbality/orality storyline.  This objective wouldn't have been possible on the first reading, as I was too busy struggling to grok the overall shape of what he was saying; in fact, though I'd been accumulating thought fragments throughout his book, it wasn't until Jaynes's Afterword that I realized, in a definite Aha! moment (my notes pinpoint it at the top of page 458), that the key concept in relating Jaynes's theories with mine is storytelling, which underpins Jaynes's notion of consciousness and my notion of the verbality/orality transition.  So, as part of that full digestion, following is the more elaborated form that my theories have achieved from their first pass by Jaynes.

My narrative timeline, as it now stands (yes, theorizing is itself storytelling, which in this case feeds into the story being told since it implies that the advent of storytelling would produce a tremendous acceleration of human intellectual development), starts with the transition from verbality to orality at the beginning of the upper paleolithic.  Speculatively, this cultural transition may coincide in language development to the introduction of one or both of the two devices mentioned above as missing from Pirahã:  time, and numbers.  Jaynes's ideas about consciousness are rather close to those two factors, as well.  Once the orality-threshold device is introduced, whatever it is, there is a distinct expansion of human activity.

If the start of the upper paleolithic is when orality starts, it's a long time before the period Jaynes primarily discusses, as his breakdown of the bicameral mind starts only about four thousand years ago.  The intervening thirty six thousand years, be the same more or less, would have to be accounted for by the very slow process of inventing the art of advanced storytelling.  As mentioned above, Jaynes has little to say about this period.  He reckons language only began where I'm placing the verbality/orality transition, at the start of the upper paleolithic, and he (iirc) briefly describes a series of stages in the development of language that would have taken place during the upper paleolithic before the fully developed device of language catalyzed the emergence of the bicameral mind and the neolithic.  Some of Jaynes's language stages would likely precede storytelling, but certainly a second reading should carefully examine these stages in case some of them offer some inspiration on storytelling after all.  On the other hand, if he is indeed overestimating how much of consciousness must postdate his bicameral era, his timeline for the development of consciousness starting four thousand years ago might, on careful examination, be mapped more widely onto the entire oral period from (nominally) forty thousand to twenty five hundred years ago.

After the verbality/orality transition, the next specific event in my timeline is the emergence of writing, the point at which, by my conceptual framework, the art of storytelling exceeds a critical threshold enabling it to support the written form.  This coincides with Jaynes's start of the breakdown of the bicameral mind, four thousand years ago.  Jaynes's bicameral age is for me the late part of the larger oral period prior to emergent writing; his bicameral age might well be plausibly reinterpretable as a phase in the development of storytelling, perhaps something milder than but similar to bicamerality, though quite what that would be is unclear (and might stubbornly remain unresolved even after a second in-depth reading).

The period from the advent of writing onward is intensively covered in Jaynes's book, and wants close reconsideration from top to bottom.  Several complications apply.

Reinterpretations are likely to be steep in this period, with a wide conceptual gap.  In Jaynes's framework, bicamerality is an absolute state of mind with power to direct ancient empires, while religions are pale echoes of it; in mine, bicamerality is expected to fall within the normal operating range of the human mind (though perhaps not a part of the range commonly exercised in the modern era), while religions are memetic organisms with the power to direct ancient empires.

I remarked earlier on the treacherous nature of physical evidence with multiple layers of interpretation built on it.  A particular complication here is that Jaynes is judging what people think by how they describe their experiences, but I am hypothesizing that throughout the entire period people were trying to figure out how to describe their experiences, and in particular I'm guessing that explaining one's own thoughts was especially hard to figure out; so that the further back in time you go, the less people's descriptions reflect their inner life.

Judging by the above rough sketch of a timeline, the Iliad as we know it — even after compensating (or trying to) for mutation between being composed and being written down — should already represent an extremely advanced stage of storytelling, chronologically about seven eighths of the way from the onset of storytelling toward the present day.  Hopefully, a close second reading can use the depth of Jaynes's treatment to conjecture intermediate steps in the long evolution of advanced storytelling.

Tuesday, February 13, 2018

Sapience and non-sapience

DOCTOR:   I knew a Galactic Federation once, lots of different lifeforms so they appointed a justice machine to administer the law.
ROMANA:  What happened?
DOCTOR:   They found the Federation in contempt of court and blew up the entire galaxy.
The Stones of Blood, Doctor Who, 1978.

The biggest systemic threat atm to the future of civilization, I submit, is that we will design out of it the most important information-processing asset we have:  ourselves.  Sapient beings.  Granted, there is a lot of bad stuff going on in the world right now; I put this threat first because coping with other problems tends to depend on civilization's collective wisdom.

That is, we're much less likely to get into trouble by successfully endowing our creations with sapience, than by our non-sapient creations leaching the sapience out of us.  I'm not just talking about AIs, though that's a hot topic for discussion lately; our non-sapient creations include, for a few examples, corporations (remember Mitt Romney saying "corporations are people"?), bureaucracy (cf. Franz Kafka), AIs, big data analysis, restrictive user interfaces, and totalitarian governments.

I'm not saying AI isn't powerful, or useful.  I'm certainly not suggesting human beings are all brilliant and wise — although one might argue that stupidity is something only a sapient being can achieve.  Computers can't be stupid.  They can do stupid things, but they don't produce the stupidity, merely conduct and amplify it.  Including, of course, amplifying the consequences of assigning sapient tasks to non-sapient devices such as computers.  Stupidity, especially by people in positions of power, is indeed a major threat in the world; but as a practical matter, much stupidity comes down to not thinking rationally, thus failing to tap the potential of our own sapience.  Technological creations are by no means the only thing discouraging us from rational thought; but even in (for example) the case of religious "blind faith", technological creations can make things worse.

To be clear, when I say "collective wisdom", I don't just mean addressing externals like global climate change; I also mean addressing us.  One of our technological creations is a global economic infrastructure that shapes most collective decisions about how the world is to run ("money makes the world go 'round").  We have some degree of control over how that infrastructure works, but limited control and also limited understanding of it; at some point I hope to blog about how that infrastructure does and can work; but the salient point for the current post is, if we want to survive as a species, we would do well to understand what human beings contribute to the global infrastructure.  Solving the global economic conundrum is clearly beyond the scope of this post, but it seems that this post is a preliminary thereto.

I've mentioned before on this blog the contrast between sapience and non-sapience.  Here I mean to explore the contrast, and interplay, between them more closely.  Notably, populations of sapient beings have group dynamics fundamentally different from — and, seemingly, far more efficacious from an evolutionary standpoint than — the group dynamics of non-sapient constructs.

Not only am I unconvinced that modern science can create sapience, I don't think we can even measure it.

Contents
Chess
Memetics
The sorcerer's apprentice
Lies, damned lies, and statistics
Pro-sapient tech
Storytelling and social upheaval
Chess

We seem to have talked ourselves into an inferiority complex.  Broadly, I see three major trends contributing to this.

For one thing, advocates of science since Darwin, in attempting to articulate for a popular audience the profound implications of Darwinian theory, have emphasized the power of "blind" evolution, and in doing so they've tended to describe it in decision-making terms, rather as if it were thinking.  Evolution thinks about the ways it changes species over time in the same sense that weather thinks about eroding a mountain, which is to say, not at all.  Religious thinkers have tended to ascribe some divine specialness to human beings, and even scientific thinkers have shown a tendency, until relatively recently, to portray evolution as culminating in humanity; but in favoring objective observation over mysticism, science advocates have been pushed (even if despite themselves) into downplaying human specialness.  Moreover, science advocates in emphasizing evolution have also played into a strong and ancient religious tradition that views parts/aspects of nature, and Nature herself, as sapient (cf. my past remarks on oral society).

Meanwhile, in the capitalist structure of the world we've created, people are strongly motivated to devise ways to do things with technology, and strongly motivated to make strong claims about what they can do with it.  There is no obvious capitalist motive for them to suggest technology might be inferior to people for some purposes, let alone for them to actually go out and look for advantages of not using technology for some things.  Certainly our technology can do things with algorithms and vast quantities of data that clearly could not be done by an unaided human mind.  So we've accumulated both evidence and claims for the power of technology, and neither for the power of the human mind.

The third major trend I see is more insidious.  Following the scientific methods of objectivity highly recommended by their success in studying the natural world, we tried to objectively measure our intelligence; it seemed like a good idea at the time.  And how do you objectively measure it?  The means that comes to mind is to identify a standard, well-defined, structured task that requires intelligence (in some sense of the word), and test how well we do that task.  It's just a matter of finding the right task to test for... right?  No, it's not.  The reason is appallingly simple.  If a task really is well-defined and structured, we can in principle build technology to do it.  It's when the task isn't well-defined and structured that a sapient mind is wanted.  For quite a while this wasn't a problem.  Alan Turing proposed a test for whether a computer could "think" that it seemed no computer would be passing any time soon; computers were nowhere near image recognition; computers were hilariously bad at natural-language translation; computers couldn't play chess on the level of human masters.

To be brutally honest, automated natural-language translation is still awful.  That task is defined by the way the human mind works — which might sound dismissive if you infer mere eccentricities of human thinking, but becomes quite profound if you take "the way the human mind works" to mean "sapience".  The most obvious way computers can do automatic translation well is if we train people to constrain their thoughts to patterns that computers don't have a problem with; which seemingly amounts to training people to avoid sapient thought.  (Training people to avoid sapient thought is, historically, characteristic of demagogues.)  Image processing is still a tough nut to crack, though we're making progress.  But chess has certainly been technologized.  It figures that would be the first-technologized of those tasks I've mentioned because it's the most well-defined and structured of them.  When it happened, I didn't take it as a sign that computers were becoming sapient, but rather a demonstration that chess doesn't strictly require whatever-it-is that distinguishes sapience.  I wasn't impressed by Go, either.  I wondered about computer Jeopardy!; but on reflection, that too is a highly structured problem, with no more penalty for a completely nonsensical wrong answer than for a plausible wrong one.  I'm not suggesting these aren't all impressive technological achievements; I'm suggesting the very objectivity of these measures hides the missing element in them — understanding.

Recently in a discussion I read, someone described modern advances in AI by saying computers are getting 'better and better at understanding the world' (or nearly those words), and I thought, understanding is just what they aren't doing.  It seems to me the technology is doing what it's always done — getting better and better at solving classes of problems without understanding them.  The idea that the technology understands anything at all seems to me to be an extraordinary claim, therefore requiring extraordinary proof which I do not see forthcoming since, as remarked, we expect to be unable to test it by means of the most obvious sort of experiment (a structured aptitude test).  If someone wants to contend that the opposite claim I'm making is also extraordinary — the claim that we understand in a sense the technology does not — I'll tentatively allow that resolving the question in either direction may require extraordinary proof; but I maintain there are things we need to do in case I'm right.

Somebody, I maintain, has to bring a big-picture perspective to bear.  To understand, in order to choose the goals of what our technology is set to do, in order to choose the structural paradigm for the problem, in order to judge when the technology is actually solving the problem and when the situation falls outside the paradigm.  In order to improvise what to do when the situation does fall outside the paradigm.  That somebody has to be sapient.

For those skeptics who may wonder (keeping in mind I'm all for skepticism, myself) whether there is an unfalsifiable claim lurking here somewhere, note that we are not universally prohibited from observing the gap between sapience and non-sapience.  The difficulty is with one means of observation:  a very large and important class of experiments are predictably incapable of measuring, or even detecting, the gap.  The reason this does not imply unfalsifiability is that scientific inquiry isn't limited to that particular class of experiments, large and important though the class is; the range of scientific inquiry doesn't have specific formally-defined boundaries — because it's an activity of sapient minds.

The gap is at least suggested by the aforementioned difficulty of automatic translation.  What's missing in automatic translation is understanding:  by its nature automatic translation treats texts for translation as strings to be manipulated, rather than indications about the reality in which their author is embedded.  Whatever is missed by automatic translation because it is manipulating strings without thinking about their meaning, that is a manifestation of the sapience/non-sapience gap.  Presumably, with enough work one could continue to improve automatic translators; any particular failure of translation can always be fixed, just as any standardized test can be technologized.  How small the automatic-translation shortfall can be made in practice, remains to be seen; but the shape of the shortfall should always be that of an automated system doing a technical manipulation that reveals absence of comprehension.

Consider fly-by-wire airplanes, which I mentioned in a previous post.  What happens when a fly-by-wire airplane encounters a situation outside the parameters of the fly-by-wire system?  It turns control over to the human pilots.  Who often don't realize, for a few critical moments (if those moments weren't critical, we wouldn't be talking about them, and quite likely the fly-by-wire system would not have bailed) that the fly-by-wire system has stopped flying the plane for them; and they have to orient themselves to the situation; and they've mostly been getting practice at letting the fly-by-wire system do things for them.  And then when this stacked-deck of a situation leads to a horrible outcome, there are strong psychological, political, and economic incentives to conclude that it was human error; after all, the humans were in control at the denouement, right?  It seems pretty clear to me that, of the possible ways that one could try to divvy up tasks between technology and humans, the model currently used by fly-by-wire airplanes (and now, one suspects, drive-by-wire cars) is a poor model, dividing tasks for the convenience of whoever is providing the automation rather than for the synergism of the human/non-human ensemble.  It doesn't look as if we know how to design such systems for synergism of the ensemble; and it's not immediately clear that there's any economic incentive for us to figure it out.  Occasionally, of course, something that seems unprofitable has economic potential that's only waiting for somebody to figure out how to exploit it; if there is such potential here, we may need first to understand the information-processing characteristics of sapience better.  Meanwhile, I suggest, there is a massive penalty, on a civilization-wide scale (which is outside the province of ordinary economics), if we fail to figure out how to design our technology to nurture sapience.  It should be possible to nurture sapience without first knowing how it works, or even exactly what it does — though figuring out how to nurture it may bring us closer to those other things.

I'll remark other facets of the inferiority-complex effect, as they arise in discussion, below.

Memetics

By the time I'm writing this post, I've moved further along a path of thought I mentioned in my first contentful post on this blog.  I wrote then that in Dawkins's original description of memetics, he made an understandable mistake by saying that memetic life was "still in its infancy, still drifting clumsily about in its primeval soup".  That much I'm quite satisfied with:  it was a mistake — memetic evolution has apparently proceeded about three to five orders of magnitude faster than genetic evolution, and has been well beyond primeval soup for millennia, perhaps tens of millennia — and it was an understandable mistake, at that.  I have more to say now, though, about the origins of the mistake.  I wrote that memetic organisms are hard to recognize because you can't observe them directly, as their primary form is abstract rather than physical; and that's true as far as it goes; but there's also something deeper going on.  Dawkins is a geneticist, and in describing necessary conditions under which replication gives rise to evolution, he assumed it would always require the sort of conditions that genetic replication needs to produce evolution.  In particular, he appears to have assumed there must be a mechanism that copies a basic representation of information with fantastically high fidelity.

Now, this is a tricky point.  I'm okay with the idea that extreme-fidelity basic replication is necessary for genetic evolution.  It seems logically cogent that something would have to be replicated with extreme fidelity to support evolution-in-general (such as memetic evolution).  But I see no reason this extreme-fidelity replication would have to occur in the basic representation.  There's no apparent reason we must be able to pin down at all just what is being replicated with extreme fidelity, nor must we be able to identify a mechanism for extreme-fidelity copying.  If we stipulate that evolution implies something is being extreme-fidelity-copied, and we see that evolution is taking place, we can infer that some extreme-fidelity copying is taking place; but evolution works by exploiting what happens with indifference to why it happens.  We might find that underlying material is being copied wildly unfaithfully, yet somehow, beyond our ability to follow the connections, this copying preserves some inarticulable abstract property that leads to an observable evolutionary outcome.  Evolution would exploit the abstract property with complete indifference to our inability to isolate it.

It appears that in the case of genetic evolution, we have identified a basic extreme-fidelity copying mechanism.  In fact, apparently it even has an error-detection-and-correction mechanism built into it; which certainly seems solid confirmation that such extreme fidelity was direly needed for genetic evolution or such a sophisticated mechanism would never have developed.  Yet there appears to be nothing remotely like that for memetic replication.  If memetic evolution really had the same sort of dynamics as genetic evolution, we would indeed expect memetic life to be "still drifting clumsily about in its primeval soup"; it couldn't possibly do better than that until it had developed a super-high-fidelity low-level replicating mechanism.

Yet memetic evolution proceeds at, comparatively, break-neck pace, in spectacular defiance of the expectation.  Therefore we may suppose that the dynamics of memetic evolution are altered by some factor profoundly different from genetic evolution.

I suggest the key altering factor of memetic evolution, overturning the dynamics of genetic evolution, is that the basic elements of the host medium — people, rather than chemicals — are sapient.  What this implies is that, while memetic replication involves obviously-low-fidelity copying of explicitly represented information, the individual hosts are thinking about the content, processing it through the lens of their big-picture sapient perspective.  Apparently, this can result in an information flow with abstract fixpoints — things that get copied with extreme fidelity — that can't be readily mapped onto the explicit representation (e.g., what is said/written).  My sense of this situation is that if it is even useful to explicitly posit the existence of discrete "memes" in memetic evolution, it might yet be appropriate to treat them as unknown quantities rather than pouring effort into trying to identify them individually.  It seems possible the wholesale discreteness assumption may be unhelpful as well — though ideas don't seem like a continuous fluid in the usual simple sense, either.

This particular observation of the sapient/non-sapient gap is from an unusual angle.  When trying to build an AI, we're likely to think in terms of what makes an individual entity sapient; likewise when defining sapience.  The group dynamics of populations of sapients versus non-sapients probably won't (at a guess) help us in any direct way to build or measure sapience; but it does offer a striking view of the existence of a sapience/non-sapience gap.  I've remarked before that groups of people get less sapient at scale; a population of sapiences is not itself sapient; but it appears that, when building a system, mixing in sapient components can produce systemic properties that aren't attainable with uniformly non-sapient components, thus attesting that the two kinds of components do have different properties.

This evolutionary property of networks of sapiences affords yet another opportunity to underestimate sapience itself.  Seeing that populations of humans can accumulate tremendous knowledge over time — and recognizing that no individual can hope to achieve great feats of intellect without learning from, and interacting with, such a scholastic tradition — and given the various motives, discussed above, for downplaying human specialness — it may be tempting to suppose that sapience is not, after all, a property of individuals.  However, cogito, ergo that's taking the idea of collective intelligence to an absurdity.  The evolutionary property of memetics I've described is not merely a property of how the network is set up; if it were, genetic evolution ought to have struck on it at some point.

There are, broadly, three idealized models (at least three) of how a self-directing system can develop.  There's "blind evolution", which explores alternatives by maintaining a large population with different individuals blundering down different paths simultaneously, and if the population is big enough, the variety amongst individuals is broad enough, and the viable paths are close enough to blunder into, enough individuals will succeed well enough that the population evolves rather than going extinct.  This strategy isn't applicable to a single systemic decision, as with the now-topical issue of global climate change:  there's no opportunity for different individuals to live in different global climates, so there's no opportunity for individuals who make better choices to survive better than individuals who make poorer choices.  As a second model, there's a system directed by a sapience; the individual sapient mind who runs the show can plan, devising possible strategies and weighing their possible consequences before choosing.  It is also subject to all the weaknesses and fallibilities of individuals — including plain old corruption (which, we're reminded, power causes).  The third model is a large population of sapiences, evolving memetically — and that's different again.  I don't pretend to fully grok the dynamics of that third model, and I think it's safe to say no-one else does either; we're all learning about it in real time as history unfolds, struggling with different ways of arranging societies (governmentally, economically, what have you).

A key weakness of the third model is that it only applies under fragile conditions; in particular, the conditions may be deliberately disrupted, at least in the short term; keeping in mind we're dealing with a population of sapiences each potentially deliberate.  When systemic bias or small controlling population interferes with the homogeneity of the sapient population, the model breaks down and the system control loses — at least, partly loses — its memetic dynamics.  This is a vulnerability shared in common by the systems of democracy and capitalism.

The sorcerer's apprentice

There are, of course, more-than-adequate ways for us to get into trouble by succeeding in giving our technology sapience.  A particularly straightforward one is that we give it sapience and it decides it doesn't want to do what we want it to.  In science fiction this scenario may be accompanied by a premise that the created sapience is smarter than we are — although, looking around at history, there seems a dearth of evidence that smart people end up running the show.  Even if they're only about as smart, and stupid, as we are, an influx of artificial sapiences into the general pool of sapience in civilization is likely to throw off the balance of the pool as a whole — either deliberately or, more likely, inadvertently.  One has only to ask whether sapient AIs should have the right to vote to see a tangle of moral, ethical, and practical problems cascading forth (with vote rigging on one side, slavery on the other; not forgetting that, spreading opaque fog over the whole, we have no clue how to test for sapience).  However, I see no particular reason to think we're close to giving our technology sapience; I have doubts we're even trying to do so, since I doubt we know where that target actually is, making it impossible for us to aim for it (though mistaking something else for the target is another opportunity for trouble).  Even if we could eventually get ourselves into trouble by giving our technology sapience, we might not last long enough to do so because we get ourselves into trouble sooner by the non-sapient-technology route.  So, back to non-sapience.

A major theme in non-sapient information processing is algorithms:  rigidly specified instructions for how to proceed.  An archetypal cautionary tale about what goes wrong with algorithms is The Sorcerer's Apprentice, an illustration (amongst other possible interpretations) of what happens when a rigid formula is followed without sapient oversight of when the formula itself ceases to be appropriate due to big-picture perspective.  One might argue that this characteristic rigidity is an inherently non-sapient limitation of algorithms.

It's not an accident that error-handling is among the great unresolved mysteries of programming-language design — algorithms being neither well-suited to determine when things have gone wrong, nor well-suited to cope with the mess when they do.

Algorithmic rigidity is what makes bureaucracy something to complain about — blind adherence to rules even when they don't make sense in the context where they occur, evoking the metaphor of being tied up in red tape.  The evident dehumanizing effect of bureaucracy is that it eliminates discretion to take advantage of understanding arbitrary aspects of big picture; it seems that to afford full scope to sapience, maximizing its potential, one wants to provide arbitrary flexibility — freedom — avoiding limitation to discrete choices.

A bureaucratic system can give lip service to "giving people more choices" by adding on additional rules, but this is not a route to the sort of innate freedom that empowers the potential of sapience.  To the contrary:  sapient minds are ultimately less able to cope with vast networks of complicated rules than technological creations such as computers — or corporations, or governments — are, and consequently, institutions such as corporations and governments naturally evolve vast networks of complicated rules as a strategy for asserting control over sapiences.  There are a variety of ways to describe this.  One might say that an institution, because it is a non-sapient entity in a sea of sapient minds, is more likely to survive if it has some property that limits sapient minds so they're less likely to overwhelm it.  A more cynical way to say the same thing is that the institution survives better if it finds a way to prevent people from thinking.  A stereotypical liberal conspiracy theorist might say "they" strangle "us" with complicated rules to keep us down — which, if you think about it, is yet another way of saying the same thing (other than the usual incautious assumption of conspiracy theorists, that the behavior must be a deliberate plot by individual sapiences rather than an evolved survival strategy of memetic organisms).  Some people are far better at handling complexity than others, but even the greatest of our complexity tolerances are trivial compared to those of our non-sapient creations.  Part of my point here is that I don't think that's somehow a "flaw" in us, but rather part of the inherent operational characteristics of sapience that shape the way it ought to be most effectively applied.

Lies, damned lies, and statistics

A second major theme in non-sapient information processing is "big data".  Where algorithms contrast with sapience in logical strategy, big data contrasts in sheer volume of raw data.

These two dimensions — logical strategy and data scale — are evidently related.  Algorithms can be applied directly to arbitrarily-large-scale data; sapience cannot, which is why big data is the province of non-sapient technology.  I suggested in an earlier post that the device of sapience only works at a certain range of scales, and that the sizes of both our short- and our long-term memories may be, to some extent, essential consequences of sapience rather than accidental consequences of evolution.  Not everyone tops out at the same scale of raw data, of course; some people can take in a lot more, or a lot less, than others before they need to impose some structure on it.  Interestingly, this is pretty clearly not some sort of "magnitude" of sapience, as there have been acknowledged geniuses, of different styles, toward both ends of the spectrum; examples that come to mind, Leonard Euler (with a spectacular memory) and Albert Einstein (notoriously absent-minded).

That we sapiences can "make sense" of raw data, imposing structure on it and thereby coping with masses of data far beyond our ability to handle in raw form, would seem to be part of the essence of what it means to be sapient.  The attendant limitation on raw data processing would then be a technical property of the Platonic realm in broadly the same sense as fundamental constants like π, e, etc., and distant kin to such properties of the physical realm as the conditions necessary for nuclear fusion.

Sometimes, we can make sense of vast data sets, many orders of magnitude beyond our native capacity, by leveraging technological capacity to process more-or-less-arbitrarily large volumes of raw data and boil it down algorithmically, to a scale/form within our scope.  It should be clear that the success of the enterprise depends on how insightfully we direct the technology on how to boil down the data; essentially, we have to intuit what sorts of analysis will give us the right sorts of information to gain insight into the salient features of the data.  We're then at the short end of a data-mining lever; the bigger the data mine, the trickier it is to reason out how to direct the technological part of the operation.  It's also possible to deliberately choose an analysis that will give us the answer we want, rather than helping us learn about reality.  And thus are born the twin phenomena of misuse of statistics and abuse of statistics.

There may be a temptation to apply technology to the problem of deciding how to mine the data.  That —it should be clear on reflection— is an illusion.  The technology is just as devoid of sapient insight when we apply it to the meta-analysis as when we applied it to the analysis directly; and the potential for miscues is yet larger, since technology working at the meta-level is in a position to make more biasing errors through lack of judgement.

One might be tempted to think of conceptualization, the process by which we impose concepts on raw data to structure and thus make sense of it, as "both cause and cure" of our limited capacity to process raw data; but this would, imo, be a mistake of orientation.  Conceptualization — which seems to be the basic functional manifestation of sapience — may cause the limited-capacity problem, and it may also be the "cure", i.e., the means by which we cope with the problem, but neither of those is the point of conceptualization/sapience.  As discussed, sapience differs from non-sapient information processing in ways that don't obviously fit on any sort of spectrum.  Consider:  logically, our inability to directly grok big data can't be a "failure" unless one makes a value judgement that that particular ability is something we should be able to do — and making a value judgement is something that can only be meaningfully ascribed to a sapience.

It's also rather common to imagine the possibility of a sapience of a different order, capable of processing vast (perhaps even arbitrarily vast) quantities of data.  This can result from —as noted earlier— portraying evolution as if it were a sapient process.  It may result from an extrapolation based on the existence of some people with higher raw-data tolerances than others; but this treats "intelligence" as an ordering correlated with raw data processing capacity — which, as I've noted above, it is not.  Human sapiences toward the upper end of raw data processing capacity don't appear to be "more sapient", rather it's more like they're striking a different balance of parameters.  Different strengths and weaknesses occur at different mixtures of the parameters, and this seems to me characteristic of an effect (sapience) that can only occur under a limited range of conditions, with the effect breaking down in different ways depending on which boundary of the range is crossed.  Alternatively, it has sometimes been suggested there should some sort of fundamentally different kind of mind, working on different principles than our own; but once one no longer expects this supposed effect to have anything to do with sapience as it occurs in humans, I see no basis on which to conjecture the supposed effect at all.

There's also yet another opportunity here for us to talk ourselves into an inferiority complex.  We tend to break down a holistic situation into components for understanding, and then when things fail we may be inclined to ascribe failure to a particular component, rather than to the way the components fit together or to the system as a whole.  So when a human/technology ensemble fails, we're that much more likely to blame the human component.

Pro-sapient tech

How can we design technology to nurture sapience rather than stifle it?  Though I don't claim to grasp the full scope of this formidable challenge, I have some suggestions that should help.

On the stifling side, the two big principles I've discussed are algorithms and scale; algorithms eliminate the arbitrary flexibility that gives sapience room to function, while vast masses of data overwhelm sapiences (technology handles arbitrarily large masses of data smoothly, not trying to grok big-picture implications that presumably grow at least quadratically with scale).  Evidently sapience needs full-spectrum access to the data (it can't react to what it doesn't know), needs to have hands-on experience from which to learn, needs to be unfettered in its flexibility to act on what it sees.

Tedium should be avoided.  Aspects of this are likely well-known in some circles, perhaps know-how related to (human) assembly-line work; from my own experience, tedium can trip up sapience in a couple of ways, that blur into each other.  Repeating actions over and over can lead to inattention, so that when a case comes along that ought to be treated differently, the sapient operator just does the same thing yet again, either failing to notice it at all, or "catching it too late" (i.e., becoming aware of the anomaly after having already committed to processing it in the usual way).  On the other hand, paying full attention to an endless series of simple cases, even if they offer variations maintaining novelty, can exhaust the sapient operator's decision-making capacity; I, for one, find that making lots of little decisions drains me for a time, as if I had a reservoir of choice that, when depleted, refills at a limited natural rate.  (I somewhat recall a theory ascribed to Barack Obama that a person can only make one or two big decisions per day; same principle.)

Another important principle to keep in mind is that sapient minds need experience.  Even "deep learning" AIs need training, but with sapiences the need is deeper and wider; the point is not merely to "train" them to do a particular task, important though that is, but to give them accumulated broad experience in the whole unbounded context surrounding whatever particular tasks are involved.  Teaching a student to think is an educator's highest aspiration.  An expert sapient practitioner of any trade uses "tricks of the trade" that may be entirely outside the box.  A typical metaphor for extreme forms of such applied sapient measures is 'chewing gum and baling wire'.  One of the subtle traps of over-reliance on technology is that if sapiences aren't getting plenty of broad, wide hands-on experience, when situations outside known parameters arise there will be no-one clueful to deal with it — even if the infrastructure has sufficiently broad human-accessible flexibility to provide scope for out-of-the-box sapient measures.  (An old joke describes an expert being called in to fix some sort of complex system involving pipes under pressure —recently perhaps a nuclear power plant, some older versions involve a steamboat— who looks around, taps a valve somewhere, and everything starts working again; the expert charges a huge amount of money —say a million dollars, though the figure has to ratchet up over time due to inflation— and explains, when challenged on the amount, that one dollar is for tapping the valve, and the rest is for knowing where to tap.)

This presents an economic/social challenge.  The need to provide humans with hands-on experience is a long-term investment in fundamental robustness.  For the same reason that standardized tests ultimately cannot measure sapience, short-term performance on any sufficiently well-structured task can be improved by applying technology to it, which can lead to a search for ways to make tasks more well-structured — with a completely predictable loss of ability to deal with... the unpredictable.  I touched on an instance of this phenomenon when describing, in an earlier post, the inherent robustness of a traffic system made up of human drivers.

Suppression of sapience also takes much more sweeping, long-term systemic forms.  A particular case that made a deep impression on me:  in studying the history of my home town I was fascinated that the earliest European landowners of the area received land grants from the king, several generations before Massachusetts residents rose up in rebellion against English rule (causing a considerable ruckus, which you may have heard about).  Those land grants were subject to proving the land, which is to say, demonstrating an ability to develop it.  Think about that.  We criticize various parties —developers, big corporations, whatever— for exploiting the environment, but those land grants, some four hundred years ago under a different system of government, required exploiting the land, otherwise the land would be taken away and given to someone else.  Just how profoundly is that exploitation woven into the fabric of Western civilization?  It appears to be quite beyond distinctions like monarchy versus democracy, capitalism versus socialism.  We've got hold of the tail of a vast beast that hasn't even turned 'round to where we can see the thing as a whole; it's far, far beyond anything I can tackle in this post, except to note pointedly that we must be aware of it, and be thinking about it.

A much simpler, but also pernicious, source of long-term systemic bias is planning to add support for creativity "later".  Criticism of this practice could be drawn to quite reasonable tactical concerns like whether anyone will really ever get around to attempting the addition, and whether a successful addition would fail to take hold because it would come too late to overcome previously established patterns of behavior; the key criticism I recommend, though, is that strategically, creativity is itself systemic and needs to be inherent in the design from the start.  Anything tacked on as an afterthought would be necessarily inferior.

To give proper scope for sapience, its input — the information presented to the sapient operator in a technological interface — should be high-bandwidth from an unbounded well of ordered complexity.  There has to be underlying rhyme-and-reason to what is presented, otherwise information overload is likely, but it mustn't be stoppered down to the sort of simple order that lends itself to formal, aka technological, treatment, which would defeat the purpose of bringing a sapience to bear on it.  Take English text as archetypical:  built up mostly from 26 letters and a few punctuation marks and whitespace, yet as one scales up, any formal/technological grasp on its complexity starts to fuzz until ultimately it gets entirely outside what a non-sapience can handle.  Technology sinks in the swamp of natural language, while to a sapience natural language comes... well, naturally.  This sort of emergent formal intractability seems a characteristic domain of sapience.  There is apparently some range of variation in the sorts of rhyme and reason involved; for my part, I favor a clean simple set of orthogonal primitives, while another sort of mind favors a less tidy primitive set (more-or-less the design difference between Scheme and Common Lisp).

When filtering input to avoid simply overwhelming the sapient user, whitelisting is inherently more dangerous than blacklisting.  That is, an automatic filter to admit information makes an algorithmic judgement about what may be important, which judgement is properly the purview of sapience, to assess unbounded context; whereas a filter to omit completely predictable information, though it certainly can go wrong, has a better chance of working since it isn't trying to make a call about which information is extraneous, only about which information is completely predictable (if properly designed; censorship being one of the ways for it to go horribly wrong).

On the output side —i.e., what the sapient operator is empowered to do— a key aspect is effective ability to step outside the framework.  Sets of discrete top-level choices are likely to stifle sapient creativity rather than enhance it (not to be confused with a set of building blocks, which would include the aforementioned letters-plus-punctuation).  While there is obvious advantage in facilities to support common types of actions, those facilities need to blend smoothly with robust handling of general cases, to produce graceful degradation when stepping off the beaten path.  Handling some approaches more easily than others might easily turn into systemic bias against the others — a highly context-dependent pitfall, on which the reason for less-supported behavior seems to be the pivotal factor.  (Consider the role of motive-for-deviation in the subjective balance between pestering the operator about an unconventional choice until they give it up, versus allowing one anomaly to needlessly propagate unchecked complications.)

Storytelling and social upheaval

A final thought, grounding this view of individual sapiences back into global systemic threats (where I started, at the top of the post).

Have you noticed it's really hard to adapt a really good book into a really good movie?  So it seems to me.  When top-flight literature translates successfully to a top-flight movie, the literature is more likely to have been a short story.  A whole book is more likely to translate into a miniseries, or a set of movies.  I was particularly interested by the Harry Potter movies, which I found suffered from their attempt to fit far too much into each single movie; the Harry Potter books were mostly quite long, and were notable for their rich detail, and that couldn't possibly be captured by one movie per book without reducing the richness to something telegraphic.  The books were classics, for the ages; the movies weren't actually bad, but they weren't in the same rarefied league as the books.  (I've wondered if one could turn the Harry Potter book set into a television series, with one season per book.)

The trouble in converting literature to cinematography is bandwidth.  From a technical standpoint this is counter-intuitive:  text takes vastly less digital storage than video; but how much of that data can be used as effective signal depends on what kind of signal is intended.  I maintain that as a storytelling medium, text is extremely high-bandwidth while video is a severe bottleneck, stunningly inefficient at getting the relevant ideas across if, indeed, they can be expressed at all.  In essence, I suggest, storytelling is what language has evolved for.  A picture may be worth a thousand words, but  (a) it depends on which words and which picture,  (b) it's apparently more like 84 words, and  (c) it doesn't follow that a thousand pictures are worth a thousand times as many words.

In a post here some time back, I theorized that human language has evolved in three major stages (post).  The current stage in the developed world is literacy, in which society embraces written language as a foundation for acquiring knowledge.  The preceding stage was orality, where oral sagas are the foundation for acquiring knowledge, according to the theory propounded by Eric Havelock in his magnum opus Preface to Plato, where he proposes that Plato lived on the cusp of the transition of ancient Greek society from orality to literacy.  My extrapolation from Havelock's theory says that before the orality stage of language was another stage I've called verbality, which I speculate may have more-or-less resembled the peculiar Amazonian language Pirahã (documented by Daniel Everett in Don't Sleep There are Snakes).  Pirahã has a variety of strange features, but what particularly attracted my attention was that, adding up these features, Pirahã apparently does not and cannot support an oral culture; Pirahã culture has no history, art, or storytelling (does not), and the language has no temporal vocabulary, tense, or number system (cannot).

'No storytelling' is where this relates back to books-versus-movies.  The nature of the transition from verbality to orality is unclear to me; but I (now) conjecture that once the transition to orality occurs, there would then necessarily be a long period of linguistic evolution during which society would slowly figure out how to tell stories.  At some point in this development, writing would arise and after a while precipitate the transition to literacy.  But the written form of language, in order to support the transition to literate society, would particularly have to be ideally suited to storytelling.

Soon after the inception of email as a communication medium came the development of emoticons:  symbols absent from traditional written storytelling but evidently needed to fill in for the contextual "body language" clues ordinarily available in face-to-face social interaction.  Demonstrating that social interaction itself is not storytelling as such, for which written language was already well suited without emoticons.  One might conjecture that video, while lower-storytelling-bandwidth than text, could have higher effective social-interaction-bandwidth than text.  And on the other side of the equation, emoticons also demonstrate that the new electronic medium was already being used for non-storytelling social interaction.

For another glimpse into the character of the electronic medium, contrast the experience of browsing Wikibooks — an online library of some thousands of open-access textbooks — against the pre-Internet experience of browsing in an academic library.

On Wikibooks, perhaps you enter through the main page, which offers you a search box and links to some top-level subject pages like Computing, Engineering, Humanities, and such.  Each of those top-level subject pages provides an array of subsections, and each subsection will list all its own books as well as listing its own sub-subsections, and so on.  The ubiquitous search box will do a string search, listing first pages that mention your chosen search terms in the page title, then pages that contain the terms somewhere in the content of the page.  Look at a particular page of a book, and you'll see the text, perhaps navigation links such as next/previous page, parent page, subpages; there might be a navigation box on the right side of the page that shows the top-level table of contents of the book.

At the pre-Internet library, typically, you enter past the circulation desk, where a librarian is seated.  Past that, you come to the card catalog; hundreds of alphabetically labeled deep drawers of three-by-five index cards, each card cumulatively customized by successive librarians over decades, perhaps over more than a century if this is a long-established library.  (Side insight, btw:  that card catalog is, in its essence, a collaborative hypertext document very like a wiki.)  You may spend some time browsing through the catalog, flipping through the cards in various drawers, jotting down notes and using them to move from one drawer to another — a slower process than if you could move instantly from one to another by clicking an electronic link, but also a qualitatively richer experience.  At every moment, surrounding context bears on your awareness; other index cards near the one you're looking at, other drawers; and beyond that, strange though it now seems that this is worth saying, you are in a room, literally immersed in context.  Furniture, lights, perhaps a cork bulletin board with some notices on it; posters, signs, or notices on the walls, sometimes even thematic displays; miscellany (is that a potted plant over there?); likely some other people, quietly going about their own business.  The librarian you passed at the desk probably had some of their own stuff there, may have been reading a book.  Context.  Having taking notes on what you found in the card catalog and formulated a plan, you move on to the stacks; long rows of closely spaced bookcases, carefully labeled according to some indexing system referenced by the cards and jotted down in your notes, with perhaps additional notices on some of the cases — you're in another room — you come to the shelves, and may well browse through other books near what your notes direct you to, which you can hardly help noticing (not like an electronic system where you generally have to go out of your way to conjure up whatever context the system may be able to provide).  You select the particular book you want, and perhaps take it to a reading desk (or just plunk down on the carpet right there, or a nearby footstool, to read); and as you're looking at a physical book, you may well flip through the pages as you go, yet another inherently context-intensive browsing technique made possible by the physicality of the situation.

What makes this whole pre-Internet experience profoundly different from Wikibooks — and I say this as a great enthusiast of Wikibooks — is the rich, deep, pervasive context.  And context is where this dovetails back into the main theme of this post, recognizing context as the special province of sapience.

When the thriving memetic ecosystem of oral culture was introduced to the medium of written language, it did profoundly change things, producing literate culture, and new taxonomic classes of memetic organisms that could not have thrived in oral society (I'm thinking especially of scientific organisms); but despite these profound changes, the medium still thoroughly supported language, and context-intensive social interactions mostly remained in the realm of face-to-face encounters.  So the memetic ecosystem continued to thrive.

Memetic ecosystem is where all of this links back to the earlier discussion of populations of sapiences.

That discussion noted system self-direction through a population of sapiences can break down if the system is thrown out of balance.  And while the memetic ecosystem handily survived the transition to literacy, it's an open question what will happen with the transition to the Internet medium.  This time, the new medium is highly context-resistant while it aggressively pulls in social interactions.  With sapience centering on context aspects that are by default eliminated or drastically transformed in the transition, it seems the transition must have, somehow, an extreme impact on the way sapient minds develop.  If there is indeed a healthy, stable form of society to be achieved on the far side of this transition, I don't think we should kid ourselves that we know what that will look like, but it's likely to be very different, in some way or other, from the sort of stable society that preceded.

The obvious forecast is social upheaval.  The new system doesn't know how to put itself together, or really even know for sure whether it can.  The old system is pretty sure to push back.  As I write this, I look at the political chaos in the United States —and elsewhere— and I see these forces at work.

And I think of the word singularity.

Friday, June 16, 2017

Co-hygiene and quantum gravity

[l'Universo] è scritto in lingua matematica
([The Universe] is written in the language of mathematics)
— Galileo Galilei, Il Saggiatore (The Assayer), 1623.

Here's another installment in my ongoing exploration of exotic ways to structure a theory of basic physics.  In our last exciting episode, I backtraced a baffling structural similarity between term-rewriting calculi and basic physics to a term-rewriting property I dubbed co-hygiene.  This time, I'll consider what this particular vein of theory would imply about the big-picture structure of a theory of physics.  For starters, I'll suggest it would imply, if fruitful, that quantum gravity is likely to be ultimately unfruitful and, moreover, quantum mechanics ought to be less foundational than it has been taken to be.  The post continues on from there much further than, candidly, I had expected it to; by the end of this installment my immediate focus will be distinctly shifting toward relativity.

To be perfectly clear:  I am not suggesting anyone should stop pursuing quantum gravity, nor anything else for that matter.  I want to expand the range of theories explored, not contract it.  I broadly diagnose basic physics as having fallen into a fundamental rut of thinking, that is, assuming something deeply structural about the subject that ought not to be assumed; and since my indirect evidence for this diagnosis doesn't tell me what that deep structural assumption is, I want to devise a range of mind-bendingly different ways to structure theories of physics, to reduce the likelihood that any structural choice would be made through mere failure to imagine an alternative.

The structural similarity I've been pursuing analogizes between, on one side, the contrast of pure function-application with side-effect-ful operations in term-rewriting calculi; and on the other side, the contrast of gravity with the other fundamental forces in physics.  Gravity corresponds to pure function-application, and the other fundamental forces correspond to side-effects.  In the earlier co-hygiene post I considered what this analogy might imply about nondeterminism in physics, and I'd thought my next post in the series would be about whether or not it's even mathematically possible to derive the quantum variety of nondeterminism from the sort of physical structure indicated.  Just lately, though, I've realized there may be more to draw from the analogy by considering first what it implies about non-locality, folding in nondeterminism later.  Starting with the observation that if quantum non-locality ("spooky action at a distance") is part of the analog to side-effects, then gravity should be outside the entanglement framework, implying both that quantum gravity would be a non-starter, and that quantum mechanics, which is routinely interpreted to act directly from the foundation of reality by shaping the spectrum of alternative versions of the entire universe, would have to be happening at a less fundamental level than the one where gravity differs from the other forces.

On my way to new material here, I'll start with material mostly revisited from the earlier post, where it was mixed in with a great deal of other material; here it will be more concentrated, with a different emphasis and perhaps some extra elements leading to additional inferences.  As for the earlier material that isn't revisited here — I'm very glad it's there.  This is, deliberately, paradigm-bending stuff, where different parts don't belong to the same conceptual framework and can't easily be held in the mind all at once; so if I hadn't written down all that intermediate thinking at the time, with its nuances and tangents, I don't think I could recapture it all later.  I'll continue here my policy of capturing the journey, with its intermediate thoughts and their nuances and tangents.

Until I started describing λ-calculus here in earnest, it hadn't registered on me that it would be a major section of the post.  Turns out, though, my perception of λ-calculus has been profoundly transformed by the infusion of perspective from physics; so I found myself going back to revisit basic principles that I would have skipped lightly over twenty years ago, and perhaps even two years ago.  It remains to be seen whether developments later in this post will sufficiently alter my perspective to provoke yet another recasting of λ-calculus in some future post.

Contents
Side-effects
Variables
Side-effect-ful variables
Quantum scope
Geometry and network
Cosmic structure
Side-effects

There were three main notions of computability in the 1930s that were proved equi-powerful by the Church-Turing thesis:  general recursive functions, λ-calculus, and Turing machines (due respectively to Jacques Herbrand and Kurt Gödel, to Alonzo Church, and to Alan Turing).  General recursive functions are broadly equational in style, λ-calculus is stylistically more applicative; both are purely functional.  Turing machines, on the other hand, are explicitly imperative.  Gödel apparently lacked confidence in the purely functional approaches as notions of mechanical calculability, though Church was more confident, until the purely functional approaches were proven equivalent to Turing machines; which to me makes sense as a matter of concreteness.  (There's some discussion of the history in a paper by Solomon Feferman; pdf.)

This mismatch between abstract elegance and concrete straightforwardness was an early obstacle, in the 1960s, to applying λ-calculus to programming-language semantics.  Gordon Plotkin found a schematic solution strategy for the mismatch in his 1975 paper "Call-by-name, call-by-value and the λ-calculus" (pdf); one sets up two formal systems, one a calculus with abstract elegance akin to λ-calculus, the other an operational semantics with concrete clarity akin to Turing machines, then proves well-behavedness theorems for the calculus and correspondence theorems between the calculus and operational semantics.  The well-behavedness of the calculus allows us to reason conveniently about program behavior, while the concreteness of the operational semantics allows us to be certain we are really reasoning about what we intend to.  For the whole arrangement to work, we need to find a calculus that is fully well-behaved while matching the behavior of the operational semantics we want so that the correspondence theorems can be established.

Plotkin's 1975 paper modified λ-calculus to match the behavior of eager argument evaluation; he devised a call-by-value λv-calculus, with all the requisite theorems.  The behavior was, however, still purely functional, i.e., without side-effects.  Traditional mathematics doesn't incorporate side-effects.  There was (if you think about it) no need for traditional mathematics to explicitly incorporate side-effects, because the practice of traditional mathematics was already awash in side-effects.  Mutable state:  mathematicians wrote down what they were doing; and they changed their own mental state and each others'.  Non-local control-flow (aka "goto"s):  mathematicians made intuitive leaps, and the measure of proof was understandability by other sapient mathematicians rather than conformance to some purely hierarchical ordering.  The formulae themselves didn't contain side-effects because they didn't have to.  Computer programs, though, have to explicitly encompass all these contextual factors that the mathematician implicitly provided to traditional mathematics.  Programs are usually side-effect-ful.

In the 1980s Matthias Felleisen devised λ-like calculi to capture side-effect-ful behavior.  At the time, though, he didn't quite manage the entire suite of theorems that Plotkin's paradigm had called for.  Somewhere, something had to be compromised.  In the first published form of Felleisen's calculi, he slightly weakened the well-behavedness theorems for the calculus.  In another published variant he achieved full elegance for the calculus but slightly weakened the correspondence theorems between the calculus and the operational semantics.  In yet another published variant he slightly modified the behavior — in operational semantics as well as calculus — to something he was able to reconcile without compromising the strength of the various theorems.  This, then, is where I came into the picture:  given Felleisen's solution and a fresh perspective (each generation knows a little less about what can't be done than the generation before), I thought I saw a way to capture the unmodified side-effect-ful behavior without weakening any of the theorems.  Eventually I seized an opportunity to explore the insight, when I was writing my dissertation on a nearby topic.  To explain where my approach fits in, I need to go back and pick up another thread:  the treatment of variables in λ-calculus.

Variables

Alonzo Church also apparently seized an opportunity to explore an insight when doing research on a nearby topic.  The main line of his research was to see if one could banish the paradoxes of classical logic by developing a formal logic that weakens reductio ad absurdum — instead of eliminating the law of the excluded middle, which was a favored approach to the problem.  But when he published the logic, in 1932, he mentioned reductio ad absurdum in the first paragraph and then spent the next several paragraphs ranting about the evils of unbound variables.  One gathers he wanted everything to be perfectly clear, and unbound variables offended his sense of philosophical precision.  His logic had just one possible semantics for a variable, namely, a parameter to be supplied to a function; he avoided the need for any alternative notions of universally or existentially quantified variables, by the (imho quite lovely) device of using higher-order functions for quantification.  That is (since I've brought it up), existential quantifier Σ applied to function F would produce a proposition ΣF meaning that there is some true proposition FX, and universal quantifier Π applied to F, proposition ΠF meaning that every proposition FX is true.  In essence, he showed that these quantifiers are orthogonal to variable-binding; leaving him with only a single variable-binding device, which, for some reason lost to history, he called "λ".

λ-calculus is formally a term-rewriting calculus; a set of terms together with a set of rules for rewriting a term to produce another term.  The two basic well-behavedness properties that a term-rewriting calculus generally ought to have are compatibility and Church-Rosser-ness. Compatibility says that if a term can be rewritten when it's a standalone term, it can also be rewritten when it's a subterm of a larger term.  Church-Rosser-ness says that if a term can be rewritten in two different ways, then the difference between the two results can always be eliminated by some further rewriting.  Church-Rosser-ness is another way of saying that rewriting can be thought of as a directed process toward an answer, which is characteristic of calculi.  Philosophically, one might be tempted to ask why the various paths of rewriting ought to reconverge later, but this follows from thinking of the terms as the underlying reality.  If the terms merely describe the reality, and the rewriting lets us reason about its development, then the term syntax is just a way for us to separately describe different parts of the reality, and compatibility and Church-Rosser-ness are just statements about our ability (via this system) to reason separately about different aspects of the development at different parts of the reality without distorting our eventual conclusion about where the whole development is going.  From that perspective, Church-Rosser-ness is about separability, and convergence is just the form in which the separability appears in the calculus.

The syntax of λ-calculus — which particularly clearly illustrates these principles — is

T   ::=   x | (TT) | (λx.T)  .
That is, a term is either a variable; or a combination, specifying that a function is applied to an operand; or a λ-expression, defining a function of one parameter.  The T in (λx.T) is the body of the function, x its parameter, and free occurrences of x in T are bound by this λ.  An occurrence of x in T is free if it doesn't occur inside a smaller context (λx.[ ]) within T.  This connection between a λ and the variable instances it binds is structural.  Here, for example, is a term involving variables x, y, and z, annotated with pointers to a particular binding λ and its variable instances:
((λx.((λy.((λx.(xz))(xy)))(xz)))(xy))  .
  ^^                 ^     ^
The x instance in the trailing (xy) is not bound by this λ since it is outside the binding expression.  The x instance in the innermost (xz) is not bound since it is captured by another λ inside the body of the one we're considering.  I suggest that the three marked elements — binder and two bound instances — should be thought of together as the syntactic representation of a deeper, distributed entity that connects distant elements of the term.

There is just one rewriting rule — one of the fascinations of this calculus, that just one rule suffices for all computation — called the β-rule:

((λx.T1)T2)   →   T1[x ← T2]   .
The left-hand side of this rule is the redex pattern (redex short for reducible expression); it specifies a local pattern in the syntax tree of the term.  Here the redex pattern is that some particular parent node in the syntax tree is a combination whose left-hand child is a λ-expression.  Remember, this rewriting relation is compatible, so the parent node doesn't have to be the root of the entire tree.  It's important that this local pattern in the syntax tree includes a variable binder λ, thus engaging not only a local region of the syntax tree, but also a specific distributed structure in the network of non-local connections across the tree.  Following my earlier post, I'll call the syntax tree the "geometry" of the term, and the totality of the non-local connections its "network topology".

The right-hand side of the rule specifies replacement by substituting the operand T2 for the parameter x everywhere it occurs free in the body T1; but there's a catch.  One might, naively, imagine that this would be recursively defined as

x[x ← T]   =   T
x1[x2 ← T]   =   x1   if x1 isn't x2

(T1 T2)[x ← T]   =   (T1[x ← T] T2[x ← T])

(λx.T1)[x ← T2]   =   (λx.T1)
(λx1.T1)[x2 ← T2]   =   (λx1.T1[x2 ← T2])   if x1 isn't x2.
This definition just descends the syntax tree substituting for the variable, and stops if it hits a λ that binds the same variable; very straightforward, and only a little tedious.  Except that it doesn't work.  Most of it does; but there's a subtle error in the rule for descending through a λ that binds a different variable,
(λx1.T1)[x2 ← T2]   =   (λx1.T1[x2 ← T2])   if x1 isn't x2.
The trouble is, what if T1 contains a free occurrence of x2 and, at the same time, T2 contains a free instance of x1?  Then, before the substitution, that free instance of x1 was part of some larger distributed structure; that is, it was bound by some λ further up in the syntax tree; but after the substitution, following this naive definition of substitution, a copy of T2 is embedded within T1 with an instance of x1 that has been cut off from the larger distributed structure and instead bound by the inner λx1, essentially altering the sense of syntactic template T2.  The inner λx1 is then said to capture the free x1 in T2, and the resulting loss of integrity of the meaning of T2 is called bad hygiene (or, a hygiene violation).  For example,
((λy.(λx.y))x)   ⇒β   (λx.y)[y ← x]
but under the naive definition of substitution, this would be (λx.x), because of the coincidence that the x we're substituting for y happens to have the same name as the bound variable of this inner λ.  If the inner variable had been named anything else (other than y) there would have been no problem.  The "right" answer here is a term of the form (λz.x), where any variable name could be used instead of z as long as it isn't "x" or "y".  The standard solution is to introduce a rule for renaming bound variables (called α-renaming), and restrict the substitution rule to require that hygiene be arranged beforehand.  That is,
(λx1.T)   →   (λx2.T[x1 ← x2])   where x2 doesn't occur free in T

(λx1.T1)[x2 ← T2]   =   (λx1.T1[x2 ← T2])   if x1 isn't x2 and doesn't occur free in T2.
Here again, this may be puzzling if one thinks of the syntax as the underlying reality.  If the distributed structures of the network topology are the reality, which the syntax merely describes, then α-renaming is merely an artifact of the means of description; indeed, the variable-names themselves are merely an artifact of the means of description.

Side-effect-ful variables

Suppose we want to capture classical side-effect-ful behavior, unmodified, without weakening any of the theorems of Plotkin's paradigm.  Side-effects are by nature distributed across the term, and would therefore seem to belong naturally to its network topology.  In Felleisen's basic calculus, retaining the classical behavior and requiring the full correspondence theorems, side-effect-ful operations create syntactic markers that then "bubble up" through the syntax tree till they reach the top of the term, from which the global consequence of the side-effect is enacted by a whole-term-rewriting rule — thus violating compatibility, since the culminating rule is by nature applied to the whole term rather than to a subterm.  This strategy seems, in retrospect, to be somewhat limited by an (understandable) inclination to conform to the style of variable handling in λ-calculus, whose sole binding device is tied to function application at a specific location in the geometry.  Alternatively (as I seized the opportunity to explore in my dissertation), one can avoid the non-compatible whole-term rules by making the syntactic marker, which bubbles up through the term, a variable-binder.  These side-effect-ful bindings are no longer strongly tied to a particular location in the geometry; they float, potentially to the top of the term, or may linger further down in the tree if the side-effect happens to only affect a limited region of the geometry.  But the full classical behavior (in the cases Felleisen addressed) is captured, and Plotkin's entire suite of theorems are supported.

The calculus in which I implemented this side-effect strategy (along with some other things, that were the actual point of the dissertation but don't apparently matter here) is called vau-calculus.

Recall that the β-rule of λ-calculus applies to a redex pattern at a specific location in the geometry, and requires a binder to occur there so that it can also tie in to a specific element of the network topology.  The same is true of the side-effect-ful rules of the calculus I constructed:  a redex pattern occurs at a specific location in the geometry with a local tie-in to the network topology.  There may then be a substitutive operation on the right-hand side of the rule, which uses the associated element of the network topology to propagate side-effect-ful consequences back down the syntax tree to the entire encompassed subterm.  There is a qualitative difference, though, between the traditional substitution of the β-rule and the substitutions of the side-effect-ful operations.  A traditional substitution T1[x ← T2] may attach new T2 subtrees at certain leaves of the T1 syntax tree (free instances of x in T1), but does not disturb any of the pre-existing tree structure of T1.  Consequently, the only effect of the β-rule on the pre-existing geometry is the rearrangement it does within the redex pattern.  This is symmetric to the hygiene property, which assures (by active intervention if necessary, via α-renaming) that the only effect of the β-rule on the pre-existing network topology is what it does to the variable element whose binding is within the redex pattern.  I've therefore called the geometry non-disturbance property co-hygiene.  As long as β-substitution is the only variable substitution used, co-hygiene is an easily overlooked property of the β-rule since, unlike hygiene, it does not require any active intervention to maintain.

The substitutions used by the side-effect-ful rewriting operations go to the same α-renaming lengths as the β-rule to assure hygiene.  However, the side-effect-ful substitutions are non-co-hygienic.  This might, arguably, be used as a technical definition of side-effects, which cause distributed changes to the pre-existing geometry of the term.

Quantum scope

Because co-hygiene is about not perturbing pre-existing geometry, it seems reasonable that co-hygienic rewriting operations should be more in harmony with the geometry than non-co-hygienic rewriting operations.  Thus, β-rewriting should be more in harmony with the geometry of the term than the side-effect-ful operations; which, subjectively, does appear to be the case.  (The property that first drew my attention to all this was that α-renaming, which is geometrically neutral, is a special case of β-substitution, whereas the side-effect-ful substitutions are structurally disparate from α-renaming.)

And gravity is more in harmony with the geometry of spacetime than are the other fundamental forces; witness general relativity.

Hence my speculation, by analogy, that one might usefully structure a theory of basic physics such that gravity is co-hygienic while the other fundamental forces are non-co-hygienic.

One implication of this line of speculation (as I noted in the earlier post) would be fruitlessness of efforts to unify the other fundamental forces with gravity by integrating them into the geometry of spacetime.  If the other forces are non-co-hygienic, their non-affinity with geometry is structural, and trying to treat them in a more gravity-like way would be like trying to treat side-effect-ful behavior as structurally akin to function-application in λ-calculus — which I have long reckoned was the structural miscue that prevented Felleisen's calculus from supporting the full set of well-behavedness theorems.

On further consideration, though, something more may be suggested; even as the other forces might not integrate into the geometry of spacetime, gravity might not integrate into the infrastructure of quantum mechanics.  All this has to do with the network topology, a non-local infrastructure that exists even in pure λ-calculus, but which in the side-effect-ful vau-calculus achieves what one might be tempted to call "spooky action at a distance".  Suppose that quantum entanglement is part of this non-co-hygienic aspect of the theory.  (Perhaps quantum entanglement would be the whole of the non-co-hygienic aspect, or, as I discussed in the earlier post, perhaps there would be other, non-quantum non-locality with interesting consequences at cosmological scale; then again, one might wonder if quantum entanglement would itself have consequences at cosmological scale that we have failed to anticipate because the math is beyond us.)  It would follow that gravity would not exhibit quantum entanglement.  On one hand, this would imply that quantum gravity should not work well as a natural unification strategy.  On the other hand, to make this approach work, something rather drastic must happen to the underpinnings of quantum mechanics, both philosophical and technical.

We understand quantum mechanics as describing the shape of a spectrum of different possible realities; from a technical perspective that is what quantum mechanics describes, even if one doesn't accept it as a philosophical interpretation (and many do accept that interpretation, if only on grounds of Occam's Razor that there's no reason to suppose philosophically some other foundation than is supported technically).  But, shaped spectra of alternative versions of the entire universe seems reminiscent of whole-term rewriting in Felleisen's calculus — which was, notably, a consequence of a structural design choice in the calculus that actually weakened the internal symmetry of the system.  The alternative strategy of vau-calculus both had a more uniform infrastructure and avoided the non-compatible whole-term rewriting rules.  An analogous theory of basic physics ought to account for quantum entanglement without requiring wholesale branching of alternative universes.  Put another way, if gravity isn't included in quantum entanglement, and therefore has to diverge from the other forces at a level more basic than the level where quantum entanglement arises, then the level at which quantum entanglement arises cannot be the most basic level.

Just because quantum structure would not be at the deepest level of physics, does not at all suggest that what lies beneath it must be remotely classical.  Quantum mechanics is mathematically a sort of lens that distorts whatever classical system is passed through it; taking the Schrödinger equation as demonstrative,

iℏ Ψ
t
 =   Ĥ Ψ ,
the classical system is contained in the Hamiltonian function Ĥ, which is plugged into the equation to produce a suitable spectrum of alternatives.  Hence my description of the quantum equation itself as basic.  But, following the vau-calculus analogy, it seems some sort of internal non-locality ought to be basic, as it follows from the existence of the network topology; looking at vau-calculus, even the β-rule fully engages the network topology, though co-hygienically.

Geometry and network

The above insights on the physical theory itself are mostly negative, indicating what this sort of theory of physics would not be like, what characteristics of conventional quantum math it would not have.  What sort of structure would it have?

I'm not looking for detailed math, just yet, but the overall shape into which the details would be cast.  Some detailed math will be needed, before things go much further, to demonstrate that the proposed approach is capable of generating predictions sufficiently consistent with quantum mechanics, keeping in mind the well-known no-go result of Bell's Theorem.  I'm aware of the need; the question, though, is not whether Bell's Theorem can be sidestepped — of course it can, like any other no-go theorem, by blatantly violating one of its premises — but whether it can be sidestepped by a certain kind of theory.  So the structure of the theory is part of the possibility question, and needs to be settled before we can ask the question properly.

In fact, one of my concerns for this sort of theory is that it might have too many ways to get around Bell's Theorem.  Occam's Razor would not look favorably on a theory with redundant Bell-avoidance devices.

Let's now set aside locality for a moment, and consider nondeterminism.  Bell's Theorem calls (in combination with some experimental results that are, somewhat inevitably, argued over) for chronological nondeterminism, that is, nondeterminism relative to the time evolution of the physical system.  One might, speculatively, be able to approximate that sort of nondeterminism arbitrarily well, in a fundamentally non-local theory, by exploiting the assumption that the physical system under consideration is trivially small relative to the whole cosmos.  We might be able to draw on interactions with distant elements of the cosmos to provide a more-or-less "endless" supply of pseudo-randomness.  I considered this possibility in the earlier post on co-hygiene, and it is an interesting theoretical question whether (or, at the very least, how) a theory of this sort could in fact generate the sort of quantum probability distribution that, according to Bell's Theorem, cannot be generated by a chronologically deterministic local theory.  The sort of theory I'm describing, however, is merely a way to provide a local illusion of nondeterminism in a non-local theory with global determinism — and when we're talking chronology, it is difficult even to define global determinism (because, thanks to relativity, "time" is tricky to define even locally; made even trickier since we're now contemplating a theory lacking the sort of continuity that relativity relies upon; and is likely impossible to define globally, thanks to relativity's deep locality).  It's also no longer clear anymore why one should expect chronological determinism at all.

A more straightforward solution, seemingly therefore favored by Occam's Razor, is to give up on chronological determinism and instead acquire mathematical determinism, by the arguably "obvious" strategy of supposing that the whole of spacetime evolves deterministically along an orthogonal dimension, converting unknown initial conditions (initial in the orthogonal dimension) into chronological nondeterminism.  I demonstrated the principle of this approach in an earlier post.  It is a bit over-powered, though; a mathematically deterministic theory of this sort — moreover, a mathematically deterministic and mathematically local theory of this sort — can readily generate not only a quantum probability distribution of the sort considered by Bell's Theorem, but, on the face of it, any probability distribution you like.  This sort of excessive power would seem rather disfavored by Occam's Razor.

The approach does, however, seem well-suited to a co-hygiene-directed theory.  Church-Rosser-ness implies that term rewriting should be treated as reasoning rather than directly as chronological evolution, which seemingly puts term rewriting on a dimension orthogonal to spacetime.  The earlier co-hygiene post noted that calculi, which converge to an answer via Church-Rosser-ness, contrast with grammars, which are also term-rewriting systems but exist for the purpose of diverging and are thus naturally allied with mathematical nondeterminism whereas calculi naturally ally with mathematical determinism.  So our desire to exploit the calculus/physics analogy, together with our desire for abstract separability of parts, seems to favor this use of a rewriting dimension orthogonal to spacetime.

A puzzle then arises about the notion of mathematical locality.  When the rewriting relation, through this orthogonal dimension (which I used to call "meta-time", though now that we're associating it with reasoning some other name is wanted), changes spacetime, there's no need for the change to be non-local.  We can apparently generate any sort of physical laws, quantum or otherwise, without the need for more than strictly local rewrite rules; so, again by Occam's Razor, why would we need to suppose a whole elaborate non-local "network topology"?  A strictly local rewriting rule sounds much simpler.

Consider, though, what we mean by locality.  Both nondeterminism and locality must be understood relative to a dimension of change, thus "chronological nondeterminism"; but to be thorough in defining locality we also need a notion of what it means for two elements of a system state to be near each other.  "Yes, yes," you may say, "but we have an obvious notion of nearness, provided by the geometry of spacetime."  Perhaps; but then again, we're now deep enough in the infrastructure that we might expect the geometry of spacetime to emerge from something deeper.  So, what is the essence of the geometry/network distinction in vau-calculus?

A λ-calculus term is a syntax tree — a graph, made up of nodes connected to each other by edges that, in this case, define the potential function-application relationships.  That is, the whole purpose of the context-free syntax is to define where the interactions — the redex patterns for applying the β-rule — are.  One might plausibly say much the same for the geometry of spacetime re gravity, i.e., location in spacetime defines the potential gravitational interactions.  The spacetime geometry is not, evidently, hierarchical like that of λ-calculus terms; that hierarchy is apparently a part of the function-application concept.  Without the hierarchy, there is no obvious opportunity for a direct physical analog to the property of compatibility in term-rewriting calculi.

The network topology, i.e., the variables, provide another set of connections between nodes of the graph.  These groups of connection are less uniform, and the variations between them do not participate in the redex patterns, but are merely tangential to the redex patterns thus cuing the engagement of a variable structure in a rewriting transformation.  In vau-calculi the variable is always engaged in the redex through its binding, but this is done for compatibility; by guaranteeing that all the variable instances occur below the binding in the syntax tree, the rewriting transformation can be limited to that branch of the tree.  Indeed, only the λ bindings really have a fixed place in the geometry, dictated by the role of the variable in the syntactically located function application; side-effect-ful bindings float rather freely, and their movement through the tree really makes no difference to the function-application structure as long as they stay far enough up in the tree to encompass all their matching variable instances.  If not for the convenience of tying these bindings onto the tree, one might represent them as partly or entirely separate from the tree (depending on which kind of side-effect one is considering), tethered to the tree mostly by the connections to the bound variable instances.  The redex pattern, embedded within the geometry, would presumably be at a variable instance.  Arranging for Church-Rosser-ness would, one supposes, be rather more challenging without compatibility.

Interestingly, btw, of the two classes of side-effects considered by vau-calculus (and by Felleisen), this separation of bindings from the syntax tree is more complete for sequential-state side-effects than for sequential-control side-effects — and sequential control is much more simply handled in vau-calculus than is sequential state.  I'm still wondering if there's some abstract principle here that could relate to the differences between various non-gravitational forces in physics, such as the simplicity of Maxwell's equations for electromagnetism.

This notion of a binding node for a variable hovering outside the geometry, tethered more-or-less-loosely to it by connections to variable instances, has a certain vague similarity to the aggressive non-locality of quantum wave functions.  The form of the wave function would, perhaps, be determined by a mix of the nature of the connections to the geometry together with some sort of blurring effect resulting from a poor choice of representing structures; the hope would be that a better choice of representation would afford a more focused description.

I've now identified, for vau-calculus, three structural differences between the geometry and the network.

  • The geometry contains the redex patterns (with perhaps some exotic exceptions).
  • The geometric topology is much simpler and more uniform than the network topology.
  • The network topology is treated hygienically by all rewriting transformations, whereas the geometry is treated co-hygienically only by one class of rewriting transformations (β).
But which of these three do we expect to carry over to physics?

The three major classes of rewriting operations in vau-calculus — function application, sequential control, and sequential state — all involve some information in the term that directs the rewrite and therefore belongs in the redex pattern.  All three classes of operations involve distributing information to all the instances of the engaged variable.  But, the three classes differ in how closely this directing information is tied to the geometry.

For function application, the directing information is entirely contained in the geometry, the redex pattern of the β-rule, ((λx.T1)T2).  The only information about the variable not contained within that purely geometric redex pattern is the locations of the bound instances.

For sequential control, the variable binder is a catch expression, and the bound variable instances are throw expressions that send a value up to the matching catch.  (I examined this case in detail in an earlier post.)  The directing information contained in the variable, beyond the locations of the bound instances, would seem to be the location of the catch; but in fact the catch can move, floating upward in the syntax tree, though moving the catch involves a non-co-hygienic substitutive transformation — in fact, the only non-co-hygienic transformation for sequential control.  So the directing information is still partly tied to the syntactic structure (and this tie is somehow related to the non-co-hygiene).  The catch-throw device is explicitly hierarchical, which would not carry over directly to physics; but this may be only a consequence of its relation to the function-application structure, which does carry over (in the broad sense of spacetime geometry).  There may yet be more to make of a side analogy between vau-calculus catch-throw and Maxwell's Equations.

For sequential state, the directing information is a full-blown environment, a mapping from symbols to values, with arbitrarily extensive information content and very little relation to geometric location.  The calculus rewrite makes limited use of the syntactic hierarchy to coordinate time ordering of assignments — not so much inherently hierarchical as inherently tied to the time sequencing of function applications, which itself happens to be hierarchical — but this geometric connection is even weaker than for catch-throw, and its linkage to time ordering is more apparent.  In correspondence with the weaker geometric ties, the supporting rewrite rules are much more complicated, as they moderate passage of information into and out of the mapping repository.

"Time ordering" here really does refer to time in broadly the same sense that it would arise in physics, not to rewriting order as such.  That is, it is the chronological ordering of events in the programming language described by the rewriting system, analogous to the chronological ordering of events described by a theory of physics.  Order of rewriting is in part related to described chronology, although details of the relationship would likely be quite different for physics where it's to do with relativity.  This distinction is confusing even in term-rewriting PL semantics, where PL time is strictly classical; one might argue that confusion between rewriting, which is essentially reasoning, and evaluation, which is the PL process reasoned about, resulted in the unfortunately misleading "theory of fexprs is trivial" result which I have discussed here previously.

It's an interesting insight that, while part of the use of syntactic hierarchy in sequential control/state — and even in function application, really — is about compatibility, which afaics does not at all carry over to physics, their remaining use of syntactic hierarchy is really about coordination of time sequencing, which does occur in physics in the form of relativity.  Admittedly, in this sort of speculative exploration of possible theories for physics, I find the prospect of tinkering with the infrastructure of quantum mechanics not nearly as daunting as tinkering with the infrastructure of relativity.

At any rate, the fact that vau-calculus puts the redex pattern (almost always) entirely within a localized area of the syntax, would seem to be more a statement about the way the information is represented than about the geometry/network balance.  That is, vau-calculus represents the entire state of the system by a syntactic term, so each item of information has to be given a specific location in the term, even if that location is chosen somewhat arbitrarily.  It is then convenient, for time ordering, to require that all the information needed for a transformation should get together in a particular area of the term.  Quantum mechanics may suffer from a similar problem, in a more advanced form, as some of the information in a wave function may be less tied to the geometry than the equations (e.g. the Schrödinger equation) depict it.  What really makes things messy is devices that are related to the geometry but less tightly so than the primary, co-hygienic device.  Perhaps that is the ultimate trade-off, with differently structured devices becoming more loosely coupled to the geometry and proportionately less co-hygienic.

All of which has followed from considering the first of three geometry/network asymmetries:  that redex patterns are mostly contained in the geometry rather than the network.  The other two asymmetries noted were  (1) that the geometric structure is simple and uniform while the network structure is not, and  (2) that the network is protected from perturbation while the geometry is not — i.e., the operations are all hygienic (protecting the network) but not all are co-hygienic (protecting the geometry).  Non-co-hygiene complicates things only moderately, because the perturbations are to the simple, uniform part of the system configuration; all of the operations are hygienic, so they don't perturb the complicated, nonuniform part of the configuration.  Which is fortunate for mathematical treatment; if the perturbations were to the messy stuff, it seems we mightn't be able to cope mathematically at all.  So these two asymmetries go together.  In my more cynical moments, this seems like wishful thinking; why should the physical world be so cooperative?  However, perhaps they should be properly understood as two aspects of a single effect, itself a kind of separability, the same view I've recommended for Church-Rosser-ness; in fact, Church-Rosser-ness may be another aspect of the same whole.  The essential point is that we are able to usefully consider individual parts of the cosmos even though they're all interconnected, because there are limits on how aggressively the interconnectedness is exercised.  The "geometry" is the simple, uniform way of decomposing the whole into parts, and "hygiene" is an assertion that this decomposition suffices to keep things tractable.  It's still fair to question why the cosmos should be separable in this way, and even to try to build a theory of physics in which the separation breaks down; but there may be some reassurance, re Occam's Razor, in the thought that these two asymmetries (simplicity/uniformity, and hygiene) are two aspects of a single serendipitous effect, rather than two independently serendipitous effects.

Cosmic structure

Most of these threads are pointing toward a rewriting relation along a dimension orthogonal to spacetime, though we're lacking a good name for it atm (I tend to want to name things early in the development process, though I'm open to change if a better name comes along).

One thread, mentioned above, that seems at least partly indifferent to the rewriting question, is that of changes in the character of quantum mechanics at cosmological scale.  This relates to the notion of decoherence.  It was recognized early in the conceptualization of quantum mechanics that a very small entangled quantum system would tend to interact with the rest of the universe and thereby lose its entanglement and, ultimately, become more classical. We can only handle the quantum math for very small physical systems; in fact, rather insanely small physical systems.  Intuitively, what if this tendency of entanglement to evaporate when interacting with the rest of the universe ceases to be valid when the size of the physical system is sufficiently nontrivial compared to the size of the whole universe?  In the traditional quantum mechanics, decoherence appears to be an all-or-nothing proposition, a strict dichotomy tied to the concept of observation.  If something else is going on at large scales, either it is an unanticipated implication of the math-that-we-can't-do, or it is an aspect of the physics that our quantum math doesn't include because the phenomena that would cause us to confront this aspect are many orders of magnitude outside anything we could possibly apply the quantum math to.  It's tantalizing that this conjures both the problem of observation, and the possibility that quantum mechanics may be (like Newtonian mechanics) only an approximation that's very good within its realm of application.

The persistently awkward interplay of the continuous and discrete is a theme I've visited before.  Relativity appears to have too stiff a dose of continuity in it, creating a self-reference problem even in the non-quantum case (iirc Einstein had doubts on this point before convincing himself the math of general relativity could be made to work); and when non-local effects are introduced for the quantum case, continuity becomes overconstraining.  Quantum gravity efforts suffer from a self-reference problem on steroids (non-renormalizable infinities).  The Big Picture perspective here is that non-locality and discontinuity go together because a continuum — as simple and uniform as it is possible to be — is always going to be perceived as geometry.

The non-local network in vau-calculus appears to be inherently discrete, based on completely arbitrary point-to-point connections defined by location of variable instances, with no obvious way to set up any remotely similar continuous arrangement.  Moreover, the means I've described for deriving nondeterminism from the network connections (on which I went into some detail in the earlier post) exploits the potential for chaotic scrambling of discrete point-to-point connections by following successions of links hopscotching from point to point.  While the geometry might seem more amenable to continuity, a truly continuous geometry doesn't seem consistent with point-to-point network connections, either, as one would then have the prospect of an infinitely dense tangle of network connections to randomly unrelated remote points, a sort of probability-density field that seems likely to wash out the randomness advantages of the strategy and less likely to be mathematically useful; so the whole rewriting strategy appears discrete in both the geometry and network aspects of its configuration as well as in the discrete rewriting steps themselves.

The rewriting approach may suffer from too stiff a dose of discreteness, as it seems to force a concrete choice of basic structures.  Quantum mechanics is foundationally flexible on the choice of elementary particles; the mathematical infrastructure (e.g. the Schrödinger equation) makes no commitment on the matter at all, leaving it to the Hamiltonian Ĥ.  Particles are devised comparatively freely, as with such entities as phonons and holes.  Possibly the rewriting structure one chooses will afford comparable flexibility, but it's not at all obvious that one could expect this level of versatile refactoring from a thoroughly discrete system.  Keeping in mind this likely shortfall of flexibility, it's not immediately clear what the basic elements should be.  Even if one adopts, say, the standard model, it's unclear how that choice of observable particles would correspond to concrete elements in a discrete spacetime-rewriting system (in one "metaclassical" scenario I've considered, spacetime events are particle-like entities tracing out one-dimensional curves as spacetime evolves across an orthogonal dimension); and it is by no means certain that the observable elements ought to follow the standard model, either.  As I write this there is, part of the time, a cat sitting on the sofa next to me.  It's perfectly clear to me that this is the correct way to view the situation, even though on even moderately closer examination the boundaries of the cat may be ambiguous, e.g. at what point an individual strand of fur ceases to be part of the cat.  By the time we get down to the scale where quantum mechanics comes into play and refactoring of particles becomes feasible, though, is it even certain that those particles are "really" there?  (Hilaire Belloc cast aspersions on the reality of a microbe merely because it couldn't be seen without the technological intervention of a microscope; how much more skepticism is recommended when we need a gigantic particle accelerator?)

Re the structural implications of quasiparticles (such as holes), note that such entities are approximations introduced to describe the behavior of vastly complicated systems underneath.  A speculation that naturally springs to mind is, could the underlying "elementary" particles be themselves approximations resulting from complicated systems at a vastly smaller scale; which would seem problematic in conventional physics since quantum mechanics is apparently inclined to stop at Planck scale.  However, the variety of non-locality I've been exploring in this thread may offer a solution:  by maintaining network connections from an individual "elementary" particle to remote, and rather arbitrarily scrambled, elements of the cosmos, one could effectively make the entire cosmos (or at least significant parts of it) serve as the vastly complicated system underlying the particle.

It is, btw, also not certain what we should expect as the destination of a spacetime-rewriting relation.  An obvious choice, sufficient for a proof-of-concept theory (previous post), is to require that spacetime reach a stable state, from which there is either no rewriting possible, or further rewriting leaves the system state unchanged.  Is that the only way to derive a final state of spacetime?  No.  Whatever other options might be devised, one that comes to mind is some form of cycle, repeating a closed set of states of spacetime, perhaps giving rise to a set of states that would manifest in more conventional quantum math as a standing wave.  Speculatively, different particles might differ from each other by the sort of cyclic pattern they settle into, determining a finite — or perhaps infinite — set of possible "elementary particles".  (Side speculation:  How do we choose an initial state for spacetime?  Perhaps quantum probability distributions are themselves stable in the sense that, while most initial probability distributions produce a different final distribution, a quantum distribution produces itself.)

Granting that the calculus/physics analogy naturally suggests some sort of physical theory based on a discrete rewriting system, I've had recurring doubts over whether the rewriting ought to be in the direction of time — an intuitively natural option — or, as discussed, in a direction orthogonal to spacetime.  At this point, though, we've accumulated several reasons to prefer rewriting orthogonal to spacetime.

Church-Rosser-ness.  CR-ness is about ability to reason separately about the implications of different parts of the system, without having to worry about which reasoning to do first.  The formal property is that whatever order one takes these locally-driven inferences in ("locally-driven" being a sort of weak locality), it's always possible to make later inferences that reach a common conclusion by either path.  This makes it implausible to think of these inference steps as if they were chronological evolution.

Bell's Theorem.  The theorem says, essentially, the probability distributions of quantum mechanics can't be generated by a conventionally deterministic local theory.  Could it be done by a non-local rewriting theory evolving deterministically forward in time?  My guess would be, probably it could (at least for classical time); but I suspect it'd be rather artificial, whereas my sense of the orthogonal-dimension rewriting approach (from my aforementioned proof-of-concept) is that it ought to work out neatly.

Relativity.  Uses an intensively continuous mathematical infrastructure to construct a relative notion of time.  It would be rather awkward to set an intensively discrete rewriting relation on top of this relative notion of time; the intensively discrete rewriting really wants to be at a deeper level of reality than any continuous relativistic infrastructure, rather than built on top of it (just as we've placed it at a deeper level than quantum entanglement), with apparent continuity arising from statistical averaging over the discrete foundations.  Once rewriting is below relativity, there is no clear definition of a "chronological" direction for rewriting; so rewriting orthogonal to spacetime is a natural device from which to derive relativistic structure.  Relativity is however a quintessentially local theory, which ought to be naturally favored by a predominately local rewriting relation in the orthogonal dimension.  Deriving relativistic structure from an orthogonal rewriting relation with a simple causal structure also defuses the self-reference problems that have lingered about gravity.

It's rather heartening to see this feature of the theory (rewriting orthogonal to spacetime) — or really any feature of a theory — drawing support from considerations in both quantum mechanics and relativity.

The next phase of exploring this branch of theory — working from these clues to the sort of structure such a theory ought to have — seems likely to study how the shape of a spacetime-orthogonal rewriting system determines the shape of spacetime.  My sense atm is that one would probably want particular attention to how the system might give rise to a relativity-like structure, with an eye toward what role, if any, a non-local network might play in the system.  Keeping in mind that β-rule use of network topology, though co-hygienic, is at the core of what function application does and, at the same time, inspired my suggestion to simulate nondeterminism through repeatedly rescrambled network connections; and, likewise, keeping in mind evidence (variously touched on above) on the possible character of different kinds of generalized non-co-hygienic operations.