Jon Garcia

I have a PhD in Computational Neuroscience from UCSD (Bachelor's was in Biomedical Engineering with Math and Computer Science minors). Ever since junior high, I've been trying to figure out how to engineer artificial minds, and I've been coding up artificial neural networks ever since I first learned to program. Obviously, all my early designs were almost completely wrong/unworkable/poorly defined, but I think my experiences did prime my brain with inductive biases that are well suited for working on AGI.

Although I now work as a data scientist in R&D at a large medical device company, I continue to spend my free time studying the latest developments in AI/ML/DL/RL and neuroscience and trying to come up with models for how to bring it all together into systems that could actually be implemented. Unfortnately, I don't seem to have much time to develop my ideas into publishable models, but I would love to have the opportunity to share ideas with those who do.

Of course, I'm also very interested in AI Alignment (hence the account here). My ideas on that front mostly fall into the "learn (invertible) generative models of human needs/goals and hook those up to the AI's own reward signal" camp. I think methods of achieving alignment that depend on restricting the AI's intelligence or behavior are about as destined to failure in the long term as Prohibition or the War on Drugs in the USA. We need a better theory of what reward signals are for in general (probably something to do with maximizing (minimizing) the attainable (dis)utility with respect to the survival needs of a system) before we can hope to model human values usefully. This could even extend to modeling the "values" of the ecological/socioeconomic/political supersystems in which humans are embedded or of the biological subsystems that are embedded within humans, both of which would be crucial for creating a better future.


Sorted by New

Wiki Contributions


Why do you need the story?

I notice that while a lot of the answer is formal and well-grounded, "stories have the minimum level of internal complexity to explain the complex phenomena we experience" is itself a story :)

Yep. That's just how humans think about it: complex phenomena require complex explanations. "Emergence," as complexity arising from the many simple interactions of many simple components, I think is a pretty recent concept for humanity. People still think intelligent design makes more intuitive sense than evolution, for instance, even though the latter makes astronomically fewer assumptions and should be favored a priori by Occam's Razor.

Why do you need the story?

By "story," I mean something like a causal/conceptual map of an event/system/phenomenon, including things like the who, what, when, where, why, and how. At the level of sentences, this would be a map of all the words according to their semantic/syntactic role, like part of speech, with different slots for each role and connections relating them together. At the level of what we would normally call "stories," such a story map would include slots for things like protagonist, antagonist, quest, conflict, plot points, and archetypes, along with their various interactions.

In the brain, these story maps/graphs could be implemented as regions of the cortex. Just as some cortical regions have retinotopic or somatotopic maps, more abstract regions may contain maps of conceptual space, along with neural connections between subregions that represent causal, structural, semantic, or social relationships between items in the map. Other brain regions may learn how to traverse these maps in systematic ways, giving rise to things like syntax, story structure, and action planning.

I've suggested before ( that I think these sorts of maps may be key to understanding things like language and consciousness. Stories that can be loaded into and from long-term memory or transferred between minds via language can offer a huge selective advantage, both to individual humans and to groups of humans. I think the recogition, accumulation, and transmission of stories is actually pretty fundamental to how human psychology works.

Why do you need the story?

This deep psychological need to latch onto some story, any story, to explain what we don't understand, seems to me to tie back in to the Bayesian Brain Hypothesis. Basically, our brains are constantly and uncontrollably generating hypotheses for the evidence we encounter in the world, seeing which ones predict our experiences with the greatest likelihood (weighted by our biological and cultural priors, of course). These hypotheses come in the form of stories because stories have the minimum level of internal complexity to explain the complex phenomena we experience (which, themselves, we internalize as stories). Choosing the "best" explanation, of course, follows Bayes' formula:

A few problems with this:

  1. We might just be terrible at choosing good priors (). Occam's Razor / Solomonoff Induction just isn't that intuitive to most humans. Most people find consciousness (which is familiar) to be simpler than neuroscience (which is alien), so they see no problem hypothesizing disembodied spirits, yet they scoff at the idea of humans being no more than matter. Astrology sounds reasonable when you have no reason to think that stars and planets shouldn't have personalities and try to affect your personal life, like everyone else, just so long as you don't try to figure out how that would actually work at a mechanistic level. Statistical modeling, on the other hand, is hard for humans to grasp, and therefore much more complicated, and therefore much less likely to have any explanatory power a priori, at least as far as most people are concerned.
  2. Likelihood functions () can be really hard to figure out. They require coming up with hypotheses that have the same causal structure as the real system they're trying to predict. When most of our declarative mental models exist at the level of abstraction of human social dynamics, it can be difficult to accurately imagine all the interacting bodily systems and metabolic pathways that make NSAIDs (or any other drugs, to say nothing of whole foods) have the precise effect that they do.
  3. Unfortunately, evolution didn't equip us with very good priors for how much weight to give to unimagined hypotheses, so we end up normalizing the posterior distribution by only those hypotheses we can think of. That means the denominator in the equation above () is often much less than it should be, even if the priors and evidential likelihoods are all correct, because other hypotheses have not had a chance to weigh in. For most people, all future (or as-yet unheard-of) scientific discoveries are effectively given a prior probability of 0, while all the myths passed down from the tribal/religious/political elders seem to explain everything as well as anything they've ever heard, and so those stories get all the weight and all the acceptance.

It's unavoidable for us as humans with Bayesian-ish brains to start coming up with stories to explain phenomena, even when evidence is lacking. We just need to be careful to cultivate an awareness for when our priors may be mistaken, for when our stories don't have sufficiently reductionist internal causal structure to explain what they are meant to explain, and for when we probably haven't even considered hypotheses that are anywhere close to the true explanation.

From language to ethics by automated reasoning

Well, if it's a language model anything like GPT-3, then any discussions about morality that it engages in will likely be permutations and rewordings of what it has seen in its training data. Such models aren't even guaranteed to produce text that is self-consistent over time, so I would expect to see conflicting moral stances from the AI that derive from conflicting moral stances of humans whose words it trained on. (Hopefully it was at least trained more on the Stanford Encyclopedia of Philosophy and less on Reddit/Twitter/Facebook.)

It would be interesting, though, if we could design a "language model" AI that continuously seeks self-consistency upon internal reflection. Maybe it would continuously generate moral statements, use them to predict policies under hypothetical scenarios, look for any conflicting predictions, develop moral statements that minimize the conflict, and retrain on the coherent moral statements. I would expect a process like this to converge over time, especially if we are starting from large sample of human moral opinions like a typical language model would, since all human moralities form a relatively tight cluster in behavioral policy space. Then maybe we would be one step closer to achieving the C in CEV.

Regardless, I agree with you overall in the sense that sophisticated language models will be necessary for aligning AGI with human morality at all the relevant levels of abstraction. I just don't think it will be anywhere near sufficient.

“The Wisdom of the Lazy Teacher”

If you teach an AI to fish, it might optimize its performance within a narrow scope. Teach it to teach itself to fish, and you've created a recursively self-improving AGI that is unaligned with human values by default and will most likely end up killing us all.

-- Eliezer, probably

From language to ethics by automated reasoning

Natural language exists as a low-bandwidth communication channel for imprinting one person's mental map onto another person's. The mental maps themselves are formed through direct interactions with an external environment. As such, I'm not sure it's possible to get around the symbol-grounding problem to reach natural language understanding without some form of embodied cognition (physical or virtual). Words only "mean" something when there is an element of a person's mental map of reality that the word is associated with (or an element of their language processing machinery for certain sentence particles), and those mental maps are formed through high-bandwidth perception.

However, even if you could get an AI to reach true understanding just from natural language data (i.e., by training on exponentially more language data than children do until the AI's map of reality is as fine-grained as it would have been from embodied interaction with the environment), and even if this AI had a complete understanding of human emotions and moral systems, it would not necessarily be aligned. It would need to have an aligned (or convergently alignable) motivational schema already in motion before you could trust it in general.

Why do you believe AI alignment is possible?

Yeah, Friston is a bit notorious for not explaining his ideas clearly enough for others to understand easily. It took me a while to wrap my head around what all his equations were up to and what exactly "active inference" entails, but the concepts are relatively straightforward once it all clicks.

You can think of "free energy" as the discrepancy between prediction and observation, like the potential energy of a spring stretched between them. Minimizing free energy is all about finding states with the highest probability and setting things up such that the highest probability states are those where your model predictions match your observations. In statistical mechanics, the probability of a particle occupying a particular state is proportional to the exponential of the negative potential energy of that state. That's why air pressure exponentially drops off with altitude (to a first approximation, ). For a normal distribution:

the energy is a parabola:

This is exactly the energy landscape you see for an ideal Newtonian spring with rest length  and spring constant . Physical systems always seek the configuration with the lowest free energy (e.g., a stretched spring contracting towards its rest length). In the context of mind engineering,  might represent an observation,  the prediction of the agent's internal model of the world, and  the expected precision of that prediction. Of course, these are all high-dimensional vectors, so matrix math is involved (Friston always uses  for the precision matrix).

For rational agents, free energy minimization involves adjusting the hidden variables in an agent's internal predictive model (perception) or adjusting the environment itself (action) until "predictions" and "observations" align to within the desired/expected precision. (For actions, "prediction" is a bit of a misnomer; it's actually a goal or a homeostatic set point that the agent is trying to achieve. This is what "active inference" is all about, though, and has caused free energy people to talk about motor outputs from the brain as being "self-fulfilling prophecies".) The predictive models that the agent uses for perception are actually built hierarchically, with each level acting as a dynamic generative model making predictions about the level below. Higher levels send predictions down to compare with the "observations" (state) of the level below, and lower levels send prediction errors back up to the higher levels in order to adjust the hidden variables through something like online gradient descent. This process is called "predictive coding" and leads to the minimization of the free energy between all levels in the hierarchy.

My little limerick was alluding to the idea that you could build an AGI to include a generative model of human behavior, using predictive coding to find the goals, policies, instinctual drives, and homeostatic set points that best explain the human's observed behavior. Then you could route these goals and policies to the AGI's own teleological system. That is, make the human's goals and drives, whatever it determines them to be using its best epistemological techniques, into its own goals and drives. Whether this could solve AI alignment would take some research to figure out. (Or just point out the glaring flaws in my reasoning here.)

Why do you believe AI alignment is possible?

The algorithms of good epistemology

Can also equip axiology.

With free energy minimization

Of other-mind prediction,

You can route it to an AI's teleology.

Open & Welcome Thread November 2021

Well, at the time I had assumed that Earth history was a special case, a small stage temporarily under quarantine from the rest of the universe where the problem of evil could play itself out. I hoped that God had created the rest of the universe to contain innumerable inhabited worlds, all of which would learn the lesson of just how good the Creator's system of justice is after contrasting against a world that He had allowed to take matters into its own hands. However, now that I'm out of that mindset, I realize that even a small Type-I ASI could easily do a much better job instilling such a lesson into all sentient minds than Yahweh has purportedly done (i.e., without all the blood sacrifices and genocides).

Open & Welcome Thread November 2021

Thanks. I think it's important not to forget the path I've taken. It's a major part of my identity even though I no longer endorse what were once my most cherished beliefs, and I feel that it helps connect me with the greater human experience. My parents and (ironically) my training in apologetics instilled in me a thirst for truth and an alertness toward logical fallacies that took me quite far from where I started in life. I guess that a greater emphasis on overcoming confirmation bias would have accelerated my truth-seeking journey a bit more. Unfortunately and surprisingly for a certain species of story-telling social primates, the truth is not necessarily what is believed and taught by the tribe. An idea is not true just because people devote lifetimes to defending it. And an idea is not false just because they spend lifetimes mocking it.

The one thing that held me back the most, I think, is my rather strong deontological instinct. I always saw it as my moral duty to apply the full force of my rational mind to defending the Revealed Truth. I was willing to apply good epistemology to modify my beliefs arbitrarily far, as long as it did not violate the moral constraint that my worldview remain consistent with the holistic biblical narrative. Sometimes that meant radically rethinking religious doctrines in light of science (or conflicting scriptures), but more often it pushed me to rationalize scientific evidence to fit with my core beliefs.

I always recognized that all things that are true are necessarily mutually consistent, that we all inhabit a single self-consistent Reality, and that the Truth must be the minimum-energy harmonization of all existing facts. However, it wasn't until I was willing to let go of the moral duty to retain the biblical narrative in my set of brute facts that the free energy of my worldview dropped dramatically. It was like a thousand high-tension cables binding all my beliefs to a single (misplaced) epistemological hub were all released at once. Suddenly, everything else in my worldview began to fall into place as all lines of evidence I had already accumulated pulled things into a much lower-energy configuration.

It's funny how a single powerful prior or a single moral obligation can skew everything else. I wish it were a more widely held virtue to deeply scrutinize one's most cherished beliefs and to reject them if necessary. Oh well. Maybe in the next million years if we can set up the social selection pressures right.

Load More