Steven Byrnes

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , Twitter , Mastodon , Threads , Bluesky , GitHub , Wikipedia , Physics-StackExchange , LinkedIn

Sequences

Valence
Intro to Brain-Like-AGI Safety

Wiki Contributions

Comments

Sorted by

… and it turns out the behavior researchers have done exactly that kind of test; they call it testing for “linearity”. Indeed, they’ve done it many many times over, with several different operationalizations of the statistics, in a whole slew of species.

This thread makes it seem like rock-paper-scissors “pecking order” is not that uncommon, at least among r/BackYardChickens subreddit participants.

I have a moderately-anti-status-ladders-per-se-being-important discussion in §2.5.1 here.

I think every part of your post where you rely on the existence of a strict status ladder, could be lightly rephrased to not rely on that, without any substantive change.

Once pointed out, that also sounds like how human status tends to work! The new hire at the company, the new kid at school, the new member to the social group, the visitor at another’s house… all these people typically have very low dominance-status, at least within their new context.

I think it comes from being more confident / comfortable in a familiar environment. There’s some game theory at play, see §2.5.4 here.

I was just reading Daniel Dennett’s memoir for no reason in particular, it had some interesting glimpses into how professional philosophers actually practice philosophy. Like I guess there’s a thing where one person reads their paper (word-for-word!) and then someone else is the designated criticizer? I forget the details. Extremely different from my experience in physics academia though!!

(Obviously, reading that memoir is probably not the most time-efficient way to learn about the day-to-day practice of academic philosophy.)

(Oh, there was another funny anecdote in the memoir where the American professional philosopher association basically had a consensus against some school of philosophy, and everyone was putting it behind them and moving on, but then there was a rebellion where the people who still liked that school of philosophy did a hostile takeover of the association’s leadership!) 

Academic culture/norms - no or negative rewards for being more modest or expressing confusion. (Moral uncertainty being sometimes expressed because one can get rewarded by proposing some novel mechanism for dealing with it.)

A non-ethics example that jumps to my mind is David Chalmers on the Hard Problem of Consciousness here: “So if I’m giving my overall credences, I’m going to give, 10% to illusionism, 30% to panpsychism, 30% to dualism, and maybe the other 30% to, I don’t know what else could be true, but maybe there’s something else out there.” That’s the only example I can think of but I read very very little philosophy.

(I probably agree about formal verification. Instead, I’m arguing the narrow point that I think if someone were to simulate liquid water using just the Standard Model Lagrangian as we know it today, with no adjustable parameters and no approximations, on a magical hypercomputer, then they would calculate a freezing point that agrees with experiment. If that’s not a point you care about, then you can ignore the rest of this comment!)

OK let’s talk about getting from the Standard Model + weak-field GR to the freezing point of water. The weak force just leads to certain radioactive decays—hopefully we’re on the same page that it has well-understood effects that are irrelevant to water. GR just leads to Newton’s Law of Gravity which is also irrelevant to calculating the freezing point of water. Likewise, neutrinos, muons, etc. are all irrelevant to water.

Next, the strong force, quarks and gluons. That leads to the existence of nuclei, and their specific properties. I’m not an expert but I believe that the standard model via “lattice QCD” predicts the proton mass pretty well, although you need a supercomputer for that. So that’s the hydrogen nucleus. What about the oxygen nucleus? A quick google suggests that simulating an oxygen nucleus with lattice QCD is way beyond what today’s supercomputers can do (seems like the SOTA is around two nucleons, whereas oxygen has 16). So we need an approximation step, where we say that the soup of quarks and gluons approximately condenses into sets of quark-triples (nucleons) that interact by exchanging quark-doubles (pions). And then we get the nuclear shell model etc. Well anyway, I think there’s very good reason to believe that someone could turn the standard model and a hypercomputer into the list of nuclides in agreement with experiment; if you disagree, we can talk about that separately.

OK, so we can encapsulate all those pieces and all that’s left are nuclei, electrons, and photons—a.k.a. quantum electrodynamics (QED). QED is famously perhaps the most stringently tested theory in science, with two VERY different measurements of the fine structure constant agreeing to 1 part in 1e8 (like measuring the distance from Boston to San Francisco using two very different techniques and getting the same answer to within 4 cm—the techniques are probably sound!).

But those are very simple systems; what if QED violations are hiding in particle-particle interactions? Well, you can do spectroscopy of atoms with two electrons and a nucleus (helium or helium-like), and we still get up to parts-per-million agreement with no-adjustable-parameter QED predictions, and OK yes this says there’s a discrepency very slightly (1.7×) outside the experimental uncertainty bars but historically it’s very common for people to underestimate their experimental uncertainty bars by that amount.

But that’s still only two electrons and a nucleus; what about water with zillions of atoms and electrons? Maybe there’s some behavior in there that contradicts QED?

For one thing, it’s hard and probably impossible to just posit some new fundamental physics phenomenon that impacts a large aggregate of atoms without having any measurable effect on precision atomic measurements, particle accelerator measurements, and so on. Almost any fundamental physics phenomenon that you write down would violate some symmetry or other principle that seems to be foundational, or at any rate, that has been tested at even higher accuracy than the above (e.g. the electron charge and proton charge are known to be exact opposites to 1e-21 accuracy, the vacuum dispersion is zero to 1e18 accuracy … there are a ton of things like that that tend to be screwed up by any fundamental physics phenomenon that is not of a very specific type, namely a term that looks like quantum field theory as we know it today).

For another thing, ab initio molecular simulations exist and do give results compatible with macroscale material properties, which might or might not include the freezing point of water (this seems related but I’m not sure upon a quick google). “Ab initio” means “starting from known fundamental physics principles, with no adjustable parameters”.

Now, I’m sympathetic to the conundrum that you can open up some paper that describes itself as an “ab initio”, and OK if the authors are not outright lying then we can feel good that there are no adjustable parameters in the source code as such. But surely the authors were making decisions about how to set up various approximations. How sure are we that they weren’t just messing around until they got the right freezing point, IR spectrum, shear strength, or whatever else they were calculating?

I think this is a legitimate hypothesis to consider and I’m sure it’s true of many individual papers. I’m not sure how to make it legible, but I have worked in molecular dynamics myself and had extremely smart and scrupulous friends in really good molecular dynamics labs such that I could see how they worked. And I don’t think the above paragraph concern is a correct description of the field. I think there’s a critical mass of good principled researchers who can recognize when people are putting more into the simulations than they get out, and keep the garbage studies out of textbooks and out of open-source tooling.

I guess one legible piece of evidence is that DFT was the best (and kinda only) approximation scheme that lets you calculate semiconductor bandgaps from first principles with reasonable amounts of compute, for many decades. And DFT famously always gives bandgaps that are too small. Everybody knew that, and that means that nobody was massaging their results to get the right bandgap. And it means that whenever people over the decades came up with some special-pleading correction that gave bigger bandgaps, the field as a whole wasn’t buying it. And that’s a good sign! (My impression is that people now have more compute-intensive techniques that are still ab initio and still “principled” but which give better bandgaps.)

FWIW I’m with Steve O here, e.g. I was recently writing the following footnote in a forthcoming blog post:

“The Standard Model of Particle Physics plus perturbative quantum general relativity” (I wish it was better-known and had a catchier name) appears sufficient to explain everything that happens in the solar system. Nobody has ever found any experiment violating it, despite extraordinarily precise tests. This theory can’t explain everything that happens in the universe—in particular, it can’t make any predictions about either (A) microscopic exploding black holes or (B) the Big Bang. Also, (C) the Standard Model happens to includes 18 elementary particles (depending on how you count), because those are the ones we’ve discovered; but the theoretical framework is fully compatible with other particles existing too, and indeed there are strong theoretical and astronomical reasons to think they do exist. It’s just that those other particles are irrelevant for anything happening on Earth. Anyway, all signs point to some version of string theory eventually filling in those gaps as a true Theory of Everything. After all, string theories seem to be mathematically well-defined, to be exactly compatible with general relativity, and to have the same mathematical structure as the Standard Model of Particle Physics (i.e., quantum field theory) in the situations where that’s expected. Nobody has found a specific string theory vacuum with exactly the right set of elementary particles and masses and so on to match our universe. And maybe they won’t find that anytime soon—I’m not even sure if they know how to do those calculations! But anyway, there doesn’t seem to be any deep impenetrable mystery between us and a physics Theory of Everything.

(I interpret your statement to be about everyday experiences which depend on something being incomplete / wrong in fundamental physics as we know it, as opposed to just saying the obvious fact that we don’t understand all the emergent consequences of fundamental physics as we know it.)

I also think “we basically have no ability to model any high-level phenomena using quantum field theory” is misleading. It’s true that we can’t directly use the Standard Model Lagrangian to simulate a transistor. But we do know how and why and to what extent quantum field theory reduces to normal quantum mechanics and quantum chemistry (to such-and-such accuracy in such-and-such situations), and we know how those in turn approximately reduce to fluid dynamics and solid mechanics and classical electromagnetism and so on (to such-and-such accuracy in such-and-such situations), and now we’re all the way at the normal set of tools that physicists / chemists / engineers actually use to model high-level phenomena. You’re obviously losing fidelity at each step of simplification, but you’re generally losing fidelity in a legible way—you’re making specific approximations, and you know what you’re leaving out and why omitting it is appropriate in this situation, and you can do an incrementally more accurate calculation if you need to double-check. Do you see what I mean?

By (loose) analogy, someone could say “we don’t know for sure that intermolecular gravitational interactions are irrelevant for the freezing point of water, because nobody has ever included intermolecular gravitational interactions in a molecular dynamics calculation”. But the reason nobody has ever included them in a calculation is because we know for sure that they’re infinitesimal and irrelevant. Likewise, a lot of the complexity of QFT is infinitesimal and irrelevant in any particular situation of interest.

I haven’t looked at that report in particular, but I VERY quickly looked into fluoride 6 months ago for my own decision-making purposes, and I wound up feeling like (1) a bunch of the studies are confounded by the fact that polluted areas have more fluoride, and people with more income / education / etc. [which are IQ correlates] are better at avoiding living in polluted areas and drinking the water, (2) getting fluoride out of my tap water is sufficiently annoying / weird that I don’t immediately want to bother in the absence of stronger beliefs (e.g. normal activated carbon filters don’t get the fluoride out), (3) I should brush with normal toothpaste then rinse with water, then use fluoride mouthwash right before bed (and NOT rinse with water afterwards, but do try extra hard to spit out as much of it as possible), (4) use fluoride-free toothpaste for the kids until they’re good at spitting it out (we were already doing this, I think it’s standard practice), but then switch.

I’m very open to (1) being wrong and any of (2-4) being the wrong call. FWIW, where I live, the tap water is 0.7mg/L.

Sure, but the way it's described, it sounds like there's one adjustable parameter in the source code. If the setup allows for thousands of independently-adjustable parameters in the source code, that seems potentially useful but I'd want to know more details.

I think it's unlikely we get there in the foreseeable future, with the current paradigms

It would be nice if you could define “foreseeable future”. 3 years? 10 years? 30? 100? 1000? What?

And I’m not sure why “with the current paradigms” is in that sentence. The post you’re responding to is “Ten arguments that AI is an existential risk”, not “Ten arguments that Multimodal Large Language Models are an existential risk”, right?

If your assumption is that “the current paradigms” will remain the current paradigms for the “foreseeable future”, then you should say that, and explain why you think so. It seems to me that the paradigm in AI has had quite a bit of change in the last 6 years (i.e. since 2018, before GPT-2, i.e. a time when few had heard of LLMs), and has had complete wrenching change in the last 20 years (i.e. since 2004, many years before AlexNet, and a time when deep learning as a whole was still an obscure backwater, if I understand correctly). So by the same token, it’s plausible that the field of AI might have quite a bit of change in the next 6 years, and complete wrenching change in the next 20 years, right?

I just signed up for the Patreon and encourage others to do the same! Abram has done a lot of good work over the years—I’ve learned a lot of important things, things that affect my own research and thinking about AI alignment, by reading his writing.

I just made a wording change from:

Normies like me have an intuitive mental concept “me” which is simultaneously BOTH (A) me-the-human-body-etc AND (B) me-the-soul / consciousness / wellspring of vitalistic force / what Dan Dennett calls a “homunculus” / whatever.

to:

Normies like me (Steve) have an intuitive mental concept “Steve” which is simultaneously BOTH (A) Steve-the-human-body-etc AND (B) Steve-the-soul / consciousness / wellspring of vitalistic force / what Dan Dennett calls a “homunculus” / whatever.

I think that’s closer to what I was trying to get across. Does that edit change anything in your response?

At least the 'me-the-human-body' part of the concept. I don't know what the '-etc' part refers to.

The “etc” would include things like the tendency for fingers to reactively withdraw from touching a hot surface.

Elaborating a bit: In my own (physicalist, illusionist) ontology, there’s a body with a nervous system including the brain, and the whole mental world including consciousness / awareness is inextricably part of that package. But in other people’s ontology, as I understand it, some nervous system activities / properties (e.g. a finger reactively withdrawing from pain, maybe some or all other desires and aversions) gets lumped in with the body, whereas other [things that I happen to believe are] nervous system activities / properties (e.g. awareness) gets peeled off into (B). So I said “etc” to include all the former stuff. Hopefully that’s clear.

(I’m trying hard not to get sidetracked into an argument about the true nature of consciousness—I’m stating my ontology without defending it.)

Many helpful replies! Here’s where I’m at right now (feel free to push back!) [I’m coming from an atheist-physicalist perspective; this will bounce off everyone else.]

Hypothesis:

Normies like me (Steve) have an intuitive mental concept “Steve” which is simultaneously BOTH (A) Steve-the-human-body-etc AND (B) Steve-the-soul / consciousness / wellspring of vitalistic force / what Dan Dennett calls a “homunculus” / whatever.

The (A) & (B) “Steve” concepts are the same concept in normies like me, or at least deeply tangled together. So it’s hard to entertain the possibility of them coming apart, or to think through the consequences if they do.

Some people can get into a Mental State S (call it a form of “enlightenment”, or pick your favorite terminology) where their intuitive concept-space around (B) radically changes—it broadens, or disappears, or whatever. But for them, the (A) mental concept still exists and indeed doesn’t change much.

Anyway, people often have thoughts that connect sense-of-self to motivation, like “not wanting to be embarrassed” or “wanting to keep my promises”. My central claim that the relevant sense-of-self involved in that motivation is (A), not (B).

If we conflate (A) & (B)—as normies like me are intuitively inclined to do—then we get the intuition that a radical change in (B) must have radical impacts on behavior. But that’s wrong—the (A) concept is still there and largely unchanged even in Mental State S, and it’s (A), not (B), that plays a role in those behaviorally-important everyday thoughts like “not wanting to be embarrassed” or “wanting to keep my promises”. So radical changes in (B) would not (directly) have the radical behavioral effects that one might intuitively expect (although it does of course have more than zero behavioral effect, with self-reports being an obvious example).

End of hypothesis. Again, feel free to push back!

Load More