Steven Byrnes — LessWrong

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed, X/Twitter, Bluesky, Substack, LinkedIn, and more at my website.

Cool! Oops, I probably just skimmed too fast and incorrectly pattern-matched you to other people I’ve talked to about this topic in the past. :-P

It’s true that thermodynamics was historically invented before statistical mechanics, and if you find stat-mech-free presentations of thermodynamics to be pedagogically helpful, then cool, whatever works for you. But at the same time, I hope we can agree that the stat-mech level is the actual truth of what’s going on, and that the laws of thermodynamics are not axioms but rather derivable from the fundamental physical laws of the universe (particle physics etc.) via statistical mechanics. If you find the probabilistic definition of entropy and temperature etc. to be unintuitive in the context of steam engines, then I’m sorry but you’re not done learning thermodynamics yet, you still have work ahead of you. You can’t just call it a day because you have an intuitive feel for stat-mech and also separately have an intuitive feel for thermodynamics; you’re not done until those two bundles of intuitions are deeply unified and interlinking. [Or maybe you’re already there and I’m misreading this post? If so, sorry & congrats :) ]

Hmm, my usage seems more like: “I think that…” means the reader/listener might disagree with me, because maybe I’m wrong and the reader is right. (Or maybe it’s subjective.) Meanwhile, “I claim that…” also means the reader might disagree with me, but if they do, it’s only because I haven’t explained myself (yet), and the reader will sooner or later come to see that I’m totally right. So “I think” really is pretty centrally about confidence levels. I think :)

To me, this doesn’t sound related to “anxiety” per se, instead it sounds like you react very strongly to negative situations (especially negative social situations) and thus go out of your way to avoid even a small chance of encountering such a situation. I’m definitely like that (to some extent). I sometimes call it “neuroticism”, although the term “neuroticism” is not great either, it encompasses lots of different things, not all of which describe me.

Like, imagine there’s an Activity X (say, inviting an acquaintance to dinner), and it involves “carrots” (something can go well and that feels rewarding), and also “sticks” (something can go badly and that feels unpleasant). For some people, their psychological makeup is such that the sticks are always especially painful (they have sharp thorns, so to speak). Those people will (quite reasonably) choose not to partake in Activity X, even if most other people would, at least on the margin. This is very sensible, it’s just cost-benefit analysis. It needn’t have anything to do with “anxiety”. It can feel like “no thanks, I don’t like Activity X so I choose not to do it”.

(Sorry if I’m way off-base, you can tell me if this doesn’t resonate with your experience.)

(semi-related)

That was a scary but also fun read, thanks for sharing and glad you’re doing OK ❤️

(That drawing of the Dunning-Kruger Effect is a popular misconception—there was a post last week on that, see also here.)

I think there’s “if you have a hammer, everything looks like a nail” stuff going on. Economists spend a lot of time thinking about labor automation, so they often treat AGI as if it will be just another form of labor automation. LLM & CS people spend a lot of time thinking about the LLMs of 2025, so they often treat AGI as if it will be just like the LLMs of 2025. Military people spend a lo of time thinking about weapons, so they often treat AGI as if it will be just another weapon. Etc.

So yeah, this post happens to be targeted at economists, but that’s not because economists are uniquely blameworthy, or anything like that.

The “multiple stage fallacy fallacy” is the fallacious idea that equations like

are false, when in fact they are true. :-P

I think Nate here & Eliezer here are pointing to something real, but the problem is not multiple stages per se but rather (1) “treating stages as required when in fact they’re optional” and/or (2) “failing to properly condition on the conditions and as a result giving underconfident numbers”. For example, if A & B & C have all already come true in some possible universe, then that’s a universe where maybe you have learned something important and updated your beliefs, and you need to imagine yourself in that universe before you try to evaluate $P (D | A & B & C)$

Of course, that paragraph is just parroting what Eliezer & Nate wrote, if you read what they wrote. But I think other people on LW have too often skipped over the text and just latched onto the name “multiple stages fallacy” instead of drilling down to the actual mistake.

In the case at hand, I don’t have much opinion in the absence of more details about the AI training approach etc., but here’s a couple general comments.

If an AI development team notices Problem A and fixes it, and then notices Problem B and fixes it, and then notices Problem C and fixes it, we should expect that it’s less likely, not more likely, that this same team will preempt Problem D before Problem D actually occurs.

Conversely, if the team has a track record of preempting every problem before it arises (when the problems are low-stakes), then we can have incrementally more hope that they will also preempt high-stakes problems.

Likewise, if there simply are no low-stakes problems to preempt or respond to, because it’s a kind of system that just automatically by its nature has no problems in the first place, then we can feel generically incrementally better about there not being high-stakes problems.

Those comments are all generic, and readers are now free to argue with each other about how they apply to present and future AI. :)

I genuinely appreciate the sanity-check and the vote of confidence here!

Uhh, well, technically I wrote that sentence as a conditional, and technically I didn’t say whether or not the condition applied to you-in-particular.

…I hope you have good judgment! For that matter, I hope I myself have good judgment!! Hard to know though. ¯\_(ツ)_/¯

I noticed that peeing is rewarding? What the hell?! How did enough of my (human) non-ancestors die because peeing wasn't rewarding enough? The answer is they weren't homo sapiens or hominids at all.

I would split it into two questions:

(1) what’s the evolutionary benefit of peeing promptly?
(2) In general, if it’s evolutionarily beneficial to do X, why does the brain implement desire-to-X in the form of both “carrots” and “sticks”, as opposed to just one or just the other? Needing to pee is unpleasant (stick) AND peeing is then pleasant (carrot). Being hungry is unpleasant (stick) AND eating is then pleasant (carrot). Etc.

I do think there’s a generic answer to (2) in terms of learning algorithms etc., but no need to get into the details here.

As for (1), you’re wasting energy by carrying around extra weight of urine. Maybe there are other factors too. (Eventually of course you risk incontinence or injury or even death.) Yes I think it’s totally possible that our hominin ancestors had extra counterfactual children by wasting 0.1% less energy or whatever. Energy is important, and every little bit helps.

There are about ~100-200 different neurotransmitters our brains use. I was surprised to find out that I could not find a single neurotransmitter that is not shared between humans and mice (let me know if you can find one, though).

Like you said, truly new neurotransmitters are rare. For example, oxytocin and vasopressin split off from a common ancestor in a gene duplication event 500Mya, and the ancestral form has homologues in octopuses and insects etc. OTOH, even if mice and humans have homologous neurotransmitters, they presumably differ by at least a few mutations; they’re not exactly the same. (Separately, their functional effects are sometimes quite different! For example, eating induces oxytocin release in rodents but vasopressin release in humans.)

Anyway, looking into recent evolutionary changes to neurotransmitters (and especially neuropeptides) is an interesting idea (thanks!). I found this paper comparing endocrine systems of humans and chimps. It claims (among other things) that GNRH2 and UCN2 are protein-coding genes in humans but inactive (“pseudogenes”) in chimps. If true, what does that imply? Beats me. It does not seem to have any straightforward interpretation that I can see. Oh well.

Thanks for the advice. I have now added at least the basic template, for the benefit of readers who don’t already have it memorized. I will leave it to the reader to imagine the curves moving around—I don’t want to add too much length and busy-ness.

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments