Steven Byrnes

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , Twitter , Mastodon , Threads , Bluesky , GitHub , Wikipedia , Physics-StackExchange , LinkedIn

Sequences

Valence
Intro to Brain-Like-AGI Safety

Wiki Contributions

Comments

The very next paragraph after the dinosaur-train thing says:

Of course, in polite society, we would typically both be “50%-leading”, or at least “close-to-50% leading”, and thus we would deftly and implicitly negotiate a compromise. Maybe the conversation topic would bounce back and forth between trains and dinosaurs, or we would find something else entirely to talk about, or we would stop talking and go watch TV, or we would find an excuse to cordially end our interaction, etc.

I think it’s really obvious that friends seek compromises and win-win-solutions where possible, and I think it’s also really obvious that not all participants in an interaction are going to wind up in the best of all possible worlds by their own lights. I think you’re unhappy that I’m spending the whole post talking about the latter, and basically don’t talk about finding win-wins apart from that one little paragraph, because you feel that this choice of emphasis conveys a vibe of “people are just out to get each other and fight for their own interest all the time” instead of a vibe of “kumbaya let’s all be friends”. If so, that’s not deliberate. I don’t feel that vibe and was not trying to convey it. I have friends and family just like you. Instead, I’m focusing on the latter because I think I have interesting things to say about it.

I think there’s also something else going on with your comment though…

As I mention in the third paragraph, there’s a kind of cultural taboo where we’re all supposed to keep up the pretense that mildly conflicting preferences between good friends simply don’t exist. Objectively speaking, I mean, what are the chances that my object-level preferences are exactly the same as yours down to the twelfth decimal place? Zero, right? But if we’re chatting, and you would very very slightly rather continue talking about sports, while I would very very slightly rather change the subject to the funny story that you heard at work, then we will mutually engage in a conspiracy of silence about this slight divergence, because mentioning the divergence out loud (and to some extent, even bringing it to conscious awareness in the privacy of your own head!) is not treated as pointing out an obvious innocuous truth, but rather a rude and pushy declaration that our perfectly normal slight divergence in immediate conversational interests is actually a big deal that poses a serious threat to our ability to get along and has to be somehow dealt with. It’s not! It’s perfectly fine and normal!

The post itself attempts to explain why this taboo / pretense / conspiracy-of-silence exists. And it would be kinda ironic if the post itself is getting misunderstood thanks to the very same conversational conventions that it is attempting to explain.  :)

Thanks for the suggestions; I rewrote the intro, and what you call "Section 1.3" is the new "Section 1.2".

Here are clarifications for a couple minor things I was confused about while reading: 

a GPU is way below the power of the human brain. You need something like 100,000 or a million to match it, so we are off by a huge factor here.

I was trying to figure out where LeCun’s 100,000+ claim is coming from, and I found this 2017 article which is paywalled but the subheading implies that he’s focusing on the 10^14 synapses in a human brain, and comparing that to the number of neuron-to-neuron connections that a GPU can handle.

(If so, I strongly disagree with that comparison, but I don’t want to get into that in this little comment.)

Francois Chollet says “The actual information input of the visual system is under 1MB/s”.

I don’t think Chollet is directly responding to LeCun’s point, because Chollet is saying that optical information is compressible to <1MB/s, but LeCun is comparing uncompressed human optical bits to uncompressed LLM text bits. And the text bits are presumably compressible too!

It’s possible that the compressibility (useful information content per bit) of optical information is very different from the compressibility of text information, in which case LeCun’s comparison is misleading, but neither LeCun nor Chollet is making claims one way or the other about that, AFAICT.

Fwiw I don't think the main paper would have been much shorter if we'd aimed to write a blog post instead…

Oops. Thanks. I should have checked more carefully before writing that. I was wrong and have now put a correction into the post.

I think you’re not the target audience for this post.

Pick a random person on the street who has used chatGPT, and ask them to describe a world in which we have AI that is “like chatGPT but better”. I think they’ll describe a world very much like today’s, but where chatGPT hallucinates less and writes better essays. I really don’t think they’ll describe the right-column world. If you’re imagining the right-column world, then great! Again, you’re not the target audience.

I think you’re responding to something different than what I was saying.

Again, let’s say Bob wants to sit at the cool kid’s table at lunch, and Bob dreams of being a movie star at dinner. Bob feels motivated to do one thing, and then later on Bob feels motivated to do the other thing. Both are still clearly goal-directed behaviors: At lunchtime, Bob’s “planning machinery” is pointed towards “sitting at the cool kid’s table”, and at dinnertime, Bob’s “planning machinery” is pointed towards “being a movie star”. Neither of these things can be accomplished by unthinking habits and reactions, obviously.

I think there’s a deep-seated system in the brainstem (or hypothalamus). When Bob’s world-model (cortex) is imagining a future where he is sitting at the cool kid’s table, then this brainstem system flags that future as “desirable”. Then later on, when Bob’s world-model (cortex) is imagining a future where he is a movie star, then this brainstem system flags that future as “desirable”. But from the perspective of Bob’s world-model / cortex / conscious awareness (both verbalized and not), there does not have to be any concept that makes a connection between “sit at the cool kid’s table” and “be a movie star”. Right?

By analogy, if Caveman Oog feels motivated to eat meat sometimes, and to eat vegetables other times, then it might or might not be the case that Oog has a single concept akin to the English word “eating” that encompasses both eating-meat and eating-vegetables. Maybe in his culture, those are thought of as two totally different activities—the way we think of eating versus dancing. It’s not like there’s no overlap between eating and dancing—your heart is beating in both cases, it’s usually-but-not-always a group activity in both cases, it alleviates boredom in both cases—but there isn’t any concept in English unifying them. Likewise, if you asked Oog about eating-meat versus eating-vegetables, he would say “huh, never thought about that, but yeah sure, I guess they do have some things in common, like both involve putting stuff into one’s mouth and moving the jaw”. I’m not saying that this Oog thought experiment is likely, but it’s possible, right? And that illustrates the fact that coherently-and-systematically-planning-to-eat does not rely on having a concept of “eating”, whether verbalized or not.

Point 1: I think “writing less concisely than would be ideal” is the natural default for writers, so we don’t need to look to incentives to explain it. Pick up any book of writing advice and it will say that, right? “You have to kill your darlings”, “If I had more time, I would have written a shorter letter”, etc.

Point 2: I don’t know if this applies to you-in-particular, but there’s a systematic dynamic where readers generally somewhat underestimate the ideal length of a piece of nonfiction writing. The problem is, the writer is writing for a heterogeneous audience of readers. Different readers are coming in with different confusions, different topics-of-interest, different depths-of-interest, etc. So you can imagine, for example, that every reader really only benefits from 70% of the prose … but it’s a different 70% for different readers. Then each individual reader will be complaining that it’s unnecessarily long, but actually it can’t be cut at all without totally losing a bunch of the audience.

(To be clear, I think both of these are true—Point 2 is not meant as a denial to Point 1; not all extra length is adding anything. I think the solution is to both try to write concisely and make it easy for the reader to recognize and skip over the parts that they don’t need to read, for example with good headings and a summary / table-of-contents at the top. Making it fun to read can also somewhat substitute for making it quick to read.)

Hmm, I notice a pretty strong negative correlation between how long it takes me to write a blog post and how much karma it gets. For example, very recently I spent like a month of full-time work to write two posts on social status (karma = 71 & 36), then I took a break to catch up on my to-do list, in the course of which I would sometimes spend a few hours dashing off a little post, and there have been three posts in that category, and their karma is 57, 60, 121 (this one). So, 20ish times less effort, somewhat more karma. This is totally in line with my normal expectations.

I think that’s because if I’m spending a month on a blog post then it’s probably going to be full of boring technical details such that it’s not fun to read, and if I spend a few hours on a blog post like this one, it’s gonna consist of stories and rants and wild speculation and so on, which is more fun to read.

In terms of word count, here you go, I did the experiment:

I could make a long list of “advice” to get lots of lesswrong karma (but that probably actually makes a post less valuable), but I don’t think “conspicuous signals of effort” would be one of them. Instead it would be things like: Give it a clickbaity title & intro, Make it about an ongoing hot debate (e.g. the eternal battle between “yay Eliezer” vibes versus “boo Eliezer” vibes, or the debate over whether p(doom) is high versus low, AGI timelines, etc.), Make it reinforce popular rationalist tribal beliefs (yay YIMBY, boo FDA, etc.—I guess this post is doing that a bit), make it an easy read, don’t mention AI because the AI tag gets penalized by default in the frontpage ranking, etc. My impression is that length per se is not particularly rewarded in terms of LW karma, and that the kind of “rigor” that would be well-received in peer-review (e.g. comprehensive lit reviews) is a negative in terms of lesswrong karma.

Of course this is stupid, because karma is meaningless internet points, and the obvious right answer is to basically not care about lesswrong karma in the first place. Instead I recommend metrics like “My former self would have learned a lot from reading this” or “This constitutes meaningful progress on such-and-such long-term project that I’m pursuing and care about”. For example, I have a number of super-low-karma posts that I feel great pride and affinity towards. I am not doing month-long research projects because it’s a good way to get karma, which it’s not, but rather because it’s a good way to make progress on my long-term research agenda. :)

Thanks. I made some edits to the questions. I’m open to more suggestions.

This is already version 3 of that image (see v1,v2) but I’m also very open to suggestions on that too.

Agree—I was also arguing for “trivial” in this EA Forum thread a couple years ago.

Load More