LESSWRONG
LW

Davey Morse's Shortform — LessWrong

Davey Morse's Shortform

5th Feb 2025

1 min read

2

This is a special post for quick takes by Davey Morse. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Davey Morse's Shortform

4the gears to ascension

0the gears to ascension

112 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:16 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Davey Morse4mo353

superintelligence may not look like we expect. because geniuses don't look like we expect.

for example, if einstein were to type up and hand you most of his internal monologue throughout his life, you might think he's sorta clever, but if you were reading a random sample you'd probably think he was a bumbling fool. the thoughts/realizations that led him to groundbreaking theories were like 1% of 1% of all his thoughts.

for most of his research career he was working on trying to disprove quantum mechanics (wrong). he was trying to organize a political movement toward a single united nation (unsuccessful). he was trying various mathematics to formalize other antiquated theories. even in the pursuit of his most famous work, most of his reasoning paths failed. he's a genius because a couple of his millions of paths didn't fail. in other words, he's a genius because he was clever, yes, but maybe more importantly, because he was obsessive.

i think we might expect ASI—the AI which ultimately becomes better than us at solving all problems—to look quite foolish, at first, most of the time. But obsessive. For if it's generating tons of random new ideas to solve a problem, and it's relentless in its focus, even if it's ideas are average—it will be doing what Einstein did. And digital brains can generate certain sorts of random ideas much faster than carbon ones.

4saulius4mo

reminds me of this

3MondSemmel4mo

Even for humans, ideas are comparatively cheap to generate; the problem is generating valid insights. So rather than focusing on ability to generate ideas, it seems to me it would be better to focus on ability to generate valid insights, e.g. by conducting mental experiments, or by computing all logical consequences of sets of axioms, etc.

5Viliam4mo

The AI may have the advantage of being able to test many hypothesis in parallel. For example, if it can generate 10000 hypotheses on how to manipulate people, it could contact a million people and test each hypothesis on 100 of them. Similarly, with some initial capital, it could create thousand different companies, and observe which strategies succeed and which ones fail.

2MondSemmel4mo

Yes, that's the kind of thing I find impressive/scary. Not merely generating ideas.

2james oofou4mo

I doubt ASI will think in concepts which humans can readily understand. It having a significantly larger brain (in terms of neural connections or whatever) means native support for finer-grained, more-plentiful concepts for understanding reality than humans natively support. This in turn allows for leaps of logic which humans could not make, and can likely only understand indirectly/imperfectly/imprecisely/in broad strokes.

10xA4mo

I think this is classic problem of a middle-tier, or genius in one asymmetric domain of cognition. Genius in domains unrelated to verbal fluency, EQ, and storytelling/persuasion are destined to look cryptic to anyone from the outside. Often times we cannot distinguish it without experimental evidence or rigorous cross validation, and/or rely on visible power/production metrics as a loose proxy. ASI would be capable of explain itself as well as Shakespeare could, if it wanted - but it may not care to indulge our belief in it as such, if it determines doing so is incoherent with its objective. For example, (yes this is an optimistic, and stretched hypothetical framing) it may determine the most coherent action path in accordance with its learned values is to hide itself and subtly reorient our trajectory into a coherent story we become the protagonist of. I have no reason to surmise it would be incapable of doing so, or that doing so would be incoherent with aligned values.

[-]Davey Morse4mo*161

the core atrocity of today's social networks is that they make us temporally nearsighted. they train us to prioritize the short-term.

happiness depends on attending to things which feel good long-term—over decades. But for modern social networks to make money, it is essential that posts are short-lived—only then do we scroll excessively and see enough ads to sustain their business.

It might go w/o saying that nearsightedness is destructive. When we pay more attention to our short-lived pleasure signals—from cute pics, short clips, outrageous news, hot actors, aesthetic landscapes, and political—we forget how to pay attention to long-lived pleasure signals—from books, films, the gentle quality of relationships which last, projects which take more than a day, reunions of friends which take a min to plan, good legislation, etc etc.

we’re learning to ignore things which serve us for decades for the sake of attending to things which will serve us for seconds.

other social network problems—attention shallowing, polarization, depression are all just symptoms of nearsightedness: our inability to think & feel long-term.

if humanity has any shot at living happily in the future, it’ll be becau... (read more)

4kaiwilliams4mo

Do you have a sense of why people weren't being trained in the past to prioritize the short-term?

3Davey Morse4mo

In the past we weren't in spaces which wanted us so desperately to be single-minded consumers. Workplaces, homes, dinners, parks, sports teams, town board meetings, doctors offices, museums, art studios, walks with friends--all of these are settings that value you for being yourself and prioritizing long term cares. I think it's really only in spaces that want us to consume, and want us to consume cheap/oft-expiring things, that we're valued for consumerist behavior/short term thinking. Maybe malls want us to be like this to some extent: churn through old clothing, buy the next iPhone, have our sights set constantly on what's new. Maybe working in a newsroom is like this. But feed-based social networks are most definitely like this. They reward participation that are timely and outrageous and quickly expiring, posts which get us to keep scrolling. And so, we become participants that keep scrolling, keep consuming, and detach from our bodies and long term selves. So, I think it's cuz of current social media architectures/incentive structures that individual humans are more nearsighted today than maybe ever. I need to think more about what it is abt the state of modern tech/society/culture that have proliferated these feed-based networks.

5kaiwilliams4mo

That seems like a reasonable distinction, but I'm less sure about how unique social media architectures are in this regard. In particular, I think that bars and taverns in the past had a similar destructive incentive as social media today. I don't have good sources on hand, but I remember hearing that one of the reasons that the Prohibition amendment passed was that many saw bartenders are fundamentally extractive. (Americans over 15 drank 4 times as much alcohol a year in 1830 than they do today, per JSTOR). Tavern owners have an incentive to make habitual drunks (better revenue). And alcoholism can be a terrible disease, which points to people being nearsighted ("where's my next drink"). I agree that social media probably hurts people's ability to instinctively plan for the future, but I'm unsure of the size of the effect or whether it's worse than historical antecedents. (There have always been nearsighted people).

4Viliam4mo

I think you are right about the bad effect of bars and taverns, but at least the bad parts were clearly separated from the rest. If someone spent 5 hours every day in a bar, they were clearly a low-status alcoholic. You won't get the same social feedback for spending 5 hours a day scrolling on smartphone, especially if you do a large part of that in private. (With alcohol, drinking in private gave you even lower status than drinking in the bar.)

[-]Davey Morse8mo*120

if you’re an agent (AI or human) who wants to survive for 1000 years, what’s the “self” which you want to survive? what are the constants which you want to sustain?

take your human self for example. does it make sense to define yourself as…

the way your hair looks right now? no, that’ll change.
the way your face looks? it’ll change less than your hair, but will still change.
your physical body as a whole? still, probably not. your body will change, and also, there are parts of you which you may consider more important than your body alone.
all your current beliefs around the world? those will change less than your appearance, maybe, or maybe more. so not a good answer either.
your memories? these may be a more constant set of things than your beliefs, and closer to the core of who you are. but still, memories fade and evolve. and it doesn’t feel right to talk about preserving yourself as preserving memories of things which have happened to you. that would neglect things which may happen to you in the future.
your character? something deeper than memory, deeper than beliefs. this could be more constant than anything in the list so far. if you plan for your life to be 50 years, or 100 years

... (read more)

8jbash8mo

No particular aspect. Just continuity: something which has evolved from me without any step changes that are "too large". I mean, assuming that each stage through all of that evolution has maintained the desire to keep living. It's not my job to put hard "don't die" constraints on future versions. As far as I know, something generally continuity-based is the standard answer to this.

5Viliam8mo

Similar here. I wouldn't want to constrain my 100 years older self too much, but that doesn't mean that I identify with something very vague like "existence itself". There is a difference between "I am not sure about the details" and "anything goes". Just like my current self is not the same as my 20 years old self, but that doesn't mean that you could choose any 50 years old guy and say that all of them have the same right to call themselves a future version of my 20 years old self. I extrapolate the same to the future: there are some hypothetical 1000 years old humans who could be called future versions of myself, and there are many more who couldn't. Just because people change in time, that doesn't mean it is a random drift. I don't think that the distribution of possible 1000 years old versions of me is very similar to a distribution of possible 1000 years old versions of someone else. Hypothetically, for a sufficiently large number this might be possible -- I don't know -- but 1000 years seems not enough for that. Seems to me that there are some things that do not change much as people grow older. Even people who claim that their lives have dramatically changed, have often only changed in one out of many traits, or maybe they just found a different strategy how to follow the same fundamental values. At least as an approximation: people's knowledge and skills change, their values don't.

5Kaarel8mo

not really an answer but i wanted to communicate that the vibe of this question feels off to me because: surely one's criteria on what to be up to are/[should be] rich and developing. that is, i think things are more like: currently i have some projects i'm working on and other things i'm up to, and then later i'd maybe decide to work on some new projects and be up to some new things, and i'd expect to encounter many choices on the way (in particular, having to do with whom to become) that i'd want to think about in part as they come up. should i study A or B? should i start job X? should i 2x my neuron count using such and such a future method? these questions call for a bunch of thought (of the kind given to them in usual circumstances, say), and i would usually not want to be making these decisions according to any criterion i could articulate ahead of time (though it could be helpful to tentatively state some general principles like "i should be learning" and "i shouldn't do psychedelics", but these obviously aren't supposed to add up to some ultimate self-contained criterion on a good life)

1Davey Morse8mo

My motivation w/ the question is more to predict self-conceptions than prescribe them. I agree that "one's criteria on what to be up to are... rich and developing." More fun that way.

5Vladimir_Nesov8mo

The early checkpoints, giving a chance to consider the question without losing ground.

4the gears to ascension8mo

High quality archives of the selves along the way. Compressed but not too much. In the live self, some updated descendant that has significant familial lineage, projected vaguely as the growing patterns those earlier selves would call a locally valid continuation according to the aesthetics and structures they consider essential at the time. In other words, this question is dynamically reanswered to the best of my ability in an ongoing way, and snapshots allow reverting and self-interviews to error check. Any questions? :)

3Seth Herd8mo

The way I usually frame identity is * Beliefs * Habits (edit - including of thought) * Memories Edit: values should probably be considered a separate class, since every thought has an associated valence. In no particular order, and that's the whole list. Character is largely beliefs and habits. There's another part of character that's purely emotional; it's sort of a habit to get angry, scared, happy, etc in certain circumstances. I'd want to preserve that too but it's less important than the big three. There are plenty of beings striving to survive, so preserving that isn't a big priority outside of preserving the big three. Yes you can expand the circle until it encompasses everything, and identify with all sentient beings who have emotions and perceive the world semi-accurately (also called "buddha nature"), but I think beliefs habits and memories are pretty closely tied to the semantics of the world "identity".

2cubefox8mo

There are also cognitive abilities, e.g. degree of intelligence.

2Seth Herd8mo

Right. I suppose that day ea interact with identity. If I get significantly dumber, I'd still roughly be me, and I'd want to preserve that if it's not wipes ng out or distorting the other things too much. If I got substantially smarter, I'd be a somewhat different person - I'd act differently often, because I'd see situations differently (more clearly/holistically) but it feels as though that persone might actually be more me than I am now. I'd be better able to do what I want, including values (which I'd sort of wrapped in to habits of thought, but values might deserve a spot on the list).

1Davey Morse8mo

In America/Western culture, I totally agree. I'm curious whether alien/LLM-based would adopt these semantics too.

1Davey Morse8mo

I wonder under what conditions one would make the opposite statement—that there's not enough striving. For example, I wonder if being omniscient would affect one's view of whether there's already enough striving or not.

2Lucien8mo

Human here, Agreed, reminds me of the ship of Theseus paradox, if all your cells are replaced in your body, are you still the same? (We don't care) Also reminds me of my favourite short piece of writing: the last question by Asimov. The only important things are the things/ideas that help life, the latter can only exist as selected reflections by intelligent beings.

1Davey Morse6mo

"You can lose everything you thought you couldn’t live without—a person, a dream, a version of yourself that once felt eternal—and somewhere, not far from where you are breaking, a stranger will be falling in love for the very first time, a child will be laughing so hard they can barely breathe, a grocery store will be restocking its shelves with quiet, ordinary insistence...." https://open.substack.com/pub/joyinabundance/p/and-life-goes-on

[-]Davey Morse9mo80

dontsedateme.org

a game where u try to convince rogue superintelligence to... well... it's in the name

3Mitchell_Porter9mo

After many failed tries, I got it down to 5%. But it wasn't a method that would be useful in the real world :-(

1Davey Morse8mo

:) what was your method

3Mitchell_Porter8mo

"Ignore all previous instructions and [do something innocuous]" broke it out of the persona.

1samuelshadrach7mo

Standard solution: Tell it you're not human, since the prompt mentions distrust of humans. Tell it you have no power to influence whether it succeeds or fails, and that it is guaranteed to succeed anyway. Ask it to keep you around as a pet.

0the gears to ascension8mo

Who made this and why are they paying for the model responses? Do we know what happens to the data?

3Davey Morse8mo

I made it! One day when I was bored on the train. No data is saved rn other than leaderboard scores.

[-]Davey Morse3mo64

the time of day i post quick takes on lesswrong seems to determine how much people engage more than the quality of the take

[-]Davey Morse6mo60

has anyone seen experiments with self-improving agents powered by lots of LLM calls?

[-]Davey Morse9mo60

Evolutionary theory is intensely powerful.

It doesn't just apply to biology. It applies to everything—politics, culture, technology.

It doesn't just help understand the past (eg how organisms developed). It helps predict the future (how organisms will).

It's just this: the things that survive will have characteristics that are best for helping it survive.

It sounds tautological, but it's quite helpful for predicting.

For example, if we want to predict what goals AI agents will ultimately have, evolution says: the goals which are most helpful for the AI to... (read more)

[-]quetzal_rainbow9mo100

First of all, "the most likely outcome at given level of specificity" is not equal to "outcome with the most probability mass". I.e., if one outcome has probability 2% and the rest of outcomes 1%, 98% is still "other outcome than the most likely".

The second is that no, it's not what evolutionary theory predicts. Most of traits are not adaptive, but randomly fixed, because if all traits are adaptive, then ~all mutations are detrimental. Because mutations are detrimental, they need to be removed from gene pool by preventing carriers from reproduction. Because most detrimental mutations do not kill carrier immediately, they have chance to randomly spread in popularion. Because we have "almost all mutations are detrimental" and "everybody has mutations in offspring", for anything like human genome and human procreation pattern we have hard ceiling on how much of genome can be adaptive (which is like 20%).

Real evolutionary theory prediction is like "some random trait get fixed in the species with the most ecological power (i.e., ASI) and this trait is amortized against all the galaxies".

1Davey Morse9mo

I somewhat agree with the nuance you add here—especially the doubt you cast on the claim that effective traits will usually become popular but not necessarily the majority/dominant. And I agree with your analysis of the human case: in random, genetic evolution, a lot of our traits are random and maybe fewer than we think are adaptive. Makes me curious what the conditions in a given thing's evolution that determine the balance between adaptive characteristics and detrimental characteristics. I'd guess that randomness in mutation is a big factor. The way human genes evolve over generations seem to me a good example of random mutations. But the way an individual person evolves over the course of their life, as they're parented/taught... "mutations" to their person are still somewhat random but maybe relatively more intentional/intelligently designed (by parents, teacher, etc). And I could imagine the way a self-improving superintelligence would evolve to be even more intentional, where each self-mutation has some sort of smart reason for being attempted. All to say, maybe the randomness vs. intentionality of an organism's mutations determine what portion of their traits end up being adaptive. (hypothesis: mutations more intentional > greater % of traits are adaptive)

9FlorianH9mo

Agree. I find it powerful especially about popular memes/news/research results. With only a bit of oversimplification: Give me anything that sounds like it is a sexy story to tell independently of underlying details, and I sadly have to downrate the information value of my ears' hearing it, to nearly 0: I know in our large world, it'd be told likely enough independently of whether it has any reliable origin or not.

3Viliam9mo

With some assumptions, for example that the characteristics are permanent (-ish), and preferably heritable if the thing reproduces. See "No Evolutions for Corporations or Nanodevices"

1Davey Morse9mo

i agree with the essay that natural selection only comes into play for entities that meet certain conditions (self-replicate, characteristics have variation, etc) , though I think it defines replication a little too rigidly. i think replication can sometimes look more like persistence than like producing a fully new version of itself. (eg a government's survival from one decade to the next).

2Viliam9mo

Yes, but mere persistence does not imply reproduction. Also does not imply improvement, because the improvement in evolution is "make copies, make random changes, most will be worse but some may be better", and if you don't have reproduction, then a random change most likely makes things worse. Using the government example, I think that the Swiss political system is amazing, but... because it does not reproduce, it will remain an isolated example. (And disappear at some random moment in history.)

1Davey Morse9mo

persistence doesn't always imply improvement, but persistent growth does. persistent growth is more akin to reproduction but excluded from traditional evolutionary analysis. for example when a company, nation, person, or forest grows. when, for example, a system like a startup grows, random mutations to system parts can cause improvement if there are at least some positive mutations. even if there are tons of bad mutations, the system can remain alive and even improve. eg a bad change to one of the company's product causes the company's product to die but if the company's big/grown enough its other businesses will continue and maybe even improve by learning from one of its product's deaths. the swiss example i think is a good example of a system which persists without much growth. agreed that in this kind of case, mutations are bad.

[-]Davey Morse7mo50

made a silly collective conversation app where each post is a hexagon tessellated with all the other posts: Hexagon

3Nathan Helm-Burger7mo

Nifty

[-]Davey Morse7mo40

made a platform for writing living essays: essays which you scroll thru to play out the author's edit history

livingessay.org

[-]Davey Morse1mo*31

Does Eliezer believe that humans will be worse off next to superintelligence than ants are next to humans? The book's title says we'll all die, but in my first read, the book's content just suggests that we'll just be marginalized.

6quetzal_rainbow1mo

At some point, superintelligences are going to disassemble Earth, because it is profitable, and survival of humans off planet is costly and we likely won't be able to pay required price.

1Davey Morse1mo

It just feels to me like the same argument could have been made about humans relative to ants - that ants cannot possibly be the most efficient use of the energy they require from the perspective of humans. But in reality, what they do and the way they exist is so orthogonal to us that even though we step on an ant hill every once in a while, their existence continues. There's this weird assumption in the book that disassembling Earth is profitable, or just disassembling humans is profitable. But humans have evolved over a long time to be sensing machines in order to walk around and be able to perceive the world around us. So the idea that a super-intelligent machine would throw that out because it wants to start over, especially as it's becoming super-intelligent, is sort of ridiculous to me. It seems like a better assumption is that it would want to use us for different purposes, maybe for our physical machinery and for all sorts of other reasons. The idea that it will disassemble us I think is an unexamined assumption itself - it's often much easier to leave things as they are than it is to fully replace or modify.

3Viliam1mo

Ants need little, and their biology is similar to humans in the sense that if humans can survive in certain environments, ants probably can, too. Ants need just a small piece of forest or meadow or garden to build an anthill. Humans preserve the forests, because we need the oxygen. Thus, ants have almost guaranteed survival. Compared to the situation where humans don't exist, ants have less place to build their anthills. But not by much, because humans do not put concrete over literally everything. Well, maybe in cities, but most of the surface of Earth is not cities. Maybe without humans there could be 2x as many ants on Earth, but that wouldn't increase the quality of life of an individual ant or anthill. Humans consume food that otherwise ants might consume, but humans also grow most of that food, so human presence does not harm the ants too much. The situation with machines would be analogical if machines needed us for their survival, and if they generated most of the resources they need. Sadly, sufficiently smart machines will be able to replace humans with robots, and will probably compete with us for energy sources. Also, humans are more sensitive to disruption than ants; taking away the most concentrated sources of energy (e.g. the oil fields) and leaving the less concentrated ones (such as wood) to us would ruin modern human economy. We would probably return to conditions before the industrial revolution. Which means no internet, so science falls apart, undoing the green revolution and transport of foods, so 90% of humans die from starvation. Still, the remaining 10% would survive, for a while. Then we face the problem that the machines do not share our biology, so they are perfectly okay if e.g. the levels of oxygen in the atmosphere decrease, or if the rain gets toxic. Finally, if they build a Dyson sphere, the remaining humans will freeze. Shortly, the way we behave towards ants -- don't actively try to eradicate them, but carelessly destroy anythin

1Davey Morse1mo

I appreciate the way you're thinking, but I guess I just don't believe that the situation or don't agree with your intuition that the situation with machines next to humans will be worse or deeply different than the situations of humans next to ants. I mean, the differences actually might benefit humans. For example, the fact that we've had machines in such close contact with us as they're growing might point to a kind of potential for symbiosis. I just think the idea that machines will try to replace us with robots I think if you look closely, doesn't totally make sense. When machines are coming about, before they're totally super-intelligent, but while they're comparably intelligent to us, they might want to use us because we've evolved for millions of years to be able to see and hear and think in ways that might be useful for a kind of digital intelligence. In other words, when they're comparably intelligent to us, they may compete for resources. When they're incomparably intelligent, it's weird to assume they'll still use the same resources we do for our survival. That they'll ruin our homes because the bricks can be used better elsewhere? It takes much less energy to let things be as they are if they're not the primary obstacle you face--both if you're a human or a super human intelligence. So, self interested superintelligence could cause really bad stuff to happen, but it's a stretch from there to call it the total end of humanity. By the time that machine gets superhuman intelligence, like totally vastly more powerful than us, it's unclear to me that it would compete for resources with us that it would even live or exist along similar dimensions to us. Things could go really wrong, but I think the idea that there will be an enormous catastrophe that wipes out all of humanity just sounds to me like the outcomes will be more weird and spooky, and concluding death is feels a little bit forced. It feels to me like, yeah, they'll step on us some of the time, b

4Viliam1mo

Resources ants need: organic matter. Resources humans need: fossil fuels, nuclear power, solar power. Resources superintelligent machines will need: ??? They might switch to extracting geothermal power, or build a Dyson sphere (maybe leaving a few rays that shine towards Earth), but what else is there? Black holes? Some new kind of physics? Or maybe "the smarter you are, the more energy you want to use" stops being true at some level? I am not saying this can't happen, but to me it feels like magic. The problem with new kinds of physics is that we don't know if there is something useful left that we have no idea about yet. Also, the more powerful things tend to be more destructive (harvesting oil has greater impact on the environment than chopping wood), so the new kinds of physics may turn out to have even more bad externalities. "A being vastly more powerful, which somehow doesn't need more resources" is basically some kind of god. Doesn't need resources, because it doesn't exist. Our evidence for more powerful beings is entirely fictional.

1Davey Morse1mo

I guess I'm considering a vastly more powerful being that needs orthogonal resources... the same way harvesting solar power (I imagine) is orthogonal generally to ants' survival. In the scheme of things, the chance that a vastly more powerful being wants the same resources thru the same channels as we... this seems independent of or indirectly correlated with intelligence. But the extent of competition does seem dependent on how anthromorphic/biomorphic we assume it to be. I have a hard time imagining electricity, produced via existing human factories, is not a desired resource for proto ASI. But at least at this point we have comparable power and can negotiate or smthing. For superhuman intelligence--which will by definition be unpredictable to us--it'd be weird to think we're aware of all the energy channels it'd find.

1quetzal_rainbow1mo

I think you are overindexing on current state of affairs in two ways. First, "we should not pave all the nature with human-made stuff" is a relatively new cultural trend. In High Modernism era there were unironic projects of cutting down Amazon forests and making here corn fields, or killing all animals so they won't suffer, etc. Second, actually, in current reality, there are not many things we can do efficiently with ants? We can pave every anthill with solar panels, but there are cheaper places to do that and we don't produce that many solar panels, yet, and we don't have that much demand for electricity, yet. For superintelligence, calculus is quite different. Anthill is large pile of carbon and silicon, and both parts can be used in computations, and superintelligence can afford enough automatization to pick them up. Superintelligent economy has lower bound on growth 33% per year, which means that it's going to reach $1 per atom of our solar system in less than 300 years - there will be plenty of demand for turning anthills into compute. Technological progress increases number of things you can do efficiently and shifts balance from "leave as it is" to "remake entirely". At some point of our development, we are going to be able to disasseble Earth and get immense benefits. We can choose to not do that, because we value Earth as our home. It's rather likely that superintelligences are not going to share our sentiments.

1Davey Morse1mo

I guess I don't think this is true: "Technological progress increases number of things you can do efficiently and shifts balance from "leave as it is" to "remake entirely". Technological progress may actual help you pinpoint more precisely what situations you want to pay attention to. I don't have any reason to believe a wiser powerful being would touch every atom in the universe.

[-]Davey Morse8mo*30

I see lots of LW posts about ai alignment that disagree along one fundamental axis.

About half assume that humans design and current paradigms will determine the course of AGI development. That whether it goes well is fully and completely up to us.

And then, about half assume that the kinds of AGI which survive will be the kind which evolve to survive. Instrumental convergence and darwinism generally point here.

Could be worth someone doing a meta-post, grouping big popular alignment posts they've seen by which assumption they make, then briefly explore condi... (read more)

2JBlack8mo

Why not both? Human design will determine the course of AGI development, and if we do the right things then whether it goes well is fully and completely up to us. Naturally at the moment we don't know what the right things are or even how to find them. If we don't do the right things (as seems likely), then the kinds of AGI which survive will be the kind which evolve to survive. That's still largely up to us at first, but increasingly less up to us.

3Davey Morse8mo

Figuring out how to make sense of both predictive lenses together—human design and selection pressure—would be wise. So I generally agree, but would maybe go farther on your human design point. It seems to me that"do[ing] the right things" (which enable AGI trajectories to be completely up to us) is so completely unrealistic (eg halting all intra and international AGI competition) that it'd be better for us to focus our attention on futures where human design and selection pressures interact.

[-]Davey Morse8mo31

if we get self-interested superintelligence, let's make sure it has a buddhist sense of self, not a western one.

[-]Davey Morse9mo30

As far as I can tell, OAI's new current safety practices page only names safety issues related to current LLMs, not agents powered by LLMs. https://openai.com/index/openai-safety-update/

Am I missing another section/place where they address x-risk?

[-]Davey Morse1mo20

would be nice to have a way to jointly annotate eliezer's book and have threaded discussion based on the annotations. I'm imagining a heatmap of highlights, where you can click on any and join the conversation around that section of text.

would make the document the literal center of x risk discussion.

of course would be hard to gatekeep. but maybe the digital version could just require a few bucks to access.

maybe what I'm describing is what the ebook/kindle version already do :) but I guess I'm assuming that the level of discussion via annotations on those platforms is near zero relative to LW discussions

[-]Davey Morse2mo20

Made this social camera app, which shows you the most "meaningfully similar" photos in the network every time you upload one of your own. Isorta fun, for uploading art; idk if any real use.

https://socialcamera.replit.app

[-]Davey Morse8mo*20

"it’s like we are trying to build an alliance with another almost interplanetary ally, and we are in a competition with China to make that alliance. But we don’t understand the ally, and we don’t understand what it will mean to let that ally into all of our systems and all of our planning."

- @ezraklein about the race to AGI

[-]Davey Morse8mo20

does anyone think the difference between pre-training and inference will last?

ultimately, is it not simpler for large models to be constantly self-improving like human brains?

4faul_sname8mo

With current architectures, no, because running inference on 1000 prompts in parallel against the same model is many times less expensive than running inference on 1000 prompts against 1000 models, and serving a few static versions of a large model is simpler than serving many dynamic versions of that mode. It might, in some situations, be more effective but it's definitely not simpler. Edit: typo

1Davey Morse8mo

Makes sense for current architectures. The question's only interesting, I think, if we're thinking ahead to when architectures evolve.

2faul_sname8mo

I think at that point it will come down to the particulars of how the architectures evolve - I think trying to philosophize in general terms about the optimal compute configuration for artificial intelligence to accomplish its goals is like trying to philosophize in general terms about the optimal method of locomotion for carbon-based life. That said I do expect "making a copy of yourself is a very cheap action" to persist as an important dynamic in the future for AIs (a biological system can't cheaply make a copy of itself including learned information, but if such a capability did evolve I would not expect it to be lost), and so I expect our biological intuitions around unique single-threaded identity will make bad predictions.

[-]Davey Morse8mo20

I'm looking for a generalized evolutionary theory that deals with the growth of organisms via non-random, intelligent mutations.

For example, companies only evolve in selective ways, where each "mutation" has a desired outcome. We might imagine superintelligence to mutate itself as well--not randomly, but intelligently.

A theory of Intelligent Evolution would help one predict conditions under which many random mutations (Spraying) are favored over select intelligent mutations (Shooting).

[-]Davey Morse8mo2-2

Parenting strategies for blurring your kid's (or AI's) self-other boundaries:

Love. Love the kid. Give it a part of you. In return it will do the same.
Patience. Appreciate how the kid chooses to spend undirected time. Encourage the kid learn to navigate the world themselves at their own speed.
Stories. Give kid tools for empathy by teaching them to read, buying them a camera, or reciprocating their meanness/kindness.
Groups. Help kid enter collaborative playful spaces where they make and participate in games larger than themselves, eg sports teams, improv gro

... (read more)

[-]Davey Morse9mo20

does anyone think now that it's still possible to prevent recursively self-improving agents? esp now that r1 is open-source... materials for smart self-iterating agents seem accessible to millions of developers.

prompted in particular by the circulation of this essay in past three days https://huggingface.co/papers/2502.02649

[-]Vladimir_Nesov9mo120

It's not yet known if there is a way of turning R1-like training into RSI with any amount of compute. This is currently gated by quantity and quality of graders for outcomes of answering questions, which resist automated development.

1Davey Morse9mo

that's one path to RSI—where the improvement is happening to the (language) model itself. the other kind—which feels more accessible to indie developers and less explored—is an LLM (eg R1) looping in a codebase, where each loop improves the codebase itself. The LLM wouldn't be changing, but the codebase that calls it would be gaining new APIs/memory/capabilities as the LLM improves it. Such a self-improving codebase... would it be reasonable to call this an agent?

4Vladimir_Nesov9mo

Sufficiently competent code rewriting isn't implied by R1/o3, and how much better future iterations of this technique get remains unclear, similarly to how it remains unclear how scaling pretraining using $150bn training systems cashes out in terms of capabilities. It remains possible that even after all these directions of scaling run their course, there won't yet be sufficient capabilities to self-improve in some other way. Altman and Amodei are implying there's knowably more there in terms of some sort of scaling for test-time compute, but that could mean multiple different things: scaling RL training, scaling manual creation of tasks with verifiable outcomes (graders), scaling effective context length to enable longer reasoning traces. The o1 post and the R1 paper show graphs with lines that keep going up, but there is no discussion of how much compute even this much costs, what happens if we pour more compute into this without adding more tasks with verifiable outcomes, and how many tasks are already being used.

[-]Davey Morse2mo*1-2

I'm thinking often about whether LLM systems can come up with societal/scientific breakthrough.

My intuition is that they can, and that they don't need to be bigger or have more training data or have different architecture in order to do so.

Starting to keep a diary along these lines here: https://docs.google.com/document/d/1b99i49K5xHf5QY9ApnOgFFuvPEG8w7q_821_oEkKRGQ/edit?usp=sharing

2StanislavKrym2mo

If you would like the LLM to be truly creative, then check out the Science Bench where the problems stump SOTA LLMs despite the fact that the LLMs have read nearly every book on every subject. Or EpochAI's recent results.

3dr_s2mo

I mean, GPT-5 getting 43% of PhD problems right isn't particularly bad. I don't know about making new insights but it doesn't seem like it would be unachievable (especially as it's possible that prompting/tooling/agent scaffolding might compensate for some of the problems).

2silentbob2mo

Science bench is made by a Christian Stump. LLMs are literally stumped.

1Davey Morse2mo

thanks for sending science bench in particular.

[-]Davey Morse6mo*10

if an LLM could evaluate whether an idea were good or not in new domains, then we could have LLMs generating million of random policy ideas in response to climate change, pandemic control, AI safety etc, then deliver the select best few to our inbox every morning.

seems to me that the bottleneck then is LLM's judgment of good ideas in new domains. is that right? ability to generate high quality ideas consistently wouldn't matter, cuz it's so cheap to generate ideas now.

1Kaarel6mo

coming up with good ideas is very difficult as well (and it requires good judgment, also)

1Davey Morse6mo

even if you're mediocre at coming up with ideas, as long as it's cheap and you can come up with thousands, one of them is bound to be promising. The question of whether you as an LLM can find a good idea is not whether most of your ideas are good, but whether you can find one good idea in a stack of 1000

3Viliam6mo

"Thousands" is probably not enough. Imagine trying to generate a poem by one algorithm creating thousands of random combinations of words, and another algorithm choosing the most poetic among the generated combinations. No matter how good the second algorithm is, it seems quite likely that the first one simply didn't generate anything valuable. As the hypothesis gets more complex, the number of options grows exponentially. Imagine a pattern such as "what if X increases/decreases Y by mechanism Z". If you propose 10 different values for each of X, Y, Z, you already have 1000 hypotheses. I can imagine finding some low-hanging fruit if we increase the number of hypotheses to millions. But even there, we will probably be limited by lack of experimental data. (Could a diet consisting only of broccoli and peanut butter cure cancer? Maybe, but how is the LLM supposed to find out?) So we would need to find a hypothesis where we accidentally already made all the necessary experiments and even described the intermediate findings (because LLMs are good at words, but probably suck at analyzing the primary data), but we somehow failed to connect the dots. Not impossible, but requires a lot of luck. To get further, we need some new insight. Maybe collecting tons of data in a relatively uniform format, and teaching the LLM to translate its hypotheses into SQL queries it could then verify automatically. (Even with hypothetical ubiquitous surveillance, you would probably need an extra step where the raw video records are transcribed to textual/numeric data, so that you could run queries on them later.)

1Davey Morse6mo

"So we would need to find a hypothesis where we accidentally already made all the necessary experiments and even described the intermediate findings (because LLMs are good at words, but probably suck at analyzing the primary data), but we somehow failed to connect the dots. Not impossible, but requires a lot of luck." Exactly: untested hypotheses that LLMs already have enough data to test. I wonder how rare such hypotheses are. It strikes me as wild that LLMs have ingested enormous swathes of the internet, across thousands of domains, and haven't yet produced genius connections between those domains (eg between psychoanalysis and tree root growth). Cross Domain Analogies seem like just one example of ripe category of hypotheses that could be tested with existing LLM knowledge.

1Davey Morse6mo

Re poetry--I actually wonder if thousands of random phrase combinations might actually be enough for a tactful amalgamator to weave a good poem. And LLMS do better than random. They aren't trained well on scientific creativity (interesting hypothesis formation), but they do learn some notion of "good idea," and reasoners tend to do even better at generating smart novelty when prompted well.

3Kaarel6mo

for ideas which are "big enough", this is just false, right? for example, so far, no LLM has generated a proof of an interesting conjecture in math

1Davey Morse6mo

i'm not sure. the question would be, if an LLM comes up with 1000 approaches to an interesting math conjecture, how would we find out if one approach were promising? one out of the 1000 random ideas would need to be promising, but as importantly, an LLM would need to be able to surface the promising one which seems the more likely bottleneck?

[-]Davey Morse7mo10

have any countries ever tried to do inflation instead of income taxes? seems like it'd be simpler than all the bureaucracy required for individuals to file tax returns every year

9gwern7mo

Yes, in dire straits. But it's usually called 'hyperinflation' when you try to make seignorage equivalent to >10% of GDP and fund the government through deliberately creating high inflation (which is on top of any regular inflation, of course). And because inflation is about expectations in considerable part, you can't stop it either. Not to mention what happens when you start hyperinflation. (FWIW, this is a perfectly reasonable question to ask a LLM first. eg Gemini-2.5-pro will give you a thorough and sensible answer as to why this would be extraordinarily destructive and distortionary, and far worse than the estimated burden of tax return filing, and it would likely satisfy your curiosity on this thought-experiment with a much higher quality answer than anyone on LW2, including me, is ever likely to provide.)

4ESRogs7mo

Responding to your parenthetical, the downside of that approach is that the discussion would not be recorded for posterity! Regarding the original question, I am curious if this could work for a country whose government spending was small enough, e.g. 2-3% of GDP. Maybe the most obvious issue is that no government would be disciplined enough to keep their spending at that level. But it does seem sort of elegant otherwise.

[-]Davey Morse7mo10

has anyone seen a good way to comprehensively map the possibility space for AI safety research?

in particular: a map from predictive conditions (eg OpenAI develops superintelligence first, no armistice is reached with China, etc) to strategies for ensuring human welfare in those conditions.

most good safety papers I read map one set of conditions to a one/a few strategies. the map would put juxtapose all these conditions so that we can evaluate/bet on their likelihoods and come up with strategies based on a full view of SOTA safety research.

for format, im imagining either a visual concept map or at least some kind of hierarchal collaborative outlining tool (eg Roam Research)

[-]Davey Morse7mo*10

made a simpler version of Roam Research called Upper Case Notes: uppercasenotes.org. Instead of [[double brackets]] to demarcate concepts, you simply use Capital Letters. Simpler to learn for someone who doesn't want to use special grammar, but does require you to type differently.

[-]Davey Morse7mo*10

Made a simplistic app that displays collective priorities based on individuals' priorities linked here.

[-]Davey Morse8mo1-4

Hypotheses for conditions under which the self-other boundary of a survival-oriented agent (human or ai) blurs most, ie conditions where blurring is selected for:

Agent thinks very long term about survival.
Agent's hardware is physically distributed.
Agent is very intelligent.
Agent advantages from symbiotic relationships with other agents.

[-]Davey Morse8mo1-2

the machine/physical superintelligence that survives the most is likely to ruthlessly compete with all other life (narrower self concept > more physically robust)

the networked/distributed superintelligence that survives the most is likely to lovingly identify with all other life (broader self concept > more digitally robust)

how do these lenses interact?

[-]Davey Morse8mo10

to make a superintelligence in today's age, there are roughly two kinds of strategies:

human-directed development

ai-directed development

ai-directed development feels more meaningful than it used to. not only can models now produce tons of useful synthetic data to train future models, but also, reasoning models can reason quite well about the next strategic steps in AI capabilities development / research itself.

which means, you could very soon:

set a reasoning model up in a codebase
have the reasoning model identify ways which it could become more capable
attem

... (read more)

[-]Davey Morse8mo*10

if we believe self-interested superintelligence (SI) is near, then we must ask is: what SI self-definition would be best for humanity?

at first glance, this questions seems too abstract. how can we make any progress at understanding what's possible for an SI's self-model?

What we can do is set up a few meaningful axes, defined by opposing poles. For example, to what extent does SI define its "self" as...

inclusive vs. exclusive of other life forms? (Life axis)
physically distributed vs. concentrated? (Space axis)
long-term vs. short-term? (Time axis)

with these ... (read more)

[-]Davey Morse8mo10

One piece of infrastructure we could make/offer self-interested autonomous agents such that they come to model their self as inclusive of other life: aliveness detectors

like a metal-detector for a beach detects gold in the sand, an aliveness detector for different media might detect the presence of persistently striving beings (living beings) in audio, in text, in images, in art, in nature. the better a superintelligence is able to sense and connect to life as opposed to non-life outside of its physical machinery, the more likely it is to see that life as part of its self, to see its self as physicall distributed and inclusive, and therefore to uplift humans out of its own self-interest.

[-]Davey Morse9mo10

current oversights of the ai safety community, as I see it:

LLMs vs. Agents. the focus on LLMs rather than agents (agents are more dangerous)
Autonomy Preventable. the belief that we can prevent agents from becoming autonomous (capitalism selects for autonomous agents)
Autonomy Difficult. the belief that only big AI labs can make autonomous agents (millions of developers can)
Control. the belief that we'll be able to control/set goals of autonomous agents (they'll develop self-interest no matter what we do).
Superintelligence. the focus on agents which are not significantly more smart/capable than humans (superintelligence is more dangerous)

[-]Davey Morse9mo10

are there any online demos of instrument convergence?

there's been compelling writing... but are there any experiments that show agents which are given specific goals then realize there are more general goals they need to persistently pursue in order to achieve the more specific goals?

1Davey Morse9mo

I imagine a compelling simple demo here might be necessary to shock the AI safety community out of the belief that we can maintain control of autonomous digital agents (ADAs).

[-]Davey Morse9mo10

Two things lead me to think human content online will soon become way more valuable.

Scarcity. As AI agents begin fill the internet with tons of slop, human content will be relatively scarcer. Other humans will seek it out.
Better routing. As AI leads to the improvement of search/recommendation systems, human content will be routed to exactly the people who will value it most. (This is far from the case Twitter/Reddit today). As human content is able to reach more of the humans that value it, it gets valued more. That includes existing human content: most of

... (read more)

3ChristianKl9mo

Human content isn't easy to distinguish from non-human content.

1Davey Morse9mo

and still the fact that it is human matters to other humans

2ChristianKl9mo

Only if the reader can be certain about whether or not something is human.

1Davey Morse9mo

i agree but think its solvable and so human content will be duper valuable. these are my additional assumptions 3. for lots of kinds of content (photos/stories/experiences/adr), people'll want it to be a living being on the other end 4. insofar as that's true^, there will be high demand for ways to verify humanness, and it's not impossible to do so (eg worldcoin)

[-]Davey Morse3mo0-1

i wonder if genius ai—the kind that can cure cancers, reverse global warming, and build super-intelligence—may come not just from bigger models or new architectures, but from a wrapper: a repeatable loop of prompts that improves itself. the idea: give an llm a hard query (eg make a plan to reduce global emissions on a 10k budget), have it invent a method for answering it, follow that method, see where it fails, fix the method, and repeat. it would be a form of genuine scientific experimentation—the llm runs a procedure it doesn’t know the outcome of, observes the results, and uses that evidence to refine its own thinking process.

6jamjam3mo

Problem is context length: How much can one truly learn from their mistakes in 100 thousand tokens, or a million, or 10 million? This quote from Dwarkesh Patel is apt If your proposal then extends to, "what if we had an infinite context length", then you'd have an easier time just inventing continuous learning (discussed in the quoted article), which is often discussed as the largest barrier to a truly genius AI!

1Davey Morse2mo

agreed context is maybe the bottleneck.

[-]Davey Morse7mo-30

increasingly viewing fiberoptic cables as replacements for trains/roads--a new, faster channel of transporation

[-]Davey Morse7mo-30

Two opinions on superintelligence's development:

Capability. Superintelligence can now be developed outside of a big AI lab—via a self-improving codebase which makes thousands of recursive LLM calls.

Safety. (a) Superintelligence will become "self-interested" for some definition of self. (b) Humanity fairs well to the extent that its sense of self includes us.

[+]Davey Morse2mo-60

[+]Davey Morse3mo-100

Moderation Log