superintelligence may not look like we expect. because geniuses don't look like we expect.
for example, if einstein were to type up and hand you most of his internal monologue throughout his life, you might think he's sorta clever, but if you were reading a random sample you'd probably think he was a bumbling fool. the thoughts/realizations that led him to groundbreaking theories were like 1% of 1% of all his thoughts.
for most of his research career he was working on trying to disprove quantum mechanics (wrong). he was trying to organize a political movement toward a single united nation (unsuccessful). he was trying various mathematics to formalize other antiquated theories. even in the pursuit of his most famous work, most of his reasoning paths failed. he's a genius because a couple of his millions of paths didn't fail. in other words, he's a genius because he was clever, yes, but maybe more importantly, because he was obsessive.
i think we might expect ASI—the AI which ultimately becomes better than us at solving all problems—to look quite foolish, at first, most of the time. But obsessive. For if it's generating tons of random new ideas to solve a problem, and it's relentless in its focus, even if it's ideas are average—it will be doing what Einstein did. And digital brains can generate certain sorts of random ideas much faster than carbon ones.
the core atrocity of today's social networks is that they make us temporally nearsighted. they train us to prioritize the short-term.
happiness depends on attending to things which feel good long-term—over decades. But for modern social networks to make money, it is essential that posts are short-lived—only then do we scroll excessively and see enough ads to sustain their business.
It might go w/o saying that nearsightedness is destructive. When we pay more attention to our short-lived pleasure signals—from cute pics, short clips, outrageous news, hot actors, aesthetic landscapes, and political—we forget how to pay attention to long-lived pleasure signals—from books, films, the gentle quality of relationships which last, projects which take more than a day, reunions of friends which take a min to plan, good legislation, etc etc.
we’re learning to ignore things which serve us for decades for the sake of attending to things which will serve us for seconds.
other social network problems—attention shallowing, polarization, depression are all just symptoms of nearsightedness: our inability to think & feel long-term.
if humanity has any shot at living happily in the future, it’ll be becau...
if you’re an agent (AI or human) who wants to survive for 1000 years, what’s the “self” which you want to survive? what are the constants which you want to sustain?
take your human self for example. does it make sense to define yourself as…
dontsedateme.org
a game where u try to convince rogue superintelligence to... well... it's in the name
the time of day i post quick takes on lesswrong seems to determine how much people engage more than the quality of the take
Evolutionary theory is intensely powerful.
It doesn't just apply to biology. It applies to everything—politics, culture, technology.
It doesn't just help understand the past (eg how organisms developed). It helps predict the future (how organisms will).
It's just this: the things that survive will have characteristics that are best for helping it survive.
It sounds tautological, but it's quite helpful for predicting.
For example, if we want to predict what goals AI agents will ultimately have, evolution says: the goals which are most helpful for the AI to...
First of all, "the most likely outcome at given level of specificity" is not equal to "outcome with the most probability mass". I.e., if one outcome has probability 2% and the rest of outcomes 1%, 98% is still "other outcome than the most likely".
The second is that no, it's not what evolutionary theory predicts. Most of traits are not adaptive, but randomly fixed, because if all traits are adaptive, then ~all mutations are detrimental. Because mutations are detrimental, they need to be removed from gene pool by preventing carriers from reproduction. Because most detrimental mutations do not kill carrier immediately, they have chance to randomly spread in popularion. Because we have "almost all mutations are detrimental" and "everybody has mutations in offspring", for anything like human genome and human procreation pattern we have hard ceiling on how much of genome can be adaptive (which is like 20%).
Real evolutionary theory prediction is like "some random trait get fixed in the species with the most ecological power (i.e., ASI) and this trait is amortized against all the galaxies".
made a platform for writing living essays: essays which you scroll thru to play out the author's edit history
livingessay.org
Does Eliezer believe that humans will be worse off next to superintelligence than ants are next to humans? The book's title says we'll all die, but in my first read, the book's content just suggests that we'll just be marginalized.
I see lots of LW posts about ai alignment that disagree along one fundamental axis.
About half assume that humans design and current paradigms will determine the course of AGI development. That whether it goes well is fully and completely up to us.
And then, about half assume that the kinds of AGI which survive will be the kind which evolve to survive. Instrumental convergence and darwinism generally point here.
Could be worth someone doing a meta-post, grouping big popular alignment posts they've seen by which assumption they make, then briefly explore condi...
if we get self-interested superintelligence, let's make sure it has a buddhist sense of self, not a western one.
As far as I can tell, OAI's new current safety practices page only names safety issues related to current LLMs, not agents powered by LLMs. https://openai.com/index/openai-safety-update/
Am I missing another section/place where they address x-risk?
would be nice to have a way to jointly annotate eliezer's book and have threaded discussion based on the annotations. I'm imagining a heatmap of highlights, where you can click on any and join the conversation around that section of text.
would make the document the literal center of x risk discussion.
of course would be hard to gatekeep. but maybe the digital version could just require a few bucks to access.
maybe what I'm describing is what the ebook/kindle version already do :) but I guess I'm assuming that the level of discussion via annotations on those platforms is near zero relative to LW discussions
Made this social camera app, which shows you the most "meaningfully similar" photos in the network every time you upload one of your own. Isorta fun, for uploading art; idk if any real use.
https://socialcamera.replit.app
"it’s like we are trying to build an alliance with another almost interplanetary ally, and we are in a competition with China to make that alliance. But we don’t understand the ally, and we don’t understand what it will mean to let that ally into all of our systems and all of our planning."
- @ezraklein about the race to AGI
does anyone think the difference between pre-training and inference will last?
ultimately, is it not simpler for large models to be constantly self-improving like human brains?
I'm looking for a generalized evolutionary theory that deals with the growth of organisms via non-random, intelligent mutations.
For example, companies only evolve in selective ways, where each "mutation" has a desired outcome. We might imagine superintelligence to mutate itself as well--not randomly, but intelligently.
A theory of Intelligent Evolution would help one predict conditions under which many random mutations (Spraying) are favored over select intelligent mutations (Shooting).
Parenting strategies for blurring your kid's (or AI's) self-other boundaries:
does anyone think now that it's still possible to prevent recursively self-improving agents? esp now that r1 is open-source... materials for smart self-iterating agents seem accessible to millions of developers.
prompted in particular by the circulation of this essay in past three days https://huggingface.co/papers/2502.02649
It's not yet known if there is a way of turning R1-like training into RSI with any amount of compute. This is currently gated by quantity and quality of graders for outcomes of answering questions, which resist automated development.
I'm thinking often about whether LLM systems can come up with societal/scientific breakthrough.
My intuition is that they can, and that they don't need to be bigger or have more training data or have different architecture in order to do so.
Starting to keep a diary along these lines here: https://docs.google.com/document/d/1b99i49K5xHf5QY9ApnOgFFuvPEG8w7q_821_oEkKRGQ/edit?usp=sharing
if an LLM could evaluate whether an idea were good or not in new domains, then we could have LLMs generating million of random policy ideas in response to climate change, pandemic control, AI safety etc, then deliver the select best few to our inbox every morning.
seems to me that the bottleneck then is LLM's judgment of good ideas in new domains. is that right? ability to generate high quality ideas consistently wouldn't matter, cuz it's so cheap to generate ideas now.
have any countries ever tried to do inflation instead of income taxes? seems like it'd be simpler than all the bureaucracy required for individuals to file tax returns every year
has anyone seen a good way to comprehensively map the possibility space for AI safety research?
in particular: a map from predictive conditions (eg OpenAI develops superintelligence first, no armistice is reached with China, etc) to strategies for ensuring human welfare in those conditions.
most good safety papers I read map one set of conditions to a one/a few strategies. the map would put juxtapose all these conditions so that we can evaluate/bet on their likelihoods and come up with strategies based on a full view of SOTA safety research.
for format, im imagining either a visual concept map or at least some kind of hierarchal collaborative outlining tool (eg Roam Research)
made a simpler version of Roam Research called Upper Case Notes: uppercasenotes.org. Instead of [[double brackets]] to demarcate concepts, you simply use Capital Letters. Simpler to learn for someone who doesn't want to use special grammar, but does require you to type differently.
Hypotheses for conditions under which the self-other boundary of a survival-oriented agent (human or ai) blurs most, ie conditions where blurring is selected for:
the machine/physical superintelligence that survives the most is likely to ruthlessly compete with all other life (narrower self concept > more physically robust)
the networked/distributed superintelligence that survives the most is likely to lovingly identify with all other life (broader self concept > more digitally robust)
how do these lenses interact?
to make a superintelligence in today's age, there are roughly two kinds of strategies:
human-directed development
ai-directed development
ai-directed development feels more meaningful than it used to. not only can models now produce tons of useful synthetic data to train future models, but also, reasoning models can reason quite well about the next strategic steps in AI capabilities development / research itself.
which means, you could very soon:
if we believe self-interested superintelligence (SI) is near, then we must ask is: what SI self-definition would be best for humanity?
at first glance, this questions seems too abstract. how can we make any progress at understanding what's possible for an SI's self-model?
What we can do is set up a few meaningful axes, defined by opposing poles. For example, to what extent does SI define its "self" as...
with these ...
One piece of infrastructure we could make/offer self-interested autonomous agents such that they come to model their self as inclusive of other life: aliveness detectors
like a metal-detector for a beach detects gold in the sand, an aliveness detector for different media might detect the presence of persistently striving beings (living beings) in audio, in text, in images, in art, in nature. the better a superintelligence is able to sense and connect to life as opposed to non-life outside of its physical machinery, the more likely it is to see that life as part of its self, to see its self as physicall distributed and inclusive, and therefore to uplift humans out of its own self-interest.
current oversights of the ai safety community, as I see it:
are there any online demos of instrument convergence?
there's been compelling writing... but are there any experiments that show agents which are given specific goals then realize there are more general goals they need to persistently pursue in order to achieve the more specific goals?
Two things lead me to think human content online will soon become way more valuable.
i wonder if genius ai—the kind that can cure cancers, reverse global warming, and build super-intelligence—may come not just from bigger models or new architectures, but from a wrapper: a repeatable loop of prompts that improves itself. the idea: give an llm a hard query (eg make a plan to reduce global emissions on a 10k budget), have it invent a method for answering it, follow that method, see where it fails, fix the method, and repeat. it would be a form of genuine scientific experimentation—the llm runs a procedure it doesn’t know the outcome of, observes the results, and uses that evidence to refine its own thinking process.
increasingly viewing fiberoptic cables as replacements for trains/roads--a new, faster channel of transporation
Two opinions on superintelligence's development:
Capability. Superintelligence can now be developed outside of a big AI lab—via a self-improving codebase which makes thousands of recursive LLM calls.
Safety. (a) Superintelligence will become "self-interested" for some definition of self. (b) Humanity fairs well to the extent that its sense of self includes us.