The notion of an AI-enabled “pivotal act” seems misguided. Aligned AI systems can reduce the period of risk of an unaligned AI by advancing alignment research, convincingly demonstrating the risk posed by unaligned AI, and consuming the “free energy” that an unaligned AI might have used to grow explosively. No particular act needs to be pivotal in order to greatly reduce the risk from unaligned AI, and the search for single pivotal acts leads to unrealistic stories of the future and unrealistic pictures of what AI labs should do.
We could maybe make the wor...
Eliezer appears to expect AI systems performing extremely fast recursive self-improvement before those systems are able to make superhuman contributions to other domains (including alignment research), but I think this is mostly unjustified. If Eliezer doesn’t believe this, then his arguments about the alignment problem that humans need to solve appear to be wrong.
My understanding of Eliezer's view is that some domains are much harder to do aligned cognition in than others, and alignment is among the hardest.
(I'm not sure I clearly understand wh...
One important factor seems to be that Eliezer often imagines scenarios in which AI systems avoid making major technical contributions, or revealing the extent of their capabilities, because they are lying in wait to cause trouble later. But if we are constantly training AI systems to do things that look impressive, then SGD will be aggressively selecting against any AI systems who don’t do impressive-looking stuff. So by the time we have AI systems who can develop molecular nanotech, we will definitely have had systems that did something slightly-less-impr
That's the hard part.
My guess is that training cutting edge models, and not releasing them is a pretty good play, or would have been, if there wasn't huge AGI hype.
As it is, information about your models is going to leak, and in most cases the fact that something is possible is most of the secret to reverse engineering it (note: this might be true in the regime of transformer models, but it might not be true for other tasks or sub-problems).
But on the other hand, given the hype, people are going to try to do the things that you're doing anyway,...
In terms of speeding up AI development, not building anything > building something and keeping it completely secret > building something that your competitors learn about > building something and generating public hype about it via demos > building something with hype and publicly releasing it to users & customers.
I think it is very helpful, and healthy for the discourse, to make this distinction. I agree that many of these things might get lumped together.
But also, I want to flag the possibility that something can be very very bad to do, e...
I like the creative thinking here.
I suggest a standard here, where can test our "emulation" against the researcher themselves, to see how much of a diff there is in their answers, and the researcher and rate how good a substitute the model is for themselves, on a number of different dimensions.
This continues to be one of the best and most important posts I have every read.
I have multiple references that corroborate that.
Can you share? I would like to have a clearer sense of what happened to them. If there's info that I don't know, I'd like to see it.
things i'm going off:
the pdf archive of Maia's blog posted by Ziz to sinseriously (I have it downloaded to backup as well)
the archive.org backup of Fluttershy's blog
Ziz's account of the event (and how sparse and weirdly guilt ridden it is for her)
several oblique references to the situation that Ziz makes
various reports about the situation posted to LW which can be found by searching Pasek
From this i've developed my own model of what ziz et al have been calling "single-good interhemispheric game theory" which is just extremely advanced and high level beatin...
I do appreciate the conciseness a lot.
It seems like I maybe would have gotten the same value from the essay (which would have taken 5 minutes to read?) as from this image (which maybe took 5 seconds).
But I don't want to create a culture that rewards snark, even more than it already does. It seems like that is the death of discourse, in a bunch of communities.
So I'm interested in if there are ways to get the benefits here, without the costs.
Downvoted, because even though I think this is reasonable point worth considering, I'm not excited about a LessWrong dominated by snarky memes, that make points, instead of essays.
I think this was excellently worded, and I'm glad you said it. I'm also glad to have read all the responses, many of which seem important and on point to me. I strong upvoted this comment as well as several of the responses.
I'm leaving this comment, because I want to give you some social reinforcement for saying what you said, and saying it as clearly and tactfully as you did.
There wasn't actually any such thing as Security, and if there ever was it would mean that it was time to overthrow the government immediately.
I held back tears at this part.
FYI, he wrote a book in which he describes his process. You want chapter 16: "How to Become and Idea Machine."
I think it would be good if someone could verify if this story is true. Is there someone with a known identity that can verify the author and confirm that this isn't a troll post?
I can verify that the owner of the blaked account is someone I have known for a significant amount of time, that he is a person with a serious, long-standing concern with AI safety (and all other details verifiable by me fit), and that based on the surrounding context I strongly expect him to have presented the story as he experienced it.
This isn't a troll.
(also I get to claim memetic credit for coining the term "blaked" for being affected by this class of AI persuasion)
I retracted this comment, because reading all of my comments here, a few years later, I feel much more compelled by my original take than by this addition.
I think the addition points out real dynamics, but that those dynamics don't take precedence over the dynamics that I expressed the first place. Those seem higher priority to me.
If someone makes correlated errors, they are better explained as part of a strategy.
That does seem right to me.
It seems like very often correlated errors are the result of a mistaken, upstream crux. They're making one mistake, which is flowing into a bunch of specific instances.
This at least has to be another hypothesis, along with "this is a conscious or unconscious strategy to get what they want."
Increasing your output bandwidth in a case like this one would just give the AI more ability to model you and cater to you specifically.
This story increases my probability that AI will lead to dead rock instead of a superintelligent sphere of computronium, expanding outwards at near the speed of light.
Manipulating humans into taking wild actions will be a much much much easier task than inventing nanotech or building von neuman probes. I can easily imagine the world ending as too many people go crazy in unprecedented ways, as a result of the actions of superhumanly emotionally intelligent AI systems, but not as part of any coordinated plan.
Strong upvote + agree. I've been thinking this myself recently. While something like the classic paperclip story seems likely enough to me, I think there's even more justification for the (less dramatic) idea that AI will drive the world crazy by flailing around in ways that humans find highly appealing.
LLMs aren't good enough to do any major damage right now, but I don't think it would take that much more intelligence to get a lot of people addicted or convinced of weird things, even for AI that doesn't have a "goal" as such. This might not directly cause...
I'm struck by how much this story drives home the hopelessness of Brain-computer interface "solutions" to alignment. The AI learned to manipulate you through a text channel. In what way would giving the AI direct access to your brain help?
While I'm not particularly optimistic about BCI solutions either, I don't think this story is strong evidence against them. Suppose that the BCI took the form of an exocortex that expanded the person's brain functions and also significantly increased their introspective awareness to the level of an inhumanly good meditator. This would effectively allow for constant monitoring of what subagents within the person's mind were getting activated in conversation, flagging those to the person's awareness in real time and letting the person notice when they were g...
It's like the whole world is about to be on new, personally-tailored, drugs.
And not being on drugs won't be an option. Because the drugs come with superpowers, and if you don't join in, you'll be left behind, irrelevant, in the dust.
This was and is already true to a lesser degree with manipulative digital socialization. The less of your agency you surrender to network X, the more your friends who have given their habits to network X will be able to work at higher speed and capacity with each other and won't bother with you. But X is often controlled by a powerful and misaligned entity.
And of course these two things may have quite a lot of synergy with each other.
The choice of music for this video is superb.
And in general, great work, guys!
Somebody who already knows the precise way in which the constellation Ursa Major outlines a bear might be like "of course!" But someone who's simply told "these points are supposed to form a bear" is unlikely to end up conceiving of this:
Um. Do bears have tails?
From googling, it looks like some of them do, but they don't have tails like that.
Have bears changed since ancient times? Or are these just the charismatic bears, which all happen to have short tails? [Another google image search suggests its not that one.] Is "bear" a mistranslation of ...
On one hand wikipedia suggests Jewish astronomers saw the three tail stars as cubs. But at the same time, it suggests several ancient civilizations independently saw Ursa Major as a bear. Also confused.
A post making a related point: https://knowingless.com/2017/10/18/me-too-on-sexual-assault/
Since this got nominated, now's a good time to jump in and note that I wish that I had chosen different terminology for this post.
I was intending for "final crunch time" to be a riff on Eliezer saying, here, that we are currently in crunch time.
This is crunch time for the whole human species, and not just for us but for the intergalactic civilization whose existence depends on us. This is the hour before the final exam and we're trying to get as much studying done as possible.
I said explicitly, in this post, "I'm going to refer to this last stretch of a fe...
Poll: Does your personal experience resonate with what you take Val to be pointing at in this post?
Options are sub-comments of this parent.
Please vote by agreeing, not upvoting, with the answer that feels right to you. Please don't click the disagree button for options you disagree with, so that we can easily tabulate numbers by checking how many people have voted.
(Open to suggestions for better ways to set up polls, for the future.)
Shouldn't the question be "which is better, a tiger or a designer tiger?"?
Isn't yet another surprising result of existing capabilities evidence that general intelligence is itself a surprising result of existing capabilities?
and I am confident that I can back out (and actually correct my intuitions) if the need arises.
Did you ever do this? Or are you still running on some top-down overwritten intuitive models?
If you did back out, what was that like? Did you do anything in particular, or did this effect fade over time?
Ideally, we would be just as motivated to carry out instrumental goals as we are to carry out terminal goals. In reality, this is not the case. As a human, your motivation system does discriminate between the goals that you feel obligated to achieve and the goals that you pursue as ends unto themselves.
I don't think that this is quite right actually.
If the psychological link between them is strong in the right way, the instrumental goal will feel as appealing as the terminal goal (because succeeding at the instrumental goal feels like making progress on th...
None of this is explicit, mind you, it's just the nature of goals. I can change the goal and I can drop the goal, but I can't hold the goal and not pursue it.
Does this mean that you only have one goal at a time? It seems like while you're pursuing one goal, you would not be pursing any of the others.
I know full well that my resolution against spending willpower against myself means that once I get addicted to something, it has to run its full course before I can be productive again. This is a nuclear option: because I know that I won't stop, I am very leery of lengthy media.
Does this mean that if you're trapped in an addictive spiral and feels terrible (eg "bloated, but still eating" or continuing to mindlessly watch youtube, even when it's not fun, and is actually painful in a muted way as you distract yourself from something") that you don't do anyt...
All the ASI-boosted humans one feel a bit tricky for me to answer, because it seems possible that we get strong aligned AI, in a distributed takeoff, but that we deploy it unwisely. Namely that world immediately collapses into Moloch, whereby everyone follows their myopic incentives off a cliff.
That cuts my odds of good outcomes by a factor of two or so.
Whoops. Answered later in the post.
Much of the value of alien civilizations might well come from the interaction of their civilization and ours, and from the fairness (which may well turn out to be a major terminal human value) of them getting their just fraction of the universe.
Won't the size of the universe-shard that a civilization controls be determined entirely by how early or late they started grabbing galaxies? Which is itself almost entirely determined by how early or late they evolved?
That doesn't sound like a fair distribution to me.
I guess we could redistribute some of...
As someone who said his share of prayers back in his Orthodox Jewish childhood upbringing, I can personally testify that praising God is an enormously boring activity, even if you're still young enough to truly believe in God. The part about praising God is there as an applause light that no one is allowed to contradict: it's something theists believe they should enjoy, even though, if you ran them through an fMRI machine, you probably wouldn't find their pleasure centers lighting up much.
I think this is typical minding. It really can be joyful to ex...
Moreover, I suspect that it would be good (in expectation) for humans to encounter aliens someday, even though this means that we’ll control a smaller universe-shard.
I suspect this would be a genuinely better outcome than us being alone, and would make the future more awesome by human standards.
I don't get this. If encountering aliens is so great, we could make it happen, even in an empty universe, by simulating evolution (and the development of civilization up to super-intelligence) and then being friends and partners with those alien civilizations, ...
That said, I, at least, am not making this error, I think:
Another concern I have is that most people seem to neglect the difference between “exhibiting an external behavior in the same way that humans do, and for the same reasons we do”, and “having additional follow-on internal responses to that behavior”.
An example: If we suppose that it’s very morally important for people to internally subvocalize “I sneezed” after sneezing, and you do this whenever you sneeze, and all your (human) friends report that they do it too, it would nonetheless be a
Yeah. I'd already read the Yudkowsky piece. I hadn't read the Muehlhauser one though!
My guess would be that the most common variety of alien is “unconscious brethren”, followed by “unconscious squiggle maximizer”, then “conscious brethren”, then “conscious squiggle maximizer”.
It might sound odd to call an unconscious entity “brother”, but it's plausible to me that on reflection, humanity strongly prefers universes with evolved-creatures doing evolved-creature-stuff (relative to an empty universe), even if none of those creatures are conscious.
Somehow, thinking of ourselves from the perspective of an unconscious alien really drives home how...
Moreover, it’s observably the case that consciousness-ascription is hyperactive. We readily see faces and minds in natural phenomena. We readily imagine simple stick-figures in comic strips experiencing rich mental lives.
A concern I have with the whole consciousness discussion in EA-adjacent circles is that people seem to consider their empathic response to be important evidence about the distribution of qualia in Nature, despite the obvious hyperactivity.
This post the single most persuasive piece of writing that I have encountered with regard to talk...
And we should expect the time machine and the infrastructure it builds to be well-defended, since "you can't make the coffee if you're dead"
Does that follow? The time machine doesn't do any planning. So I would expect that in one timeline, something happens that accidentally drops an anvil on the time machine, breaking the reset mechanism, and there's no more time loops after that.
Indeed, in practice, I expect this time machine to optimize to destroy itself, not to fill the universe with paperclips.
The "anvil dropped on the time machine" scenario seems lik...
My guess is that the aliens-control-the-universe-shard scenario is net-positive, but that it loses orders of magnitude of cosmopolitan utility compared to the “cognitively constrained humans” scenario.
It seems like something weird is happening if we claim that we expect human values to be more cosmopolitan than alien values. Is that what you're claiming?
If I understand it correctly, what happened is that some people got paid to work on this full-time.
This is about what I was going to say in response, before reading your comment.
I think the key factor that makes it different from other examples is that it was a competent person's full time job.
There are some other things that need to go right in addition to that, but I suspect that there are lots of things that people are correctly outside view gloomy about which can just be done, if someone makes it their first priority.
An obvious possible regime change is the shift to training (some) agents that do lifetime learning rather than only incorporating capability from SGD.
That's one thing simple thing that seems likely to generate a sharp left turn.