Bostroms definition of the control problem in 'Superintelligence' only refer to "harming the projects interests", which you are right is broader than existential risk. However, the immediate context makes it clear that Bostrom is discussing existential risk. The "harm" referred to does not include things like gender bias.
On reflection, I don't actually believe that AI Alignment has ever exclusively referred to existential risk from AI. I do believe that talk about "AI Alignment" on LessWrong has usually primarily been about existential risk. I further thin...
I was unclear. Let me elaborate:
"AGI-Completeness" is the idea that a large class of tasks have the same difficulty, roughly analogous to "Turing-Completeness" and "NP-Completenes".
My claim in the post is that I doubt OpenAI's hope that the task "Alignment Research" will turn out to be strictly easier than any dangerous task.
My claim in my comment above refers to the relative difficulty of 2 tasks:
I fully agree that it is a factual question, and OpenAI could easily shed light on the circumstances around the launch if they chose to do so.
Maybe the underlying reason why we are interpreting the evidence in different ways is because we are holding OpenAI to different standards:
Compared to a standard company, having a feedback button is evidence of competence. Quickly incorporating training data is also a positive update, as is having an explicit graphical representation of illegitimate questions.
I am comparing OpenAI to the extremely high standard of "Being able to solve the alignment problem". Against this standard, having a feedback button is absolutely expected, and even things like Eliezers suggestion (publishing hashes of your gambits) should be obvious to companies competent enough to have a chance of solving the alignment problem.
On reflection, I agree that it is only weak evidence. I agree we know nothing about damage. I agree that we have no evidence that this wasn't the planned strategy. Still, the evidence the other way (that this was deliberate to gather training data) is IMHO weaker.
My point in the "Review" section is that OpenAI's plan committed them to transparency about these questions, and yet we have to rely on speculations.
Eliezer: OpenAI probably thought they were trying hard at precautions; but they didn't have anybody on their team who was really creative about breaking stuff, let alone anyone as creative as the combined Internet; so it got jailbroken in like a day after something smarter looked at it.
I think this is very weak evidence. "Jailbreaking it" did as far as I know no damage. At least I haven't seen anybody point to any damage created. On the other hand, it did give OpenAI training data it could use to fix many of the holes.
Even if you don't agree with that strategy, I see no evidence that this wasn't the planned strategy.
I haven't seen a rigorous treatment of the concept of AGI-completeness. Here are some suggested AGI complete problems:
I don't have a solid answer, but I would be surprised if the task "Write the book 'Superintelligence'" required less general intelligence than "full self-driving from NY to SF".
I would be excited to see Rational Animations try to cover the Hard Problem of Corrigibility: https://arbital.com/p/hard_corrigibility/
I believe that this would be the optimal video to create for the optimization target "reduce probability of AI-Doom". It seems (barely) plausible that someone really smart could watch the video, make a connection to some obscure subject none of us know about, and then produce a really impactful contribution to solving AI Alignment.
Talking concretely, what does a utility function look like that is so close to a human utility function that an AI system has it after a bunch of training, but which is an absolute disaster?
A simple example could be that the humans involved in the initial training are negative utilitarians. Once the AI is powerful enough, it would be able to implement omnicide rather than just curing diseases.
Thus in order to arrive at a conclusion of doom, it is not enough to argue that we cannot align AI perfectly.
I am open to being corrected, but I do not recall ever seeing a requirement of "perfect" alignment in the cases made for doom. Eliezer Yudkowsky in "AGI Ruin: A List of Lethalities" only asks for 'this will not kill literally everyone'.
Without investigating these empirical details, it is unclear whether a particular qualitatively identified force for goal-directedness will cause disaster within a particular time.
A sufficient criteria for a desire to cause catastrophe (distinct from having the means to cause catastrophe) is if the AI is sufficiently goal-directed to be influenced by Stephen Omohundro's "Basic AI Drives".
For instance, take an entity with a cycle of preferences, apples > bananas = oranges > pears > apples. The entity notices that it sometimes treats oranges as better than pears and sometimes worse. It tries to correct by adjusting the value of oranges to be the same as pears. The new utility function is exactly as incoherent as the old one.
It is possible that an AI will try to become more coherent and fail, but we are worried about the most capable AI and cannot rely on the hope that it will fail such a simple task. Being coherent is easy if the fruits are instrumental: Just look up the prices of the fruits.
However if we think that utility maximization is difficult to wield without great destruction, then that suggests a disincentive to creating systems with behavior closer to utility-maximization. Not just from the world being destroyed, but from the same dynamic causing more minor divergences from expectations, if the user can’t specify their own utility function well.
A strategically aware utility maximizer would try to figure out what your expectations are, satisfy them while preparing a take-over, and strike decisively without warning. We should not expect to see an intermediate level of "great destruction".
I prefer "AI Safety" over "AI Alignment" because I associate the first more with Corrigibility, and the second more with Value-alignment.
It is the term "Safe AI" that implies 0% risk, while "AI Safety" seems more similar to "Aircraft Safety" in acknowledging a non-zero risk.
The epistemic shadow argument further requires that the fast takeoff leads to something close to extinction.
This is not the least impressive thing I expect GPT-4 won't be able to do~.
I should have explained what I mean by "always (10/10)": If you generate 10 completions, you expect with 95% confidence that all 10 satisfies the criteria.
All the absolute statements in my post should be turned down from 100% to 99.5%. My intuition is that if less than 1 in 200 ideas are valuable, it will not be worthwhile to have the model contribute to improving itself.
Intelligence Amplification
GPT-4 will be unable to contribute to the core cognitive tasks involved in AI programming.
The Gato paper from DeepMind actually shows, if you look at their data, that they’re still getting better transfer effects if you train in domain than if you train across all possible tasks.
This probably refers to figure 9 in A Generalist Agent, which compares generalization given:
From DeepMind's results in the figure, it looks like 3. almost always outperforms 2., though I would hesitate to draw strong conclusions from this figur...
In-universe, Mecha-Godzilla had to be built with a Godzilla-skeleton, which caused both to turn against Humanity.
It feels probable that there will be substantial technical similarities between Production Superintelligences and Alignment Superintelligences, which could cause both of them to turn against us.
(Epistemic state: Low confidence)
The inclusion criteria states:
Tasks that are completely beyond the capabilities of current language models are also encouraged
It's easy to come up with a benchmark that requires a high but unspecified level of intelligence. An extreme example would be to ask for a proof that P!=NP - we have no idea about the difficulty of the task, though we suspect that it requires superintelligence. To be valuable, the challenge of a benchmark needs to possible to relate to meaningful capabilities, such as "The Human Level".
Most people couldn't answer questions about...
Thank you for a very thought-provoking post.
My layman understanding of the VDV is that their goals are primarily political ("Anti-coup") and meant for rapid deployment to counter uprisings etc. rather than maximizing military effectiveness. This reflects how they were used in Ukraine - contrary to their expectations, this was a real war and not an uprising.
Giving disproportionate ressources to "Republican Guard" units seem like a common pattern in authoritarian countries.
I think we are close to agreeing with each other on how we expect the future to look. I certainly agree that real world impact is discontinuous in metrics, though I would blame practical matters rather than poor metrics.
I only have a vague idea what is meant by language models contributing to GDP.
Current language models are actually quite reliable when you give them easy questions. Practical deployment of language models are sometimes held to very high standards of reliability and lack of bias, possibly for regulatory, social or other practical reasons. Yet I personally know someone who works in customer service and is somewhat racist and not very reliable.
I am not sure I understand your counterbet. I would guess most translation is already automated, most programmers use automated tools already and most Internet "journalism" is already computer generated.
I claim that most coordination-tasks (defined very broadly) in our civilization could be done by language models talking to each other, if we could overcome the enormous obstacle of getting all relevant information into the prompts and transferring the completions to "the real world".
I agree with this in principle, but in practice I think current language models are much too bad for this to be on the cards.
Assume PaLM magically improved to perform 2 standard deviations above the human average. In my model, this would have a very slow effect on GDP....
Thank you for this important caveat. As an imperfect bayesian, I expect that if I analyzed the benchmark, I would update towards a belief that the results are real, but less impressive than the article makes them appear.
:)
Assume that as a consequence of being in the Paul-verse, regulatory and other practical obstacles are possible to overcome in a very cost-effective way. In this world, how much value does current language models create?
I would answer that in this obstacle-free world, they create about 10% of global GDP and this share would be rapidly increasing. This is because a large set of valuable tasks are both simple enough that models could understand them, and possible to transform into a prompt completion task.
The argument is meant as a reductio: Language models d...
I struggle to understand your first sentence. Do you cash out "Useful" as "Having the theoretical ability to do a task"? As in: "If an AI benchmarks better than humans at a task, but don't generate revenue, the reason must be that the AI is not actually capable of doing the task".
In the Paul-verse, how does AI contribute substantially to GDP at AI capability levels between "Average Human" and "Superintelligence"?
It seems (to me) that the reasons are practical issues, inertia, regulatory, bureaucracy, conservatism etc., and not "Lack of AI Capability". As a...
According to this image, the performance is generally above the human average:
In the Paul-verse, we should expect that economic interests would quickly cause such models to be used for everything that they can be profitably used for. With better-than-average-human performance, that may well be a doubling of global GDP.
In the Eliezer-verse, the impact of such models on the GDP of the world will remain around $0, due to practical and regulatory constraints, right up until the upper line ("Human (Best)") is surpassed for 1 particular task.
The BIG-Bench paper that those 'human' numbers are coming from (unpublished, quasi-public as TeX here) cautions against taking those average very seriously, without giving complete details about who the humans are or how they were asked/incentivized to behave on tasks that required specialized skills:
My take as someone who thinks along similar lines to Paul is that in the Paul-verse, if these models aren't being used to generate a lot of customer revenue then they are actually not very useful even if some abstract metric you came up with says they do better than humans on average.
It may even be that your metric is right and the model outperforms humans on a specific task, but AI has been outperforming humans on some tasks for a very long time now. It's just not easy to find profitable uses for most of those tasks, in the sense that the total consumer surplus generated by being able to perform them cheaply and at a high quality is low.
It is possible that Putins political goals involve dismantling Ukraine along with the complete subjugation of the Ukrainian people. Nuclear weapons could thus have a desired political effect, in addition to their substantial practical effects.
As a first approximation, if the food production falls to a level of X% of the required calories for the population, your probability of surviving is roughly X%.
Even a full counter-value nuclear exchange would not destroy all of our ability to produce food. Cities would be the primary targets, and they are net importers of food. Civilization might not even collapse with the removal of the 3000 largest cities in the western world.
I have a higher probability that Putin will launch the first nuke at Kiev. I think he might think that all other scenarios end with a Russian defeat and his personal untimely death. Russia is already a pariah state, and there is comparatively little for him to lose. Nuking Kiev would have the side-effect of making civilians flee urban centers in Ukraine, dramatically increasing the probability of a conventional Russian victory.
Yes, it was in fact this post. Thank you
Absolutely fascinating link - strong upvote! Han et al.,2013 did not investigate motor control and motion planning, but I agree that human neural cells probably are just better, though possibly requiring more energy.
From martial arts, I'm convinced people have different innate levels of motor control and motion planning, and this helps nontrivially in fights. However, brains and muscles both require energy, and I'd generally give the advantage to the person with +1std muscles over the person with +1std motor control, assuming both are untrained.
no, I think a human in a big animal body, with brain adapted to operate that body instead of our own, would beat a big animal straightforwardly
According to https://en.wikipedia.org/wiki/File:Brain-body_mass_ratio_for_some_animals_diagram.svg the hippopotamus would be the animal that "gained" the most from having a human brain, assuming that brain-body mass ratio indicates intelligence.
I could see the Ronaldo-brained hippopotamus winning by planning, cooperating and learning more from experience, but I'm not seeing a big advantage in a straight-up fight.
I think human neural cells are 'just better'; there's some evidence in mice to this effect:
...We found that the human glial chimeras indeed performed better than control mice across a variety of learning tasks, that included auditory fear conditioning, novel object and place recognition, and Barnes maze navigation. In all of these tests - but not in any test of social interactivity or primary perception - the human glial chimeras performed better and acquired new causal associations more quickly than did murine-allografted or untransplanted controls (Han et a
Paul Christiano makes a slightly different claim here: https://www.lesswrong.com/posts/7MCqRnZzvszsxgtJi/christiano-cotra-and-yudkowsky-on-ai-progress?commentId=AiNd3hZsKbajTDG2J
As I read the two claims:
I think the second could be made into a bet. I tried to operationalise it as a reply to the linked comment.
How long time do you see between "1 AI clearly on track to Foom" and "First AI to actually Foom"? My weak guess is Eliezer would say "Probably quite little time", but your model of the world requires the GWP to double over a 4 year period, and I'm guessing that period probably starts later than 2026.
I would be surprised if by 2027, I could point to an AI that for a full year had been on track to Foom, without Foom happening.
Yes, we are still running, though at a bi-weekly schedule. We will discuss Paul Christiano's "Another (Outer) Alignment failure story" on the 8th of July.
I made my most strident and impolite presentation yet in the AISafety.com Reading Group last night. We were discussing "Conversation with Ernie Davis", and I attacked this part:
"And once an AI has common sense it will realize that there’s no point in turning the world into paperclips..."
I described this as fundamentally mistaken and like an argument you'd hear from a person that had not read "Superintelligence". This is ad hominem, and it pains me. However, I feel like the emperor has no clothes, and calling it out explicitly is important.
Thank you for your answer, and good luck with the Alignment Research Center.
In the interview with AI Impacts, you said:
...examples of things that I’m optimistic about that they [people at MIRI] are super pessimistic about are like, stuff that looks more like verification...
Are you still optimistic? What do you consider the most promising recent work?
I don't think my view has changed too much (I don't work in the area so don't pay as much attention or think about it as often as I might like).
The main updates have been:
Today, I bought 20 shares in Gamestop / GME. I expect to lose money, and bought them as a hard-to-fake signal about willingness to coordinate and cooperate in the game-theoretic sense. This was inspired by Eliezer Yudkowsky's post here: https://yudkowsky.medium.com/
In theory, Moloch should take all the ressources of someone following this strategy. In practice, Eru looks after her own, so I have the money to spare.
Thank you. I've preordered now.
I'd also like to pre-order from Denmark
The AISafety.com Reading Group discussed this blog post when it was posted. There is a fair bit of commentary here: https://youtu.be/7ogJuXNmAIw
Hi Howie,
Thank you for reminding me of these four documents. I had seen them, but I dismissed them early in the process. That might have been a mistake, and I'll read them carefully now.
I think you did a great job at the interview. I describe one place where you could have pushed back more here: https://youtu.be/_kNvExbheNA?t=1376 You asked: "...Assume that among the things that these narrow AIs are really good at doing, one of them is programming AI...", and Ben Garfinkel made a broad answer about "doing science".
Recording of the session:
Eric Drexler requested that I did not upload a recording to YouTube. Before the session, I compiled this document with most of the questions:
https://www.dropbox.com/s/i5oqix83wsfv1u5/Comprehensive_AI_Services_Q_A.pptx?dl=0
We did not get to post the last few questions. Are there any questions from this list you would like me to try to remember the answers to?
Wikipedia claims that "it is faster in cases where n > 100 or so" https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm
The introduction of this Wikipedia article seems to describe these improvements as practically useful.
In my video, I describe one of the breakthroughs in matrix multplication after Strassen as "Efficient parallelization, like MapReduce, in the nineties". This insight is used in practice, though some of the other improvements I mention are not practical.
In the section "Finding the secret sauce", you...
I'll do both: