All of ekka's Comments + Replies

Not aligned on values, beliefs and moral intuitions. Plenty of humans would not kill all people alive if given the choice but there are some who would. I think the existence of doomsday cults that have tried to precipitate an armageddon give support to this claim.

Ah, so you mean that humans are not perfectly aligned with each other? I was going by the definition of "aligned" in Eliezer's "AGI ruin" post, which was Likewise, in an earlier paper I mentioned that by an AGI that "respects human values", we don't mean to imply that current human values would be ideal or static. We just mean that we hope to at least figure out how to build an AGI that does not, say, destroy all of humanity, cause vast amounts of unnecessary suffering, or forcibly reprogram everyone's brains according to its own wishes. A lot of discussion about alignment takes this as the minimum goal. Figuring out what to do with humans having differing values and beliefs would be great, but if we could even get the AGI to not get us into outcomes that the vast majority of humans would agree are horrible, that'd be enormously better than the opposite. And there do seem to exist humans who are aligned in this sense of "would not do things that the vast majority of other humans would find horrible, if put in control of the whole world"; even if some would, the fact that some wouldn't suggests that it's also possible for some AIs not to do it.

The effort from going from Chimp to Human was marginally lower but still took a huge amount of effort. It was maybe 5 million years since the last common ancestor between Chimps and Humans and taking a generation to be like 20 years that's at least 250,000 generations of a couple of thousand individuals in a complex environment with lots of processes going on. I haven't done the math but that seems like a massive amount of computation. To go from human to Von Neumann still takes a huge search process. If we think of every individual human as consisting of ... (read more)

The thing is, it's not. Evolution is optimizing for the amount of descendants. Nothing more. If being more intelligent is the way forward - nice! If having blue hair results in even more children - even better! Intelligence just happens to be what evolution decided for humans. Daisies happened to come up with liking closely cropped grasslands, which is also currently a very good strategy (lawns). The point is that evolution chooses what to try totally at random, and whatever works is good. Even if it causes complexity to be reduced, e.g. snakes loosing legs, or cave fish loosing eyes. AI work, on the other hand, is focused on specific outcome spaces, trying things which seem reasonable and avoiding things which have no chance of working. This massively simplifies things, as you can massively lower the number of combinations needed to be checked.

What is the theory of change of the AI Safety field and why do you think it has a high probability to work?

One of the few paths to victory I see is having a weakly aligned weak AGI which is not capable of recursive self-improvement and using it as a research assistant to help us solve the hard version of alignment. I don't think this has a high probability of working, but it seems probably worth trying.
3Adam Jermyn1y
I think there are many theories of change. One theory is that we want to make sure there are easy, cheap tools for making AI safe so that when someone does make an AGI, they do it safely. Other theories are shaped more like "The AI safety field should develop an AGI as soon as it can be done safely, then use that AGI to solve alignment/perform some pivotal act/etc."
7Drake Thomas1y
I think a lot of people in AI safety don't think it has a high probability of working (in the sense that the existence of the field caused an aligned AGI to exist where there otherwise wouldn't have been one) - if it turns out that AI alignment is easy and happens by default if people put even a little bit of thought into it, or it's incredibly difficult and nothing short of a massive civilizational effort could save us, then probably the field will end up being useless. But even a 0.1% chance of counterfactually causing aligned AI would be extremely worthwhile! Theory of change seems like something that varies a lot across different pieces of the field; e.g., Eliezer Yudkowsky's writing about why MIRI's approach to alignment is important seems very different from Chris Olah's discussion of the future of interpretability. It's definitely an important thing to ask for a given project, but I'm not sure there's a good monolithic answer for everyone working on AI alignment problems.

Human beings are not aligned and will possibly never be aligned without changing what humans are. If it's possible to build an AI as capable as a human in all ways that matter, why would it be possible to align such an AI?

Define "not aligned"? For instance, there are plenty of humans who, given the choice, would rather not kill every single person alive.
2mako yass1y
Most of what people call morality is conflict mediation: techniques for taking the conflicting desires of various parties and producing better outcomes for them than war. That's how I've always thought of the alignment problem. The creation of a very very good compromise that almost all of humanity will enjoy. There's no obvious best solution to value aggregation/cooperative bargaining, but there are a couple of approaches that're obviously better than just having an arms race, rushing the work, and producing something awful that's nowhere near the average human preference.
Because we're building the AI from the ground up and can change what the AI is via our design choices. Humans' goal functions are basically decided by genetic accident, which is why humans are often counterproductive. 
Indeed humans are significantly non-aligned. In order for an ASI to be non-catastrophic, it would likely have to be substantially more aligned than humans are. This is probably less-than-impossible due to the fact that the AI can be built from the get-go to be aligned, rather than being a bunch of barely-coherent odds and ends thrown together by natural selection. Of course, reaching that level of alignedness remains a very hard task, hence the whole AI alignment problem.
Assuming humans can't be "aligned", then it would also make sense to allocate resources in an attempt to prevent one of them from becoming much more powerful than all of the rest of us.
1Adam Jermyn1y
I'm not quite sure what this means. As I understand it humans are not aligned with evolution's implicit goal of "maximizing genetic fitness" but humans are (definitionally) aligned with human values. And e.g. many humans are aligned with core values like "treat others with dignity". Importantly, capability and alignment are sort of orthogonal. The consequences of misaligned AI get worse the more capable it is, but it seems possible to have aligned superhuman AI, as well as horribly misaligned weak AI.

Evolution is massively parallelized and occurs in a very complex, interactive, and dynamic environment. Evolution is also patient, can tolerate high costs such as mass extinction events and also really doesn't care about the outcome of the process. It's just something that happens and results in the filtering of the most fit genes. The amount of computation that it would take to replicate such complex, interactive, and dynamic environments would be huge. Why should we be confident that it's possible to find an architecture for general intelligence a lot mo... (read more)

7Carl Feynman1y
You ask two interesting questions, with rather separate answers.  I will discuss each in turn. First, It's plausible to think that "it's possible to find an architecture for general intelligence a lot more efficiently than evolution".  Our process of engineering development is far faster than evolution.  People get good (or bad) ideas, try stuff, copy what works, speak at conferences, publish, make theories, teach undergraduates... and the result is progress in decades instead of millions of years.  We haven't duplicated all the achievements of life yet, but we've made a start, and have exceeded it in many places.  In particular, we've recently made huge progress in AI.  GPT-3 has pretty much duplicated the human language faculty, which takes up roughly 1% of the brain.  And we've duplicated visual object recognition, which takes another few percent.  Those were done without needing evolution, so we probably don't need evolution for the remaining 90% of the mind. Second, "an intelligence that does the exact things we want" is the ideal that we're aiming for.  Unfortunately it does not seem possible to get to that, currently.  With current technology, what we get is "an intelligence that does approximately what we rewarded it for, plus some other weird stuff we didn't ask for."  It's not obvious, but it is much harder than you think to specify a set of goals that produce acceptable behavior.  And it is even harder (currently impossible) to provide any assurance that an AI will continue to follow those goals when set free to exert power in the world.
While evolution did indeed put a huge amount of effort into creating a chimp's brain, the amount of marginal effort it put into going from a chimp to a human brain was vastly lower.  And the effort of going from a human brain to John von Neumann's brain was tiny.  Consequently, once we have AI at the level of chimp intelligence or human intelligence it might not take much to get to John von Neumann level intelligence.  Very likely, having a million John von Neumann AI brains running at speeds greater than the original would quickly give us a singularity. 
1Adam Jermyn1y
We don't need to be confident in this to think that AGI is likely in the next few decades. Extrapolating current compute trends, the available compute may well be enough to replicate such environments. My guess is that we will try to create intelligence to do the things we want, but we may fail. The hard part of alignment is that succeeding at getting the thing you want from a superhuman AI seems surprisingly hard.

Who are the AI Capabilities researchers trying to build AGI and think they will succeed within the next 30 years?

5DeLesley Hutchins1y
There are numerous big corporate research labs: OpenAI, DeepMind, Google Research, Facebook AI (Meta), plus lots of academic labs. The rate of progress has been accelerating.  From 1960 - 2010 progress was incremental, and remained centered around narrow problems (chess) or toy problems.   Since 2015, progress has been very rapid, driven mainly by new hardware and big data.  Long-standing hard problems in ML/AI, such as go, image understanding, language translation, logical reasoning, etc. seem to fall on an almost monthly basis now, and huge amounts of money and intellect are being thrown at the field.  The rate of advance from 2015-2022 (only 7 years) has been phenomenal; given another 30, it's hard to imagine that we wouldn't reach an inflection point of some kind. I think the burden of proof is now on those who don't believe that 30 years is enough time to crack AGI.  You would have to postulate some fundamental difficulty, like finding out that the human brain is doing things that can't be done in silicon, that would somehow arrest the current rate of progress and lead to a new "AI winter." Historically,  AI researchers have often been overconfident.  But this time does feel different.
9Adam Jermyn1y
Among organizations, both OpenAI and DeepMind are aiming at AGI and seem confident they will get there. I don't know their internal timelines and don't know if they've stated them...

Great point! Though for what it's worth I didn't mean to be dismissive of the prediction, my main point is that the future has not yet been determined. As you indicate people can react to predictions of the future and end up on a different course.

I'm still forming my views and I don't think I'm well calibrated to state any probability with authority yet. My uncertainty still feels so high that I think my error bars would be too wide for my actual probability estimates to be useful. Some things I'm thinking about:

  • Forecasters are not that great at making forecasts greater than 5 years out according to Superforecasting IIRC and I don't think AGI is going to happen within the next 5 years.
  • AGI has not been created yet and its possible that AI development gets derailed due to other factors e.g.:
    • Political
... (read more)

Smart people were once afraid that overpopulation would lead to wide scale famine. The future is hard to predict and there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned. It would seem dubious to me for one to assign a 100% probability to any outcome based on just thought experiments of things that can happen in the future especially when there are so many unknowns. With so much uncertainty it seems a little bit premature to take on a full on doom frame.

6Donald Hobson1y
Yep. Concerned enough to start technical research on nitrogen fertilizer, selective breeding crops, etc. It might be fairer to put this in the "foreseen and prevented" basket, not the "nonsensical prediction of doom" basket.
4Rob Bensinger1y
Agreed that 'some smart people are really worried about AGI' is a really weak argument for worrying about AGI, on its own. If you're going to base your concern at deference, at the very least you need a more detailed model of what competencies are at work here, and why you don't think it's truth-conducive to defer to smart skeptics on this topic. I agree with this, as stated; though I'm guessing your probability mass is much more spread out than mine, and that you mean to endorse something stronger than what I'd have in mind if I said "the future is hard to predict" or "there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned". In particular, I think the long-term human-relevant outcomes are highly predictable if we build AGI systems and never align them: AGI systems end up steering the future to extremely low-value states, likely to optimize some simple goal that has no information content from human morality or human psychology. In that particular class of scenarios, I think there are a lot of extremely uncertain and unpredictable details (like 'what specific goal gets optimized' and 'how does the AGI go about taking control'), but we aren't equally uncertain about everything. LessWrongers generally think that you shouldn't give 100% probability to anything. When you say "100%" here, I assume you're being hyperbolic; but I don't know what sort of real, calibrated probability you think you're arguing against here, so I don't know which of 99.9%, 99%, 95%, 90%, 80%, etc. you'd include in the reasonable range of views. What are your own rough probabilities, across the broad outcome categories you consider most likely? If we were in a world where AGI is very likely to kill everyone, what present observations would you expect to have already made, that you haven't made in real life (thus giving Bayesian evidence that AGI is less likely to kill everyone)? What are some relatively-likely examples of future possible obs
5Alex Vermillion1y
There's absolutely no need to assign "100% probability to any outcome" to be worried. I wear a seatbelt because I am afraid I might one day be in a car crash despite the fact that I've not been in one yet. I understand there is more to your point, but I found that segment pretty objectionable and obviously irrelevant.

Yann Lecun published a vision on how to build an autonomous system in February. Should folks have started considering alarm bells then? Have the recent results made Lecun's vision seem more plausible now than it did back in February?