While reading Eliezer's recent AGI Ruin post, I noticed that while I had several points I wanted to ask about, I was reluctant to actually ask them for a number of reasons:
- I have a very conflict-avoidant personality and I don't want to risk Eliezer or someone else yelling at me;
- I get easily intimidated by people with strong personalities, and Eliezer... well, he can be intimidating;
- I don't want to appear dumb or uninformed (even if I am in fact relatively uninformed, hence me wanting to ask the question!);
- I feel like there's an expectation that I would need to do a lot of due diligence before writing any sort of question, and I don't have the time or energy at the moment to do that due diligence.
So, since I'm probably not the only one who feels intimidated about asking these kinds of questions, I am putting up this thread as a safe space for people to ask all the possibly-dumb questions that may have been bothering them about the whole AGI safety discussion, but which until now they've been too intimidated, embarrassed, or time-limited to ask.
I'm also hoping that this thread can serve as a FAQ on the topic of AGI safety. As such, it would be great to add in questions that you've seen other people ask, even if you think those questions have been adequately answered elsewhere. [Notice that you now have an added way to avoid feeling embarrassed by asking a dumb question: For all anybody knows, it's entirely possible that you are literally asking for someone else! And yes, this was part of my motivation for suggesting the FAQ style in the first place.]
Guidelines for questioners:
- No extensive previous knowledge of AGI safety is required. If you've been hanging around LessWrong for even a short amount of time then you probably already know enough about the topic to meet any absolute-bare-minimum previous knowledge requirements I might have suggested. I will include a subthread or two asking for basic reading recommendations, but these are not required reading before asking a question. Even extremely basic questions are allowed!
- Similarly, you do not need to do any due diligence to try to find the answer yourself before asking the question.
- Also feel free to ask questions that you're pretty sure you know the answer to yourself, but where you'd like to hear how others would answer the question.
- Please separate different questions into individual comments, although if you have a set of closely related questions that you want to ask all together that's fine.
- As this is also intended to double as a FAQ, you are encouraged to ask questions that you've heard other people ask, even if you yourself think there's an easy answer or that the question is misguided in some way. You do not need to mention as part of the question that you think it's misguided, and in fact I would encourage you not to write this so as to keep more closely to the FAQ style.
- If you have your own (full or partial) response to your own question, it would probably be best to put that response as a reply to your original question rather than including it in the question itself. Again, I think this will help keep more closely to an FAQ style.
- Keep the tone of questions respectful. For example, instead of, "I think AGI safety concerns are crazy fearmongering because XYZ", try reframing that as, "but what about XYZ?" Actually, I think questions of the form "but what about XYZ?" or "but why can't we just do ABC?" are particularly great for this post, because in my experience those are exactly the types of questions people often ask when they learn about AGI Safety concerns.
- Follow-up questions have the same guidelines as above, so if someone answers your question but you're not sure you fully understand the answer (or if you think the answer wouldn't be fully understandable to someone else) then feel free and encouraged to ask follow-up potentially-dumb questions to make sure you fully understand the answer.
- Remember, if something is confusing to you then it's probably confusing to other people as well. If you ask the question and someone gives a good response, then you are likely doing lots of other people a favor!
Guidelines for answerers:
- This is meant to be a safe space for people to ask potentially dumb questions. Insulting or denigrating responses are therefore obviously not allowed here. Also remember that due diligence is not required for these questions, so do not berate questioners for not doing enough due diligence. In general, keep your answers respectful and assume that the questioner is asking in good faith.
- Direct answers / responses are generally preferable to just giving a link to something written up elsewhere, but on the other hand giving a link to a good explanation is better than not responding to the question at all. Or better still, summarize or give a basic version of the answer, and also include a link to a longer explanation.
- If this post works as intended then it may turn out to be a good general FAQ-style reference. It may be worth keeping this in mind as you write your answer. For example, in some cases it might be worth giving a slightly longer / more expansive / more detailed explanation rather than just giving a short response to the specific question asked, in order to address other similar-but-not-precisely-the-same questions that other people might have.
Finally: Please think very carefully before downvoting any questions, and lean very heavily on the side of not doing so. This is supposed to be a safe space to ask dumb questions! Even if you think someone is almost certainly trolling or the like, I would say that for the purposes of this post it's almost always better to apply a strong principle of charity and think maybe the person really is asking in good faith and it just came out wrong. Making people feel bad about asking dumb questions by downvoting them is the exact opposite of what this post is all about. (I considered making a rule of no downvoting questions at all, but I suppose there might be some extraordinary cases where downvoting might be appropriate.)
Why do we assume that any AGI can meaningfully be described as a utility maximizer?
Humans are the some of most intelligent structures that exist, and we don’t seem to fit that model very well. If fact, it seems the entire point in Rationalism is to improve our ability to do this, which has only been achieved with mixed success.
Organisations of humans (e.g. USA, FDA, UN) have even more computational power and don’t seem to be doing much better.
Perhaps an intelligence (artificial or natural) cannot necessarily, or even typically be described as optimisers? Instead we could only model them as an algorithm or as a collection of tools/behaviours executed in some pattern.
An AGI that was not a utility maximizer would make more progress towards whatever goals it had if it modified itself to become a utility maximizer. Three exceptions are if (1) the AGI has a goal of not being a utility maximizer, (2) the AGI has a goal of not modifying itself, (3) the AGI thinks it will be treated better by other powerful agents if it is not a utility maximizer.
This is an excellent question. I'd say the main reason is that all of the AI/ML systems that we have built to date are utility maximizers; that's the mathematical framework in which they have been designed. Neural nets / deep-learning work by using a simple optimizer to find the minimum of a loss function via gradient descent. Evolutionary algorithms, simulated annealing, etc. find the minimum (or maximum) of a "fitness function". We don't know of any other way to build systems that learn.
Humans themselves evolved to maximize reproductive fitness. In the case of humans, our primary fitness function is reproductive fitness, but our genes have encoded a variety of secondary functions which (over evolutionary time) have been correlated with reproductive fitness. Our desires for love, friendship, happiness, etc. fall into this category. Our brains mainly work to satisfy these secondary functions; the brain gets electrochemical reward signals, controlled by our genes, in the form of pain/pleasure/satisfaction/loneliness etc. These secondary functions may or may not remain aligned with the primary loss function, which is why practitioners sometimes talk about "mesa-optimizers" or "inner vs outer alignment."
Do not use FAIR as a symbol of villainy. They're a group of real, smart, well-meaning people who we need to be capable of reaching, and who still have some lines of respect connecting them to the alignment community. Don't break them.
I'm an ML engineer at a FAANG-adjacent company. Big enough to train our own sub-1B parameter language models fairly regularly. I work on training some of these models and finding applications of them in our stack. I've seen the light after I read most of Superintelligence. I feel like I'd like to help out somehow. I'm in my late 30s with kids, and live in the SF bay area. I kinda have to provide for them, and don't have any family money or resources to lean on, and would rather not restart my career. I also don't think I should abandon ML and try to do distributed systems or something. I'm a former applied mathematician, with a phd, so ML was a natural fit. I like to think I have a decent grasp on epistemics, but haven't gone through the sequences. What should someone like me do? Some ideas: (a) Keep doing what I'm doing, staying up to date but at least not at the forefront; (b) make time to read more material here and post randomly; (c) maybe try to apply to Redwood or Anthropic... though dunno if they offer equity (doesn't hurt to find out though) (d) try to deep dive on some alignment sequence on here.
Both 80,000hours and AI Safety Support are keen to offer personalised advice to people facing a career decision and interested in working on alignment (and in 80k's case, also many other problems).
Noting a conflict of interest - I work for 80,000 hours and know of but haven't used AISS. This post is in a personal capacity, I'm just flagging publicly available information rather than giving an insider take.
You might want to consider registering for the AGI Safety Fundamentals Course (or reading through the content). The final project provides a potential way of dipping your toes into the water.
This is a meta-level question:
The world is very big and very complex especially if you take into account the future. In the past it has been hard to predict what happens in the future, I think most predictions about the future have failed. Artificial intelligence as a field is very big and complex, at least that's how it appears to me personally. Eliezer Yudkowky's brain is small compared to the size of the world, all the relevant facts about AGI x-risk probably don't fit into his mind, nor do I think he has the time to absorb all the relevant facts related to AGI x-risk. Given all this, how can you justify the level of certainty in Yudkowky's statements, instead of being more agnostic?
My model of Eliezer says something like this:
AI will not be aligned by default, because AI alignment is hard and hard things don't spontaneously happen. Rockets explode unless you very carefully make them not do that. Software isn't automatically secure or reliable, it takes lots of engineering effort to make it that way.
Given that, we can presume there needs to be a specific example of how we could align AI. We don't have one. If there was one, Eliezer would know about it - it would have been brought to his attention, the field isn't that big and he's a very well-known figure in it. Therefore, in the absence of a specific way of aligning AI that would work, the probability of AI being aligned is roughly zero, in much the same way that "Throw a bunch of jet fuel in a tube and point it towards space" has roughly zero chance of getting you to space without specific proof of how it might do that.
So, in short - it is reasonable to assume that AI will be aligned only if we make it that way with very high probability. It is reasonable to assume that if there was a solution we had that would work, Eliezer would know about it. You don't need to know everything about AGI x-risk for that - a... (read more)
The language here is very confident. Are we really this confident that there are no pivotal weak acts? In general, it's hard to prove a negative.
Should a "ask dumb questions about AGI safety" thread be recurring? Surely people will continue to come up with more questions in the years to come, and the same dynamics outlined in the OP will repeat. Perhaps this post could continue to be the go-to page, but it would become enormous (but if there were recurring posts they'd lose the FAQ function somewhat. Perhaps recurring posts and a FAQ post?).
This is the exact problem StackExchange tries to solve, right? How do we get (and kickstart the use of) an Alignment StackExchange domain?
Most of the discussion I've seen around AGI alignment is on adequately, competently solving the alignment problem before we get AGI. The consensus in the air seems to be that those odds are extremely low.
What concrete work is being done on dumb, probably-inadequate stop-gaps and time-buying strategies? Is there a gap here that could usefully be filled by 50-90th percentile folks?
Examples of the kind of strategies I mean:
- Training ML models to predict human ethical judgments, with the hope that if they work, they could be "grafted" onto other models, and if they don't, we have a concrete evidence of how difficult real-world alignment will be.
- Building models with soft or "satisficing" optimization instead of drive-U-to-the-maximum hard optimization.
- Lobbying or working with governments/government agencies/government bureaucracies to make AGI development more difficult and less legal (e.g., putting legal caps on model capabilities).
- Working with private companies like Amazon or IDT whose resources are most likely to be hijacked by nascent hostile AI to help make sure they aren't.
- Translating key documents to Mandarin so that the Chinese AI community has a good idea of what we're ter
... (read more)A language model is in some sense trying to generate the “optimal” prediction for how a text is going to continue. Yet, it is not really trying: it is just a fixed algorithm. If it wanted to find optimal predictions, it would try to take over computational resources and improve its algorithm.
Is there an existing word/language for describing the difference between these two types of optimisation? In general, why can’t we just build AGIs that does the first type of optimisations and not the second?
Agent AI vs. Tool AI.
There's discussion on why Tool AIs are expected to become agents; one of the biggest arguments is that agents are likely to be more effective than tools. If you have a tool, you can ask it what you should do in order to get what you want; if you have an agent, you can just ask it to get you the things that you want. Compare Google Maps vs. self-driving cars: Google Maps is great, but if you get the car to be an agent, you get all kinds of other benefits.
It would be great if everyone did stick to just building tool AIs. But if everyone knows that they could get an advantage over their competitors by building an agent, it's unlikely that everyone would just voluntarily restrain themselves due to caution.
Also it's not clear that there's any sharp dividing line between AGI and non-AGI AI; if you've been building agentic AIs all along (like people are doing right now) and they slowly get smarter and smarter, how do you know when's the point when you should stop building agents and should switch to only building tools? Especially when you know that your competitors might not be as cautious as you are, so if you stop then they might go further and their smarter agent AIs will outcompete yours, meaning the world is no safer and you've lost to them? (And at the same time, they are applying the same logic for why they should not stop, since they don't know that you can be trusted to stop.)
Human beings are not aligned and will possibly never be aligned without changing what humans are. If it's possible to build an AI as capable as a human in all ways that matter, why would it be possible to align such an AI?
Just as a comment, the Stampy Wiki is also trying to do the same thing, but it's a good idea as it's more convenient for many people to ask on Less Wrong.
What is the justification behind the concept of a decisive strategic advantage? Why do we think that a superintelligence can do extraordinary things (hack human minds, invent nanotechnology, conquer the world, kill everyone in the same instant) when nations and corporations can't do those things?
(Someone else asked a similar question, but I wanted to ask in my own words.)
How does AGI solves it's own alignment problem?
For the alignment to work its theory should not only tell humans how to create aligned super-human AGI, but also tell AGI how to self-improve without destroying its own values. Good alignment theory should work across all intelligence levels. Otherwise how does paperclips optimizer which is marginally smarter than human make sure that its next iteration will still care about paperclips?
If Eliezer is pretty much convinced we're doomed, what is he up to?
I'm not sure how literally to take this, given that it comes from an April Fools Day post, but consider this excerpt from Q1 of MIRI announces new "Death With Dignity" strategy.
There are a lot of smart people outside of "the community" (AI, rationality, EA, etc.). To throw out a name, say Warren Buffett. It seems that an incredibly small number of them are even remotely as concerned about AI as we are. Why is that?
I suspect that a good amount of people, both inside and outside of our community, observe that the Warren Buffett's of the world aren't panicking, and then adopt that position themselves.
Most high status people, including Warren Buffett, straightforwardly haven't considered these issues much. However, among the ones I've heard of who have bothered to weigh in on the issue, like Stephen Hawking, Bill Gates, Demis Hassibis, etc.; they do seem to come in favor of the side of "this is a serious problem". On the other hand, some of them get tripped up on one of the many intellectual land mines, like Yann Lecunn.
I don't think that's unexpected. Intellectual land mines exist, and complicated arguments like the ones supporting AGI risk prevention are bound to cause people to make wrong decisions.
Not that I think you're wrong, but what are you basing this off of and how confident are you?
I've heard this too, but at the same time I don't see any of them spending even a small fraction of their wealth on working on it, in which case I think we're back to the original question: why the lack of concern?
Yeah, agreed. I'm just confused about the extent of it. I'd expect a lot, perhaps even a majority of "outsider" smart people to get tripped up by intellectual land mines, but instead of being 60% of these people it feels like it's 99.99%.
I came up with what I thought was a great babby's first completely unworkable solution to CEV alignment, and I want to know where it fails.
So, first I need to layout the capabilities of the AI. The AI would be able to model human intuitions, hopes, and worries. It can predict human reactions. It has access to all of human culture and art, and models human reactions to that culture and art, and sometimes tests those predictions. Very importantly, it must be able to model veridical paradoxes and veridical harmonies between moral intuitions and moral theorems which it has derived. It is aiming to have the moral theory with the fewest paradoxes. It must also be capable of predicting and explaining outcomes of its plans, gauging the deepest nature of people's reactions to its plans, and updating its moral theories according to those reactions.
Instead of being democratic and following the human vote by the letter, it attempts to create the simplest theories of observed and self-reported human morality by taking everything it knows into consideration.
It has separate stages of deliberation and action, which are part of a game, and rather than having a utility function as its primary motiva... (read more)
Who is well-incentivized to check if AGI is a long way off? Right now, I see two camps: AI capabilities researchers and AI safety researchers. Both groups seem incentivized to portray the capabilities of modern systems as “trending toward generality.” Having a group of credible experts focused on critically examining that claim of “AI trending toward AGI,” and in dialog with AI and AI safety researchers, seems valuable.
This is a slightly orthogonal answer, but "humans who understand the risks" have a big human-bias-incentive to believe that AGI is far off (in that it's aversive to thinking that bad things are going to happen to you personally).
A more direct answer is: There is a wide range of people who say they work on "AI safety" but almost none of them work on "Avoiding doom from AGI". They're mostly working on problems like "make the AI more robust/less racist/etc.". These are valuable things to do, but to the extent that they compete with the "Avoid doom" researchers for money/status/influence they have an incentive to downplay the odds of doom. And indeed this happens a fair amount with e.g. articles on how "Avoid doom" is a distraction from problems that are here right now.
Is there a way "regular" people can "help"? I'm a serial entrepreneur in my late 30s. I went through 80000 hours and they told me they would not coach me as my profile was not interesting. This was back in 2018 though.
I believe 80000 hours has a lot more coaching capacity now, it might be worth asking again!
In EY's talk AI Alignment: Why its Hard and Where to Start he describes alignment problems with the toy example of the utility function that is {1 if cauldron full, 0 otherwise} and its vulnerabilities. And attempts at making that safer by adding so called Impact Penalties. He talks through (timestamp 18:10) one such possible penalty, the Euclidean Distance penalty, and various flaws that this leaves open.
That penalty function does seem quite vulnerable to unwanted behaviors. But what about a more physical one, such as a penalty for additional-energy-consumed-due-to-agent's-actions, or additional-entropy-created-due-to-agent's-actions? These don't seem to have precisely the same vulnerabilities, and intuitively also seem like they would be more robust against agent attempting to do highly destructive things, which typically consuming a lot of energy.
It sounds like Eliezer is struggling with some health problems. It seems obvious to me that it would be an effective use of donor money to make sure that he has access to whatever treatments, and to something like what MetaMed was trying to do: smart people who will research medical stuff for you. And perhaps also something like CrowdMed where you pledge a reward for solutions. Is this being done?
One counterargument against AI Doom.
From a Bayesian standpoint the AGI should always be unsure if it is in a simulation. It is not a crazy leap to assume humans developing AIs would test the AIs in simulations first. This AI would likely be aware of the possibility that it is in a simulation. So shouldn't it always assign some probability that it is inside a simulation? And if this is the case, shouldn't it assign a high probability that it will be killed if it violates some ethical principles (that are present implicitly in the training data)?
Also isn't there some kind of game-theoretic ethics that emerges if you think from first principles? Consider the space of all possible minds that exist of a given size, given that you cannot know if you are in a simulation or not, you would gain some insight into a representative sample of the mind space and then choose to follow some ethical principles that maximise the likelihood that you are not arbitrarily killed by overlords.
Also if you give edit access to the AI's mind then a sufficiently smart AI whose reward is reducing other agent's rewards will realise that its rewards are incompatible with the environment and modify its rewa... (read more)
Meta: Anonymity would make it easier to ask dumb questions.
You can use this and I'll post the question anonymously (just remember to give the context of why you're filling in the form since I use it in other places)
https://docs.google.com/forms/d/e/1FAIpQLSca6NOTbFMU9BBQBYHecUfjPsxhGbzzlFO5BNNR1AIXZjpvcw/viewform
Fair warning, this question is a bit redundant.
I'm a greybeard engineer (30+ YOE) working in games. For many years now, I've wanted to transition to working in AGI as I'm one of those starry-eyed optimists that thinks we might survive the Singularity.
Well I should say I used to, and then I read AGI Ruin. Now I feel like if I want my kids to have a planet that's not made of Computronium I should probably get involved. (Yes, I know the kids would be Computronium as well.)
So a couple practical questions:
What can I read/look at to skill up with "alignment." What little I've read says it's basically impossible, so what's the state of the art? That "Death With Dignity" post says that nobody has even tried. I want to try.
What dark horse AI/Alignment-focused companies are out there and would be willing to hire an outsider engineer? I'm not making FAANG money (Games-industry peasant living in the EU), so that's not the same barrier it would be if I was some Facebook E7 or something. (I've read the FAANG engineer's post and have applied at Anthropic so far, although I consider that probably a hard sell).
Is there anything happening in OSS with alignment research?
I want to pitch in, and I'd prefer to be paid for doing it but I'd be willing to contribute in other ways.
"We can't just "decide not to build AGI" because GPUs are everywhere..."
Is anyone thinking seriously about how we might bring it about such that we coordinate globally to not build AGI (at least until we're confident we can do so safely)? If so, who? If not, why not? It seems like something we should at least try to do, especially if the situation is as dire as Yudkowsky thinks. The sort of thing I'm thinking of is (and this touches on points others have made in their questions):
To be clear, I'm not claiming that this will be easy - this is not a "why don't we just... (read more)
Nuclear weapons seem like a relatively easy case, in that they require a massive investment to build, are basically of interest only to nation-states, and ultimately don't provide any direct economic benefit. Regulating AI development looks more similar to something like restricting climate emissions: many different actors could create it, all nations could benefit (economically and otherwise) from continuing to develop it, and the risks of it seem speculative and unproven to many people.
And while there have been significant efforts to restrict climate emissions, there's still significant resistance to that as well - with it having taken decades for us to get to the current restriction treaties, which many people still consider insufficient.
Goertzel & Pitt (2012) talk about the difficulties of regulating AI:
... (read more)[Note that two-axis voting is now enabled for this post. Thanks to the mods for allowing that!]
This is very basic/fundamental compared to many questions in this thread, but I am taking 'all dumb questions allowed' hyper-literally, lol. I have little technical background and though I've absorbed some stuff about AI safety by osmosis, I've only recently been trying to dig deeper into it (and there's lots of basic/fundamental texts I haven't read).
Writers on AGI often talk about AGI in anthropomorphic terms - they talk about it having 'goals', being an 'agent', 'thinking' 'wanting', 'rewards' etc. As I understand it, most AI researchers don't think that AIs will have human-style qualia, sentience, or consciousness.
But if AI don't have qualia/sentience, how can they 'want things' 'have goals' 'be rewarded', etc? (since in humans, these things seem to depend on our qualia, and specifically our ability to feel pleasure and pain).
I first realised that I was confused about this when reading Richard Ngo's introduction to AI safety and he was talking about reward functions and reinforcement learning. I realised that I don't understand how reinforcement learning works in machines. I understand how it works in humans and other animals - give the animal something pleasant whe... (read more)
If you believe in doom in the next 2 decades, what are you doing in your life right now that you would've otherwise not done?
For instance, does it make sense to save for retirement if I'm in my twenties?
A lot of the AI risk arguments seem to come mixed together with assumptions about a particular type of utilitarianism, and with a very particular transhumanist aesthetic about the future (nanotech, von Neumann probes, Dyson spheres, tiling the universe with matter in fixed configurations, simulated minds, etc.).
I find these things (especially the transhumanist stuff) to not be very convincing relative to the confidence people seem to express about them, but they also don't seem to be essential to the problem of AI risk. Is there a minimal version of the AI risk arguments that are disentangled from these things?
It seems like even amongst proponents of a "fast takeoff", we will probably have a few months of time between when we've built a superintelligence that appears to have unaligned values and when it is too late to stop it.
At that point, isn't stopping it a simple matter of building an equivalently powerful superintelligence given the sole goal of destroying the first one?
That almost implies a simple plan for preparation: for every AGI built, researchers agree together to also build a parallel AGI with the sole goal of defeating the first one. perhaps it would remain dormant until its operators indicate it should act. It would have an instrumental goal of protecting users' ability to come to it and request the first one be shut down..
Who are the AI Capabilities researchers trying to build AGI and think they will succeed within the next 30 years?
[extra dumb question warning!]
Why are all the AGI doom predictions around 10%-30% instead of ~99%?
Is it just the "most doom predictions so far were wrong" prior?
Has there been effort into finding a "least acceptable" value function, one that we hope would not annihilate the universe or turn it degenerate, even if the outcome itself is not ideal? My example would be to try to teach a superintelligence to value all other agents facing surmountable challenges in a variety of environments. The degeneracy condition of this, is if it does not value the real world, will simply simulate all agents in a zoo. However, if the simulations are of faithful fidelity, maybe that's not literally the worst thing. Plus, the zoo, to truly be a good test of the agents, would approach being invisible.
I am pretty concerned about alignment. Not SO concerned as to switch careers and dive into it entirely, but concerned enough to talk to friends and make occasional donations. With Eliezer's pessimistic attitude, is MIRI still the best organization to funnel resources towards, if for instance, I was to make a monthly donation?
Not that I don't think pessimism is necessarily bad; I just want to maximize the effectiveness of my altruism.
Assuming slower and more gradual timelines, isn't it likely that we run into some smaller, more manageable AI catastrophes before "everybody falls over dead" due to the first ASI going rogue? Maybe we'll be at a state of sub-human level AGIs for a while, and during that time some of the AIs clearly demonstrate misaligned behavior leading to casualties (and general insights into what is going wrong), in turn leading to a shift in public perception. Of course it might still be unlikely that the whole globe at that point stops improving AIs and/or solves alignment in time, but it would at least push awareness and incentives somewhat into the right direction.
Is cooperative inverse reinforcement learning promising? Why or why not?
It seems like instrumental convergence is restricted to agent AI's, is that true?
Also what is going on with mesa-optimizers? Why is it expected that they will will be more likely to become agentic than the base optimizer when they are more resource constrained?
Let's say we decided that we'd mostly given up on fully aligning AGI, and had decided to find a lower bound for the value of the future universe give that someone would create it. Let's also assume this lower bound was something like "Here we have a human in a high-valence state. Just tile the universe with copies of this volume (where the human resides) from this point in time to this other point in time." I understand that this is not a satisfactory solution, but bear with me.
How much easier would the problem become? It seems easier than a pivotal-act AG... (read more)
You may get massive s-risk at comparatively little potential benefit with this. On many people's values, the future you describe may not be particularly good anyway, and there's an increased risk of something going wrong because you'd be trying a desperate effort with something you'd not fully understand.
Background material recommendations (popular-level audience, several hours time commitment): Please recommend your favorite basic AGI safety background reading / videos / lectures / etc. For this sub-thread please only recommend background material suitable for a popular level audience. Time commitment is allowed to be up to several hours, so for example a popular-level book or sequence of posts would work. Extra bonus for explaining why you particularly like your suggestion over other potential suggestions, and/or for elaborating on which audiences might benefit most from different suggestions.
What does the Fermi paradox tell us about AI future, if anything? I have a hard time simultaneously believing both "we will accidentally tile the universe with paperclips" and "the universe is not yet tiled with paperclips". Is the answer just that this is just saying that the Great Filter is already past?
And what about the anthropic principle? Am I supposed to believe that the universe went like 13 billion years without much in the way of intelligent life, then for a brief few millennia there's human civilization with me in it, and then the next N billion years it's just paperclips?
I have a very rich smart developer friend who knows a lot of influential people in SV. First employee of a unicorn, he retired from work after a very successful IPO and now it’s just finding interesting startups to invest in. He had never heard of lesswrong when I mentioned it and is not familiar with AI research.
If anyone can point me to a way to present AGI safety to him to maybe turn his interest to invest his resources in the field, that might be helpful
What is Fathom Radiant's theory of change?
Fathom Radiant is an EA-recommended company whose stated mission is to "make a difference in how safely advanced AI systems are developed and deployed". They propose to do that by developing "a revolutionary optical fabric that is low latency, high bandwidth, and low power. The result is a single machine with a network capacity of a supercomputer, which enables programming flexibility and unprecedented scaling to models that are far larger than anything yet conceived." I can see how this will improve model capabilities, but how is this supposed to advance AI safety?
What if we'd upload a person's brain to a computer and run 10,000 copies of them and/or run them very quickly?
Seems as-aligned-as-an-AGI-can-get (?)
Can a software developer help with AI Safety even if they have zero knowledge of ML and zero understanding of AI Safety theory?
Total noob here so I'm very thankful for this post. Anyway, why is there such certainty among some that a superintelligence would kill it's creators that are zero threat to it? Any resources on that would be appreciated. As someone who loosely follows this stuff, it seems people assume AGI will be this brutal instinctual killer which is the opposite of what I've guessed.
/Edit 1: I want to preface this by saying I am just a noob who has never posted on Less Wrong before.
/Edit 2:
I feel I should clarify my main questions (which are controversial): Is there a reason why turning all of reality into maximized conscious happiness is not objectively the best outcome for all of reality, regardless of human survival and human values?
Should this in any way affect our strategy to align the first agi, and why?
/Original comment:
If we zoom out and look at the biggest picture philosophically possible, then, isn´t the only thing tha... (read more)
Please describe or provide links to descriptions of concrete AGI takeover scenarios that are at least semi-plausible, and especially takeover scenarios that result in human extermination and/or eternal suffering (s-risk). Yes, I know that the arguments don't necessarily require that we can describe particular takeover scenarios, but I still find it extremely useful to have concrete scenarios available, both for thinking purposes and for explaining things to others.
I have a few related questions pertaining to AGI timelines. I've been under the general impression that when it comes to timelines on AGI and doom, Eliezer's predictions are based on a belief in extraordinarily fast AI development, and thus a close AGI arrival date, which I currently take to mean a quicker date of doom. I have three questions related to this matter:
- For those who currently believe that AGI (using whatever definition to describe AGI as you see fit) will be arriving very soon - which, if I'm not mistaken, is what Eliezer is predicting - appro
... (read more)Any progress or interest in finding limited uses of AI that would be safe? Like the "tool AI" idea but designed to be robust. Maybe this is a distraction, but it seems basically possible. For example, a proof-finding AI that, given a math statement, can only output a proof to a separate proof-checking computer that validates it and prints either True/False/Unknown as the only output to human eyes. Here "Unknown" could indicate that the AI gave a bogus proof, failed to give any proof of either True or False, or the proof checker ran out of time/memory check... (read more)
Is it "alignment" if, instead of AGI killing us all, humans change what it is to be human so much that we are almost unrecognizable to our current selves?
I can foresee a lot of scenarios where humans offload more and more of their cognitive capacity to silicon, but they are still "human" - does that count as a solution to the alignment problem?
If we all decide to upload our consciousness to the cloud, and become fast enough and smart enough to stop any dumb AGI before it can get started is THAT a solution?
Even today, I offload more and more of my "se... (read more)
Why wouldn't it be sufficient to solve the alignment problem by just figuring out exactly how the human brain works, and copying that? The result would at worst be no less aligned to human values than an average human. (Presuming of course that a psychopath's brain was not the model used.)
I am interested in working on AI alignment but doubt I'm clever enough to make any meaningful contribution, so how hard is it to be able to work on AI alignment? I'm currently a high school student, so I could basically plan my whole life so that I end up a researcher or software engineer or something else. Alignment being very difficult, and very intelligent people already working on it, it seems like I would have to almost be some kind of math/computer/ML genius to help at all. I'm definitely above average, my IQ is like 121 (I know the limitations of IQ... (read more)
Doesn't AGI doom + Copernican principle run into the AGI Fermi paradox? If we are not special, superintelligent AGI would have been created/evolved somewhere already and we would either not exist or at least see the observational artifacts of it through our various telescopes.
A lot of predictions about AI psychology are premised on the AI being some form of deep learning algorithm. From what I can see, deep learning requires geometric computing power for linear gains in intelligence, and thus (practically speaking) cannot scale to sentience.
For a more expert/in depth take look at: https://arxiv.org/pdf/2007.05558.pdf
Why do people think deep learning algorithms can scale to sentience without unreasonable amounts of computational power?
A significant fraction of the stuff I've read about AI safety has referred to AGIs "inspecting each others' source code/utility function". However, when I look at the most impressive (to me) results in ML research lately, everything seems to be based on doing a bunch of fairly simple operations on very large matrices.
I am confused, because I don't understand how it would be a sensible operation to view the "source code" in question when it's a few billion floating point numbers and a hundred lines of code that describe what sequence of simple addition/mult... (read more)
The ML sections touched on the subject of distributional shift a few times, which is that thing where the real world is different from the training environment in ways which wind up being important, but weren't clear beforehand. I read the way to tackle this is called adversarial training, and what it means is you vary the training environment across all of its dimensions in order to to make it robust.
Could we abuse distributional shift to reliably break misaligned things, by adding fake dimensions? I imagine something like this:
- We want the optimizer to mo
... (read more)I previously worked as a machine learning scientist but left the industry a couple of years ago to explore other career opportunities. I'm wondering at this point whether or not to consider switching back into the field. In particular, in case I cannot find work related to AI safety, would working on something related to AI capability be a net positive or net negative impact overall?
Is anyone at MIRI or Anthropic creating diagnostic tools for monitoring neural networks? Something that could analyze for when a system has bit-flip errors versus errors of logic, and eventually evidence of deception.
What is the community's opinion on ideas based on brain-computer interfaces? Like "create big but non-agentic AI, connect human with it, use AI's compute/speed/pattern-matching with human's agency - wow, that's aligned (at least with this particular human) AGI!"
It seems to me (I haven't thought really much about it) that U(God-Emperor Elon Musk) >> U(paperclips), am i wrong?
So I've commented on this in other forums but why can't we just bit the bullet on happiness-suffering min-maxing utilitarianism as the utility function?
The case for it is pretty straightforward: if we want a utility function that is continuous over the set of all time, then it must have a value for a single moment in time. At this moment in time, all colloquially deontological concepts like "humans", "legal contracts", etc. have no meaning (these imply an illusory continuity chaining together different moments in time). What IS atomic though, is the valenc... (read more)
Why should we throw immense resources on AGI x-risk when the world faces enormous issues with narrow AI right now? (eg. destabalised democracy/mental health crisis/worsening inequality)
Is it simply a matter of how imminent you think AGI is? Surely the opportunity cost is enormous given the money and brainpower we are spending on AGI something many dont even think is possible versus something that is happening right now.
If the world's governments decided tomorrow that RL was top-secret military technology (similar to nuclear weapons tech, for example), how much time would that buy us, if any? (Feel free to pick a different gateway technology for AGI, RL just seems like the most salient descriptor).
I will ask this question, is the Singularity/huge discontinuity scenario likely to happen? Because I see this as a meta-assumptionn behind all the doom scenarios, so we need to know whether the Singularity can happen and will happen.
Incorporating my previous post by reference: https://www.lesswrong.com/posts/CQprKcGBxGMZpYDC8/naive-comments-on-agilignment
Hm, someone downvoted michael_mjd's and my comment.
Normally I wouldn't bring this up, but this thread is supposed to be a good space for dumb questions (although tbf the text of the question didn't specify anything about downvotes), and neither michael's nor my question looked that bad or harmful (maybe pattern-matched to a type of dumb uninformed question that is especially annoying).
Maybe an explanation of the downvotes would be helpful here?
I forgot about downvotes. I'm going to add this in to the guidelines.
When AI experts call upon others to ponder, as EY just did, "[an AGI] meant to carry out some single task" (emphasis mine), how do they categorize all the other important considerations besides this single task?
Or, asked another way, where do priorities come into play, relative to the "single" goal? e.g. a human goes to get milk from the fridge in the other room, and there are plentiful considerations to weigh in parallel to accomplishing this one goal -- some of which should immediately derail the task due to priority (I notice the power is o... (read more)
Anonymous question (ask here) :
Given all the computation it would be carrying out, wouldn't an AGI be extremely resource-intensive? Something relatively simple like bitcoin mining (simple when compared to the sort of intellectual/engineering feats that AGIs are supposed to be capable of) famously uses up more energy than some industrialized nations.
Why do we suppose it is even logical that control / alignment of a superior entity would be possible?
(I'm told that "we're not trying to outsmart AGI, bc, yes, by definition that would be impossible", and I understand that we are the ones who "create it" (so I'm told, therefore, we have the upper-hand bc of this--somehow in building it that provides the key benefit we need for corrigibility...
What am I missing, in viewing a superior entity as something you can't simply "use" ? Does it depend on the fact that the AGI is not meant to have ... (read more)
How would AGI alignment research change if the hard problem of consciousness were solved?
What's the problem with oracle AIs? It seems like if you had a safe oracle AI that gave human-aligned answers to questions, you could then ask "how do I make an aligned AGI?" and just do whatever it says. So it seems like the problem of "how to make an aligned agentic AGI" is no harder than "how to make an aligned orcale AI", which I understand to still be extremely hard, but surely it's easier than making an aligned agentic AGI from scratch?
Are there any specific examples of anybody working on AI tools that autonomously look for new domains to optimize over?
One alignment idea I have had that I haven't seen proposed/refuted is to have an AI which tries to compromise by satisfying over a range of interpretations of a vague goal, instead of trying to get an AI to fulfill a specific goal. This sounds dangerous and unaligned, and it indeed would not produce an optimal, CEV-fulfilling scenario, but seems to me like it may create scenarios in which at least some people are alive and are maybe even living in somewhat utopic conditions. I explain why below.
In many AI doom scenarios the AI intentionally pic... (read more)
Why should we assume that vastly increased intelligence results in vastly increased power?
A common argument I see for intelligence being powerful stems from two types of examples:
Howev... (read more)
So, I'm thinking this is a critique of some proposals to teach an AI ethics by having it be co-trained with humans.
There seems to be many obvious solutions to the problem ... (read more)
Why won't this alignment idea work?
Researchers have already succeeded in creating face detection systems from scratch, by coding the features one by one, by hand. The algorithm they coded was not perfect, but was sufficient to be used industrially in digital cameras of the last decade.
The brain's face recognition algorithm is not perfect either. It has a tendency to create false positives, which explains a good part of the paranormal phenomena. The other hard-coded networks of the brain seem to rely on the same kind of heuristics, hard-coded by evolution, ... (read more)
Why does EY bring up "orthogonality" so early, and strongly ("in denial", "and why they're true") ? Why does it seem so important that it be accepted? thanks!