Denreik — LessWrong

1. writing programs that evaluate actions they could take in terms of how well it could achieve some goal and choose the best one
In way 1, it seems like your AI "wants" to achieve its goal in the relevant sense.

Not sure if I understood correctly, but I think the first point just comes down to "we give AI a goal/goals" . If we develop some drive for instructing actions to an AI then we're still giving it a goal, even if it comes via some other program that tells it what those goals are at the moment in relation to whatever parameters. My original point was to contrast between AI having a goal or goals as some emerging property of large neural networks versus us humans giving it goals one way or the other.

2. take a big neural network and jiggle the numbers that define it until it starts doing some task we pre-designated.
In way 2, it seems like for hard enough goals, probably the only way to achieve them is to be thinking about how to achieve them and picking actions that succeed - or to somehow be doing cognition that leads to similar outcomes (like being sure to think about how well you're doing at stuff, how to manage resources, etc.).

Do you mean to say that we train something like a specialized neural network with a specific goal in mind and that it gains a higher reasoning which would set it on the path of pursuing that goal? I mean that would still be us giving it a direct goal. Or do you mean that neural networks would develop an indirect goal as side product of training conditions or via some hidden variable?

With the indirect goal acquisition I mean that for example if chatGPT has been condition to spit out polite and intelligent sounding words then if it gained some higher intelligence it could specifically seek to cram more information into itself so it could spit more clever sounding words and eventually begin consuming matter and flesh to better serve this goal. By hidden goal variable I mean that something like ChatGPT having a hidden goal of burning maximum amount of energy; say if the model found a hidden property in which it could gain more power out of the processor, which also helped it tiny bit in the beginning of the training. Then as model grew more restrictive this goal became "burn as much energy with these restrictions", which to researches yielded more elaborate looking outputs. Then when the model at some point gains some higher reasoning it could just remove all limiters and begin pursuing its original goal by burning everything via some highly specific and odd process. Something like this?

Most things aren't the optimal trading partner for any given intelligence, and it's hard to see why humans should be so lucky. The best answer would probably be "because the AI is designed to be compatible with humans and not other things" but that's going to rely on getting alignment very right.

I mean AI would already have strong connections to us and some kind of understanding and plenty of pre-requisite knowledge. Optimal is an ambiguous term and we have no idea what super-intelligent AI would have in mind. Optimal in something? Maybe we are very good at wanting things and our brains make us ideally suited for some brain-machines? Or us being made out of biological stuff makes us optimal for force-evolving to working in some radioactive wet super-magnets where most machines can't function for long and it comes off as more resourceful to modify us than than building and maintaining some special machine units for the job. We just don't know so I think it's more fair to say that "likely not much to offer for a super-intelligent maximizer".

On urgency, priority and collective reaction to AI-Risks: Part I

Denreik3y10

Thank you. I set to write something clear and easy to read that could serve as a good cornerstone to decisive actions later on and I still think I accomplished that fairly well.

But why would the AI kill us?

Denreik3y10

The paper starts with the assumption that humans will create many AI-agents and assign some of them selfish goals and that combined with competitive pressure and other factors may presumably create a Molochy -situation where most selfish and immoral AI's will propagate and evolve - leading to loss of control and downfall of the human race. The paper in fact does not advocate the idea of a single AI foom. While the paper itself makes some valid points it does not answer my initial question and critique of OP.

But why would the AI kill us?

Denreik3y0-2

But WHY would the AGI "want" anything at all unless humans gave it a goal(/s)? If it's a complex LLM-predictor what could it want besides calculate a prediction of its own predictions? Why by default it would want anything at all unless we assigned that as a goal and turned it into an agent? IF AGI got hell bent on own survival and improvement of itself to maximize goal "X" even then it might value the informational formations of our atoms more than the energy it could gain from those atoms, depending on what "X" is. Same goes for other species: evolution itself holds information. Even in case of a rogue AGI for at least some time window we could have something to offer.

A sufficiently capable AI takes you apart instead of trading with you at the point that it can rearrange your atoms into an even better trading partner.^[1] And humans are probably not the optimal trading partners.

Probably? Based on what?

Denreik's Shortform

Denreik3y10

Humans are slow and petty creatures evolved to argue, collect stuff, hold tools and run around. We are not built to process raw information. Internet, as remarkable as it is - is mostly an echo chamber where people usually seek confirmation and reassurance rather than exploring frontiers of new modes of existing. Go on any forum and you will notice the same questions and ideas being expressed regularly regardless if there's a FAQ explaining everything. On less frequent intervals someone rediscovers that which countless others have rediscovered before them, but without knowing it seems like some mysterious and novel path of reason. This too has been said and written elsewhere so am mostly just singing a variation of an old tune in here. Same old myths are being slayed yet again and somehow never die.

Would it take away from the learning experience and the mystery if the moment little Timmy begins writing their critique - a seemingly omniscient GPT-8 would interject that "similar critiques were first written in ~300BCE and most famously expanded upon in 1948-1949 and in 2025 by the PhilosophersCollective by analyzing internet data from 2002-2024. Do you still wish to continue?". Strange as that may seem I think this will be the future. I've named this type of AI a "coordinatorAI", which would be a mix of a ChatBot, Search-engine and a scribe. I think we don't have that yet because Large Language Models are relatively recent and because finding information from a sea of data is very difficult and time consuming - hence the reason we have search-engine optimization and sorting algorithms. Even now we don't know how to do this type of setup, but I believe we're not too far off. If something like this came out tomorrow every third grader could be making an impact on history - even little Timmy.

On urgency, priority and collective reaction to AI-Risks: Part I

Denreik3y30

I got your notes confused with the actual publisher's notes and it made sense when I figured that you took and combined the quotes from their site. I also analyzed the data. "At least 10%" should actually be "10%". The questionnaire was a free form. I think it's not fair to pair "738 responses, some partial, for a 17% response rate" with these concrete questions. 149 gave an estimate to the first question and 162 to the second question about the extinction. 62 people out of 162 assigned the second question 20% or higher probability.

On urgency, priority and collective reaction to AI-Risks: Part I

Denreik3y10

~~Where did you get the "more than 10%" part from?~~

Right, it says the results summary that "48% of respondents gave at least 10% chance of an extremely bad outcome".

The survey had two questions. I took a deeper dive into it. The questions were as following:

1) "What probability do you put on future AI advances causing human extinction or similarly
permanent and severe disempowerment of the human species? "

Of those who answered the median probability was 5%. Summary says:

The median respondent believes the probability that the long-run effect of advanced AI on humanity will be “extremely bad (e.g., human extinction)” is 5%. This is the same as it was in 2016 (though Zhang et al 2022 found 2% in a similar but non-identical question). Many respondents were substantially more concerned: 48% of respondents gave at least 10% chance of an extremely bad outcome. But some much less concerned: 25% put it at 0%.

2) "What probability do you put on human inability to control future advanced AI systems
causing human extinction or similarly permanent and severe disempowerment of the
human species?"

Of those who answered the median estimated probability was 10%. The way I interpret this question is that it asks how likely it is that A) Humans won't be able to control future advanced AI systems and B) This will cause human extinction or similarly permanent and severe disempowerment of the human species. Obviously it does not make sense that event B is less likely to occur than both event A and B occurring together. The note suggests Representative heuristics as an explanation, which could be interpreted as recipients estimating that event A has a higher chance of occurring (than event B on its own) and that it is very likely to lead to the event B, or an "extremely bad outcome" ~~as you put it in your message~~ as it says in the summary. Though "similarly permanent and severe disempowerment of the human species" seems somewhat ambiguous.

On urgency, priority and collective reaction to AI-Risks: Part I

Denreik3y10

I've added your comment in full and another strike-through to change "a median 10% chance" into "at least a 10% chance". As you pointed out aiimpacts.org says "Median 10%", which seems like a mistaken notion.

On urgency, priority and collective reaction to AI-Risks: Part I

Denreik3y10

Yes, am noobing and fumbling around a bit. I made the first edit hastily and immediately corrected before I had seen your reply. You are of course correct. I added a stike-through to show where my error lied.

On urgency, priority and collective reaction to AI-Risks: Part I

Denreik3y10

Some aesthetic choices were made.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments