LESSWRONG
LW

WillPetillo
446Ω-516350
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Simulators vs Agents: Updating Risk Models
Substrate Needs Convergence
No wikitag contributions to display.
The best simple argument for Pausing AI?
WillPetillo16d272

The basic contention here seems to be that the biggest dangers of LLMs is not from the systems themselves, but from the overreliance, excessive trust, etc. that societies and institutions put on them.  Another is that "hyping LLMs"--which I assume includes folks here expressing concerns that AI will go rogue and take over the world--increases perceptions of AI's abilities, which feeds into this overreliance.  A conclusion is that promoting "x-risk" as a reason for pausing AI will have the unintended side effect of increasing (catastrophic, but not existential) dangers associated with overreliance.

This is an interesting idea, not least because it's a common intuition among the "AI Ethics" faction, and therefore worth hashing out.  Here are my reasons for skepticism:

1. The hype that matters comes from large-scale investors (and military officers) trying to get in on the next big thing.  I assume these folks are paying more attention to corporate sales pitches than Internet Academics and people holding protest signs--and that their background point of reference is not Terminator, but the FOMO common in the tech industry (which makes sense in a context where losing market share is a bigger threat than losing investment dollars).
2. X-risk scenarios are admittedly less intuitive in the context of self supervised learning based LLMs than they were back when reinforcement learning was at the center of development as AI learned to play increasingly broad ranges of games.  These systems regularly specification-gamed their environments and it was chilling to think about what would happen when a system could treat the entire world as a game.  A concern now, however, is that agency will make a comeback because it is economically useful.  Imagine the brutal, creative effectiveness of RL combined with the broad-based common sense of SSL.  This reintegration of agency (can't speak to the specific architecture) into leading AI systems is what the tech companies are actively developing towards.  More on this concept in my Simulators sequence.

I, for one, will find your argument more compelling if you (1) take a deep dive into AI development motivations, rather than just lumping it all together as "hype", and (2) explain why AI development stops with the current paradigm of LLM-fueled chatbots or something similarly innocuous in itself but potentially dangerous in the context of societal overreliance.

Reply
Project Moonbeam
WillPetillo18d32

The motivation of this post was to design a thought experiment involving a fully self-sufficient machine ecology that remains within constraints designed to benefit something outside of the system, not as a suggestion for how to make best use of the moon.

Reply
Aligning Agents, Tools, and Simulators
WillPetillo22d10

Agree, when discussing the alignment of simulators in this post, we are referring to safety from the subset of dangers related to unbounded optimization towards alien goals, which does not include everything within value alignment, let alone AI safety.  But this qualification points to a subtle meaning drift in use of the word "alignment" in this post (towards something like "comprehension and internalization of human values") which isn't good practice and something I'll want to figure out how to edit/fix soon.

Reply
Case Studies in Simulators and Agents
WillPetillo2mo00

I am having difficulty seeing why anyone would regard these two viewpoints as opposed.


We discuss this indirectly in the first post in this sequence outlining what it means to describe a system through the lens of an agent, tool, or simulator.  Yes, the concepts overlap, but there is nonetheless a kind of tension between them.  In the case of agent vs. simulator, our central question is: which property is "driving the bus" with respect to the system's behavior, utilizing the other in its service?

The second post explores the implications of the above distinction, predicting different types of values--and thus behavior--from an agent that contains a simulation of the world and uses it to navigate vs. a simulator that generates agents because such agents are part of the environment the system is modelling vs. a system where the modes are so entangled it is meaningless to even talk about where one ends and the other begins.  Specifically, I would expect simulator-first systems to have wide value boundaries that internalize (and approximation of) human values, but more narrow, maximizing behavior from agent-first systems.

Reply
Interest In Conflict Is Instrumentally Convergent
WillPetillo2mo30

It seems to me that the most robust solution is to do it the hard way: know the people involved really well, both directly and via reputation among people you also know really well--ideally by having lived with them in a small community for a few decades.

Reply
Why does LW not put much more focus on AI governance and outreach?
WillPetillo3mo7-1

Selection bias.  Those of us who were inclined to consider working on outreach and governance have joined groups like PauseAI, StopAI, and other orgs.  A few of us reach back on occasion to say "Come on in, the water's fine!"  The real head-scratcher for me is the lack of engagement on this topic.  If one wants to deliberate on a much higher level of detail than the average person, cool--it takes all kinds to make a world.  But come on, this is obviously high stakes enough to merit attention.

Reply
Explaining the Joke: Pausing is The Way
WillPetillo3mo10

Thanks for the link!  It's important to distinguish here between:

(1) support for the movement, 
(2) support for the cause, and
(3) active support for the movement (i.e. attracting other activists to show up at future demonstrations)

Most of the paper focuses on 1, and also on activist's beliefs about the impact of their actions.  I am more interested in 2 and 3.  To be fair, the paper gives some evidence for detrimental impacts on 2 in the Trump example.  It's not clear, however, whether the nature of the cause matters here.  Support for Trump is highly polarized and entangled with culture, whereas global warming (Hallam's cause) and AI risk (PauseAI's) have relatively broad but frustratingly lukewarm public support.  There are also many other factors when looking past short-term onlooker sentiment to the larger question of affecting social change, which the paper readily admits in the Discussion section.  I'd list these points, but they largely overlap with the points I made in my post...though it was interesting to see how much was speculative.  More research is needed.

In any case, I bring up the extreme case to illustrate that the issue is far more nuanced than "regular people get squeamish--net negative!"  This is actually somewhat irrelevant to PauseAI in particular, because most of our actions are around public education and lobbying, and even the protests are legal and non-disruptive.  I've been in two myself and have seen nothing but positive sentiment from onlookers (with the exception of the occasional "good luck with that!" snark).  The hard part with all of these is getting people to show up.  (This last paragraph is not a rebuttal to anything you have said, it's a reminder of context)

Reply
PauseAI and E/Acc Should Switch Sides
WillPetillo3mo10

My conclusion is an admittedly weaksauce non-argument, included primarily to prevent misinterpretation of my actual beliefs.  I am working on a rebuttal, but it's taking longer than I planned.  For now, see: Holly Elmore's case for AI Safety Advocacy to the Public.

Reply
FAQ: What the heck is goal agnosticism?
WillPetillo5mo30

I want to push harder on Q33: "Isn't goal agnosticism pretty fragile? Aren't there strong pressures pushing anything tool-like towards more direct agency?"

In particular, the answer: "Being unable to specify a sufficiently precise goal to get your desired behavior out of an optimizer isn't merely dangerous, it's useless!" seems true to some degree, but incomplete.  Let's use a specific hypothetical of a stock-trading company employing an AI system to maximize profits.  They want the system to be agentic because this takes the humans out of the loop on actually getting profits, but also understand that there is a risk that the system will discover unexpected/undesired methods of achieving its goals like insider trading.  There are a couple of core problems:

1. Externalized Cost: if the system can cover its tracks well enough that the company doesn't suffer any legal consequences for its illegal behavior, then the effects of insider trading on the market are "somebody else's problem."
2. Irreversible Mistake: if the company is overly optimistic about their ability to control their system, doesn't understand the risks, etc. then they might use it despite regretting this decision later.  On a large scale, this might be self-correcting if some companies have problems with AI agents and this gives the latter a bad reputation, but that assumes there are lots of small problems before a big one.

Reply
The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
WillPetillo5mo180

Glad to hear it!  If you want more detail, feel free to come by the Discord Server or send me a Direct Message.  I run the welcome meetings for new members and am always happy to describe aspects of the org's methodology that aren't obvious from the outside and can also connect you with members who have done a lot more on-the-ground protesting and flyering than I have.

As someone who got into this without much prior experience in activism, I was surprised how much subtlety and counterintuitive best practices there are, most of which is learned through direct experience combined with direct mentorship, as opposed to written down & formalized.  I made an attempt to synthesize many of the code ideas in this video--it's from a year ago and looking over it there is quite a bit I would change (spend less time on some philosophical ideas, add more detail re specific methods), but it mostly holds up OK.

Reply
Load More
6Lenses, Metaphors, and Meaning
8d
0
17Project Moonbeam
19d
2
9Emergence of Simulators and Agents
22d
0
10Agents, Simulators and Interpretability
1mo
0
11Case Studies in Simulators and Agents
2mo
8
21Aligning Agents, Tools, and Simulators
2mo
2
12Agents, Tools, and Simulators
2mo
5
14Anti-memes: x-risk edition
3mo
0
24Explaining the Joke: Pausing is The Way
3mo
2
78PauseAI and E/Acc Should Switch Sides
3mo
6
Load More