catubc - LessWrong

The case for stopping AI safety research

Sorry, fixed broken link now.

The problem with "understanding the concept of intent" - is that intent and goal formation are some of the most complex notions in the universe involving genetics, development, psychology, culture and everything in between. We have been arguing about what intent - and correlates like "well-being" mean - for the entire history of our civilization. It looks like we have a good set of no-nos (e.g. read the UN declaration on human rights) - but in terms of positive descriptions of good long term outcomes it gets fuzzy. There we have less guidance, though I guess trans- and post-humanism seems to be a desirable goal to many.

The case for stopping AI safety research

catubc2d10

Seth. I just spoke about this work at ICML yesterday. Some other similar works:

Eliezers work from way back in 2004. https://intelligence.org/files/CEV.pdf. I haven't read it in full - but it's about AIs that interact with human volition - which is what I'm also worried about.

Christiano's: https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like. This is a lot about slow take offs and AI's that slowly become unstoppable or unchangeable because they become part of our economic world.

My paper on arxiv is a bit of a long read (GPT-it) : https://arxiv.org/abs/2305.19223 But it tries to show where some of the weak points in human volition and intention generation are - and why we (i.e. "most developers and humanity in general") still think of human reasoning in a mind-body dualistic framework: i.e. there's a core to human thought, goal selection and decisoin making - that can never be corrupted or manipulated. We've already discovered loads of failure modes - and we weren't even faced with omnipotent-like opponents. (https://www.sog.unc.edu/sites/www.sog.unc.edu/files/course_materials/Cognitive%20Biases%20Codex.pdf). The other point main point my work makes is that when you apply enough pressure on an aligned AI/AGI to find an optimal solution or "intent" you have for a problem that is too hard to solve - the solution it will eventually find is to change the "intent" of the human.

The case for stopping AI safety research

catubc2mo105

Thanks Garrett. There is obviously nuance that a 1min post can't get at. I am just hoping for at least some discussion to be had on this topic. There seems to be little to none now.

Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

catubc1y70

Thanks for the comment. I agree broadly of course, but the paper says more specific things. For example, agency needs to be prioritized, probably taken outside of standard optimization, otherwise decimating pressure is applied on other concepts including truth and other "human values". The other part is a empirical one, also related to your concern, namely, human values are quite flexible and biology doesn't create hard bounds / limits on depletion. If you couple that with ML/AI technologies that will predict what we will do next - then approaches that depend on human intent and values (broadly) are not as safe anymore.

Why Simulator AIs want to be Active Inference AIs

catubc1y53

Thanks so much for writing this, I think it's a much needed - perhaps even a bit late contribution connecting static views of GPT-based LLMs to dynamical systems and predictive processing. I do research on empirical agency and it's still surprises me how little the AI-safety community touches on this central part of agency - namely that you can't have agents without this closed loop.

I've been speculating a bit (mostly to myself) about the possibility that "simulators" are already a type of organism - given that appear to do active inference - which is the main driving force for nervous system evolution. Simulators seem to live in this inter-dimensional paradigm where (i) on one hand during training they behave like (sensory-systems) agents because they learn to predict outcomes and "experience" the effect of their prediction; but (ii) during inference/prediction they generally do not receive feedback. As you point out, all of this speculation may be moot as many are moving pretty fast towards embedding simulators and giving them memory etc.

What is your opinion on this idea of "loosening up" our definition of agents? I spoke to Max Tegmark a few weeks ago and my position is that we might be thinking of organisms from a time-chauvinist position - where we require the loop to be closed in a fast fashion (e.g. 1sec for most biological organisms).

Red-teaming AI-safety concepts that rely on science metaphors

catubc1y10

Thanks for the comment Erik (and taking the time to read the post).

I generally agree with you re: the inner/outer alignment comment I made. But the language I used and that others also use continues to be vague; the working def for inner-alignment on lesswrong.com is whether an "optimizer is the production of an outer aligned system, then whether that optimizer is itself aligned". I see little difference - but I could be persuaded otherwise.

My post was meant to show that it's pretty easy to find significant holes in some of the most central concepts researched now. This includes eclectic, but also mainstream research including the entire latent-knowledge approach which seems to make significant assumptions about the relationship between human decision making or intent and super-human AGIs. I work a lot on this concept and hold (perhaps too) many opinions.

The tone might not have been ideal due to time limits. Sorry if that was off putting.

I was also trying to make the point that we do not spend enough time shopping our ideas around with especially basic science researchers before we launch our work. I am a bit guilty of this. And I worry a lot that I'm actually contributing to capabilities research rather than long-term AI-safety. I guess in the end I hope for a way for AI-safety and science researchers to interact more easily and develop ideas together.

Red-teaming AI-safety concepts that rely on science metaphors

catubc1y20

Thanks for the comment. Indeed, if we could agree on capping, or slowing down, that would be a promising approach.

Focus on the places where you feel shocked everyone's dropping the ball

catubc1y1-5

Thank you so much for this effectiveness focused post. I thought I would add another perspective, namely "against the lone wolf" approach, i.e. that AI-safety will come down to one person, or a few persons, or an elite group of engineers somewhere. I agree for now there are some individuals who are doing more conceptual AI-framing than others, but in my view I am "shocked that everyone's dropping the ball" by putting up walls and saying that general public is not helpful. Yes, they might not be helpful now, but we need to work on this!... Maybe someone with the right skill will come along :)

I also view academia as almost hopeless (it's where I work). But it feels that if a few of us can get some stable jobs/positions/funding - we can start being politically active within academia and the return on investment there could be tremendous.

A newcomer’s guide to the technical AI safety field

catubc1yΩ120

Hi Chin. Thanks for writing this review, it seems like a well-needed and timed article - at least from my perspective as I was looking for something like this. In particular, I'm trying to frame my research interest relative to AI-safety field, but as you point out this is still too early.

I am wondering if you have any more insights for how you came up with your diagram above? In particular, are there any more peer-reviewed articles, or arXiv papers like Amodei et al (https://arxiv.org/abs/1606.06565) that you relied on? For example, I don't understand why seed AI is such a critical concept in AI literature (is it even published), as it seems related to the concept of viruses which are an entire field in CS. Also, why is brain-inspired AI a category in your diagram, as far as I know that story isn't published/peer reviewed or have signifcant traction?

I imagine I'm in the same place you were before you wrote this article, and I'd love to get some more insight about how you ended up with this layout.

Thank you so much,

catubc

AGIs may value intrinsic rewards more than extrinsic ones

catubc2y20

Thanks for the reply Jonathan. Indeed I'm also a bit skeptical that our innate drives (whether the ones from SDT theory or others) are really non-utility maximizing. But in some cases they do appear so.

One possibility is that they were driven to evolve for utility maximization but have now broken off completely and serve some difficult-to-understand purpose. I think there are similar theories of how consciousness developed - i.e. that it evolved as a by-effect/side-effect of some inter-organism communication - and now plays many other roles.

LESSWRONG
LW

Posts

Wiki Contributions

Comments