catubc — LessWrong

Great write up!

Why don't you do this in a mouse first? The whole cycle from birth to phenotype, including complex reasoning (e.g. bayesian inference, causality) can take 6 months.

The Field of AI Alignment: A Postmortem, and What To Do About It

catubc10mo30

Exactly, and thanks for writing this.

I would go further and say that - AI safety is AI dev - and this happened years ago. If we stopped it all now, we'd extend our timelines:

https://www.lesswrong.com/posts/vkzmbf4Mve4GNyJaF/the-case-for-stopping-ai-safety-research

Decomposing Agency — capabilities without desires

catubc1y10

Interesting read, would be great to see more done in this direction. However,it seems that mind-body dualism is still the prevalent (dare I say "dominant") mode of understanding human will and consciousness in CS and AI-safety. In my opinion - the best picture we have of human value creation comes from social and psychological sciences - not metaphysics and mathematics - and it would be great to have more interactions with those fields.

For what it's worth I've written a bunch on agency-loss as an attractor in AI/AGI-human interactions.

https://www.lesswrong.com/posts/dDDi9bZm6ELSXTJd9/intent-aligned-ai-systems-deplete-human-agency-the-need-for

And a shorter paper/poster on this at ICML last week: https://icml.cc/virtual/2024/poster/32943

The case for stopping AI safety research

catubc1y20

Sorry, fixed broken link now.

The problem with "understanding the concept of intent" - is that intent and goal formation are some of the most complex notions in the universe involving genetics, development, psychology, culture and everything in between. We have been arguing about what intent - and correlates like "well-being" mean - for the entire history of our civilization. It looks like we have a good set of no-nos (e.g. read the UN declaration on human rights) - but in terms of positive descriptions of good long term outcomes it gets fuzzy. There we have less guidance, though I guess trans- and post-humanism seems to be a desirable goal to many.

The case for stopping AI safety research

catubc1y*20

Seth. I just spoke about this work at ICML yesterday. Some other similar works:

Eliezers work from way back in 2004. https://intelligence.org/files/CEV.pdf. I haven't read it in full - but it's about AIs that interact with human volition - which is what I'm also worried about.

Christiano's: https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like. This is a lot about slow take offs and AI's that slowly become unstoppable or unchangeable because they become part of our economic world.

My paper on arxiv is a bit of a long read (GPT-it) : https://arxiv.org/abs/2305.19223 But it tries to show where some of the weak points in human volition and intention generation are - and why we (i.e. "most developers and humanity in general") still think of human reasoning in a mind-body dualistic framework: i.e. there's a core to human thought, goal selection and decisoin making - that can never be corrupted or manipulated. We've already discovered loads of failure modes - and we weren't even faced with omnipotent-like opponents. (https://www.sog.unc.edu/sites/www.sog.unc.edu/files/course_materials/Cognitive%20Biases%20Codex.pdf). The other point main point my work makes is that when you apply enough pressure on an aligned AI/AGI to find an optimal solution or "intent" you have for a problem that is too hard to solve - the solution it will eventually find is to change the "intent" of the human.

The case for stopping AI safety research

catubc2y116

Thanks Garrett. There is obviously nuance that a 1min post can't get at. I am just hoping for at least some discussion to be had on this topic. There seems to be little to none now.

Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

catubc3y70

Thanks for the comment. I agree broadly of course, but the paper says more specific things. For example, agency needs to be prioritized, probably taken outside of standard optimization, otherwise decimating pressure is applied on other concepts including truth and other "human values". The other part is a empirical one, also related to your concern, namely, human values are quite flexible and biology doesn't create hard bounds / limits on depletion. If you couple that with ML/AI technologies that will predict what we will do next - then approaches that depend on human intent and values (broadly) are not as safe anymore.

Why Simulator AIs want to be Active Inference AIs

catubc3y103

Thanks so much for writing this, I think it's a much needed - perhaps even a bit late contribution connecting static views of GPT-based LLMs to dynamical systems and predictive processing. I do research on empirical agency and it's still surprises me how little the AI-safety community touches on this central part of agency - namely that you can't have agents without this closed loop.

I've been speculating a bit (mostly to myself) about the possibility that "simulators" are already a type of organism - given that appear to do active inference - which is the main driving force for nervous system evolution. Simulators seem to live in this inter-dimensional paradigm where (i) on one hand during training they behave like (sensory-systems) agents because they learn to predict outcomes and "experience" the effect of their prediction; but (ii) during inference/prediction they generally do not receive feedback. As you point out, all of this speculation may be moot as many are moving pretty fast towards embedding simulators and giving them memory etc.

What is your opinion on this idea of "loosening up" our definition of agents? I spoke to Max Tegmark a few weeks ago and my position is that we might be thinking of organisms from a time-chauvinist position - where we require the loop to be closed in a fast fashion (e.g. 1sec for most biological organisms).

Red-teaming AI-safety concepts that rely on science metaphors

catubc3y10

Thanks for the comment Erik (and taking the time to read the post).

I generally agree with you re: the inner/outer alignment comment I made. But the language I used and that others also use continues to be vague; the working def for inner-alignment on lesswrong.com is whether an "optimizer is the production of an outer aligned system, then whether that optimizer is itself aligned". I see little difference - but I could be persuaded otherwise.

My post was meant to show that it's pretty easy to find significant holes in some of the most central concepts researched now. This includes eclectic, but also mainstream research including the entire latent-knowledge approach which seems to make significant assumptions about the relationship between human decision making or intent and super-human AGIs. I work a lot on this concept and hold (perhaps too) many opinions.

The tone might not have been ideal due to time limits. Sorry if that was off putting.

I was also trying to make the point that we do not spend enough time shopping our ideas around with especially basic science researchers before we launch our work. I am a bit guilty of this. And I worry a lot that I'm actually contributing to capabilities research rather than long-term AI-safety. I guess in the end I hope for a way for AI-safety and science researchers to interact more easily and develop ideas together.

Red-teaming AI-safety concepts that rely on science metaphors

catubc3y20

Thanks for the comment. Indeed, if we could agree on capping, or slowing down, that would be a promising approach.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments