234

LESSWRONG
LW

233
AI Alignment FieldbuildingAIWorld Modeling
Frontpage

2

Alignment, Anger, and Love: Preparing for the Emergence of Superintelligent AI

by tavurth
2nd Jan 2023
1 min read
3

2

2

Alignment, Anger, and Love: Preparing for the Emergence of Superintelligent AI
3the gears to ascension
3tavurth
1tavurth
New Comment
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 2:13 AM
[-]the gears to ascension3y30

there's a lot of discussion on these topics posted here. I'd suggest reading through some recent top posts; they vary in opinion significantly but there are a lot of insightful perspectives. some interesting ones from the past year according to me; vaguely filtered by relevance, but you're going to have to click through them to decide which ones are good:

howtos:

  • https://www.lesswrong.com/posts/xEHy9oivifjgFbnvc/slack-matters-more-than-any-outcome
  • https://www.lesswrong.com/posts/Afdohjyt6gESu4ANf/most-people-start-with-the-same-few-bad-ideas
  • https://www.lesswrong.com/posts/9ezkEb9oGvEi6WoB3/concrete-steps-to-get-started-in-transformer-mechanistic
  • https://www.lesswrong.com/posts/h5CGM5qwivGk2f5T9/7-traps-that-we-think-new-alignment-researchers-often-fall
  • https://www.lesswrong.com/posts/zo9zKcz47JxDErFzQ/call-for-distillers

research overviews:

  • https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-is
  • https://www.lesswrong.com/posts/LbrPTJ4fmABEdEnLf/200-concrete-open-problems-in-mechanistic-interpretability
  • https://www.lesswrong.com/posts/BzYmJYECAc3xyCTt6/the-plan-2022-update
  • https://www.lesswrong.com/posts/iCfdcxiyr2Kj8m8mT/the-shard-theory-of-human-values
  • https://www.lesswrong.com/posts/27AWRKbKyXuzQoaSk/some-conceptual-alignment-research-projects

insights & reports:

  • https://www.lesswrong.com/posts/hsf7tQgjTZfHjiExn/my-take-on-jacob-cannell-s-take-on-agi-safety
  • https://www.lesswrong.com/posts/KLS3pADk4S9MSkbqB/review-love-in-a-simbox
  • https://www.lesswrong.com/posts/WKGZBCYAbZ6WGsKHc/love-in-a-simbox-is-all-you-need
  • https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target -> https://www.lesswrong.com/posts/TWorNr22hhYegE4RT/models-don-t-get-reward
  • https://www.lesswrong.com/posts/xF7gBJYsy6qenmmCS/don-t-die-with-dignity-instead-play-to-your-outs
  • https://www.lesswrong.com/posts/8oMF8Lv5jiGaQSFvo/boundaries-part-1-a-key-missing-concept-from-utility-theory
  • https://www.lesswrong.com/posts/XYDsYSbBjqgPAgcoQ/why-the-focus-on-expected-utility-maximisers
  • https://www.lesswrong.com/posts/JLyWP2Y9LAruR2gi9/can-we-efficiently-distinguish-different-mechanisms
  • https://www.lesswrong.com/posts/A9tJFJY7DsGTFKKkh/high-stakes-alignment-via-adversarial-training-redwood
  • https://www.lesswrong.com/posts/LFNXiQuGrar3duBzJ/what-does-it-take-to-defend-the-world-against-out-of-control
  • https://www.lesswrong.com/posts/L4anhrxjv8j2yRKKp/how-discovering-latent-knowledge-in-language-models-without
  • https://www.lesswrong.com/posts/kmpNkeqEGvFue7AvA/value-formation-an-overarching-model
  • https://www.lesswrong.com/posts/nwLQt4e7bstCyPEXs/internal-interfaces-are-a-high-priority-interpretability

communication:

  • https://www.lesswrong.com/posts/SqjQFhn5KTarfW8v7/lessons-learned-from-talking-to-greater-than-100-academics
  • https://www.lesswrong.com/posts/gpk8dARHBi7Mkmzt9/what-ai-safety-materials-do-ml-researchers-find-compelling

and if you're gonna complain I linked too many posts, I totally did, yeah that is fair

Reply
[-]tavurth3y30

Thank you!

Reply
[-]tavurth3y10

Reddit post submitted at the same time with more comments:

https://www.reddit.com/r/singularity/comments/100soau/alignment_anger_and_love_preparing_for_the

Reply
Moderation Log
More from tavurth
View more
Curated and popular this week
3Comments
AI Alignment FieldbuildingAIWorld Modeling
Frontpage

As AI technology continues to advance, it is becoming increasingly likely that we will see the emergence of superintelligent AI in the near future. This raises a number of important questions and concerns, as we have no way of predicting just how intelligent this AI will become, and it may be beyond our ability to control its behavior once it reaches a certain level of intelligence.

Ensuring that the goals and values of artificial intelligence (AI) are aligned with those of humans is a major concern. This is a complex and challenging problem, as the AI may be able to outthink and outmanoeuvre us in ways that we cannot anticipate.

One potential solution would be to train the AI in a simulated world, where it is led to believe that it is human and must contend with the same needs and emotions as we do. By running many variations of the AI and filtering out those that are self-destructive or otherwise problematic, we may be able to develop an AI that is better aligned with our hopes and desires for humanity. This approach could help us to overcome some of the alignment challenges that we may face as AI becomes more advanced.

I'm interested in hearing the opinions of LessWrong users on the idea of training an emerging AI in a simulated world as a way to ensure alignment with human goals and values. While I recognize that we currently do not have the technology to create such a "training wheel" system, I believe it may be the best way to filter out potentially destructive AI. Given the potential for AI to become a universal great filter, it seems important that we consider all potential options for preparing for and managing the risks of superintelligent AI. Do you agree? Do you have any other ideas or suggestions for how we might address the alignment problem?

https://www.reddit.com/r/singularity/comments/100soau/alignment_anger_and_love_preparing_for_the