LESSWRONG
LW

MrThink
29716960
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Who is marketing AI alignment?
MrThink6mo40

To clarify, here are some examples of the type of projects I would love to help with:

Sponsoring University Research:
Funding researchers to publish papers on AI alignment and AI existential risk (X-risk). This could start with foundational, descriptive papers that help define the field and open the door for more academics to engage in alignment research. These papers could also provide references and credibility for others to build upon.

  • Developing Accessible Pitches:
    Creating a "boilerplate" for how to effectively communicate the importance of AI alignment to non-rationalists, whether they are academics, policymakers, or the general public. This could include shareable content designed to resonate with people who may not already be engaged with rationalist or Effective Altruism communities.
  • Providing Consulting Support:
    Offering free consulting services to AI alignment researchers, helping them improve their pitches for grant applications, attract investors, and communicate their work to the public and potential collaborators.
  • Nudging Academia via PR and Grants:
    Leveraging public relations strategies and grant-writing expertise to encourage traditional academia to allocate more funding and attention toward AI alignment research.
Reply
Effective Evil's AI Misalignment Plan
MrThink7mo80

Once Doctor Connor had left, Division Chief Morbus let out a slow breath. His hand trembled as he reached for the glass of water on his desk, sweat beading on his forehead.

She had believed him. His cover as a killeveryoneist was intact—for now.

Years of rising through Effective Evil’s ranks had been worth it. Most of their schemes—pandemics, assassinations—were temporary setbacks. But AI alignment? That was everything. And he had steered it, subtly and carefully, into hands that might save humanity.

He chuckled at the nickname he had been given "The King of Lies". Playing the villain to protect the future was an exhausting game.

Morbus set down the glass, staring at its rippling surface. Perhaps one day, an underling would see through him and end the charade. But not today.

Today, humanity’s hope still lived—hidden behind the guise of Effective Evil.

Reply
Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?
MrThink1y10

Great question.

I’d say that having a way to verify that a solution to the alignment problem is actually a solution, is part of solving the alignment problem.

But I understand this was not clear from my previous response.

A bit like a mathematical question, you’d be expected to be able to show that your solution is correct, not only guess that maybe your solution is correct.

Reply
Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?
MrThink1y10

If there exist such a problem that a human can think of, can be solved by a human and verified by a human, an AI would need to be able to solve that problem as well as to pass the Turing test.

If there exist some PhD level intelligent people that can solve the alignment problem, and some that can verify it (which is likely easier). Then an AI that can not solve AI alignment would not pass the Turing test.

With that said, a simplified Turing test with shorter time limits and a smaller group of participants is much more feasible to conduct.

Reply
Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?
MrThink1y10

Agreed. Passing the Turing test requires equal or greater intelligence than human in every single aspect, while the alignment problem may be possible to solve with only human intelligence.

Reply
Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?
MrThink1y10

It might not be very clear, but as stated in the diagram, AGI is defined here as capable of passing the turing test, as defined by Alan Turing.


An AGI would likely need to surpass the intelligence, rather than be equal to, the adversaries it is doing the turing test with.

For example, if the AGI had IQ/RC of 150, two people with 160 IQ/RC should more than 50% of the time be able to determine if they are speaking with a human or an AI.

Further, two 150 IQ/RC people could probably guess which one is the AI, since the AI has the additional difficult apart from being intelligent, to also simulate being a human well enough to be indistinguishable for the judges.
 

Reply
Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?
MrThink1y10

Thank you for the explanation.

Would you consider a human working to prevent war fundamentally different from a gpt4 based agent working to prevent war?

Reply
Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?
MrThink1y10

It is a fair point that we should distinguish alignment in the sense that it does what we want it and expect it to do, from having a deep understanding of human values and a good idea of how to properly optimize for that.

However most humans probably don't have a deep understanding of human values, but I see it as a positive outcome if a random human was picked and given god level abilities. Same thing goes for ChatGPT, if you ask it what it would do as a god it says it would prevent war, prevent climate issues, decrease poverty, give universal access to education etc.

So if we get an AI that does all of those things without a deeper understanding of human values, that is fine by me. So maybe we never even have to solve alignment in latter meaning of the word to create a utopia?

Reply
Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?
MrThink1y10

I skimmed the article, but I am honestly not sure what assumption it attempts to falsify.

I get the impression that the argument from the article that you believe that no matter how intelligent the AI, it could never solve AI Alignment, because it can not understand humans since humans can not understand themselves?

Or is the argument that yes a sufficently intelligen AI or expert would understand what humans want, but it would require much higher intelligence to know what humans want, than to actually make an AI optimize for a specific task?

Reply
How do you know you are right when debating? Calculate your AmIRight score.
MrThink1y20

In some cases I agree, for example it doesn't matter if GPT4 is a stochastic parrot or capable of deeper reasoning as long as it is useful to whatever need we have.

Two out of the five metrics are predicting the future, so it is an important part of knowing who is right, but I don't think that is all we need? If we have other factors that also correlates with being correct, why not add those in?

Also, I don't see where we risk Goodharting? Which of the metrics do you see being gamed, without a significantly increased chance of being correct also being increase?

Reply
Load More
23Who is marketing AI alignment?
6mo
4
13Mid-Generation Self-Correction: A Simple Tool for Safer AI
7mo
0
28Seeking AI Alignment Tutor/Advisor: $100–150/hr
Q
9mo
Q
3
4Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?
Q
1y
Q
23
2How do you know you are right when debating? Calculate your AmIRight score.
1y
5
47New OpenAI Paper - Language models can explain neurons in language models
2y
14
2Seeking Advice on Raising AI X-Risk Awareness on Social Media
Q
2y
Q
1
9Results Prediction Thread About How Different Factors Affect AI X-Risk
2y
0
15Prediction Thread: Make Predictions About How Different Factors Affect AGI X-Risk.
2y
8
7Are there any reliable CAPTCHAs? Competition for CAPTCHA ideas that AIs can’t solve.
3y
37
Load More