MrThink — LessWrong

To clarify, here are some examples of the type of projects I would love to help with:

Sponsoring University Research:
Funding researchers to publish papers on AI alignment and AI existential risk (X-risk). This could start with foundational, descriptive papers that help define the field and open the door for more academics to engage in alignment research. These papers could also provide references and credibility for others to build upon.

Developing Accessible Pitches:
Creating a "boilerplate" for how to effectively communicate the importance of AI alignment to non-rationalists, whether they are academics, policymakers, or the general public. This could include shareable content designed to resonate with people who may not already be engaged with rationalist or Effective Altruism communities.
Providing Consulting Support:
Offering free consulting services to AI alignment researchers, helping them improve their pitches for grant applications, attract investors, and communicate their work to the public and potential collaborators.
Nudging Academia via PR and Grants:
Leveraging public relations strategies and grant-writing expertise to encourage traditional academia to allocate more funding and attention toward AI alignment research.

Effective Evil's AI Misalignment Plan

MrThink9mo80

Once Doctor Connor had left, Division Chief Morbus let out a slow breath. His hand trembled as he reached for the glass of water on his desk, sweat beading on his forehead.

She had believed him. His cover as a killeveryoneist was intact—for now.

Years of rising through Effective Evil’s ranks had been worth it. Most of their schemes—pandemics, assassinations—were temporary setbacks. But AI alignment? That was everything. And he had steered it, subtly and carefully, into hands that might save humanity.

He chuckled at the nickname he had been given "The King of Lies". Playing the villain to protect the future was an exhausting game.

Morbus set down the glass, staring at its rippling surface. Perhaps one day, an underling would see through him and end the charade. But not today.

Today, humanity’s hope still lived—hidden behind the guise of Effective Evil.

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

MrThink1y10

Great question.

I’d say that having a way to verify that a solution to the alignment problem is actually a solution, is part of solving the alignment problem.

But I understand this was not clear from my previous response.

A bit like a mathematical question, you’d be expected to be able to show that your solution is correct, not only guess that maybe your solution is correct.

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

MrThink1y10

If there exist such a problem that a human can think of, can be solved by a human and verified by a human, an AI would need to be able to solve that problem as well as to pass the Turing test.

If there exist some PhD level intelligent people that can solve the alignment problem, and some that can verify it (which is likely easier). Then an AI that can not solve AI alignment would not pass the Turing test.

With that said, a simplified Turing test with shorter time limits and a smaller group of participants is much more feasible to conduct.

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

MrThink1y10

Agreed. Passing the Turing test requires equal or greater intelligence than human in every single aspect, while the alignment problem may be possible to solve with only human intelligence.

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

MrThink1y10

It might not be very clear, but as stated in the diagram, AGI is defined here as capable of passing the turing test, as defined by Alan Turing.

An AGI would likely need to surpass the intelligence, rather than be equal to, the adversaries it is doing the turing test with.

For example, if the AGI had IQ/RC of 150, two people with 160 IQ/RC should more than 50% of the time be able to determine if they are speaking with a human or an AI.

Further, two 150 IQ/RC people could probably guess which one is the AI, since the AI has the additional difficult apart from being intelligent, to also simulate being a human well enough to be indistinguishable for the judges.

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

MrThink1y10

Thank you for the explanation.

Would you consider a human working to prevent war fundamentally different from a gpt4 based agent working to prevent war?

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

MrThink1y10

It is a fair point that we should distinguish alignment in the sense that it does what we want it and expect it to do, from having a deep understanding of human values and a good idea of how to properly optimize for that.

However most humans probably don't have a deep understanding of human values, but I see it as a positive outcome if a random human was picked and given god level abilities. Same thing goes for ChatGPT, if you ask it what it would do as a god it says it would prevent war, prevent climate issues, decrease poverty, give universal access to education etc.

So if we get an AI that does all of those things without a deeper understanding of human values, that is fine by me. So maybe we never even have to solve alignment in latter meaning of the word to create a utopia?

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

MrThink1y10

I skimmed the article, but I am honestly not sure what assumption it attempts to falsify.

I get the impression that the argument from the article that you believe that no matter how intelligent the AI, it could never solve AI Alignment, because it can not understand humans since humans can not understand themselves?

Or is the argument that yes a sufficently intelligen AI or expert would understand what humans want, but it would require much higher intelligence to know what humans want, than to actually make an AI optimize for a specific task?

How do you know you are right when debating? Calculate your AmIRight score.

MrThink1y20

In some cases I agree, for example it doesn't matter if GPT4 is a stochastic parrot or capable of deeper reasoning as long as it is useful to whatever need we have.

Two out of the five metrics are predicting the future, so it is an important part of knowing who is right, but I don't think that is all we need? If we have other factors that also correlates with being correct, why not add those in?

Also, I don't see where we risk Goodharting? Which of the metrics do you see being gamed, without a significantly increased chance of being correct also being increase?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments