MrThink

New OpenAI Paper - Language models can explain neurons in language models

Summary by OpenAI: We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset of these (imperfect) explanations and scores for every neuron in GPT-2. Link: https://openai.com/research/language-models-can-explain-neurons-in-language-models Please share your thoughts in the the comments!

47May 10, 2023

MrThink

Message

299

Who is marketing AI alignment?

What individuals or organizations are actively working on the "marketing" of AI alignment, particularly doing work such as: * Establishing AI alignment as a recognized and respected academic field. * Building the infrastructure to make alignment research more accessible and attractive to traditional researchers and institutions. * Creating resources for...

Jan 19, 202523

Mid-Generation Self-Correction: A Simple Tool for Safer AI

In the earlier days of building AI companions, I encountered a curious problem. Back then, I used models like Google’s T5-11B for conversational agents. Occasionally, the AI would say strange or outright impossible things, like suggesting, “Would you like to meet and go bowling?” or claiming, “Yes, I know your...

Dec 19, 202413

Seeking AI Alignment Tutor/Advisor: $100–150/hr

I am actively looking for a tutor/advisor with expertise in AI x-risk, with the primary goal of collaboratively determining the most effective ways I can contribute to reducing AI existential risks (X-risk). Tutoring Goals I suspect that I misunderstand key components of the mental models that lead some highly rational...

Oct 5, 202428

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

I believe there are people with far greater knowledge than me that can point out where I am wrong. Cause I do believe my reasoning is wrong, but I can not see why it would be highly unfeasible to train a sub-AGI intelligent AI that most likely will be aligned...

Jul 2, 20244

How do you know you are right when debating? Calculate your AmIRight score.

I recently found myself in a spirited debate with a friend about whether large language models (LLMs) like GPT-4 are mere stochastic parrots or if they can genuinely engage in deeper reasoning. We both presented a range of technical arguments and genuinely considered each other’s points. Despite our efforts, we...

Jun 1, 20242

New OpenAI Paper - Language models can explain neurons in language models

May 10, 202347

Seeking Advice on Raising AI X-Risk Awareness on Social Media

Does anyone know any good resources for posting in social media to raise awareness about AI X-risk? If not, I might do a writeup myself, so feel free to comment any advice you have.

Mar 24, 20232

Load More (7/16)

LESSWRONG
LW

LESSWRONG
LW

MrThink

MrThink

MrThink

New OpenAI Paper - Language models can explain neurons in language models

The Efficient LessWrong Hypothesis - Stock Investing Competition

Seeking AI Alignment Tutor/Advisor: $100–150/hr

Who is marketing AI alignment?

MrThink

Who is marketing AI alignment?

Mid-Generation Self-Correction: A Simple Tool for Safer AI

Seeking AI Alignment Tutor/Advisor: $100–150/hr

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

How do you know you are right when debating? Calculate your AmIRight score.

New OpenAI Paper - Language models can explain neurons in language models

Seeking Advice on Raising AI X-Risk Awareness on Social Media

New OpenAI Paper - Language models can explain neurons in language models

The Efficient LessWrong Hypothesis - Stock Investing Competition

Seeking AI Alignment Tutor/Advisor: $100–150/hr

Who is marketing AI alignment?

Who is marketing AI alignment?

Mid-Generation Self-Correction: A Simple Tool for Safer AI

Seeking AI Alignment Tutor/Advisor: $100–150/hr

Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

How do you know you are right when debating? Calculate your AmIRight score.

New OpenAI Paper - Language models can explain neurons in language models

Seeking Advice on Raising AI X-Risk Awareness on Social Media