There's a kind of game here on Less Wrong.
It's the kind of game that's a little rude to point out. Part of how it works is by not being named.
Or rather, attempts to name it get dissected so everyone can agree to continue ignoring the fact that it's a game.
So I'm going to do the rude thing. But I mean to do so gently. It's not my intention to end the game. I really do respect the right for folk to keep playing it if they want.
Instead I want to offer an exit to those who would really, really like one.
I know I really super would have liked that back in 2015 & 2016. That was the peak of my hell in rationalist circles.
I'm watching the game...
Should society eliminate schools? Should we have more compulsory schooling? Should you send your kids to school? Should you prefer to hire job candidates who have received more schooling, beyond school's correlation with the g factor? Should we consider the spread of education requirements to be a form of class war by the better-educated against the worse-educated which must be opposed for the sake of the worse-educated and the future of society?
[Epistemic status: Strong opinions lightly held, this time with a cool graph.]
I argue that an entire class of common arguments against short timelines is bogus, and provide weak evidence that anchoring to the human-brain-human-lifetime milestone is reasonable.
In a sentence, my argument is that the complexity and mysteriousness and efficiency of the human brain (compared to artificial neural nets) is almost zero evidence that building TAI will be difficult, because evolution typically makes things complex and mysterious and efficient, even when there are simple, easily understood, inefficient designs that work almost as well (or even better!) for human purposes.
In slogan form: If all we had to do to get TAI was make a simple neural net 10x the size of my brain, my brain would still look the...
Many of you are already familiar with Rationalist Winter Solstice, our home-grown winter holiday. As the year grows literally dark, we gather in our respective communities to face various forms of darkness together, to celebrate what light human civilization has made, and to affirm ourselves as a community of shared values.
This thread is a central place to gather information about specific events. Please post times, places, registration or rsvp links, restrictions if any, etc.
Was playing around with chat gpt and and some fun learning about its thoughts on metaphysics. It looks like the ego is an illusion and hedonistic utilitarianism is too narrow minded to capture all of welfare. Instead, it opts for principles of beneficence, non-maleficence, autonomy, and justice. Seems to check out. What do you guys think?
ChatGPT is a lot of things. It is by all accounts quite powerful, especially with engineering questions. It does many things well, such as engineering prompts or stylistic requests. Some other things, not so much. Twitter is of course full of examples of things it does both well and poorly.
One of the things it attempts to do to be ‘safe.’ It does this by refusing to answer questions that call upon it to do or help you do something illegal or otherwise outside its bounds. Makes sense.
As is the default with such things, those safeguards were broken through almost immediately. By the end of the day, several prompt engineering methods had been found.
No one else seems to yet have gathered them together, so here you go. Note...
I wonder about a scenario where the first AI with human or superior capabilities would be nothing goal-oriented, eg a language model like GPT. Then one instance of it would be used, possibly by a random user, to make a conversational agent told to behave as a goal-oriented AI. The bot would then behave as an AGI agent with everything that implies from a safety standpoint, eg using its human user to affect the outside world.
Is this a plausible scenario for the development of AGI and the first goal-oriented AGI? Does it have any implication regarding AI safety compared to the case of an AGI designed as goal-oriented from the start?
Thanks to Ian McKenzie and Nicholas Dupuis, collaborators on a related project, for contributing to the ideas and experiments discussed in this post. Ian performed some of the random number experiments.
Also thanks to Connor Leahy for feedback on a draft, and thanks to Evan Hubinger, Connor Leahy, Beren Millidge, Ethan Perez, Tomek Korbak, Garrett Baker, Leo Gao and various others at Conjecture, Anthropic, and OpenAI for useful discussions.
This work was carried out while at Conjecture.
I have received evidence from multiple credible sources that text-davinci-002 was not trained with RLHF.
The rest of this post has not been corrected to reflect this update. Not much besides the title (formerly "Mysteries of mode collapse due to RLHF") is affected: just mentally substitute "mystery method" every time "RLHF" is invoked...