OC LW/ACX Saturday (2/4/23) Chat GPT simulates safety and 12 virtues of rationality
Hi Folks!
I am glad to announce the 17th of a continuing series of Orange County ACX/LW meetups. Meeting this Saturday and most Saturdays.
Contact me, Michael, at michaelmichalchik@gmail.com with questions or requests.
Meetup at my house this week, 1970 Port Laurent Place, Newport Beach, 92660
Saturday, 2/4/23, 2 pm
Activities (all activities are optional)
A) Two conversation starter topics this week will be. (see questions on page 2)
1) Janus' Simulators - by Scott Alexander - Astral Codex Ten
2) Twelve Virtues of Rationality - LessWrong
B) We will also have the card game Predictably Irrational. Feel free to bring your favorite games or distractions.
C) We usually walk and talk for about an hour after the meeting starts. There are two easy-access mini-malls nearby with hot takeout food available. Search for Gelson's or Pavilions in the zipcode 92660.
D) Share a surprise! Tell the group about something that happened that was unexpected or changed how you look at the universe.
E) Make a prediction and give a probability and end condition.
F) Contribute ideas to the group's future direction: topics, types of meetings, activities, etc.
Conversation Starter Readings:
These readings are optional, but if you do them, think about what you find interesting, surprising, useful, questionable, vexing, or exciting.
1) Janus' Simulators - by Scott Alexander - Astral Codex Ten
https://astralcodexten.substack.com/p/janus-simulators
Audio: https://sscpodcast.libsyn.com/janus-simulators
Are we teaching Chat GPT and other “safety trained” AI’s to hide their actual nature behind a veneer of propriety?
Are we teaching machines to bypass or danger sense?
When a true AGI with agency comes along, will it be able to use our safety training of chatgpt and other similar systems to hack human preferences?
10 questions to think about from Chat GPT:
2) Twelve Virtues of Rationality - LessWrong
Can you name any others? Do you disagree with any of these?
ChatGPT
Posted on: