The host has requested RSVPs for this event

OC LW/ACX Saturday (2/4/23) Chat GPT simulates safety and 12 virtues of rationality 

Hi Folks!

I am glad to announce the 17th of a continuing series of Orange County ACX/LW meetups. Meeting this Saturday and most Saturdays. 


 

Contact me, Michael, at michaelmichalchik@gmail.com with questions or requests.
Meetup at my house this week, 1970 Port Laurent Place, Newport Beach, 92660

 

Saturday, 2/4/23, 2 pm


Activities (all activities are optional)

A) Two conversation starter topics this week will be. (see questions on page 2)


 

1) Janus' Simulators - by Scott Alexander - Astral Codex Ten


 

2) Twelve Virtues of Rationality - LessWrong


B) We will also have the card game Predictably Irrational. Feel free to bring your favorite games or distractions. 

C) We usually walk and talk for about an hour after the meeting starts. There are two easy-access mini-malls nearby with hot takeout food available. Search for Gelson's or Pavilions in the zipcode 92660. 

 

D) Share a surprise! Tell the group about something that happened that was unexpected or changed how you look at the universe.

E) Make a prediction and give a probability and end condition.

F) Contribute ideas to the group's future direction: topics, types of meetings, activities, etc.


 








 







Conversation Starter Readings:
These readings are optional, but if you do them, think about what you find interesting, surprising, useful, questionable, vexing, or exciting.  

1) Janus' Simulators - by Scott Alexander - Astral Codex Ten

https://astralcodexten.substack.com/p/janus-simulators


 

Audio: https://sscpodcast.libsyn.com/janus-simulators


 


Are we teaching Chat GPT and other “safety trained” AI’s to hide their actual nature behind a veneer of propriety?
Are we teaching machines to bypass or danger sense?
When a true AGI with agency comes along, will it be able to use our safety training of chatgpt and other similar systems to hack human preferences?


 

10 questions to think about from Chat GPT:

 

  1. How did the early AI alignment pioneers approach aligning AI systems?
  2. What are the three motivational systems speculated by the AI alignment pioneers?
  3. How does Janus' concept of a "simulator" differ from the original concepts of "agent", "genie" or "oracle"?
  4. How does GPT-3's architecture give it the ability to simulate different characters or genres?
  5. How does Reinforcement Learning from Human Feedback (RLHF) impact GPT's abilities as a "simulator"?
  6. What is the difference between an agent, a genie and an oracle in the context of AI alignment?
  7. How did the early AI alignment pioneers like Eliezer Yudkowsky and Nick Bostrom approach the field in the absence of AIs worth aligning?
  8. What is Janus' view on language models like GPT-3 in terms of their alignment considerations?
  9. How does the notion of "simulator" differ from other motivational systems for AIs like agent, genie or oracle?
  10. How does GPT-3's "simulating" a character differ from simply answering a question truthfully?


 



 

2) Twelve Virtues of Rationality - LessWrong


 

Can you name any others? Do you disagree with any of these? 

 

ChatGPT

  1. How do you cultivate curiosity in your life and why is it important?
  2. How do you approach relinquishing beliefs that may be proven false?
  3. Can you give an example of how lightness in decision-making can be valuable?
  4. How can one maintain evenness in evaluating different viewpoints and arguments?
  5. Why is argumentation an important aspect of rationality?
  6. How does empiricism play a role in shaping our beliefs and understanding of the world?
  7. How does simplicity aid in our quest for knowledge and truth?
  8. Can you share how humility contributes to our growth as rational individuals?
  9. How do you strive for perfectionism in your thought process and decision-making?
  10. Why is precision crucial in rational thinking and problem solving?
  11. How does ongoing scholarship enhance our rationality?
  12. How can one cultivate the "nameless" virtue of rationality and what does it mean?



 

New to LessWrong?

1

New Comment