This is a lightly edited transcript of a chatroom conversation between Scott Alexander and Eliezer Yudkowsky last year, following up on the Late 2021 MIRI Conversations. Questions discussed include "How hard is it to get the right goals into AGI systems?" and "In what contexts do AI systems exhibit 'consequentialism'?".
[Yudkowsky][13:29] @ScottAlexander ready when you are |
[Alexander][13:31] Okay, how do you want to do this? |
[Yudkowsky][13:32] If you have an agenda of Things To Ask, you can follow it; otherwise I can start by posing a probing question or you can? We've been very much winging it on these and that has worked... as well as you have seen it working! |
[Alexander][13:34] Okay. I'll post from my agenda. I'm assuming we both have the right to edit logs |
Hi, I have a friend in Kenya who works with gifted children and would like to get ChatGPT accounts for them. Can anybody get me in touch with someone from OpenAI who might be interested in supporting such a project?
Definition. On how I use words, values are decision-influences (also known as shards). “I value doing well at school” is a short sentence for “in a range of contexts, there exists an influence on my decision-making which upweights actions and plans that lead to e.g. learning and good grades and honor among my classmates.”
Summaries of key points:
This is a great article! It helps me understand shard theory better and value it more; in particular, it relates to something I've been thinking about where people seem to conflate utility-optimizing agents with policy-execuing agents, but the two have meaningfully different alignment characteristics, and shard theory seems to be deeply exploring the latter, which is 👍.
That is to say, prior to "simulators" and "shard theory", a lot of focus was on utility-maximizers--agents that do things like planning or search to maximize a utility function; but plannin...
(Cross-posted on my personal blog.)
There is a memory that has always stuck with me. In sixth grade I remember sitting on the floor at a friend's house. We were supposed to write an essay together. I distinctly remember him proposing that I should write the first version and then he'll go over it afterwards and make it sound good. Because he's better at making things sound good than I am, and I'm better at the content.
I was annoyed. Not because he slyly wanted to escape doing any of the real work. At least not primarily. I was mainly annoyed at the claim that I'm not good at making things sound good.
Basically, he wanted to go through the essay and use "bigger" and "fancier" words and phrases. Ie....
Sounds like simulacra level 4 to me! Just saying things for the vybez.
The Less Wrong General Census is back!
In days of yore, there was an annual census of the site users. That census has come again, at least for this year! Click here to take the survey! It can take as little as five minutes if you just want to fill out the basics, and can take longer if you want to fill other the optional sections out. The survey will be open from today until February 27th, at which point it will close.
Once the census is closed, I'll remove the very private information, then make some summaries of the data and write up what I found in a post that will be linked from here. I'll also release a csv of all the responses marked "fine to include"...
I just noticed a kinda-ambiguity that I should have spotted before when looking at the questions.
There is a question about "cryonics" and then a question about "anti-agathics". It would be nice if the latter made it explicit (1) whether it counts as "reaching an age of 1000 years" if you are cryosuspended and then revived 1000 years later, and (2) whether it counts as "reaching an age of 1000 years" if you are cryosuspended and then revived after everyone currently alive has died or likewise been cryosuspended, and then (using whatever future technology is...
Prompt 0:
Think about the way computer programmers talk about “bugs” in the program, or “feature requests” that would make a given app or game much better. Bugs are things-that-are-bad: frustrations, irritations, frictions, problems. Feature requests are things-that-could-be-great: opportunities, possibilities, new systems or abilities.
Write down as many “bugs” and “feature requests” as you can, for your own life.
Prompt 1:
A genie has offered to fix every bug you’ve written down, and to give you every feature you’ve requested, but then it will freeze your personality—you won’t be able to grow or add or improve anything else.
Hearing that, are there other things you’d like to write down, before the genie takes your list and works its magic?
Prompt 2:
Imagine someone you know well, like your father or your best friend or a...
Yeah, in general when this activity is done in-person, people are writing/typing for the whole 15-25min, and each successive prompt is basically just another way to re-ask the same question. If the frame of "bugs and feature requests" starts to feel too [whatever], another way to think of it is to just keep writing down threads-to-pull-on.
Supported by Rethink Priorities
This is part of a weekly series summarizing the top posts on the EA and LW forums - you can see the full collection here. The first post includes some details on purpose and methodology. Feedback, thoughts, and corrections are welcomed.
If you'd like to receive these summaries via email, you can subscribe here.
Podcast version: Subscribe on your favorite podcast app by searching for 'EA Forum Podcast (Summaries)'. A big thanks to Coleman Snell for producing these!
Rethink Priorities’ Welfare Range Estimates
by Bob Fischer
The author builds off analysis in the rest of the Moral Weight Project Sequence to offer estimates of the ‘welfare range’ of 11 farmed species. A welfare range is the estimated difference between the most intensely positively valenced state (pleasure) and negatively...
Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort
A big thank you to all of the people who gave me feedback on this post: Edmund Lao, Dan Murfet, Alexander Gietelink Oldenziel, Lucius Bushnaq, Rob Krzyzanowski, Alexandre Variengen, Jiri Hoogland, and Russell Goyder.
Statistical learning theory is lying to you: "overparametrized" models actually aren't overparametrized, and generalization is not just a question of broad basins.
To first...
Let me add some more views on SLT and capabilities/alignment.
...(Dan Murfet’s personal views here) First some caveats: although we are optimistic SLT can be developed into theory of deep learning, it is not currently such a theory and it remains possible that there are fundamental obstacles. Putting those aside for a moment, it is plausible that phenomena like scaling laws and the related emergence of capabilities like in-context learning can be understood from first principles within a framework like SLT. This could contribute both t
For More Detail, Previously: Simulacra Levels and Their Interactions, Unifying the Simulacra Definitions, The Four Children of the Seder as the Simulacra Levels.
A key source of misunderstanding and conflict is failure to distinguish between combinations of the following four cases.
I think this would go down more smoothly with a few more examples of level 4. I found the "Level 4: A trial by ordeal or trial by combat lacks and denies the concept of justice entirely" pretty helpful for describing lizard brain/association diverging from reality, but still feeling correct enough for a person to choose it.
Lol, cool. I tried the "4 minute" challenge (without having read EY's answer, but having read yours).
... (read more)