Well I asked this https://www.lesswrong.com/posts/X9Z9vdG7kEFTBkA6h/what-could-a-policy-banning-agi-look-like but roughly no one was interested--I had to learn about "born secret" https://en.wikipedia.org/wiki/Born_secret from Eric Weinstein in a youtube video.
FYI, while restricting compute manufacture is I would guess net helpful, it's far from a solution. People can make plenty of conceptual progress given current levels of compute https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce . It's not a way out, either. There are ways possibly-out. But approximately no one is interested in them.
(IMO this research isn't promising as alignment research because no existing research is promising as alignment research, but is more promising as alignment research than most stuff that I'm aware of that does get funded as alignment research.)
On a meta note, IF proposition 2 is true, THEN the best way to tell this would be if people had been saying so AT THE TIME. If instead, actually everyone at the time disagreed with proposition 2, then it's not clear that there's someone "we" know to hand over decision making power to instead. Personally, I was pretty new to the area, and as a Yudkowskyite I'd probably have reflexively decried giving money to any sort of non-X-risk-pilled non-alignment-differential capabilities research. But more to the point, as a newcomer, I wouldn't have tried hard to have independent opinions about stuff that wasn't in my technical focus area, or to express those opinions with much conviction, maybe because it seemed like Many Highly Respected Community Members With Substantially Greater Decision Making Experience would know far better, and would not have the time or the non-status to let me in on the secret subtle reasons for doing counterintuitive things. Now I think everyone's dumb and everyone should say their opinions a lot so that later they can say that they've been saying this all along. I've become extremely disagreeable in the last few years, I'm still not disagreeable enough, and approximately no one I know personally is disagreeable enough.
highly convergent
Huh? A hyperphone is a two-player tool. Loom is a one-player tool.
As I've tried to explain over and over, including once to Conor, if you want to improve thinking (rather than, say, "knowledge management" (text snippet / link management?)), you have to watch thinking think, think about how thinking thinks, and ask thinking what it would need in order to think better. No one who sets out to build so-called "tools for thinking" ever does this. They instead think of cool-sounding things to have, and then get excited imagining how those things might free your thoughts from the nested directory structure or whatever, and come up with unassailable arguments about that.
I like the essay and I think [something like what you call deep honesty] is underrated right now. But I'm still confused what you mean, and about the thing itself.
I'll say a few more things but the headline is that I'm confused and would like more clarity about what a deep honesty-er is.
Probabilities on summary events like this are mostly pretty pointless. You're throwing together a bunch of different questions, about which you have very different knowledge states (including how much and how often you should update about them).
This practice doesn't mean excusing bad behavior. You can still hold others accountable while taking responsibility for your own reactions.
Well, what if there's a good piece of code (if you'll allow the crudity) in your head, and someone else's bad behavior is geared at hacking/exploiting that piece of code? The harm done is partly due to that piece of code and its role in part of your reaction to their bad behavior. But the implication is that they should stop with their bad behavior, not that you should get rid of the good code. I believe you'll respond "Ah, but you see, there's more than two options. You can change yourself in ways other than just deleting the code. You could recognize how the code is actually partly good and partly bad, and refactor it; and you could add other code to respond skillfully to their bad behavior; and you can add other code to help them correct their behavior.". Which I totally agree with, but at this point, what's being communicated by "taking self-blame" other than at best "reprogram yourself in Good/skillful ways" or more realistically "acquiesce to abuse"?
IDK if this is a crux for me thinking this is very relevant to stuff on my perspective, but:
The training procedure you propose doesn't seem to actually incentivize indifference. First, a toy model where I agree it does incentivize that:
On the first time step, the agent gets a choice: choose a number 1--N. If the agent says k, then the agent has nothing at all to do for the first k steps, after which some game G starts. (Each play of G is i.i.d., not related to k.)
So this agent is indeed incentivized to pick k uniformly at random from 1--N. Now consider:
The agent is in a rich world. There are many complex multi-step plans to incentivize agent to learn problem-solving. Each episode, at time N, the agent gets to choose: end now, or play 10 more steps.
Does this incentivize random choice at time N? No. It incentivizes the agent to choose randomly End or Continue at the very beginning of the episode, and then carefully plan and execute behavior that acheives the most reward assuming a run of length N or N+10 respectively.
Wait, but isn't this success? Didn't we make the agent have no trajectory length preference?
No. Suppose:
Same as before, but now there's a little guy standing by the End/Continue button. Sometimes he likes to press button randomly.
Do we kill the guy? Yes we certainly do, he will mess up our careful plans.
This is not true for AGI.