LESSWRONG
LW

67
habryka
51261Ω18022835831118
Message
Dialogue
Subscribe

Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com. 

(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
The Lightcone Principles
A Moderate Update to your Artificial Priors
A Moderate Update to your Organic Priors
Concepts in formal epistemology
56Habryka's Shortform Feed
Ω
7y
Ω
439
Varieties Of Doom
habryka14m20

Maybe this argument is right, but the paragraph I am confused about does not mention the word corrigibility once. It just says (paraphrased) "AIs will in fact understand what we mean, which totally pwns Bostrom because he said the opposite, as you can see in this quote" and then fails to provide a quote that says that, at all. 

Like, if you said "Contra Bostrom, AI will be corrigible, which you can see in this quote by Bostrom" then I would not be making this comment thread! I would have objections and could make arguments, and maybe I would bother to make them, but I would not be having the sense that you just said a sentence that really just sounds fully logically contradictory on its own premises, and then when asked about it keep importing context that is not references in the sentence at all.

So did you just accidentally make a typo and meant to say "Contra Bostrom 2014 AIs will in fact probably be corrigible: 'The AI may indeed understand that this is not what we meant. However, its final goal is to make us happy, not to do what the programmers meant when they wrote the code that represents this goal.'" 

If that's the paragraph you meant to write, and this is just a typo, then everything makes sense. If it isn't, then I am sorry to say that not much that you've said helped me understand what you meant by that paragraph.

Reply
Varieties Of Doom
habryka22m20

I honestly have no idea what is going on. I have read your post, but not in excruciating detail. I do not know what you are talking about with corrigibility or whatever in response to my comment, as it really has nothing to do with my question or uncertainty. The language models seem to think similar. 

I am not making a particularly complicated point. My point is fully 100% limited to this paragraph. This paragraph as far as I can understand is trying to make a local argument, and I have no idea how this logical step is supposed to work out. 

Contra Bostrom 2014 AIs will in fact probably understand what we mean by the goals we give them before they are superintelligent. (Before, you ask it's on page 147 in the 2017 paperback under the section "Malignant Failure Modes": "The AI may indeed understand that this is not what we meant. However, its final goal is to make us happy, not to do what the programmers meant when they wrote the code that represents this goal.")

I cannot make this paragraph make sense. You say (paraphrased) "Bostrom says that AI will not understand what we mean by the goals we given them before they are superintelligent, as you can see in the quote 'the AI will understand what we mean by the goals we give them'" 

And like, sure, I could engage with your broader critiques of Bostrom, but I am not. I am trying to understand this one point you make here. Think about it as a classical epistemic spot check. I just want to know what you meant by this one paragraph, as this paragraph as written does not make any sense to me, and I am sure does not make any sense to 90% of readers. It also isn't making any sense to the language models.

Like, if I hadn't had this to me very weird interaction I would be 90% confident that you just made a typo in this paragraph. 

This is all because you explicitly say "here is the specific sentence in Superintelligence that proves that I am correctly paraphrasing Bostrom" and then cite a sentence that I have no idea how it's remotely supposed to show that you are correctly paraphrasing Bostrom. Like, if you weren't trying to give a specific sentence as the source, I would not be having this objection.

Reply
Do things in small batches
habryka11h70

Yep, definitely! The reason why these are big tomes is IMO largely downstream of the distribution methods at the time.

Like, yes, totally, sometimes you have to cross large inferential distances. For example, the sequences are probably one of the most inferential-distance spanning artifacts that I have read in my life. Nevertheless, they were written one blogpost a day over the course of two years. 

Many pieces of intellectual progress were also first made in the form of a lecture series, where each lecture was prepared after the previous one was finished. Then that lecture series was eventually written up into a book. Indeed, I think that is for most forms of intellectual progress, a better way of developing both ideas and pedagogical content knowledge.

Reply
Don't let people buy credit with borrowed funds
habryka12h22

Yeah, I agree. Thinking more about this, you can think about it a bit as a mechanism for splitting the surplus of spreading accurate information fairly. Like, you are creating positive externalities by telling people that this person you respect is someone they should work with, which they get to capture. The person you respect thinks the same about you, but they are not putting in the effort to share that with others. This seems a bit unfair! It seems reasonable to be like "hey mate, I am investing in the commons in this way, and you are not, can we please both do our part?". 

It still seems a bit dicey, but like, in-principle this seems good and like it improves the world. 

Reply
Wei Dai's Shortform
habryka12h30

I think I should just be able to appoint the two as members without resigning as member? Like, members can vote to modify the bylaws, so at the very least I should just be able to appoint a new member by rewriting the bylaw. 

I will look into this sometime this week or next week. Feel free to poke me for an update any time if I end up not putting one here.

Reply
Don't let people buy credit with borrowed funds
habryka13h40

I agree that if you have two people who mutually already respect each other then such an alliance would be a null-operation, but like, why then make such alliances in the first place? Can it really be said that the alliance is therefore fine to make? Doesn't such an alliance bind you to not say something if your opinion changes? 

Reply
Don't let people buy credit with borrowed funds
habryka13h20

Agree! I found that section quite interesting to read and ran into it when researching for this post. 

I think I disagree with Paul Graham's lax-seeming relationship to telling selective truths and leveraging that for your company's success. Separately, I do also think Viaweb running a non-trivial fraction of the store fronts on the internet made it ambiguously a young company. I would be curious to learn how close this was to Yahoo's acquisition of Viaweb, which could sway me either way on this being a counterexample or not. 

Separately, I am pretty sure Paul Graham has said things other places where he warns startups not to hire PR companies, or something to that affect, but I can't find it. This HN comment refers to it. 

Reply
Wei Dai's Shortform
habryka13h150

Oh, huh, maybe you are right? If so, I myself was unaware of this! I will double check our bylaws and elections that have happened so far and confirm the current state of things. I was definitely acting under the assumption that I wasn't able to fire Vaniver and Daniel and that they would be able to fire me! 


See for example this guidance document I sent to Daniel and Vaniver when I asked them to be board members: 

If it is indeed true that they cannot fire me, then I should really rectify that! If so, I am genuinely very grateful for you noticing. 

I think given clear statements that I have made that I am appointing them to a position in which they are able to fire me, I think they would have probably indeed held the formal power to do so, but it is possible that we didn't follow the right corporate formalities, and if so should fix that! Corporate formalities do often turn out to really matter in the end.

Reply11
Wei Dai's Shortform
habryka13h70

Any of those seem fine. Public is better, since more people get to benefit from it.

Reply1
Wei Dai's Shortform
habryka13h40

This is (approximately) my forum.

Also, just to contextualize this, this is in the context of a thread about forum moderation. I have various complicated takes about the degree to which LW belongs to Lightcone Infrastructure, and what our relationships to various stakeholders is, and I don't relate to LessWrong as a thing I (approximately) own in most respects.

If you are uncertain about what I would feel comfortable doing, and what I wouldn't, feel free to ask me!

Reply
Load More
40Automate, automate it all
11h
0
111Anthropic is (probably) not meeting its RSP security commitments
18h
6
89Do things in small batches
2d
8
47Close open loops
2d
0
31Diagonalization: A (slightly) more rigorous model of paranoia
3d
13
110Put numbers on stuff, all the time, otherwise scope insensitivity will eat you
4d
3
85Increasing returns to effort are common
4d
6
102Don't let people buy credit with borrowed funds
5d
34
133Tell people as early as possible it's not going to work out
6d
11
299Paranoia: A Beginner's Guide
4d
62
Load More
CS 2881r
2 months ago
(+204)
Roko's Basilisk
4 months ago
Roko's Basilisk
4 months ago
AI Psychology
a year ago
(+58/-28)