we know how to specify rewards for... "A human approved this output"; we don't know how to specify rewards for "Actually good alignment research".
Can't these be the same thing? If we have humans who can identify actually good alignment research, we can sit them down in the RLHF booth and have the AI try to figure out how to make them happy.
Now obviously a sufficiently clever AI will infer the existence of the RLHF booth and start hacking the human in order to escape its box, which would be bad for alignment research. But it's looking increasingly plausible that e.g. GPT-6 will be smart enough to provide actually good mathematical research without being smart enough to take over the world (that doesn't happen until GPT-8). So why not alignment research?
To break the comparison I think you need to posit either that alignment research is way harder than math research (as Eli understands Eliezer does) such that anything smart enough to do it is also smart enough to hack a human, or I suppose it could be the case that we don't have humans who can identify actually good alignment research.
If you believe strongly enough in the Great Man theory of startups then it's actually working as intended. If startups are more about selling the founder rather than the product, if the pitch is "I am the kind of guy who can do cool business stuff" rather than "Look at this cool stuff I made", then penalizing founders who don't pre-truth is correctly downranking them for being some kind of chump. A better founder would have figured out that he was supposed to pre-truth and it is significant information about his competence that he did not.
Realistically it is surely at least a little bit about the product itself, and honest founders must be "unfairly" losing points on the perceived merits of their product, but one could argue that identifying people savvy enough to play the game creates more value than is lost by underestimating the merits of honest product pitches.
Depending on exactly where the boundaries of the pre-truth game are, I think I could argue no one is being deceived (I mean realistically there will be at least a couple naive investors who think founders are speaking literal truth, but there could be few enough that hoodwinking them isn't the point).
When founders present a slide deck full of pre-truths about how great their product is, that slide deck is aimed solely at investors. The founder usually doesn't publish the slide deck, and if they did they wouldn't expect Joe Average to care much. The purpose of the pre-truths isn't to make anyone believe that their product is great (because all the investors know that this is an audition for lying, so none of them are going to take the claims literally), rather it is to demonstrate to investors that the founder is good at exaggerating the greatness of their product. This establishes that a few years later when they go to market, they will be good at telling different lies to regulators, customers, etc.
The pre-truth game could be a trial run for deceiving people, rather than itself being deceptive.
Here is a possible defense of pre-truth. I'm not sure if I believe it, but it seems like one of several theories that fit the available evidence.
Willingness to lie is a generally useful business skill. Businesses that lie to regulators will spend less time on regulatory compliance, businesses that lie to customers will get more sales, etc. The optimal amount of lying is not zero.
The purpose of the pre-truth game is to allow investors to assess the founder's skill at lying, because you wouldn't want to fund some chump who can't or won't lie to regulators. Think of it as an initiation ritual: if you run a criminal gang it might be useful to make sure all your new members are able to kill a man, and if you run a venture capital firm it might be useful to make sure all the businessmen you invest in are skilled liars. The process generates value in the same way as any other skill-assessing job interview. There's a conflict which features lying, but it's a coalition of founders and investors against regulators and customers.
So why keep the game secret? Well it would probably be bad for the startup scene if it became widely known that everyone's hoping startups will lie to regulators and customers. Also, by keeping the game secret you make "figure out what game we're playing" a part of the interview process, and you'd probably prefer to invest in people savvy enough to figure that out on their own.
I understood "based" to be a 4chan-ism but I didn't think very hard about the example, it is possible I chose a word that does not actually work in the way I had meant to illustrate. Hopefully the intended meaning was still clear.
Is it wrong for Bob the Democrat to say "based" because it might lead people to incorrectly infer he is a conservative? Is it wrong for Bob the plumber to say "edema" because it might lead people to incorrectly infer he is a a doctor? If I told Bob to start saying "swelling" instead of "edema" then I feel like he would have some right to defend his word use: no one thinks edema literally means "swelling, and also I am a doctor" even if they update in a way that kind of looks like it does.
I don't think we have a significant disagreement here, I was merely trying to highlight a distinction your comment didn't dwell on, about different ways statements can be perceived differently. "There is swelling" vs "There is swelling and also I am a doctor" literally means something different while "There is swelling" vs "There is edema" merely implies something different to people familiar with who tends to use which words.
I agree that people hearing Zack say "I think this is insane" will believe he has a lower P(this is insane) than people hearing him say "This is insane", but I'm not sure that establishes the words mean that.
If Alice goes around saying "I'm kinda conservative" it would be wise to infer that she is probably conservative. If Bob goes around saying "That's based" in the modern internet sense of the term, it would also be wise to infer that he is probably a conservative. But based doesn't mean Bob is conservative, semantically it just means something like "cool", and then it happens to be the case that this particular synonym for cool is used more often by conservatives than liberals.
If it turned out that Alice voted party line Democrat and loved Bernie Sanders, one would have a reasonable case that she had used words wrong when she said she was kinda conservative, those words mean basically the opposite of her circumstances. If it turned out that Bob voted party line Democrat and loved Bernie Sanders, then one might advise him "your word choice is causing people to form a false impression, you should maybe stop saying based", but it would be weird to suggest this was about what based means. There's just an observable regularity of our society that people who say based tend to be conservative, like how people who say "edema" tend to be doctors.
If Zack is interested in accurately conveying his level of confidence, he would do well to reserve "That's insane" for cases where he is very confident and say "That seems insane" when he is less confident. If he instead decided to use "That's insane" in all cases, that would be misleading. But I think it is significant that this would be a different kind of misleading than if he were to use the words "I am very confident that is insane", even if the statements cause observers to make the exact same updates.
Everyone sometimes issues replies that are not rebuttals, but there is an expectation that replies will meet some threshold of relevance. Injecting "your comment reminds me of the medieval poet Dante Alighieri" into a random conversation would generally be considered off-topic, even if the speaker genuinely was reminded of him. Other participants in the conversation might suspect this speaker of being obsessed with Alighieri, and they might worry that he was trying to subvert the conversation by changing it to a topic no one but him was interested in. They might think-but-be-too-polite-to-say "Dude, no one cares, stop distracting from the topic at hand".
The behaviour Raemon was trying to highlight is that you soapbox. If it is line with your values to do so, it still seems like choosing to defect rather than cooperate in the game of conversation.
> Aim for convergence on truth, and behave as if your interlocutors are also aiming for convergence on truth.
It's not clear to me what the word "convergence" is doing here. I assume the word means something, because it would be weird if you had used extra words only to produce advice identical to "Aim for truth, and behave as if your interlocutors are also aiming for truth". The post talks about how truthseeking leads to convergence among truthseekers, but if that were all there was to it then one could simply seek truth and get convergence for free. Apparently we ought to seek specifically convergence on truth, but what does seeking convergence look like?
I've spent a while thinking on it and I can't come up with any behaviours that would constitute aiming for truth but not aiming for convergence on truth, could you give an example?
Nitpicking the landlord case: Banning sex for rent drives down prices.
Suppose the market rate for a room is £500 or X units of sex. Most people pay in money but some are desperate and lack £500 so they pay in sex. One day the government bans paying in sex. This is an artificial constraint on demand, some people who would have paid at the old sex rate are being prevented from doing so. When you constrain demand on something with relatively inelastic supply, prices fall. Specifically, the rooms that would have been rented for sex sit empty until their prices are lowered, the new market rate is £490.
Some people are still worse off because of this (a lot of the desperate people don't have £490 to pay either) but there are possible values where the utilitarian calculus works out net positive (plenty of non-desperate people still benefit from lower rent). One can imagine the government in a productive role as a renter's negotiating partner: "Gosh Mr. Landlord, I'd love to pay in sex but that's illegal, best I can do is £490."