Fearing that this would be adequate with a large influx of low-quality users
Clarifying: this is a typo and should be inadequate, right?
It seems unlikely that AI labs are going to comply with this petition. Supposing that this is the case, does this petition help, hurt, or have no impact on AI safety, compared to the counterfactual where it doesn't exist?
All possibilities seem plausible to me. Maybe it's ignored so it just doesn't matter. Maybe it burns political capital or establishes a norm of "everyone ignores those silly AI safety people and nothing bad happens". Maybe it raises awareness and does important things for building the AI safety coalition.
Modeling social reality is always hard, but has there been much analysis of what messaging one ought to use here, separate from the question of what policies one ought to want?
Not if the people paying in sex are poor! Imagine that 10% of housing is reserved for the poorest people in society as part of some government program that houses them for free, and the other 90% is rented for money at a rate of £500/month (also this is a toy model where all housing is the same, no mansions here). One day the government ends the housing program and privatizes the units, they all go to landlords who start charging money. Is the new rate for housing lower, higher or the same?
The old £500/month rate was the equilibrium that fell out of matching the richest 90% of people with 90% of the housing stock. The new equilibrium has 10% more people and 10% more housing to work with, but the added people are poorer than average, supply and demand tells us that prices will go down to reflect the average consumer having less buying power.
If you think of paying the rent with sex as "getting housing for free" and "government bans sex for rent" as "ending the free housing program", this model applies to both cases. Assuming that people paying the rent in sex are of exactly average wealth then the new equilibrium might also be £500/month, but if they are much poorer than average it should be lower (and interestingly, if they're richer than average, it would end up higher).
Good point. I feel like it shouldn't happen much but I agree the simple economic model predicts it should. I could resolve it within the model as some kind of market friction argument (finding someone to sell sex to is not trivial, the landlord makes it easier to go into prostitution by providing himself as a "steady employer"), but I think my real intuition is that this is a place where homo economicus breaks down so I shouldn't be trying to apply simple economic models.
Also, even if my initial argument does work, this is basically a novel form of rent control, so the standard arguments against rent control should apply (supply isn't completely inelastic, constraining demand will reduce future supply, which we don't want).
Nitpicking the landlord case: Banning sex for rent drives down prices.
Suppose the market rate for a room is £500 or X units of sex. Most people pay in money but some are desperate and lack £500 so they pay in sex. One day the government bans paying in sex. This is an artificial constraint on demand, some people who would have paid at the old sex rate are being prevented from doing so. When you constrain demand on something with relatively inelastic supply, prices fall. Specifically, the rooms that would have been rented for sex sit empty until their prices are lowered, the new market rate is £490.
Some people are still worse off because of this (a lot of the desperate people don't have £490 to pay either) but there are possible values where the utilitarian calculus works out net positive (plenty of non-desperate people still benefit from lower rent). One can imagine the government in a productive role as a renter's negotiating partner: "Gosh Mr. Landlord, I'd love to pay in sex but that's illegal, best I can do is £490."
we know how to specify rewards for... "A human approved this output"; we don't know how to specify rewards for "Actually good alignment research".
Can't these be the same thing? If we have humans who can identify actually good alignment research, we can sit them down in the RLHF booth and have the AI try to figure out how to make them happy.
Now obviously a sufficiently clever AI will infer the existence of the RLHF booth and start hacking the human in order to escape its box, which would be bad for alignment research. But it's looking increasingly plausible that e.g. GPT-6 will be smart enough to provide actually good mathematical research without being smart enough to take over the world (that doesn't happen until GPT-8). So why not alignment research?
To break the comparison I think you need to posit either that alignment research is way harder than math research (as Eli understands Eliezer does) such that anything smart enough to do it is also smart enough to hack a human, or I suppose it could be the case that we don't have humans who can identify actually good alignment research.
If you believe strongly enough in the Great Man theory of startups then it's actually working as intended. If startups are more about selling the founder rather than the product, if the pitch is "I am the kind of guy who can do cool business stuff" rather than "Look at this cool stuff I made", then penalizing founders who don't pre-truth is correctly downranking them for being some kind of chump. A better founder would have figured out that he was supposed to pre-truth and it is significant information about his competence that he did not.
Realistically it is surely at least a little bit about the product itself, and honest founders must be "unfairly" losing points on the perceived merits of their product, but one could argue that identifying people savvy enough to play the game creates more value than is lost by underestimating the merits of honest product pitches.
Depending on exactly where the boundaries of the pre-truth game are, I think I could argue no one is being deceived (I mean realistically there will be at least a couple naive investors who think founders are speaking literal truth, but there could be few enough that hoodwinking them isn't the point).
When founders present a slide deck full of pre-truths about how great their product is, that slide deck is aimed solely at investors. The founder usually doesn't publish the slide deck, and if they did they wouldn't expect Joe Average to care much. The purpose of the pre-truths isn't to make anyone believe that their product is great (because all the investors know that this is an audition for lying, so none of them are going to take the claims literally), rather it is to demonstrate to investors that the founder is good at exaggerating the greatness of their product. This establishes that a few years later when they go to market, they will be good at telling different lies to regulators, customers, etc.
The pre-truth game could be a trial run for deceiving people, rather than itself being deceptive.
Here is a possible defense of pre-truth. I'm not sure if I believe it, but it seems like one of several theories that fit the available evidence.
Willingness to lie is a generally useful business skill. Businesses that lie to regulators will spend less time on regulatory compliance, businesses that lie to customers will get more sales, etc. The optimal amount of lying is not zero.
The purpose of the pre-truth game is to allow investors to assess the founder's skill at lying, because you wouldn't want to fund some chump who can't or won't lie to regulators. Think of it as an initiation ritual: if you run a criminal gang it might be useful to make sure all your new members are able to kill a man, and if you run a venture capital firm it might be useful to make sure all the businessmen you invest in are skilled liars. The process generates value in the same way as any other skill-assessing job interview. There's a conflict which features lying, but it's a coalition of founders and investors against regulators and customers.
So why keep the game secret? Well it would probably be bad for the startup scene if it became widely known that everyone's hoping startups will lie to regulators and customers. Also, by keeping the game secret you make "figure out what game we're playing" a part of the interview process, and you'd probably prefer to invest in people savvy enough to figure that out on their own.
One can cross-reference the moderation log with "Deleted by alyssavance, Today at 8:19 AM" to determine who made any particular deleted comment. Since this information is already public, does it make sense to preserve the information directly on the comment, something like "[comment by Czynski deleted]"?