Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
To me, it feels in tension with having romantically meaningful relationships with multiple people because it sounds like sharing your resources instead of devoting them all towards the one most important thing.
I feel like a life always consists of needing to distribute resources between multiple commitments. Job, community, friends, children, principles, ambitions, and your partners. I feel like dating multiple people is only in as much in conflict with commitment as any of these other things are (though of course via their similar nature are in competition over somewhat more similar kinds of resources, but IMO only to a limited degree, e.g. someone who does not have a job does seem to me likely capable of doing their part in multiple romantic relationships).
I think it's not crazy. The LTFF is pretty constrained in what it can fund due to CEA, and also I really wouldn't predict a future in which EA Funds is more independent from CEA than it is now (indeed, the trajectory is CEA more directly controlling both the LTFF and EA Funds more broadly).
I currently think that this assessment is off in-expectation for something like the next 6 months of LTFF's existence, which includes basically all marginal funding right now, so I do think I agree with you, but if anyone was considering donating larger amounts to the LTFF, I think they should expect that to constitute a pretty direct bet on existing EA formal structures and general ways of being.
I will take bets at high odds that there is a huge enormous correlation here. It also doesn't align with advice from the sources I trust here.
I mean, in my case the issue is not that it hallucinated, it's that it hallucinated in a way that was obviously optimized to look good to me.
Like, if the LLMs just sometimes randomly made up stuff, that would be fine, but in cases like this they will very confidently make up stuff that really looks exactly like the kind of thing that would get them high RL reward if it was real, and then also kind of optimize things to make it look real.
It seems very likely that the LLM "knew" that it couldn't properly read the PDF, or that the quotes it was extracting were not actual quotes, but it did not expose that information to me, despite it of course being obviously very relevant to my interests.
Sure, here is an example of me trying to get it to extract quotes from a big PDF: https://chatgpt.com/share/6926a377-75ac-8006-b7d2-0960f5b656f1
It's not fully apparent from the transcript, but basically all the quotes from the PDF are fully made up. And emphasizing to please give me actual quotes produced just more confabulated quotes. And of course those quotes really look like they are getting me exactly what I want!
They know they're not real on reflection, but not as they're doing it. It's more like fumbling and stuttering than strategic deception.
I will agree that making up quotes is literally dishonest but it's not purposeful deliberate deception.
But the problem is when I ask them "hey, can you find me the source for this quote" they usually double down and cite some made-up source, or they say "oh, upon reflection this quote is maybe not quite real, but the underlying thing is totally true" when like, no, the underlying thing is obviously not true in that case.
I agree this is the model lying, but it's a very rare behavior with the latest models.
I agree that literally commenting out tests is now rare, but other versions of this are still quite common. Semi-routinely when I give AIs tasks that are too hard will they eventually just do some other task that surface level looks like it got the task done, but clearly isn't doing the real thing (like leaving a function unimplemented, or avoiding doing some important fetch and using stub data). And it's clearly not the case that the AI doesn't know that it didn't do the task, because at that point it might have spent 5+ minutes and 100,000k+ tokens slamming its head against the wall trying to do it, and then at the end it just says "I have implemented the feature! You can see it here. It all works. Here is how I did it...", and clearly isn't drawing attention to how it clearly cut corners after slamming its head against the wall for 5+ minutes.
I mean, the models are still useful!
But especially when it comes to the task of "please go and find me quotes or excerpts from articles that show the thing that you are saying", the models really seem to do something that seems closer to "lying". This is a common task I ask the LLMs to perform because it helps me double-check what the models are saying.
And like, maybe you have a good model of what is going in with the model that isn't "lying", but I haven't heard a good explanation. It seems to me very similar to the experience of having a kind of low-integrity teenager just kind of make stuff up to justify whatever they said previously, and then when you pin them down, they flip and says "of course, you are totally right, I was wrong, here is another completely made up thing that actually shows the opposite is true".
And these things are definitely quite trajectory dependent. If you end up asking an open-ended question where the model confabulates some high-level take, and then you ask it to back that up, then it goes a lot worse than if you ask it for sources and quotes from the beginning.
Like, none of these seems very long-term scheming oriented, but it's also really obvious to me the model isn't trying that hard to do what I want.
It’s really difficult to get AIs to be dishonest or evil by prompting
I am very confused about this statement. My models lie to me every day. They make up quotes they very well know aren't real. They pretend that search results back up the story they are telling. They will happily lie to others. They comment out tests, and pretend they solve a problem when it's really obvious they haven't solved a problem.
I don't know how much this really has that much to do what these systems will do when they are superintelligent, but this sentence really doesn't feel anywhere remotely close to true.
"Audience Capture" is the standard term I've heard for this: https://en.wikipedia.org/wiki/Audience_capture
Most of this seems like bad advice. Fire alarms basically don't help with fire deaths at all. Fire blankets don't really do much and basically never come in handy. You have a phone, you don't need a separate torch. Modern extension chords extremely rarely end up overheating. Fire is not a leading cause of death in any western country. If you have enough money to comfortably self-insure, don't buy insurance.
I agree that you should invest your money into index funds, and to watch your basic health.