Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
Last week, I was working with a paper that has over 100 upvotes on LessWrong
Just curious whether you meant "score above 100" or "more than 100 votes". Those are quite different facts!
This was the thinking model (I basically always use the thinking model).
I mean, maybe there is a bit of self-deception going on, though what that looks like in LLMs looks messy.
But it's clear that the hallucinations point in the direction of sycophancy, and also clear that the LLM is not trying very hard not to lie, despite this being a thing I obviously care quite a bit about (and the LLM knows this).
If you want to call them "sycophantically adversarial selective hallucinations", then sure, but I honestly think "lying" is a better descriptor, and more predictive of what LLMs will do in similar situations.
I would also simply bet that if we had access to the CoT in the above case, the answer to what happened would not look that much like "hallucinations". It would look more like "the model realized it can't read it, kind of panicked, tried some alternative ways of solving the problem, and eventually just output this answer". Like, I really don't think the model will have ended up in a cognitive state where it thought it could read the PDF, which is what "hallucination" would imply.
LessWrong is not a forum in which posting in good faith is sufficient to be welcomed! Think of it as a professional community. Just because you are writing a physics paper in good faith doesn't mean it will be well-received by the physics community as a contribution. Similarly here, I think you are missing a large number of prerequisites that are assumed to be understood by participants on LW.
I would recommend checking out the New User's Guide to LessWrong .
This comment had a lot of people downvote it (at this time, 2 overall karma with 19 votes). It shouldn't have been, and I personally believe this is a sign of people being attached to AI x-risk ideas and of those ideas contributing to their entire persona rather than strict disagreement. This is something I bring to conversations about AI risk, since I believe folks will post-rationalize. The above comment is not low effort or low value.
I generally think it makes sense for people to have pretty complicated reasons for why they think something should be downvoted. I think this goes more for longer content, which often would require an enormous amount of effort to respond to explicitly.
I have some sympathy for being sad here if a comment ends up highly net-downvoted, but FWIW, I think 2 karma feels vaguely in the right vicinity for this comment, maybe I would upvote it to +6, but I would indeed be sad to see it at +20 or whatever since I do think it's doing something pretty tiring and hard to engage with. Directional downvoting is a totally fine use of downvoting, and if you think a comment is overrated but not bad, please downvote it until its karma reflects where you want it to end up!
(This doesn't mean it doesn't make sense to do sociological analysis of cultural trends on LW using downvoting, but I do want to maintain the cultural locus where people can have complicated reasons for downvoting and where statements like "if you disagree strongly with the above comment you should force yourself to outline your views" aren't frequently made. The whole point of the vote system is to get signal from people without forcing them to do huge amounts of explanatory labor. Please don't break that part)
Come on, if you want to argue the fire death point at least give some kind of statistic or do a micromort estimate.
prevented home accidents do not show up in stats, it is akin to survivor bias.
Most people do not own fire blankets, as such there is little survivorship bias going on here. You can just estimate using base rates.
The expected annual property damage from fire is around $60/year per homeowner per this random ChatGPT analysis (in other words not worrying about). A fire blanket would need to result in a 50% reduction of all fire risk to start being worth the cost and attention.
Honestly, this whole conversation just feels like I am on Reddit with people giving random anecdotes without statistical literacy. You can disagree with me, but you speak with weird authority on issues that you seem to not have actually thought that clearly about.
Most of this seems like bad advice. Fire alarms basically don't help with fire deaths at all. Fire blankets don't really do much and basically never come in handy. You have a phone, you don't need a separate torch. Modern extension chords extremely rarely end up overheating. Fire is not a leading cause of death in any western country. If you have enough money to comfortably self-insure, don't buy insurance.
I agree that you should invest your money into index funds, and to watch your basic health.
To me, it feels in tension with having romantically meaningful relationships with multiple people because it sounds like sharing your resources instead of devoting them all towards the one most important thing.
I feel like a life always consists of needing to distribute resources between multiple commitments. Job, community, friends, children, principles, ambitions, and your partners. I feel like dating multiple people is only in as much in conflict with commitment as any of these other things are (though of course via their similar nature are in competition over somewhat more similar kinds of resources, but IMO only to a limited degree, e.g. someone who does not have a job does seem to me likely capable of doing their part in multiple romantic relationships).
I think it's not crazy. The LTFF is pretty constrained in what it can fund due to CEA, and also I really wouldn't predict a future in which EA Funds is more independent from CEA than it is now (indeed, the trajectory is CEA more directly controlling both the LTFF and EA Funds more broadly).
I currently think that this assessment is off in-expectation for something like the next 6 months of LTFF's existence, which includes basically all marginal funding right now, so I do think I agree with you, but if anyone was considering donating larger amounts to the LTFF, I think they should expect that to constitute a pretty direct bet on existing EA formal structures and general ways of being.
Knowing that you haven't solved the problem is actually really quite useful and important! I think basically no progress has been made on the alignment problem, but I do think the arguments for why it's not been solved yet are as such really quite important for helping humanity navigate the coming decades.