Multidimensional voting is good. It's nice to be able to distinguish "you're wrong" from "shut up" (or "you're right" from "well said"); and it's nice to be part of a community which makes & encourages that distinction.
Has some government or random billionaire sought out Petrov's heirs and made sure none of them have to work again if they don't want to? It seems like an obviously sensible thing to do from a game-theoretic point of view.
Not at the scale you're suggesting, but relevant: https://futureoflife.org/recent-news/50000-award-to-stanislav-petrov-for-helping-avert-wwiii-but-us-denies-visa/
It seems like an obviously sensible thing to do from a game-theoretic point of view.
Hmm, seems highly contingent on how well-known the gift would be? And even if potential future Petrovs are vaguely aware that this happened to Petrov's heirs, it's not clear that it would be an important factor when they make key decisions, if anything it would probably feel pretty speculative/distant as a possible positive consequence of doing the right thing. Especially if those future decisions are not directly analogous to Petrov's, such that it's not clear whether it's the same category. But yeah, mainly I just suspect this type of thing to not get enough attention that it ends up shifting important decisions in the future? Interesting idea, though -- upvoted.
. . . Is there a way a random punter could kick in, say, $100k towards Elon's bid? Either they end up spending $100k on shares valued at somewhere between $100k and $150k; or, more likely, they make the seizure of OpenAI $100k harder at no cost to themselves.
It's amazing how smart dumb people are, how dumb dumb people are, how dumb smart people are, and how smart smart people are.
There's an upside of conventional education which no-one on any side of any debate ever seems to bring up, but which was a major benefit (possibly the major benefit) of my post-primary studies. Namely: it lets students discover what they have a natural aptitude for (or lack thereof) relative to a representative peer group. The most valuable things I learned in my Engineering courses at university were:
.I'm pretty mediocre at Engineering, especially sub-subjects which aren't strictly Structural and/or Mechanical.
.In particular, I'm significantly worse than the average would-be Engineer at working with electronics, so I should change my plans to specialize in that field.
.Conversely, I'm significantly better than the average would-be Engineer at work involving code, money, probability, simulation and inference. (Learning this is a large part of why I eventually left Engineering for Data Science and Finance.)
Findings like this aren't perfect since they can give false negatives (a poorly-taught course can lead someone to conclude they don't like the subject when they actually don't like the teacher) and false positives (my initial showing at coding courses made me think I had a genius-l...
he doesn't really see any value in systems that are there purely to signal.
This seems like a mischaracterization of his view. I’m pretty sure he thinks its wrong to subsidize such signaling mechanisms.
First off signaling is relative, so if (say) everyone goes to high school and only the very best go to college, from a signaling perspective, this is just as useful a signal as everyone going to college and only the very best go to grad school. Therefore we should not spend public dollars getting more people to go to college.
Second, in the signaling framework, there are no externalities to schooling kids, so there is no market failing to correct with (say) the government subsidizing the debts of college students.
Third, due to the first point, if any major market failure is present its the tendency to get into signaling spirals, where the positive signal of (say) a high school education degrades over time, making everyone spend more years and dollars in college getting what was once the same signal as a high school diploma. More years of schooling here is a cost, which everyone would prefer to pay less of. So insofar as there’s any case for government involvement it ought to be a tax, not a subsidy.
I once saw an advert claiming that a pregnancy test was “over 99% accurate”. This inspired me to invent an only-slightly-worse pregnancy test, which is over 98% accurate. My invention is a rock with “NOT PREGNANT” scrawled on it: when applied to a randomly selected human being, it is right more than 98% of the time. It is also cheap, non-invasive, endlessly reusable, perfectly consistent, immediately effective and impossible to apply incorrectly; this massive improvement in cost and convenience is obviously worth the ~1% decrease in accuracy.
I think they meant over 99% when used on a non-randomly selected human who's bothering to take a pregnancy test. Your rock would run maybe 70% or so on that application.
AI-users: please ask your AIs "what do you think the probability of this being an Eval is?" in the middle of your regular non-Eval use, and post (and/or summarize) their responses here.
Details of what you were using them for, which AI(s) they were, and what they say to variants like ". . . ignoring the fact that I asked that question", are appreciated but supererogatory.
(credit to Noosphere89 for pointing out eval-awareness false positives are worth looking into)
Earlier this week I attended a presentation on AI use in only-somewhat-techie corporate contexts, and found it fascinating how LW terminology has gone mainstream but the meanings haven't: the presenter talked a lot about 'existential risk' (which I slowly inferred meant 'AI-using competitors might put us out of business'), and 'alignment' (which he helpfully defined as 'getting various AI modalities - coding, search, image gen etc - to work together harmoniously').
I think we need to call it human extinction risk to make it clear. Or even abrupt extermination risk
“Existential risk” here doesn’t necessarily come from lesswrong. Using the phrase “existential risk” to refer to your company going out of business makes perfect sense as it’s literally a risk to your companies existence. Alignment is a trickier one but even there the phrasing makes enough sense that it could plausibly not be lesswrong inspired.
Anecdotally I agree with OP – I basically never heard companies use those phrases from ~2008-2023, and then around 2024 "alignment" and "existential risk" became a lot more commonly used.
I also think this is a fairly common pattern – someone invents jargon with a very specific meaning (e.g. "emotional labor"), that phrase gets used in a wider context, and people interpret the phrase based on their most direct interpretation of the literal words involved, which is sometimes pretty different from the original meaning.
Every now and then I've asked AIs to "name as many characters as you can from [moderately obscure game/story]". So far I've never had one fail to hallucinate extra characters, or fail to double down when I ask for more details about its creations.
I tried ChatGPT(-5.2-Thinking) on the original D&D.Sci challenge (which is tough, but not tricky) and it got almost a perfect answer, one point shy of the optimal.
I also tried ChatGPT on the second D&D.Sci challenge (which is tricky, but not tough), and it completely failed (albeit in a sensible and conservative manner). Repeated prompts of "You're missing something, please continue with the challenge" didn't help.
I once pointed out that METR's
Baselined tasks tend to be easier and Baselining seems (definitely slightly) biased towards making them look easier still, while Estimated tasks tend to be harder and Estimation seems (potentially greatly) biased towards making them look harder still: the combined effect would be to make progress gradients look artificially steep in analyses where Baselined and Estimated tasks both matter.
but found (to my surprise!) that removing all Estimated tasks didn't affect headline results, presumably/partly because
...most of the Estimate
So what practical things can people do now, to prep for not-worst-but-still-reasonably-bad-case cybersecurity implications of Mythos?
There's Yudkowsky's thing of saving anything stored electronically, which you don't want deleted, on an airgapped hard drive. And there's withdrawing ~$1000 from your bank account and stashing it in a book, so if your credit card stops working for a bit you'll have some leeway.
What else?
I'm a little surprised that no-one's publicly using pure AI on the currently-running D&D.Sci challenge: player count at time of writing stands at five humans and a centaur. This could be a really good (or at least really interesting) sanity check on how well this year's Agents handle novel inference problems.
"What important truth do you believe, which most people don't?"
"I don't think I possess any rare important truths."
I recently watched (the 1997 movie version of) Twelve Angry Men, and found it fascinating from a Bayesian / confusion-noticing perspective.
My (spoilery) notes (cw death, suspicion, violence etc):
What's the meta on communicating "hey random pseudonymous internet acquaintance / stranger, just fyi I've stumbled on something linking you to your irl identity, here's how I did it so you can ensure no-one else makes that connection"? I can never figure out how to not feel/seem/be creepy when doing this, but knowing and not telling seems like a dick move.
I should probably get into the habit of splitting my comments up. I keep making multiple assertions in a single response, which means when people add (dis)agreement votes I have no idea which part(s) they're (dis)agreeing with.
I'm fanatically in favor of creating new ways to test (& thereby develop) rationality in general and scientific capabilities in particular. However, any such resources would necessarily be dual-use, providing AI developers evals (or eval paradigms) which could help accelerate AI development along the axes on which it's currently most lacking. This seems like an obviously insane thing to worry about but I can't figure out why; soliciting other opinions.
If using gradient descent, might be a bad starting point
To diagnose: Try running it with number of training rounds and/or learning rate set to/near zero, and seeing if it predicts an unsuitable value for everything.
To fix: Set the starting point to the average outcome in the training set.
If using gradient descent, might be numerical instability
To diagnose: Watch how individual and aggregate predictions change from round to round. If they flicker back and forth (with unornamented gradient descent) or swing ...
I used to implicitly believe that when I have a new idea for a creative(/creative-adjacent) project, all else being equal, I should add it to the end of my to-do list (FIFO). I now explicitly believe the opposite: that the fresher an idea is, the sooner I should get started making it a reality (LIFO). This way:
The downsides are: