Dumping out a lot of thoughts on LW in hopes that something sticks. Eternally upskilling and accelerating.
DMs open, especially for promising opportunities in AI Safety and potential collaborators.
I work mostly as a distiller (of xrisk-relevant topics). I try to understand some big complex thing, package it up all nice, and distribute it. The "distribute it" step is something society has already found a lot of good tech for. The other two steps, not so much.
Loom is lovely in the times I've used it. I would love to see more work done on things like this, things that enhance my intelligence while keeping me very tightly in the loop. Other things in this vein include:
Ai Labs is slightly better but still bad.
Could you give a link to this or a more searchable name? "Ai Labs" is very generic and turns up every possible result. Even if it's bad, I'd be interested in investigating something "slightly better" and hearing a bit about why.
Clicking on the link on mobile Chrome sends me to the correct website. How do you replicate this?
In the meantime I've passed this along and it should make it to the right people in CAIS by sometime today.
I have not been able to independently verify this observation, but am open to further evidence if and only if it updates my p(doom) higher.
After reviewing the evidence, both of the EA acquisition and the cessation of Lightcone collaboration with the Fooming Shoggoths, I'm updating my p(doom) upwards 10 percentage points, from 0.99 to 1.09.
This seems very related to what the Benchmarks and Gaps investigation is trying to answer, and it goes into quite a bit more detail and nuance than I'm able to get into here. I don't think there's a publicly accessible full version yet (but I think there will be at some later point).
It much more targets the question "when will we have AIs that can automate work at AGI companies?" which I realize is not really your pointed question. I don't have a good answer to your specific question because I don't know how hard alignment is or if humans realistically solve it on any time horizon without intelligence enhancement.
However, I tentatively expect safety research speedups to look mostly similar to capabilities research speedups, barring AIs being strategically deceptive and harming safety research.
I median-expect time horizons somewhere on the scale of a month (e.g. seeing an involved research project through from start to finish) to lead to very substantial research automation at AGI companies (maybe 90% research automation?), and we could see nonetheless startling macro-scale speedup effects at the scale of 1-day researchers. At 1-year researchers, things are very likely moving quite fast. I think this translates somewhat faithfully to safety orgs doing any kind of work that can be accelerated by AI agents.
I think your reasoning-as-stated there is true and I'm glad that you showed the full data. I suggested removing outliers for dutch book calculations because I suspected that the people who were wild outliers on at least one of their answers were more likely to be wild outliers on their ability to resist dutch books; I predict that the thing that causes someone to say they value a laptop at one million bikes is pretty often just going to be "they're unusually bad at assigning numeric values to things."
The actual origin of my confusion was "huh, those dutch book numbers look really high relative to my expectations, this reminds me of earlier in the post when the other outliers made numbers really high."
I'd be interested to see the outlier-less numbers here, but I respect if you don't have the spoons for that given that the designated census processing time is already over.
When taking the survey, I figured that there was something fishy going on with the conjunction fallacy questions, but predicted that it was instead about sensitivity to subtle changes in the wording of questions.
I figured there was something going on with the various questions about IQ changes, but I instead predicted that you were working for big adult intelligence enhancement, and I completely failed to notice the dutch book.
Regarding the dutch book numbers: it seems like, for each of the individual-question presentations of that data, you removed the outliers. When performing the dutch book calculations, however, it seems like you keep the outliers in. This may be part of why the numbers reflect on our dutch book resistance so poorly (although not the whole reason).
I really want a version of the fraudulent research detector that works well. I fed in the first academic paper that I had quickly on hand from some recent work and get:
Severe Date Inconsistency: The paper is dated December 12, 2024, which is in the future. This is an extremely problematic issue that raises questions about the paper's authenticity and review process.
Even though it thinks the rest of the paper is fine, it gives it a 90% retraction score. Rerunning on the same paper once more gets similar results and an 85% retraction score.
The second paper I tried, it gives a mostly robust analysis, but only after completely failing to output anything the first time around.
After this, every input of mine got the "Error Analysis failed:" error.
Did I entirely miss these? I can't find the post anywhere. I am still interested in having more music.