AI alignment researcher, ML engineer. Masters in Neuroscience.
I am currently working with Jacob Cannell at Vast.ai, where we are investigating our hypotheses about the nature of AGI. Jake and I have insights from studying neuroscience that lead us both to believe that cheap and broadly competent AGI is attainable and will be built soon. By others if not by us. This leads me to have timelines of around 2024-2027. I may be wrong, but I see no way to know for sure other than empirically testing my ideas. I think the best path forward to alignment is through safe, contained testing on models trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility. See my prediction market here: https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg
Sam Altman (in context of risks of AI): “What I lose the most sleep over is the hypothetical idea that we already have done something really bad by launching ChatGPT.”
That seems misplaced, in the sense that launching OpenAI and its general path of development seem like the places to be most worried you are in error, unless the error in ChatGPT is ‘get everyone excited about AI.’ Which is a different risk model that has many implications. I would have liked to hear more details of the threat model here.
I can't speak to Sam's threat model, but my current threat model is that the greatest risk comes from open source code and models, or non-safety-conscious labs. I think our best hope is for a safety conscious lab to get to a powerful AGI first and keep it safely contained, and study it and make iterative progress on aligning it well enough to use it to undertake a pivotal act. If this roughly corresponding to Sam's view, then I think what he said about ChatGPT would make sense, since the hype seems to be greatly speeding up the open source developments and drawing in more funding for new attempts. Especially if he, like me, believes that there are algorithmic advances which can be discovered that are so powerful that they could allow small orgs to leap frog all the big ones overnight. In such a scenario, the more independent groups you have making a serious try in approximately the right direction, the more 'lottery tickets' that Moloch gets to buy in the kill-us-all lottery.
I’d also worry a lot about scenarios where the difficulty level in practice is hard but not impossible, where making incremental progress is most likely to lead you down misleading paths that make the correct solutions harder rather than easier to find, because you have a much harder time directing attention to them, keeping attention there, or being rewarded for any incremental progress given ‘the competition’ and you have to worry those in charge will go with other, non-working solutions instead at many points.
...
I do not think that long-term alignment is so continuous with near-term reliability and control. I expect that successful solutions will likely be found to many near-term reliability and control problems, and that those solutions will have very little chance of working on AGI and then ASI systems. If I did not believe this, a lot of the strategic landscape would change dramatically, and my p(doom) from lack of alignment would decline dramatically, although I would still worry about whether that alignment actually saves us from the dynamics that happen after that – which is an under-considered problem.
I strongly agree with this, and have discussed this with a friend who does theoretical safety research who also agrees. It's easy to get excited by making incremental iterative progress. That's a thing that teams of humans tend to be great at. This makes it much easier to put additional resources into it. But it's likely that focusing only on this would lead us to ignore the likely fact that we're iterating on the wrong things and marching into a blind alley. I, and my friend, expect that the true paths to good solutions are not yet found and thus not yet available for incremental iterative progress. If that's the case, we need more theoretical researchers, and more serial time, to get to the beginning of a workable path to iterate on.
If these currently-iterable safety-ish plans manage to buy us at least a delay before things get doom-y, then that could be a benefit. I think it's plausible they could buy us a year or two of doom-delay.
Related: I also think that compute governance is potentially a good idea in that it might buy us some delay while we are in a low-algorithmic-efficiency regime. It current seems like the first models to be really dangerous will be made by well-resourced groups using lots of compute. I think that compute governance is potentially a terrible idea in that I expect it to fail completely and suddenly when, in the not so distant future, we transition to a high-algorithmic-efficiency regime. Then the barrier will be knowledge of the efficient algorithms, not large amounts of compute. I believe we can know that this high-algorithmic-efficiency regime exists because of looking at the way compute and learning work in the brain, but that we can't be sure of when algorithmic leaps will be made or how far they will get us. So if we put our trust in compute governance, we are driving our bus onto a lake which we know ahead of time has patches of ice thin enough to suddenly give way beneath us. With no way to know when we will reach the weak ice. Seems scary.
Chris Olah: One of the ideas I find most useful from @AnthropicAI‘s Core Views on AI Safety post (https://anthropic.com/index/core-views-on-ai-safety…) is thinking in terms of a distribution over safety difficulty. Here’s a cartoon picture I like for thinking about it:
I like this picture a lot. I personally place the peak of my distribution in between Apollo and P/NP. My lower tail does not go as low as Steam Engine, my upper tail does include impossible.
My response to Xuan would be that I don’t expect us to ‘just add data and compute’ in the next seven years, or four years. I expect us to do many other things as well. If you are doing the thought experiment expecting no such progress, you are doing the wrong thought experiment. Also my understanding is we already have a pattern of transformers looking unable to do something, then we scale and suddenly that changes for reasons we don’t fully understand.
I copied Xuan's twitter thread into the comments section of the LessWrong post of Jacob Steinhardt's GPT 2030 predictions. In defense of Xuan, she also says that she does not expect us to 'just add data and compute'. She said that she thinks Jacob's predictions are unlikely seeming iff you assume we will do nothing but add data and compute, but that she thinks this is unlikely. Thus, she is in agreement with you about criticizing Jacob's assumption of 'only data and compute added'.
Where you seem to differ from her point of view is that you think that additional data and compute only could indeed lead to novel and surprising emergent capabilities. She seems to think we've found about all there is to find there.
I, in agreement with you, believe that there are more novel emergent capabilities to be found through only adding data and compute. I do however think that we have reached a regime of diminishing returns. I believe that much greater efficiency in making forward progress will be found through algorithmic progress, and so the fact that technically more data and compute alone would be sufficient will become irrelevant since that will be exorbitantly expensive compared to discovering and utilizing algorithmic improvements.
It is my understanding that this is also what you think, and perhaps also what Xuan thinks but I haven't read enough from her to know that. I've only read Xuan's one twitter thread I copied.
<rant> This is a prime example of why I dislike Twitter so much as a medium of debate. The fractured tree of threads and replies makes it too hard to see a cohesive discussion between people and get a comprehensive understanding of other's viewpoints. The nuance gets lost. Had this discussion taken place in a comment thread in a forum, such as this one, then it would have been much easier to tell when you'd read Xuan's entire comment thread and gotten her full viewpoint. </rant>
copying top response to term for age you would've died at without modern medicine:
Bobby D. @Ancient0History
Technical: Medical Intervention Threshold
Untechnical: Bucket Point
For me, this would have been age 3, fever/pneumonia.
Agree with the first two, and with the risk brought up in third point but not the conclusion of 'avoid participating'. What about participating only in an inconsistent way, and responding to all requests that you create/participate in prediction markets with the response that acceding to such requests is against personal policy. That gives you the freedom to participate as you desire, but protects you from info-leaking via antagonistic request probing. Also, consider having an anonymous persona for participating in markets that seem like they might tempt you into revealing secrets you don't reflectively wish to reveal in that way.
I'm working on a lecture on brain-like AGI with an emphasis on connectomics intended for the online AI alignment conference in early August.
This is one of the errors I think this argument is making: Transformative AGI by 2043 is <1% likely
Ari Allyn-Feuer, Ted Sanders
I think the critique of Redwood Research made a few valid points. My own critique of Redwood would go something like:
Not much of a critique, honestly. A reasonable mistake that a lot of start-ups led by young inexperienced people would make, and certainly something fixable. Also, they have longer AGI timelines than me, and thus are not acting with what I see as sufficient urgency. But I don't think that it's necessarily fair for me to critique orgs for having their own well-considered opinions on this different from my own. I'm not even sure if them having my timelines would improve their output any.
This critique on the other hand seems entirely invalid and counterproductive. You criticize Conjecture's CEO for being... a charismatic leader good at selling himself and leading people? Because he's not... a senior academic with a track record of published papers? Nonsense. Expecting the CEO to be the primary technical expert seems highly misguided to me. The CEO needs to know enough about the technical aspects to be able to hire good technical people, and then needs to coordinate and inspire those people and promote the company. I think Connor is an excellent pick for this, and your criticisms of him are entirely beside the point, and also rather rude.
Conjecture, and Connor, seem to actually be trying to do something which strikes at the heart of the problem. Something which might actually help save us in three years from now when the leading AI labs have in their possession powerful AGI after a period of recursive self-improvement by almost-but-not-quite-AGI. I expect this AGI will be too untrustworthy to make more than very limited use of. So then, looking around for ways to make use of their newfound dangerous power, what will they see? Some still immature interpretability research. Sure. And then? Maybe they'll see the work Conjecture has started and realize that breaking down the big black magic box into smaller more trustworthy pieces is one of the best paths forward. Then they can go knocking on Conjecture's door, collect the research so far, and finish it themselves with their abundant resources.
My criticism of their plan is primarily: you need even more staff and more funding to have a better chance of this working. Which is basically the opposite of the conclusion you come to.
As for the untrustworthiness of their centralized infohazard policy... Yeah, this would be bad if the incentives were for the central individual to betray the world for their own benefit. That's super not the case here. The incentive is very much the opposite. For much the same reason that I feel pretty trusting of the heads of Deepmind, OpenAI, and Anthropic. Their selfish incentives to not destroy themselves and everyone they love are well aligned with humanity's desire to not be destroyed. Power-seeking in this case is a good thing! Power over the world through AGI, to these clever people, clearly means learning to control that untrustworthy AGI... thus means learning how to save the world. My threat model says that the main danger comes from not the heads of the labs, but the un-safety-convinced employees who might leave to start their own projects, or outside people replicating the results the big labs have achieved but with far fewer safety precautions.
I think reasonable safety precautions, like not allowing unlimited unsupervised recursive self-improvement, not allowing source code or model weights to leave the lab, sandbox testing, etc can actually be quite effective in the short term in protecting humanity from rogue AGI. I don't think surprise-FOOM-in-a-single-training-run-resulting-in-a-sandbox-escaping-superintelligence is a likely threat model. I think a far more likely threat model is foolish amatuers or bad actors tinkering with dangerous open source code and stumbling into an algorithmic breakthrough they didn't expect and don't understand and foolishly releasing it onto the web.
I think putting hope in compute governance is a very limited hope. We can't govern compute for long, if at all, because there will be huge reductions in compute needed once more efficient training algorithms are found.
Yes, in particular, more compute means it's easier to automate searches for algorithmic improvements....