Thanks for writing this, Nate. This topic is central to our research at Sentience Institute, e.g., "Properly including AIs in the moral circle could improve human-AI relations, reduce human-AI conflict, and reduce the likelihood of human extinction from rogue AI. Moral circle expansion to include the interests of digital minds could facilitate better relations between a nascent AGI and its creators, such that the AGI is more likely to follow instructions and the various optimizers involved in AGI-building are more likely to be aligned with each other. Empi...
I disagree with Eliezer Yudkowsky on a lot, but one thing I can say for his credibility is that in possible futures where he's right, nobody will be around to laud his correctness, and in possible futures where he's wrong, it will arguably be very clear how wrong his views were. Even if he has a big ego (as Lex Fridman suggested), this is a good reason to view his position as sincere and—dare I say it—selfless.
In particular, I wonder if many people who won't read through a post about offices and logistics would notice and find compelling a standalone post with Oliver's 2nd message and Ben's "broader ecosystem" list—analogous to AGI Ruin: A List of Lethalities. I know related points have been made elsewhere, but I think 95-Theses-style lists have a certain punch.
I like these examples, but can't we still view ChatGPT as a simulator—just a simulator of "Spock in a world where 'The giant that came to tea' is a real movie" instead of "Spock in a world where 'The giant that came to tea' is not a real movie"? You're already posing that Spock, a fictional character, exists, so it's not clear to me that one of these worlds is the right one in any privileged sense.
On the other hand, maybe the world with only one fiction is more intuitive to researchers, so the simulators frame does mislead in practice even if it can be res...
+1. While I will also respect the request to not state them in the comments, I would bet that you could sample 10 ICML/NeurIPS/ICLR/AISTATS authors and learn about >10 well-defined, not entirely overlapping obstacles of this sort.
We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.
I don't want people to skim this post and get the impression that this is a common view in ML.
The problem with asking individual authors is that most researchers in ML don't have a wide enough perspective to realize how close we are. Over the past decade of ML, it seems that people in the trenches of ML almost always think their research is going slower than it is because only a few researchers have broad enough gears models to plan the whole thing in their heads. If you aren't trying to run the search for the foom-grade model in your head at all times, you won't see it coming.
That said, they'd all be right about what bottlenecks there are. Just not how fast we're gonna solve them.
Interesting! I'm not sure what you're saying here. Which of those two things (shard theory and shard theory) is shard theory (written without a subscript)? If the former, then the OP seems accurate. If the latter or if shard theory without a subscript includes both of those two things, then I misread your view and will edit the post to note that this comment supersedes (my reading of) your previous statement.
Had you seen the researcher explanation for the March 2022 "AI suggested 40,000 new possible chemical weapons in just six hours" paper? I quote (paywall):
...Our drug discovery company received an invitation to contribute a presentation on how AI technologies for drug discovery could potentially be misused.
Risk of misuse
The thought had never previously struck us. We were vaguely aware of security concerns around work with pathogens or toxic chemicals, but that did not relate to us; we primarily operate in a virtual setting. Our work is rooted in building machi
Risk of misuse
The thought had never previously struck us.
This seems like one of the most tractable things to address to reduce AI risk.
If 5 years from now anyone developing AI or biotechnology is still not thinking (early and seriously) about ways their work could cause harm that other people have been talking about for years, I think we should consider ourselves to have failed.
Hm, the begging question meaning is probably just a verbal dispute, but I don't think asking questions can in general beg questions because they don't have conclusions. There is no "assuming its conclusion is true" if there is no conclusion. Not a big deal though!
I wouldn't say values are independent (i.e., orthogonal) at best; they are often highly correlated, such as values of "have enjoyable experiences" and "satisfy hunger" both leading to eating tasty meals. I agree they are often contradictory, and this is one valid model of catastrophic addiction or...
Thanks for the comment. I take "beg the question" to mean "assumes its conclusion," but it seems like you just mean Point 2 assumes something you disagree with, which is fair. I can see reasonable definitions of aligned and misaligned in which brains would fall into either category. For example, insofar as our values are a certain sort of evolutionary (e.g., valuing reproduction), human brains have misaligned mesaoptimization like craving sugar. If sugar craving itself is the value, then arguably we're well-aligned.
In terms of synthesizing an illusion, wha...
Very interesting! Zack's two questions were also the top two questions that came to mind for me. I'm not sure if you got around to writing this up in more detail, John, but I'll jot down the way I tentatively view this differently. Of course I've given this vastly less thought than you have, so many grains of salt.
On "If this is so hard, how do humans and other agents arguably do it so easily all the time?", how meaningful is the notion of extra parameters if most agents are able to find uses for any parameters, even just through redundancy or error-correc...
This is great. One question it raises for me is: Why is there a common assumption in AI safety that values are a sort of existent (i.e., they exist) and latent (i.e., they are not directly observable) phenomena? I don't think those are unreasonable partial definitions of "values," but they're far from the only ones and not at all obvious that they're the values with which we want to align AI. Philosophers Iason Gabriel (2020) and Patrick Butlin (2021) have pointed out some of the many definitions of "values" that we could use for AI safety.
I understand tha...
I strongly agree. There are two claims here. The weak one is that, if you hold complexity constant, directed acyclic graphs (DAGs; Bayes nets or otherwise) are not necessarily any more interpretable than conventional NNs because NNs are DAGs at that level. I don't think anyone who understands this claim would disagree with it.
But that is not the argument being put forth by Pearl/Marcus/etc. and arguably contested by LeCun/etc.; they claim that in practice (i.e., not holding anything constant), DAG-inspired or symbolic/hybrid AI approaches like Neural Causa...
I don't think the "actual claim" is necessarily true. You need more assumptions than a fixed difficulty of AGI, assumptions that I don't think everyone would agree with. I walk through two examples in my comment: one that implies "Gradual take-off implies shorter timelines" and one that implies "Gradual take-off implies longer timelines."
I agree with this post that the accelerative forces of gradual take-off (e.g., "economic value... more funding... freeing up people to work on AI...") are important and not everyone considers them when thinking through timelines.
However, I think the specific argument that "Gradual take-off implies shorter timelines" requires a prior belief that not everyone shares, such as a prior that an AGI of difficulty D will occur in the same year in both timelines. I don't think such a prior is implied by "conditioned on a given level of “AGI difficulty”". Here are t...
I think another important part of Pearl's journey was that during his transition from Bayesian networks to causal inference, he was very frustrated with the correlational turn in early 1900s statistics. Because causality is so philosophically fraught and often intractable, statisticians shifted to regressions and other acausal models. Pearl sees that as throwing out the baby (important causal questions and answers) with the bathwater (messy empirics and a lack of mathematical language for causality, which is why he coined the do operator).
Pearl discusses t...
This model was produced by fine-tuning DeBERTa XL on a dataset produced by contractors labeling a bunch of LM-generated completions to snippets of fanfiction that were selected by various heuristics to have a high probability of being completed violently.
I think you might have better performance if you train your own DeBERTa XL-like model with classification of different snippets as a secondary objective alongside masked token prediction, rather than just fine-tuning with that classification after the initial model training. (You might use different s...
I appreciate making these notions more precise. Model splintering seems closely related to other popular notions in ML, particularly underspecification ("many predictors f that a pipeline could return with similar predictive risk"), the Rashomon effect ("many different explanations exist for the same phenomenon"), and predictive multiplicity ("the ability of a prediction problem to admit competing models with conflicting predictions"), as well as more general notions of generalizability and out-of-sample or out-of-domain performance. I'd be curious what ex...
I think most thinkers on this topic wouldn't think of those weights as arbitrary (I know you and I do, as hardcore moral anti-realists), and they wouldn't find it prohibitively difficult to introduce those weights into the calculations. Not sure if you agree with me there.
I do agree with you that you can't do moral weight calculations without those weights, assuming you are weighing moral theories and not just empirical likelihoods of mental capacities.
I should also note that I do think intertheoretic comparisons become an issue in other cas...
I don't think the two-elephants problem is as fatal to moral weight calculations as you suggest (e.g. "this doesn't actually work"). The two-envelopes problem isn't a mathematical impossibility; it's just an interesting example of mathematical sleight-of-hand.
Brian's discussion of two-envelopes is just to point out that moral weight calculations require a common scale across different utility functions (e.g. the decision to fix the moral weight of a human at 1 whether you're using brain size, all-animals-are-equal, u...
I think the moral-uncertainty version of the problem is fatal unless you make further assumptions about how to resolve it, such as by fixing some arbitrary intertheoretic-comparison weights (which seems to be what you're suggesting) or using the parliamentary model.
This is a very exciting project! I'm particularly glad to see two features: (i) the focus on "deception", which undergirds much existential risk but has arguably been less of a focal point than "agency", "optimization", "inner misalignment", and other related concepts, (ii) the ability to widen the bottleneck of upskilling novice AI safety researchers who have, say, 500 hours of experience through the AI Safety Fundamentals course but need mentorship and support to make their own meaningful research contributions.