Sorted by New

Wiki Contributions


The Sea of Faith

Was once, too, at the full, and round earth’s shore

Lay like the folds of a bright girdle furled.

But now I only hear

Its melancholy, long, withdrawing roar,

Retreating, to the breath

Of the night-wind, down the vast edges drear

And naked shingles of the world.

Ah, love, let us be true

To one another! for the world, which seems

To lie before us like a land of dreams,

So various, so beautiful, so new,

Hath really neither joy, nor love, nor light,

Nor certitude, nor peace, nor help for pain;

And we are here as on a darkling plain

Swept with confused alarms of struggle and flight,

Where ignorant armies clash by night.

A major factor that I did not see on the list is the rate of progress on algorithms, and closely related formal understanding, of deep AI systems. Right now these algorithms can be surprisingly effective (alpha-zero, GPT-3) but are extremely compute intensive and often sample inefficient. Lacking any comprehensive formal models of why deep learning works as well as it does, and why it fails when it does, we are groping toward better systems.

Right now the incentives favor scaling compute power to get more marquee results, since finding more efficient algorithms doesn't scale as well with increased money. However the effort to make deep learning more efficient continues and probably can give us multiple orders of magnitude increase in both compute and sample efficiency.

Orders of magnitude improvement in the algorithms would be consistent with our experience in many other areas of computing where speedups due to better algorithms have often beaten speedups due to hardware.

Note that this is (more or less) independent of advances that contribute directly to AGI. For example algorithmic improvements may let us train GPT-3 on 100 times less data, with 1000 times less compute work, but may not suggest how to make the GPT series fundamentally smarter / more capable, except by making it bigger.

Suppose, as this argues, that effective monopoly on AGI is a necessary factor in AI risk. Then effective anti-monopoly mechanisms (maybe similar to anti-trust?) would be significant mitigators of AI risk.

The AGI equivalent of cartels could contribute to risk as well, so the anti-monopoly mechanisms would have to deal with that as well. Lacking some dominant institutions to enforce cartel agreements, however, it should be easier to handle cartels than monopolies.

Aside from the "foom" story, what are the arguments that we are at risk of an effective monopoly on AGI?

And what are the arguments that large numbers of AGIs of roughly equal power still represent a risk comparable to a single monopoly AGI?

Many of us believe that central planning is dominated by diverse local planning plus markets in human affairs. Do we really believe that for AGIs, central planning will become dominant? This is surprising.

In general AGIs will have to delegate tasks to sub-agents as they grow, otherwise they run into computational and physical bottlenecks.

Local capabilities of sub-agents raise many issues of coordination that can't just be assumed away. Sub-agents spawned by an AGI must take advantage of local computation, memory and often local data acquisition, otherwise they confer no advantage. In general these local capabilities may cause divergent choices that require negotiation to generate re-convergence between agents. This implies that the assumption of a unified dominant AGI that can scale indefinitely is dubious at best.

Let's look at a specific issue here.

Loyalty is a major issue, directly and indirectly referenced in other comments. Without reliable loyalty, principal - agent problems can easily become crippling. But another term for loyalty is goal alignment. So in effect an AGI has to solve the problem of goal alignment to grow indefinitely by spawning sub-agents.

Corporations solve the problem of alignment internally by inculcating employees with their culture. However that culture becomes a constraint on their possible responses to challenges, and that can kill them - see many many companies whose culture drove success and then failure.

An AGI with a large population of sub-agents is different in many ways but has no obvious way to escape this failure mode. A change in culture implies changes in goals and behavioral constraints for some sub-agents, quite possibly all. But

  1. this can easily have unintended consequences that the AGI can't figure out since the sub-agents collectively have far more degrees of freedom than the central planner, and

  2. the change in goals and constraints can easily trash sub-agents' existing plans and advantages, again in ways the central planner in general can't anticipate.


Karnofsky's focus on "tool AI" is useful but also his statement of it may confuse matters and needs refinement. I don't think the distinction between "tool AI" and "agent AI" is sharp, or in quite the right place.

For example, the sort of robot cars we will probably have in a few years are clearly agents-- you tell them to "come here and take me there" and they do it without further intervention on your part (when everything is working as planned). This is useful in a way that any amount and quality of question answering is not. Almost certainly there will be various flavors of robot cars available and people will choose the ones they like (that don't drive in scary ways, that get them where they want to go even if it isn't well specified, that know when to make conversation and when to be quiet, etc.) As long as robot cars just drive themselves and people around, can't modify the world autonomously to make their performance better, and are subject to continuing selection by their human users, they don't seem to be much of a threat.

The key points here seem to be (1) limited scope, (2) embedding in a network of other actors and (3) humans in the loop as evaluators. We could say these define "tool AIs" or come up with another term. But either way the antonym doesn't seem to be "agent AIs" but maybe something like "autonomous AIs" or "independent AIs" -- AIs with the power to act independently over a very broad range, unchecked by embedding in a network of other actors or by human evaluation.

Framed this way, we can ask "Why would independent AIs exist?" If the reason is mad scientists, an arms race, or something similar then Karnofsky has a very strong argument that any study of friendliness is beside the point. Outside these scenarios, the argument that we are likely to create independent AIs with any significant power seems weak; Karnofsky's survey more or less matches my own less methodical findings. I'd be interested in strong arguments if they exist.

Given this analysis, there seem to be two implications:

  • We shouldn't build independent AIs, and should organize to prevent their development if they seem likely.

  • We should thoroughly understand the likely future evolution of a patchwork of diverse tool AIs, to see where dangers arise.

For better or worse, neither of these lend themselves to tidy analytical answers, though analytical work would be useful for both. But they are very much susceptible to investigation, proposals, evangelism, etc.

These do lend themselves to collaboration with existing AI efforts. To the extent they perceive a significant risk of development of independent AIs in the foreseeable future, AI researchers will want to avoid that. I'm doubtful this is an active risk but could easily be convinced by evidence -- not just abstract arguments -- and I'm fairly sure they feel the same way.

Understanding the long term evolution of a patchwork of diverse tool AIs should interest just about all major AI developers, AI project funders, and long term planners who will be affected (which is just about all of them). Short term bias and ceteris paribus bias will lead to lots of these folks not engaging with the issue, but I think it will seem relevant to an increasing number as the hits keep coming.

Yes, sorry, fixed. I couldn't find any description of the markup conventions and there's no preview button (but thankfully an edit button).

I wish I could be optimistic about some DSL approach. The history of AI has a lot of examples of people creating little domain languages. The problem is the lack of ability to handle vagueness. The domain languages work OK on some toy problems and then break down when the researcher tries to extend them to problems of realistic complexity.

On the other hand there are AI systems that work. The best examples I know about are at Stanford -- controlling cars, helicopters, etc. In those cases the researchers are confronting realistic domains that are largely out of their control. They are using statistical modeling techniques to handle the ill-defined aspects of the domain.

Notably in both the cars and the helicopters, a lot of the domain definition is done implicitly, by learning from expert humans (drivers or stunt pilots). The resulting representation of domain models is explicit but messy. However it is subject to investigation, refinement, etc. as needed to make it work well enough to handle the target domain.

Both of these examples use Bayesian semantics, but go well beyond cookbook Bayesian approaches, and use control theory, some fairly fancy model acquisition techniques, etc.

There is a lot of relevant tech out there if Less Wrong is really serious about its mission. I haven't seen much attempt to pursue it yet.


First, my own observation agrees with GreenRoot. My view is less systematic but much longer, I've been watching this area since the 70s. (Perhaps longer, I was fascinated in my teens by Leibnitz's injunction "Let us calculate".)

Empirically I think several decades of experiment have established that no obvious or simple approach will work. Unless someone has a major new idea we should not pursue straightforward graphical representations.

On the other hand we do have a domain where machine usable representation of thought has been successful, and where in fact that representation has evolved fairly rapidly. That domain is "programming" in a broad sense.

Graphical representations of programs have been tried too, and all such attempts have been failures. (I was a project manager for such an attempt in the 80s.) The basic problem is that a program is naturally a high-dimensional object, and when mapped down into a two dimensional picture it is about as comprehensible as a bowl of spagetti.

The really interesting aspect of programming for representing arguments isn't the mainstream "get it done" perspective, but the background work that has been done on tools for analyzing, transforming, optimizing, etc. code. These tools all depend on extracting and maintaining the semantics of the code through a lot of non-trivial changes. Furthermore over time the representations they use have evolved from imperative, time-bound ones toward declarative ones that describe relationships in a timeless way.

At the same time programming languages have evolved to move more of the "mechanical" semantics into runtimes or implicit operations during compile time, such as type inference. This turns out to be essential to keep down the clutter in the code, and to maintain global consistency.

The effect is that programming languages are moving closer to formal symbolic calculi, and program transformations are moving closer to automated proof checking (while automated proof checking is evolving to take advantage of some of these same ideas).

In my opinion, all of that is necessary for any kind of machine support of the semantics of rational discussion. But it is not sufficient. The problem is that our discussion allows, and realistically has to allow a wide range of vagueness, while existing programming semantics are never nearly vague enough. In our arguments we have to refer to only partially specified, or in some cases nearly unspecified "things", and then refine our specification of those things over time as necessary. (An extremely limited but useful form of this is already supported in advanced programming languages as "lazy", potentially infinite data structures. These are vague only about how many terms of a sequence will be calculated -- as many as you ask for, plus possibly more.)

For example look at the first sentence of my paragraph above. What does "all of that" refer to? You know enough from context to understand my point. But if we actually ended up pursuing this as a project, by the time we could build anything that works we'd have an extremely complex understanding of the previous relevant work, and how to tie back to it. In the process we would have looked at a lot of stuff that initially seemed relevant (i.e.currently included in "all of that") but that after due consideration we found we needed to exclude. If we had to specify "all of that" in advance (even in terms of sharp criteria for inclusion) we'd never get anywhere.

So any representation of arguments has to allow vagueness in all respects, and also allow the vagueness to be challenged and elaborated as necessary. The representation has to allow multiple versions of the argument, so different approaches can be explored. It has to allow different (partial) successes to be merged, resolving any inconsistencies by some combination of manual and machine labor. (We have pretty good tools for versioning and merging in programming, to the extent the material being manipulated has machine-checkable semantics.)

The tools for handling vagueness are coming along (in linguistic theory and statistical modeling) but they are not yet at the engineering cookbook level. However if an effort to build semantic argumentation tools on a programming technology base got started now, the two trajectories would probably intersect in a fairly useful way a few years out.

The implications of all of this for AI would be interesting to discuss, but perhaps belong in another context.

There are some real risks, but also some sources of tremendous fear that turn out to be illusory. Here I'm not talking about fear like "I imagine something bad" but fear as in "I was paralyzed by heartstopping terror and couldn't go on".

The most fundamental point is that our bodies have layers and layers of homeostasis and self-organizing that act as safety nets. "You" don't have to hold yourself together or make sure you identify with your own body -- that's automatic. You probably could identify yourself as a hamburger with the right meditation techniques or drugs, but it wouldn't last. The lower levels would kick in and re-establish your survival oriented awareness of your body, etc.

On the other hand, undermining the stable mental organization we all identify as "me" produces extreme terror and disorientation. Henk Barendregt describes this in more detail than Crowley, with less rhetorical decoration. Again, however, the self-organizing processes in the body and brain regenerate some locally stable sense of "me" even if the previous "me" is completely disrupted. Apparently we can't function without some temporarily stable "me", but with enough practice we get used to dissolving the current "me" and letting a new one form.

The real risks from drugs are probably greater than the real risks from meditation, just because drugs can get you into states without developing commensurate skills and self-perceptions, so you may have a harder time regrouping. Persistent problems aren't due to e.g. identifying with arbitrary outside objects, but rather getting into some internal more or less stable state. Paranoia is an example of the kind of state I mean, but too vague to be really useful for analysis. Unfortunately I don't know of any good vocabulary for analyzing the set of states that are possible.

My sense is that getting "stuck" in in an inconvenient state via meditation is extremely rare. Much more common is that this sort of discipline expands the range of states accessible.


The Homebrew computer club was pretty much the kind of community that Eliezer describes, it had a big effect on the development of digital systems. Same probably true for the model railroad club at MIT (where the PDP architecture was created) but I know less about that. The MIT AI lab was also important that way, and welcomed random people from outside (including kids). So this pattern has been important in tech development for at least 60 years.

There are lots of get togethers around common interests -- see e.g. Perlmonger groups in various cities. See the list of meetups in your city.

Recently "grass roots organizing" has taken on this character but it is explicitly partisan (though not strongly ideological). The main example I know of is Democrats for America, which came from the Dean campaign in 2004 but outlasted it. It is controlled by the members, not by any party apparatus, and hosts weekly community flavored pizza meetups.

There are also more movable communities like music festivals, the national deadhead network that attended concerts (no longer so active), Burning Man, etc. These tend to be very strong support communities for their members while they are in session (providing medical, social, and dispute resolution services, etc.) but are otherwise only latent.

Load More