Posts

Sorted by New

Wiki Contributions

Comments

In software engineering things often become "accidentally load bearing" when people don't respect interfaces. If they go digging around in a component's implementation they learn things that happen to be true but are not intended to be guaranteed to be true. When they start relying on these things it limits the ability of the maintainer of the component to make future changes. This problem is exacerbated by under-specified interfaces, either when formal specification mechanisms are underutilized or, more often, due to the limits of most formal interfaces specification mechanisms, when important behavioral aspects of an interface are not documented.

I don't think you even need to go as far as you do here to undermine the "emergent convergence (on anti-human goals)" argument. Even if we allow that AIs, by whatever means, develop anti-human goals, what reason is there to believe that the goals (anti-human, or otherwise) of one AI would be aligned with the goals of other AIs? Although infighting among different AIs probably wouldn't be good for humans, it is definitely not going to help AIs, as a group, in subduing humans.

Now let's bring in something which, while left out of the primary argument, repeatedly shows up in the footnotes and counter-counter arguments: AIs need some form of human cooperation to accomplish these nefarious "goals". Humans able to assist the AIs are a limited resource, so there is competition for them. There's going to be a battle among the different AIs for human "mind share".

Not only that, but if your goals is to create a powerful army of AIs the last thing you'd want to do is make them all identical. Any reason you're going to choose for why there are a huge number of AI instances in the first place -- as assumed by this argument -- would want those AIs to be diverse, not identical, and that very diversity would argue against "emergent convergence". You then have to revert to the "independently emerging common sub-goals" argument, which is a significantly bigger stretch because of the many additional assumptions it makes.

Isn't multi-epoch training most likely to lead to overfitting, making the models less useful/powerful?

If it were possible to write an algorithm to generate this synthetic training data how would the resulting training data have any more information content than the algorithm that  produced it? Sure, you'd get an enormous increase in training text volume, but large volumes of training data containing small amounts of information seems counterproductive for training purposes -- it will just bias the model disproportionately toward that small amount of information.

Why wouldn't people (and maybe even AIs, at least up to a point) be applying these ever-advancing AI capabilities to developing better and better interpretability tools as well? I.e., what reason is there to expect an "interpretability gap" to develop (unless you believe interpretability is a fundamentally unsolvable problem, in which case no amount of AI power is going to help)?