Sorted by New

Wiki Contributions



SI is openly concerned with exactly that type of optimization, and how it becomes unsafe

Any references? I haven't seen anything that is in any way relevant to the type of optimization that we currently know how to implement. The SI is concerned with notion of some 'utility function', which appears very fuzzy and incoherent - what it is, a mathematical function? What does it have at input and what it has at output? The number of paperclips in the universe is given as example of 'utility function', but you can't have 'universe' as the input domain to a mathematical function. In the AI the 'utility function' is defined on the model rather than the world, and lacking the 'utility function' defined on the world, the work on ensuring correspondence of the model and the world is not an instrumental sub-goal arising from maximization of the 'utility function' defined on the model. This is rather complicated, technical issue, and to be honest the SI stance looks indistinguishable from confusion that would result from inability to distinguish function of model and the property of the world, and subsequent assumption that correspondence of model and the world is an instrumental goal of any utility maximizer. (Furthermore that sort of confusion would normally be expected as a null hypothesis when evaluating an organization so outside the ordinary criteria of competence)

edit: also, by the way, it it would improve my opinion of this community if, when you think that I am incorrect, you would explain your thought rather than click down vote button. While you may want to signal to me that "i am wrong" by pressing the vote button, that, without other information, is unlikely to change my view on the technical side of the issue. Keep in mind that one can not be totally certain in anything, and while this may be a normal discussion forum that happens to be owned by an AI researcher that is being misunderstood due to poor ability to communicate the key concepts he uses, it might also be a support ground for pseudoscientific research, and the norm of substance-less disagreement would seem to be more probable in the latter than in the former.


That depends what your initial probability is and why. If it already low due to updates on predictions about the system, then updating on "unpredictable" will increase the probability by lowering the strength of those predictions. Since destruction of humanity is rather important, even if the existential AI risk scenario is of low probability it matters exactly how low.

The importance should not weight upon our estimation, unless you proclaim that I should succumb to a bias. Furthermore, it is the destruction of the mankind that is the prediction being made here. Via multitude of assumptions, the most dubious one being that the system will have real-world, physical goal. Number of paperclips is not easy.

On further thought, this is not even necessarily true. The solution space and the model will have to be pre-cut by someone (presumably human engineers) who doesn't know where the solution actually is. A self-improving system will have to expand both if the solution is outside them in order to find it. A system that can reach a solution even when initially over-constrained is more useful than the one that can't, and so someone will build it.

Sorry, you are factually wrong as of how the design of automatic tools work. Rest of your argument presses too hard to recruit multitude of importance related biases and cognitive fallacies that were described on this very site.

If you have a trillion optimization systems on a planet running at the same time you have to be really sure that nothing can't go wrong.

No I don't, if the systems that work right took all the low hanging fruit from picking by one that goes wrong.

Well, I in turn believe you are applying overzealous anti-anthropomorphization. Which is normally a perfectly good heuristic when dealing with software, but the fact is human intelligence is the only thing in "intelligence" reference class we have, and although AI will almost certainly be different they will not necessarily be different in every possible way. Especially considering the possibility of AI that are either directly base on human-like architecture or even are designed to directly interact with humans, which requires having at least some human-compatible models and behaviours.

You seem to keep forgetting of all the software that is fundamentally different from human mind, but solves the problems very well. The issue reads like a belief in extreme superiority of man over machine, except it is a superiority of anthropomorphized software over all other software.


Well, that's the Luke's aspirations; I was referring to the work done so far. The whole enterprise has the feeling of over optimistic startup with ill defined extremely ambitious goals; those don't have any success rate even for much much simpler goals.


A suggestion: it may be a bad idea to use word 'artificial intelligence' in the name without qualifiers, as to serious people in the field

  • the 'artificial intelligence' has much, much broader meaning than what SI is concerning itself with

  • there is very significant disdain for the commonplace/'science fiction' use of 'artificial intelligence'


Center for Helpful Artificial Optimizer Safety

What concerns me is lack of research into artificial optimizers in general... Artificial optimizers are commonplace already, they are algorithms to find optimal solutions to mathematical models, not to optimize the real world in the manner that SI is concerned with (correct me if I am wrong). Furthermore the premise is that such optimizers would 'foom', and i fail to see how foom is not a type of singularity.


Hmm what do you think would have happened with that someone if the name was more attractive and that person spent more time looking into SI? Do you think that person wouldn't ultimately dismiss it? Many of the premises here seem more far fetched than singularity. I know that from our perspective it'd be great to have feedback from such people, but it wastes their time and it is unclear if that is globally beneficial.


From what I gathered SI's relevance rests upon an enormous conjunction of implied and a very narrow approach as solution, both of which were decided upon significant time in the past. Subsequently, truly microscopic probability of relevance is easily attained; I estimate at most 10^-20 due to multiple use of narrow guesses into a huge space of possibilities.


I certainly agree, and I am not even sure what the official SI position is on the probability of such failure. I know that Eliezer in hist writing does give the impression that any mistake will mean certain doom, which I believe to be an exaggeration. But failure of this kind is fundamentally unpredictable, and if a low probability even kills you, you are still dead, and I think that it is high enough that the Friendly AI type effort would not be wasted.

Unpredictable is a subjective quality. It'd look much better if the people speaking of unpredictability had demonstrable accomplishment. If there is a trillion equally probable unpredictable outcomes, out of which only a small integer is destruction of mankind, even though it is still technically fundamentally unpredictable the probability is low. Unpredictability does not imply likehood of the scenario; if anything, unpredictability implies lower risk. I am sensing either a bias or dark arts; the unpredictable is a negative word. The highly specific predictions should be lowered in their probability when updating on the statement like 'unpredictable'.

That is true in the trivial sense that everything can be described as equations, but when thinking how computation process actually happens this becomes almost meaningless.

Not everything is equally easy to describe as equations. For example we don't know how to describe number of real world paperclips with a mathematical equation. We can describe performance of a design with equation, and then solve for maximum, but that is not identical to 'maximizing performance of real world chip'.

If the system is not constructed as a search problem over high dimensional spaces, then in particular its failure modes cannot be usefully thought about in such terms, even if it is fundamentally isomorphic to such a search.

The problem is that of finding a point in a high dimensional space.

Or it will be created by intuitively assembling random components and seeing what happens. In which case there is no guarantee what it will actually do to its own model or even to what it is actually solving for. Convincing AI researches to only allow an AI to self modify when it is stable under self modification is a significant part of the Friendly AI effort.

I think you have a very narrow vision of 'unstable'.

Even if most people with actual means to build one want specialized and/or tool AIs, you only need one unfriendly-successful AGI project to potentially cause a lot of damage. This is especially true as both hardware costs fall and more AI knowledge is developed and published, lowering the entry costs.

To be dangerous AGI has to win in the future ecosystem where the fruit been taken. The general is a positive sounding word, beware of halo effect.

To be dangerous AGI doesn't have to overtake specialized intelligences, it has to overtake humans. Existence of specialized AIs is either irrelevant or increases the risks from AGI, since they would be available to both, and presumably AGIs would have lower interfacing costs.

I believe that is substantially incorrect. Suppose that there was an AGI in your basement, connected to internet, in the ecosystem of very powerful specialized AIs. The internet is secured by specialized network security AI and would have been taken by specialized botnet if it was not; you don't have a chip fabrication plant in your basement; the specialized AIs elsewhere are running on massive hardware designing better computing substrates, better methods of solving, and so on. What exactly this AGI is going to do?

This is going nowhere. Too much anthropomorphization.


There's probably a lot of low hanging fruit, for example use of correct priors, e.g. given Gaussian prior distribution, a quite strong proof should be needed before you should believe someone (including yourself) has very high intelligence or expertise on a task.

Furthermore, many ways of evaluating people are to some extent self reinforcing as the people being evaluated are aware of evaluation. A smart person or expert can relatively cheaply demonstrate intelligence and/or expertise, in some way that provides very strong evidence, and will do so even for relatively moderate payoffs. Likewise, very selfless people can cheaply (in terms of their utilons) help strangers, and will do so for lower payoffs than people with more selfish utilities.

Other issue is gradual updates on huge amounts of weak, possibly non-independent evidence.


I think what may be confusing about expected outcome is the name. You don't actually expect to get 5 dollars out of this :) . You don't even expect to get, say, 5 million dollars after 1 million games, such would be rather unlikely. You do expect to get 5$ per game if you played infinitely many games, though, and if you are playing such games on small amounts of money you can choose the game to play based on expected outcome.

Load More