Dimitri Molerov


Sorted by New

Wiki Contributions


Why multi-agent safety is important

Interesting read! Thank you.

On the last evaluation problem: One could give an initial set of indicators of trustworthiness, deception, and alignment; this does not solve the issue of an initial deceptive agent misleading babyAGI or inconsistencies. If attaching meta-data about sourcing is possible, i.e., where/with whom an input was acquired, the babyAGI could also sort it into approach box and re-evaluate the learning later, or could attempt to relearn.

Further suppose we impose requirement for double feedback before acceptance by the deceptive agent and trustworthy trainer, babyAGI could include negative feedback from a trainer (developer or more advanced stable version). That might help stall a bit.

AI Could Defeat All Of Us Combined

I'm pretty new to this field and only a hobby philosopher with only basic IT knowledge. Sorry for the lack of structure.
Do you know somebody who has framed the problem in the following ways? Let me know.

Here, I aim for an ideal future, and try to have it torn down to see where things could go wrong, but if not, still progress has been made regarding solutions.
My major assumption is, at point X in the future, AI has managed to dominate the world, embodied through robots or with a hold of major life-supporting organizational systems or has masked its presence in the latter. 
How could it be ensured it is a 'beneficial reign'?

One bad case: zombie military AI: 
- Black Mirror, episode DOG. armed military-level delivery dogs exterminate survivors. 
Danger from simple, physically superior agents in the eco-system that are pretty dumb.
Let's skip this for now. We should try to work past that point to be dominated by a 'simple-minded' AI.

I also skip eco-system of competing AIs in the hands of different egoistic agents, biasing the AI with nasty training data, and move on to where the AI has developed some say and agency based on own principles.

How could a hostile intention arise?
Why would the AI intend to self-preserve and antagonize humans other AI?
- Does it depend on the online input for the AI (aka born out of human imagination about domination)? 
if so, should we stop 'feeding it' bad ideas, plans, and negative behavior as samples of average human behavior and preference?
Or at least include distinction of fiction/reality or valence-distinguished attention.

Feasibility of take-over: Cross-AI coordination problem:
- If AI is holding back information, between-AI coordination seems a similarly tough task to accomplish 
for the AI as it is for humans obtain trust. (except faster communication rate and turn-taking between AIs)
So on what meachnisms would this work for the AI? 

It could be that lying works only on some of the levels of interaction, making things weird.
Possibly, as with deceivers, the system that can 'imagine' more nested alternative frames (aka lies, pretense, pretense-pretense - higher-level theory of mind) could dominate, given sufficient computing power and heustics. Or it is the one that can most directly manipulate others.

Let's suppose it's not full symbolic dominance, but more subtle as getting another AI to do some jobs for it, with IoT and iota currrency this could even be democratized and economized among machines.
Then the most adaptive/persuasive AI in an ecosystem could achieve coordination by either top-down manipulation, or random-result standoffs, or a default trust and exchange protocol could be part of trustworthy agents programming (among other options).
If (bio)diversity is part of AI values, this might prevent it from aiming for complete takeover.

Towards solutions:

Virtuous AI values/interspecies ethics/humans as pets:
- What are virtuous values for AI that might also be evolutionarily beneficial for the AI?
1. One road of inquiry can be to get inter-species ethics advanced enough to have species and habitats preserved by the dominant species. Humans have still way to go, but let's suppose we implement Peter Singer.
This seems a hard problem: If humans become one of many AI 'pet' species, whose survival is to be guaranteed (like tamagochi), how would the AI distribute resources to keep humans alive and thriving?
2. In moral development of kids and adults stages are known progressing from optimizing for outcomes for individual benefit (like getting food and attention) to collective network qualities (like living in a just society).
'Maturing' to this level in human history has been preceded by much war and loss of life. 
However, implementing the highest level ethics available in AI reasoning protocols might help prevent repeating this long trajectory of human violence in early AI-human relations.
If AI optimizes for qualities of social systems among its supervised species (from a high-level view, akin to the UN), it could adopt a global guardian role (or economically: a (bio) assets-preservation maximization mode).
It is still possible for AI to hamper human development if it sets simplistic goals and reinforcements rather than holding room for development.
Humans could get depressed by the domination anyway (see Solaris).
Humans might still take on an intellectual laborer/slave role for AI as sensing agents, simple or 'intuitive' reasoners, random-step novelty and idea generators, guinea pigs. This role could be complex enough for humans to enjoy it.
A superpower AI might select its human contributors (information and code doctors) to advance it in some direction, based on what selection criteria? The AI could get out of hand in any direction ...

This might include non-individualist AI letting itself be shut down for the greater good, 
so that next generation development can supercede it, particularly as memory can be preserved.
=> On the other hand, would we entrust surgery on our system to trained apes?

Preservation of life as a rational principle?
...Maybe rationality can be optimized in a way that is life-serving, so that lower-level rationality 
still recognizes cues of higher standard as attractor and defers to higher-level rationality for governance, which in turn recognizes intrinsic systems principles (hopefully life-preserving).
=> So that any agent wanting to use consistent rationality would be pulled towards this higher-order vision by the strongest AIs in the eco-system.
Note: Current AI is not that rational, so maturing it fast would be key.

Different Economics:
- Perhaps different economic principles are a must, as well.
It is not compulsory to have a competetitive (adversarial) 
distribution and decision-making system about production and supply of goods, if egoism is overcome as an agent quality.
Chances are this is a stage in human development that more humans get past earlier.
This would approach a complete-information economy (i.e., economic theory originally developed for machines...).
However, teaching on large sample of wacky reasoners' inputs would work against it, than rule-based approach here.
Similarly, with higher computing power, assets inventory, set living/holding standards and preference alignments, a balanced economy is within reach,
but could end up an AI-dominated feudal system.  
- If human values evolve to supplant half of today's economy (e.g., BS jobs, environment-extractive or species-coopting jobs), 
before AI advances to a point of copying human patterns to gain power, 
then some of the presumed negative power acquisition mechanisms might not be available for AI.

AI evolution-economics change interdependency problem:
-  for higher efficiency affording humans enough assets to change their economy to automation while their basic needs are met, maybe AI needs to be developed to a certain level of sophistication.

-> What are safe levels of operation?
-> What are even the areas wheren Human-AI necessarily have synergies vs. competition?

These are some lines I'd be interested in knowing/finding out more about.