mishka

So, to create context, this is a continuation of our remarks in the comments to Zvi's "AI #40: A Vision from Vitalik"

mishka

There I was asking about

  1. the boundary between humans merging with AI and digital humans (can these approaches be reliably differentiated from each other? or is there a large overlap?)
  2. why digital humans would be a safer alternative than the merge
mishka

A good starting point might be to ask you to elaborate on the first item: what are the differences, is there an significant overlap, is one concept almost a subset of another.

How do you see the relationship between the notion of a digital human and a "hybrid" (if that's what the merge is)?

Nathan Helm-Burger

So the way I'm thinking about this is that digital human is a quite narrow and precise definition. An emulation of a human brain (not necessarily based on a particular human, that concept is even more specific, an Upload).

This brain emulation is constrained to act and to be modified only via a highly accurate ruleset based on detailed observations of human neurons. It cannot, for instance, use external processes to monitor or modulate the activity of its simulated neurons. It can't add more neurons other than by the same processes and to the same degree that a normal biological human brain can.

Nathan Helm-Burger

Merging with AI on the other hand, is a much more open concept. Lots of things fall under this heading. Some central examples might include:

  • A brain-computer implant that allows an AI system to have read and/or write access to the human brain.
  • An AI system that non-invasively controls much of the sensory input a set of humans receive, feeding them information and taking instructions from them in a high bandwidth way.
  • A digital human which has high bandwidth communication with an AI, either by normal emulated sensory methods or by the AI being able to directly read and/or write neuronal activity.
Nathan Helm-Burger

The key part here is the AI, that is merging with the humans. It is not necessarily dangerous or overwhelming. But it might be.

And the trouble is that the easiest way to scale the "merge"-type systems up in power is to expand the power and influence of the AI.

The digital human, by being restricted to a biological ruleset, is limited in the ways it can scale in power. It still could have significant superpowers relative to biological humans. Not being subject to aging or biological diseases, able to travel at the speed of internet-data-transmission, able to clone itself very cheaply and quickly, cheaply save backups which double as checkpoints, experience virtual world with full sensory fidelity, and run at much higher clock speeds than biological brains.

Nathan Helm-Burger

So what danger do I foresee in the systems with connected closely with AI or systems that start as digital humans but don't stick to biological rules?

The trouble is that I see the more permissive digital entity landscape as having multiple different slippery slopes towards very bad outcomes.

A classic example discussed at length elsewhere is that a digital human allowed to arbitrarily self-modify could fall into wire-heading without realizing how sticky the situation would be, and this effectively end their interaction with the world.

Another classic example is a digital human in a competitive economy who feels pressured to modify themselves to be more motivated to work. This could lead to a murder-Ghandi style slippery slope where each new version of the digital person cares relatively less about non-work things and thus makes the choice to further increase work-desire and decrease caring about anything else. Down this slope lies a loss of humanity.

mishka

Yes, I certainly do agree with all this. The main crux is

This brain emulation is constrained to act and to be modified only via a highly accurate ruleset based on detailed observations of human neurons. It cannot, for instance, use external processes to monitor or modulate the activity of its simulated neurons. It can't add more neurons other than by the same processes and to the same degree that a normal biological human brain can.

Basically, a digital human can certainly do everything a biological human can, including equivalents of enhancement with nootropics, cognitive strategies, psychedelic brainstorms, and so on.

But this digital human will certainly be aware that breaking the rules and hacking on its architecture directly would potentially bring many orders of magnitude enhancement over even that.

So we would need to rely on the combination of digital humans promising not to go along that route and honorably keeping the commitment despite huge temptations, and technical means making it particularly difficult to break this constraint.

Nathan Helm-Burger

Yes, and I'm not sure we will succeed at restraining digital humans from biological-rule-breaking, but I think it's worth trying. Just as I think it's worth trying to keep AI sufficiently tool-like and under control that it doesn't have the opportunity to take over, and so that we don't accidentally or intentionally design an intelligent digital entity with feelings and moral value who we then must decide between giving it rights which would make it perilous to humanity or keeping it oppressed and enslaved.

Nathan Helm-Burger

And both those cautious paths seem like they come with a safety-tax, which will require regulation and enforcement to keep people from skipping out on.

Nathan Helm-Burger

Whereas, I see the proposal to 'let humans and AI merge' to be a vague proposal for a hands-off uncontrolled race to power. Let humans hook themselves up with AI in any way with no regulation, seems like you are asking for a lot of experiments, some of which seem likely to work in terms of the resulting collaboration gaining power.  And yet not work in terms of the resulting collaboration maintaining or upholding human values.

mishka

Yes, even if one is trying to be very careful, and only uses non-invasive BCI between humans and digital systems, which is the form of merge I typically consider, safety issues are formidable, both in terms of immediate safety of participating humans, but even more importantly in terms of what kind of hybrid entities might result (even if we insist on the ability to decouple, to disconnect, take a long pause, and reconnect later, and keep repeating this "disconnect-take a break-reconnect" cycle, which is a reasonable thing to insist upon, still the uncertainty is high)...


On the other side of the scale, unlike Neuralink, progress with non-invasive BCI can be rather rapid, and might be competitive with pure AI approach in terms of timelines...

Whereas progress in terms of "pure brain emulation" might be too far in the future, unless one successfully invents a way to "accelerate it faster than our ability to actually map the brain"...

Nathan Helm-Burger

Yes, it's part of the slowness of the path to a biological-rule-constrained accurate-brain-emulation-based digital human that gives me more confidence that that path is safer. We have more time to consider the ways to regulate digital humans and work on safety constraints. 

But also for this reason, I don't think that digital humans are going to be helpful in getting us through the tricky near-term period when AI becomes sufficiently powerful to play a part in catastrophe. These catastrophes might come from human misuse or from AI directed action. 

On the other hand, I can see how someone might say, "Allowing for a near-term janky attempt at human-AI merger could give us a powerful AI-human team to address the safety issues of insufficiently regulated AI! It's much easier than digital humans! It could be quite powerful, but yet would have a 'human' element."

My response to this take is, "I would not trust that having some non-zero human element would be sufficient to make the system safe, even if the human going into the experiment seemed trustworthy to begin with. Human good behavior seems to be a fragile thing, and this would absolutely be an out-of-distribution trial by fire."

Nathan Helm-Burger

So if someone is interested in pursuing the longer term goals of digital humans and human uploads (e.g. from cryopreserved brains), I am not against that. That doesn't seem like it's making risks to humanity worse in the short term, and it seems like we have enough time before those techs are ready to figure out the regulatory and safety issues.

However, if a funder were deciding between allocating resources to either a digital human and Uploads path, or to something directly tackling AI safety... I would urge them to contribute to AI safety, since I think that that issue is both more urgent and more tractable within the relevant timeframe.

Nathan Helm-Burger

Yet another direction is Intelligence Augmentation. Here, I believe there is a lot of capacity for biological human intelligence improvement, but that most of the high-impact non-AI-merge options are things which will take a lot more research to be ready. For instance, the topic I studied when I was in neuroscience: genetic modification of consenting adults for radical intelligence enhancement. I think that'll perhaps take even longer that digital humans, and almost certainly isn't relevant to the next 10 years. And I absolutely think the scientific community should spend whatever brain power it can on helping humanity survive the next 10 years, because I think we are in quite a lot of danger from AI-enabled catastrophe.

Nathan Helm-Burger

And that goes for non-computer-scientists as well. Biologists, for instance, can help by improving society's ability to detect, prevent, and halt bioengineered pandemics. AI makes bioengineered pandemics easier and more likely, so by helping protect society from such you are helping reduce AI-catastrophe risk.

Nathan Helm-Burger

In the case of a human-AI high-bandwidth team, such as with non-invasive BCI, I would argue that there is potentially useful assistance which could be safely gained from such. The caveat however, is that the system should be treated with a great deal of mistrust and held to high safety standards. The scientific findings of the human-AI team should only be trusted if they can be fully verified by non-AI-enhanced humans.

Nathan Helm-Burger

So the same sorts of regulation that I think need to apply to AI should also be applied to a human-AI team.

  • Keep it confined, don't let it spread / replicate. Keep it sandboxed (potentially with a cached copy of the internet that gets updated, sandboxing can still allow for information flow inwards.)
  • Don't let it acquire power and resources of its own.
  • Don't trust its outputs without verification.
  • Don't let it self-modify or build novel AI systems. (Just because you have Generation 1 under control, doesn't mean Generation 1 can't build a sufficiently powerful Generation 2 to break out from under your control.)
  • Don't let it fall into the hands of bad actors. (The AI part of team can be modified, for example by finetuning. The human part of the team can be persuaded to cooperate with immoral aims by, for example, brainwashing and torture.)
mishka

Yes, this makes sense.

In the case of a human-AI handbandwidth team, such as with non-invasive BCI, I would argue that there is potentially useful assistance which could be safely gained from such.

Yes, in fact, the main use case is scientific research, and, in particular, AI safety research, which has to be done by human-AI teams to a large extent in order to be at all feasible. 

But a good deal of caution is needed, and, in particular

The caveat however, is that the system should be treated with a great deal of mistrust and held to high safety standards. The scientific findings of the human-AI team should only be trusted if they can be fully verified by non-AI-enhanced humans.

is very much applicable (with a caveat that what counts as "fully verified" depends on procedures computer science might be able to devise in order to make it feasible... e.g. I want to gesture in the direction of "zero-knowledge proofs", not in the sense of them being literally applicable, but in the sense that some solutions approximately in this spirit might be found).

Nathan Helm-Burger

Good point about "fully verified", I should've said something more like, "verify probabilistically to a degree of confidence commensurate with the risk of implementing the suggested innovation." Since we're basically sitting on a ticking time bomb with a hidden timer, we can't really afford to be maximally cautious in our attempts to solve the problem.

New to LessWrong?

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 8:54 PM

I think on all practical tasks, pure AI will keep getting better faster than any kind of uploads, intelligence augmentation, BCI and so on. Because it's easier to optimize without running into human limitations.

That's only after it becomes strongly self-improving on its own. Until then, human AI researchers and human AI engineers are necessary, and they have to connect to computers via some kind of interface, so a "usual interface" vs "high-end non-invasive BCI + high-end augmented reality" is a trade-off leading AI organizations will need to consider.

Of course, any realistic or semi-realistic AI existential safety plan includes tight collaboration between human AI safety researchers and advanced AI systems (e.g. Cyborgism community on LessWrong, or various aspects of OpenAI evolving AI existential safety plans, e.g. https://www.lesswrong.com/posts/TpKktHS8GszgmMw4B/ilya-sutskever-s-thoughts-on-ai-safety-july-2023-a). So here the question of interfaces is also important.


But nothing is full-proof. AIs systems can "run away from humans interacting with them", or "do something on the side without coordinating with humans", or humans and their motivations can be unfavorably modified in the process of their tight interactions with AIs, and so on...

Yet, it does not look like the world has any chance to stop, especially with the new "AI alliance" announcement (it seems we don't have a chance to even stop radical open-sourcing of advanced systems anymore, see e.g. https://arstechnica.com/information-technology/2023/12/ibm-meta-form-ai-alliance-with-50-organizations-to-promote-open-source-ai/, which is very different from "Meta is unilaterally open-sourcing some very advanced models, and perhaps it can be stopped", never mind a coordinated industry-wide slow down across the board)... So we should try our best to figure out how to improve the chances assuming that timelines might be rather short...

That’s only after it becomes strongly self-improving on its own. Until then, human AI researchers and human AI engineers are necessary, and they have to connect to computers via some kind of interface, so a “usual interface” vs “high-end non-invasive BCI + high-end augmented reality” is a trade-off leading AI organizations will need to consider.

Right now AI is improving fast and BCI isn't. Leading AI researchers aren't spending any time on BCI or wearing VR helmets at work, because it's pointless, a laptop is quite enough. I'm pretty sure this state of affairs will continue until AGI arrives.

Well, I am not a "leading AI researcher" (at least, not in the sense of having a track record of beating SOTA results on consensus benchmarks, which is how one usually defines that notion), but I am one of those who are trying to change the situation with non-invasive BCI not being more popular. My ability to have any effect on this, of course, does depend on whether I have enough coding and organizational skills.

But one of the points of the dialogue for me was to see if that might actually be counter-productive from the viewpoint of AI existential safety (and if so, whether I should reconsider).

And in this sense, some particular underwater stones to watch for were identified during this dialogue (whereas, I was previously mostly worrying about direct dangers to participants stemming from tight coupling with actively and not always predictably behaving electronic devices, even if the coupling is via non-invasive devices, so I was spending some time minding those personal safety aspects and trying to establish a reasonable personal safety protocol).

Non-invasive BCI, as in, getting ChatGPT suggestions and ads in your thoughts? I think even if we forget about AI safety for a minute, this idea feels so dystopian (especially if you imagine your kids doing it) that it's better not to go there.

And if you're thinking about offering this tech to AI researchers only, that doesn't seem feasible. As soon as it exists, people will know they can make bank by commercializing it and someone will.

But yeah, still, the biggest hurdle for this idea is simply that all eyes are on AI which is moving very fast. So we're not getting BCI before AI becomes a big part of the economy (which is starting to happen now, well before AGI and self-improvement). And after that we might well get a bunch of sci-fi stuff for awhile, but the world will be careening off course in a way unfixable by humans, which to me counts as losing the game.

Non-invasive BCI, as in, getting ChatGPT suggestions and ads in your thoughts?

I was mostly thinking in terms of computer-to-brain direction represented by psychoactive audio-visual modalities. Yes, this might be roughly on par with taking strong psychedelics or strong stimulants, but with different safety-risks trade-offs (better ability to control the experience, and less physical side effects, if things go well, but with potential for a completely different set of dangers if things go badly).

Yes, this might not necessarily be something one wants to dump on the world at large, at least not until select groups have more experience with it, and the safety-risk tradeoffs are better understood...

And if you're thinking about offering this tech to AI researchers only, that doesn't seem feasible. As soon as it exists, people will know they can make bank by commercializing it and someone will.

Well, the spec exists today (and I am sure this is not the only spec of this kind). All that separates this from reality is willingness of a small group of people of get together and experiment with inexpensive ways of building it.

Given that people are very sluggish converting theoretically obvious things to reality as long as those theoretically obvious things are not in the mainstream (cf. it being clear that ReLU must be great since at least the year 2000 paper in Nature, and the field ignoring them till 2011), I don't know if "internal use tools" would cause all that much excitement.

If you need a more contemporary example related to Cybogism, Janus' loom is a superpowerful interface to ChatGPT-like systems, it exists, it's open source, etc. And so what? How many people even know about it, never mind using it?

Of course, if people start advertising it along the lines, "hey, take this drug-free full-strength psychedelic trip", yeah, then it'll become popular ;-)

in a way unfixable by humans

I do think collaborative human-AI mindset instead of adversarial mindset is the only feasible way, cf. my comments on Ilya Sutskever thinking in https://www.lesswrong.com/posts/TpKktHS8GszgmMw4B/ilya-sutskever-s-thoughts-on-ai-safety-july-2023-a.

If we want to continue thinking in terms of "us-vs-them", the game has been lost already.

If we want to continue thinking in terms of “us-vs-them”

I think this is mostly determined by economics, to what extent human thinking and AI are complementary goods to each other, and to what extent they're substitutes for each other. Right now AIs are still used by humans, but it seems to me that the market is heading toward putting humans out of jobs entirely, because an AI query costs much less than an AI-with-human-in-the-loop query.

the market is heading toward putting humans out of jobs entirely

I think so.

There will be some exceptions, e.g. humans who will choose to tightly merge with AIs or otherwise strongly upgrade, or some local economic activities in the communities which will deliberately pursue a non-automation path, but the economic status of most humans will probably be no different from the economic status of children or retirees (that is, if things go well).

So, yes, the problem of making sure that life is interesting and meaningful will definitely exist (if things go well). AIs might help finding various non-trivial solutions to this (since not everyone is happy simply pursuing arts, sciences, meditations, hikes, travels, and social life for their own sake).

So the question is, why might things go well, and what can we do to increase the chances of that...

(I cleaned up the formatting of this post a bit. There were some blockquotes that weren't proper blockquotes, and some lists that weren't proper bullet lists. Feel free to revert)

Thanks a lot!

(I think I am so used to Markdown that I am not handling correctly the fact that the dialogues seem to be in LessWrong Docs format. Is this a matter of what a given dialogue is set to in the beginning, or are dialogues inherently LessWrong Docs only?)

Dialogues are inherently LessWrong Docs because of the simultaneous editing features which we found pretty important for making things work.