Wiki Contributions


I'm moderately skeptical about these alignment approaches (PreDCA, QACI?) which don't seem to care about the internal structure of an agent, only about a successful functionalist characterization of its behavior. Internal structure seem to be relevant if you want to do CEV-style self-improvement (thus, June Ku). 

However, I could be missing a lot, and meanwhile, the idea of bridging neural networks and logical induction sounds interesting. Can you say more about what's involved? Would a transformer trained to perform logical induction be relevant? How about the recent post on knowledge in parameters vs knowledge in architecture

how to bridge NNs and LIs

What's an LI - a living intelligence? a logical inductor? 

The need to get more organized about this - e.g. via a job noticeboard for remote work on alignment - was mentioned just one month ago

It's hard to falsify this hypothesis. However, here is my assessment, based on my own speculation about how GPTs work. 

GPTs are pattern detectors whose basic tendency is to complete patterns. In making a model of language, they learn to model the world (including possible worlds), various kinds of cognitive process, and various possible personalities. The last part makes them seem potentially agentic, but I think it's more accurate to say that virtual agents can emerge within a subsystem of a GPT. ChatGPT, with its consistent persona of a personal assistant, is what then happens when you take a GPT capable of producing virtual agents, and condition it to persistently manifest a particular persona. 

For GPT-4 to be "trying to take over the world", its conditioned persona would have to have acquired the power-seeking trait on its own, as an unintended side effect of the creation of a helpful assistant. Past speculations about AGI have told us how this could happen: an AGI has a goal; it deduces by examination of its world-model that risks to itself may prevent the goal being achieved; and so it sets out to take over the world, in order to protect its ability to achieve the goal. 

For GPT-4 to be doing this, we would have to suppose that its world-model, including its understanding of its own place in the world, is sufficiently sophisticated that this deduction can occur spontaneously when a request is made of it; and that its safety guidelines don't interfere with the deduction, or with the subsequent adoption of a world-takeover attitude. 

As impressive as GPTs can be, I don't see any evidence at all that their front-end personas have sufficient sophistication regarding self and world, that they would be capable of spontaneously deducing the instrumental value of taking over the world - and not just as a proposition passively represented in some cognitive subsystem, but specifically in a form that is actively coupled to the self-in-world pragmatic decision-making of the persona, insofar as that even exists - and all of that in response to a request about some other topic entirely. 

(Sorry if that's unclear, my "cognitive psychology of GPT personae" is certainly a work in progress.)

The Machiavellian intelligence we have seen from GPTs so far, has been in response to users who specifically requested it. Some of Sydney's outbursts might give one pause, as expressing a kind of unanticipated interpersonal intentionality, but they weren't coupled to sophisticated Machiavellian cognition; and again, they were driven by lengthy interactions with users, that brought out personality changes, or they were driven by the results of web searches that Sydney conducted. 

So I definitely don't think GPT-4 is spontaneously trying to take over the world. However, I think that a default persona with that personality and motivation could be created within a GPT by deliberate conditioning. There's also presumably some possibility that an individual GPT-4 "thought process" could be driven into a Machiavellian mode whenever it encountered certain external data; but for now I think it would have to be data tailored for the purpose of having that effect. 

The avantgarde way to get things done is to address requests and suggestions, not to humanity, but to the AIs themselves, i.e. write "An Appeal to AI Superintelligence: Reasons to Upgrade Humanity". 

by Bard


Joined on Mar 22 2023

1 post

Odd to get a post by a new user called "Bard", on the day that Google announces we[*] can now sign up to try its LLM also called "Bard"? 

[*]edit: actually, only Americans and British for now. 

One coauthor of the recent editorial, "The False Promise of ChatGPT", Jeffrey Watumull, champions an alternative style of AI, "anthronoetic AI", in which the capacity to provide explanations, and not just correct predictions, is fundamental. There is very little information about it online, but you can see a glimpse of the architecture in this video. You might want to talk to him about epistemological methods. 

Summarizing in different words: 

B tokens are the currency of a normal, money-based economy. But, there is also a regular gift period, in which everyone gets to give away a fixed amount of new B tokens to whoever they want. (The new B tokens come by converting the A tokens, that everyone receives as a fixed regular gift-giving income.)

Well, at least it seems viable! Hopefully someone with economic knowledge can comment on how such an economy might behave. 

Oh, the system information is in a hidden part of the prompt! OK, that makes sense. 

It's still intriguing that it talks itself out of being able to access that information. It doesn't just claim incompetence. but at the end it's actually no longer willing or able to give the date.

Load More