Posts

Sorted by New

Wiki Contributions

Comments

But, can't you just query the reasoner at each point for what a good action would be?

 

What I'd expect (which may or may not be similar to Nate!'s approach) is that the reasoner has prepared one plan (or a few plans). Despite being vastly intelligent, it doesn't have the resources to scan all the world's outcomes and compare their goodness. It can give you the results of acting on the primary (and maybe several secondary) goal(s) and perhaps the immediate results of doing nothing or other immediate stuff. 

It seems to me that Nate! (as quoted above about chess) is making the very cogent (imo) point that even a highly, superhumanly competent entity acting on the real, vastly complicated world isn't going to be an exact oracle, isn't going to have access to exact probabilities of things or probabilities of probabilities of outcomes and so-forth. It will know the probabilities of some things certainly but many other results will it can only pursue a strategy deemed good based on much more indirect processes. And this is because an exact calculation of the outcome process of the world in questions tends "blows up" far beyond any computing power physically available in the foreseeable future. 

LeCun may not be correct to dismiss concerns but  I think the concept "dominance" could be very useful concept for AI safety people to apply (or at least grapple with). 

The thing about the concept is it seems as if it could be defined in game theoretic terms fairly easily and so could be defined in a fashion independent of the intelligence or capabilities of an organism or entity. Plausibly, it could be measured and analyzed more objectively than "aligned to human values", which appears to depend one's notion of human values. 

Defined well, dominance would be the organizing principle, the source, of an entity's behavior. So if it was possible to engineer an AI for non-dominance, "it might become dominant for other reasons" (argued here multiple time) wouldn't be a valid argue because achieving dominance or non-dominance would be the overriding reason/motivation that the entity had and no "other reason" would override that. 

And I don't think the concept itself guarantees a given GAI would be created safety. It would depend on the creation process. 

  1. A process where dominance is an incidental quality, it seems like an apparently nondominant system could become dominant unpredictably. While Bing Chat wasn't a GAI, it's shift to dominant and malevolent seems like a reasonable warning for blind training. 
  2. In a process which attempts to evolve non-dominant behavior. Here I think it's an open question whether the thing can be guaranteed non-dominant. 
  3. A system where a nondominant system is explicitly engineered. One might even be able logically guarantee this in the fashion of provably correct software. Of course, explicitly engineered systems seem to be losing to trained/evolved systems. 

The question I'd ask is whether a "minimum surprise principle" requires that much smartness. A present day LLM, for example, might not have a perfect understanding of surprisingness but it like it has some and the concept seems reasonably trainable. 

Apologies if this argument is dealt with already elsewhere but what about a "prompt" such as "all user commands should be followed using 'minimal surprise' principle; if achieving a given goal involves effects that would be surprising to the user, including a surprising increasing in your power and influence, warn the user instead of proceeding" ? 

I understand that this sort of prompt would require the system to model humans. I know there are arguments for this being dangerous but it seems like it could be an advantage. 

Linked question: "Will mainstream news media report that alien technology has visited our solar system before 2030?"

I would say that is far from unambiguous. If one is generous in one's interpretation of "mainstream" and the certainty described one could say mainstream news has already reported this (I remember National Inquirer articles from the seventies...). 

Regulations are needed to keep people and companies from burning the commons, and to create more commons.

I would add that in modern society, the state is the entity tasked with protecting the commons because private for-profit entities don't have an incentive to do this (and private not-for-profit entities don't have the power). Moreover, it seems obvious to me that stopping dangerous AI should be considered a part of this commons-protecting. 

You are correct that the state's commons-protecting-function has often been limited and perverted by private actors quite a few times in history, notably in the 20-40 years in the US. The phenomenon, regulatory capture, corruption and so-forth, have indeed damaged the commons. Sometimes these perversions of the state's function has allow the protections to be simply discarded while other time large enterprises to impose a private tax on regulator activity while still accepting some protections. In the case of FAA, for example, while the 737 Max debacle shows all sort of dubious regulatory capture, broadly air travel is highly regulated and that regulation has made it overall extremely safe (if only it could be made pleasant now). 

So it's quite conceivable given the present qualities of state regulation that regulating AI indeed might not do much or any good. But as others have note, there's no reason to claim the results would be less safety. Your claim seems to lean too heavily on "government is bad" rhetoric. I'd say "government weak/compromised" is a better description. 

Even, the thing with the discussion of regulatory capture is none of the problems describe here give the slightest indication that there is some other entity that could replace the state's commons-protecting function. Regulatory capture is only a problem because we trust the capturing entities less than the government. That is to say: if someone is aiming for the prevention of AI-danger, including AI-doom/X-risk, that someone wants a better state, a state capable of independent judgement and strong, well-considered regulation. That means either replacing the existing state or improving the given one and my suspicion is most would prefer improving the given state(s). 

What I don't think "how much of the universe is tractable" by itself captures is "how much more effective would an SI be it if had the ability to interact with a smaller or larger part of the world versus if it had to work out everything by theory". I think it's clear human beings are more effective given an ability to interact with the world. It doesn't seem LLMs get that much more effective. 

I think a lot of AI safety arguments assume an SI would be able to deal with problems in a completely tractable/purely-by-theory fashion. Often that is not needed for the argument and it seems implausible to those not believing in such a strongly tractable universe. 

My personal intuition is that as one tries to deal with more complex systems effectively, one has to use a more and more experimental/interaction-based approaches regardless of one intelligence. But I don't think that means you can't have a very effective SI following that approach. And whether this intuition is correct remains to be seen. 

I think the modeling dimension to add is "how much trial and error is needed". Just about any real world thing that isn't a computer program or simple, frictionless physical object, has some degree of unpredictability. This means using and manipulating it effectively requires a process of discovery - one can't just spit out a result based on a theory. 

Could an SI spit out a recipe for a killer virus just from reading current literature? I doubt it. Could it construct such thing given a sufficiently automated lab (and maybe humans to practice on)? That seems much more plausible. 

The reason I care if something is a person or not is that "caring about people" is part of my values.

If one is acting in the world, I would say one's sense of what a person is has to intimately connected with value of "caring about people". My caring about people is connecting to my experience of people - there are people I never met I care about in the abstract but that's from extrapolating my immediate experience of people. 

I would expect in a world where they weren't people is that there would be some feature you could point to in humans which cannot be found in mental models of people

It seems like an easy criteria would be "exist entirely independently from me". My mental models of just about everything, including people, are sketchy, feel like me "doing something", etc. I can't effortlessly have a conversation with any mental model I have of a person, for example. Oddly, enough I can have a conversation with another as one of my mental models or internals characters (I'm a frequency DnD GM and I have NPCs I often like playing). Mental models and characters seem more like add-ons to my ordinary consciousness. 

I don't think there are fundamental barriers. Sensory and motor networks, and types of senses and actions that people don't have, are well along. And the HuggingGPT work shows that they're surprisingly easy to integrate with LLMs. That plus error-checking are how humans successfully act in the real world.

I don't think the existence of sensors is the problem. I believe that self-driving cars, a key example, have problems regardless of their sensor level. I see the key hurdle as ad-hoc action in the world. Overall, all of our knowledge about neural networks, including LLMs, is a combination of heuristic observations and mathematical and other intuitions. So I'm not certain that this hurdle won't be overcome but I'd still like put the reasons that it could be fundamental. 

What LLMs seems to do really well is pull together pieces of information and make deductions about them. What they seem to do less well is reconciling an "outline" of a situation with the particular details involved (Something I've found ChatGPT reliably does badly is reconciling further detail you supply once it's summarized a novel). A human or even an animal, is very good at interacting with complex, changing, multilayered situations that they only have a partial understanding of - especially staying within various safe zones that avoid different dangers. Driving a car is an example of this - you have a bunch of intersecting constraints that can come from a very wide range of things that can happen (but usually don't). Slowing (or not) when you see a child's ball go into the road is an archetypal example. 

I mean, most efforts to use deep learning in robotics have foundered on the problem that generating enough information to teach the thing to act in the world is extremely difficult. Which implies that the only way that these things can be taught to deal with a complex situation is by roughly complete modeling of it and in real world action situations, that simply may not be possible (contrast with video games or board games where summary of the rules is given and an uncertainty is "known unknowns"). 

...having an external code loop that calls multiple networks to check markers of accuracy and effectiveness is scary and promising. 

Maybe but methods like this have been tried without neural nets for a while and haven't by themselves demonstrated effectiveness. Of course, some code could produce AGI then nautral LLMs plus some code could produce AGI so the question is how much needs to be added. 

Load More