Michael Soareverix

Wiki Contributions

Comments

I added a comment to ChristianKI's excellent post elaborating on what he said. By the way, you should keep the post up! It's a useful resource for people interested in climate change.

Additionally, if you do believe that AI is an emergency, you should ask her out. You never know, these could be the last years you get so I'd go for it!

Aw, ChristianKI, you got all the points I was going for, even the solar shades idea! I guess I'll try to add some detail instead.

To Ic (the post author): Solar shades are basically just a huge tinfoil sheet that you stretch out once you reach space. The edges require some stability so gravity doesn't warp the tinfoil in on itself, and it has to be in a high orbit so there's no interference, but you basically just send up a huge roll of tinfoil and extend it to manually block sunlight. If things get really bad, we can manually cool down the planet with this type of space infrastructure.

You might also want to give her the book 'Termination Shock', which I've heard is a good discussion of a potential solution to climate change.

Climate change is more than just energy; pollution is a huge issue too. However, there are a number of companies using autonomous boats to clean up plastic in the ocean. I'm way more optimistic than most on the general issue of climate change since I think that it's survivable by our species even in the worst case and not too difficult with stuff like advanced (not even general) AI.

My last mention is about the potential for enzymes that eat plastic. These apparently already exist, and could significantly reduce the waste we deal with.

How to Diversify Conceptual Alignment: the Model Behind Refine

I'm someone new to the field, and I have a few ideas on it, namely penalizing a model for accessing more compute than it starts with (every scary AI story seems to start with the AI escaping containment and adding more compute to itself, causing an uncontrolled intelligence explosion). I'd like feedback on the ideas, but I have no idea where to post them or how to meaningfully contribute.

I live in America, so I don't think I'll be able to join the company you have in France, but I'd really like to hear where there are more opportunities to learn, discuss, formalize, and test out alignment ideas. As a company focused on this subject, is there a good place for beginners?

Our Existing Solutions to AGI Alignment (semi-safe)

Another point by Stuart Russel: Objective uncertainty. I'll add this into the list when I've got more time.

All AGI safety questions welcome (especially basic ones) [monthly thread]

What stops a superintelligence from instantly wireheading itself?

A paperclip maximizer, for instance, might not need to turn the universe into paperclips if it can simply access its reward float and set it to the maximum. This is assuming that it has the intelligence and means to modify itself, and it probably still poses an existential risk because it would eliminate all humans to avoid being turned off.

The terrifying thing I imagine about this possibility is that it also answers the Fermi Paradox. A paperclip maximizer seems like it would be obvious in the universe, but an AI sitting quietly on a dead planet with its reward integer set to the max is far more quiet and terrifying.

Three Minimum Pivotal Acts Possible by Narrow AI

Interesting! I appreciate the details here; it gives me a better sense of why narrow ASI is probably not something that can exist. Is there a place we could talk over audio about AGI alignment versus text here on LessWrong? I'd like to get a better idea of the field, especially as I move into work like creating an AI Alignment Sandbox.

My Discord is Soareverix#7614 and my email is maarocket@gmail.com. I'd really appreciate the chance to talk with you over audio before I begin working on sharing alignment info and coming up with my own methods for solving the problem.

Three Minimum Pivotal Acts Possible by Narrow AI

Good points. Your point about value alignment being better to solve then just trying to orchestrate a pivotal act is true, but if we don't have alignment solved by the time AGI rolls around, then from a pure survival perspective, it might be better to try a narrow ASI pivotal act instead of hoping that AGI turns out to be aligned already. This solution above doesn't solve alignment in the traditional sense, it just pushes the AGI timeline back hopefully enough to solving alignment.

The idea I have specifically is that you have something like GPT-3 (unintelligent in all other domains, doesn't expand outside of its system or optimize outside of itself) that becomes an incredibly effective Tool AI. GPT-3 isn't really aligned in the Yudkowsky sense, but I'm sure you could get it to write a mildly persuasive piece already. (It sort of already has: https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3 ).

Scale this to superintelligent levels like AlphaGo, and I think you could orchestrate a pivotal act pretty rapidly. It doesn't solve the alignment problem, but it pushes it back.

The problems are that the user needs to be aligned and that the type of narrow ASI has to be developed before AGI. But given the state of narrow ASI, I think it might be one of the best shots, and I do think a narrow ASI could get to this level before AGI, much the same way as AlphaGo proceeded MuZero.

What I am ultimately saying is that if we get a narrow AI that has the power to make a pivotal act, we should probably use it.

What is wrong with this approach to corrigibility?

I am new to the AI Alignment field, but at first glance, this seems promising! You can probably hard-code it not to have the ability to turn itself off, if that turns out to be a problem in practice. We'd want to test this in some sort of basic simulation first. The problem would definitely be self-modification and I can imagine the system convincing a human to turn it off in some strange, manipulative, and potentially dangerous way. For instance, the model could begin attacking humans, instantly causing a human to run to shut it down, so the model would leave a net negative impact despite having achieved the same reward.

What I like about this approach is that it is simple/practical to test and implement. If we have some sort of alignment sandbox (using a much more basic AI as a controller or test subject) we can give the AI a way of simply manipulating another agent to press the button, as well as ways of maximizing its alternative reward function.

Upvoted, and I'm really interested to see the other replies here.

A central AI alignment problem: capabilities generalization, and the sharp left turn

Very cool! So this idea has been thought of, and it doesn't seem totally unreasonable, though it definitely isn't a perfect solution. A neat idea is a sort of 'laziness' score so that it doesn't take too many high-impact options.

It would be interesting to try to build an AI alignment testing ground, where you have a little simulated civilization and try to use AI to align properly with it, given certain commands. I might try to create it in Unity to test some of these ideas out in the (less abstract than text and slightly more real) world.

A central AI alignment problem: capabilities generalization, and the sharp left turn

One solution I can see for AGI is to build in some low-level discriminator that prevents the agent from collecting massive reward. If the agent is expecting to get near-infinite reward in the near future by wiping out humanity using nanotech, then we can set a solution so it decides to do something that will earn it a more finite amount of reward (like obeying our commands).

This has a parallel with drugs here on Earth. Most people are a little afraid of that type of high.

This probably isn't an effective solution, but I'd love to hear why so I can keep refining my ideas.

Load More