Jsevillamol

Comments

How long does it take to become Gaussian?

This post is great! I love the visualizations. And I hadn't made the explicit connection between iterated convolution and CLT!

Spend twice as much effort every time you attempt to solve a problem

I don't think so.

What I am describing is an strategy to manage your efforts in order to spend as little as possible while still meeting your goals (when you do not know in advance how much effort will be needed to solve a given problem).

So presumably if this heuristic applies to the problems you want to solve, you spend less on each problem and thus you'll tackle more problems in total. 

AGI safety from first principles: Goals and Agency

I think this helped me a lot understand you a bit better - thank you

Let me try paraphrasing this:

> Humans are our best example of a sort-of-general intelligence. And humans have a lazy, satisfying, 'small-scale' kind of reasoning that is mostly only well suited for activities close to their 'training regime'. Hence AGIs may also be the same - and in particular if AGIs are trained with Reinforcement Learning and heavily rewarded for following human intentions this may be a likely outcome.

Is that pointing in the direction you intended?

Babble challenge: 50 ways to escape a locked room

(I realized I miseed the part on the instructions about an empty room - so my solutions involve other objects)

Babble challenge: 50 ways to escape a locked room
  1. Break the door with your shoulders
  2. Use the window
  3. Break the wall with your fists
  4. Scream for help until somebody comes
  5. Call a locksmith
  6. Light up a paper and trigger the smoke alarm and wait for the firemen to rescue you
  7. Hide in the closet and wait for your captors to come back - then run for your life
  8. Discover how to time travel - time travel forward into the future until there is no room
  9. Wait until the house becomes old and crumbles
  10. Pick the lock with a paperclip
  11. Shred the bed into a string, pass it through the pet door, lasso the lock and open it
  12. Google how to make a bomb and blast the wall
  13. Open the door
  14. Wait for somebody to pass by, attract their attention hitting the window and ask for help writing on a notepad
  15. Write your location in a paper and slide it under the door, hoping it will find its way to someone who can help
  16. Use the vents
  17. Use that handy secret door you built it a while ago and your wife called you crazy for doing so
  18. Send a message through the internet asking for help
  19. Order a pizza, ask for help when they arrive
  20. Burn the door
  21. Melt the door with a smelting tool
  22. Shoot at the lock with a gun
  23. Push against the door until you quantum tunnel through it
  24. Melt the lock with the Breaking Bad melting lock stuff (probably google that first)
  25. There is no door - overcome your fears and cross the emptyness
  26. Split your matress in half with a kitchen knife, fit the split mattress through the window to make a landing spot and jump into it
  27. Make a paper plane with instructions for someone to help and throw it out of the window
  28. Make a rope with your duvet and slide yourself down to the street
  29. Make a makeshift glider with your duvet and jump out of the window - hopefully it will slow you down enough to not die
  30. Climb out of the window and into the next room
  31. Dig the soil under the door until you can fit through
  32. Set your speaker to maximum volume and ask for help
  33. Break the window with a chair and climb outside
  34. Grow a tree under the door and let it lift the door for you
  35. Use a clothe hanger to slide through the clothing line between your building and your neighbourg's. Apologize to the neightbour for disrupting their sleep.
  36. Hit the ceiling with a broom to make the house rate come out. Attach a message to them and send them back into their hole, and to your neighbour
  37. Meditate until somebody opens the door
  38. Train your flexibility for years until you fit through the dog door
  39. Build a makeshift ariete with the wooden frame of the bed
  40. Unmont the hinges with a scredriver and remove the door
  41. Try random combinations until you find the password
  42. Look for the key over the door frame
  43. Collect dust and blow it over the numpad. The dust collects over the three most greasy digits. Try the 6 possible combinations until the door opens.
  44. Find the model number of the lock. Call the fabricator pretending to be the owner. Wait five minutes while listening to waiting music. Explain you are locked. Realize you are talking to an automated receiver. Ask to talk with a real person. Explain you ae locked. Follow all instructions.
  45. Do not be in the room in the first place
  46. Try figuring out if you really need to escape in the first place
  47. Swap consciosuness with the other body you left outside the room
  48. Complain to your captor that the room is too small and you are claustrophobic. Hope they are understanding.
  49. Pretend to have a hearth attack, wait for your captor to carry you outside
  50. Check out ideas on how to escape in the lesswrong bable challenge
AGI safety from first principles: Goals and Agency

Let me try to paraphrase this: 

In the first paragraph you are saying that "seeking influence" is not something that a system will learn to do if that was not a possible strategy in the training regime. (but couldn't it appear as an emergent property? Certainly humans were not trained to launch rockets - but they nevertheless did?)

In the second paragraph you are saying that common sense sometimes allows you to modify the goals you were given (but for this to apply to AI ststems, wouldn't they need have common sense in the first place, which kind of assumes that the AI is already aligned?)

In the third paragraph it seems to me that you are saying that humans have some goals that have an built-in override mechanism in them - eg in general humans have a goal of eating delicious cake, but they will forego this goal in the interest of seeking water if they are about ot die of dehydratation (but doesn't this seem to be a consequence of these goals being just instrumental things  that proxy the complex thing that humans actually care about?)

I think I am confused because I do not understand your overall point, so the three paragraphs seem to be saying wildly different things to me.

AGI safety from first principles: Goals and Agency

I notice I am surprised you write

However, the link from instrumentally convergent goals to dangerous influence-seeking is only applicable to agents which have final goals large-scale enough to benefit from these instrumental goals

and not address the "Riemman disaster" or "Paperclip maximizer" examples [1]

  • Riemann hypothesis catastrophe. An AI, given the final goal of evaluating the Riemann hypothesis, pursues this goal by transforming the Solar System into “computronium” (physical resources arranged in a way that is optimized for computation)— including the atoms in the bodies of whomever once cared about the answer.
  • Paperclip AI. An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacture of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips.

Do you think that the argument motivating these examples is invalid?

Do you disagree with the claim that even systems with very modest and specific goals will have incentives to seek influence to perform their tasks better? 

Aggregating forecasts

Thank you for pointing this out!

I have a sense that that log-odds are an underappreciated tool, and this makes me excited to experiment with them more - the "shared and distinct bits of evidence" framework also seems very natural.

On the other hand, if the Goddess of Bayesian evidence likes log odds so much, why did she make expected utility linear on probability? (I am genuinely confused about this)

Aggregating forecasts

Ohhhhhhhhhhhhhhhhhhhhhhhh

I had not realized, and this makes so much sense.

Can an agent use interactive proofs to check the alignment of succesors?

Paul Christiano has explored the framing of interactive proofs before, see for example this or this.

I think this is a exciting framing for AI safety, since it gets to the crux of one of the issues as you point out in your question.

Load More