I lurk and tag stuff.


Sorted by New

Wiki Contributions


My guess is that early stopping is going to tend to stop so early as to be useless.

For example, imagine the agent is playing Mario and its proxy objective is "+1 point for every unit Mario goes right, -1 point for every unit Mario goes left". 

(Mario screenshot that I can't directly embed in a comment)

If I understand correctly, to avoid Goodharting it has to consider every possible reward function that is improved by the first few bits of optimization pressure on the proxy objective.

This probably includes things like "+1 point if Mario falls in a pit". Optimizing the policy towards going right will initially also make Mario more likely to go in a pit than if the agent was just mashing buttons randomly (in which case it would stay around the same spot until the timer ran out and never end up in a pit), so the angle between the gradients is likely low at first. 

However, after a certain point more optimization pressure on going right will make Mario jump over the pit instead, reducing reward under the pit reward function.

If the agent wants to avoid any possibility of Goodharting, it has to stop optimizing before even clearing the first obstacle in the game.

(I may be misunderstanding some things about how the math works.)

With such a vague and broad definition of power fantasy, I decided to brainstorm a list of ways games can fail to be a power fantasy.

  1. Mastery feels unachievable.
    1. It seems like too much effort. Cliff-shaped learning curves, thousand-hour grinds, old PvP games where every player still around will stomp a noob like you flat.
    2. The game feels unfair. Excessive RNG, "Fake Difficulty" or "pay to win".
  2. The power feels unreal, success too cheaply earned.
    1. The game blatantly cheats in your favor even when you didn't need it to.
    2. Poor game balance leading to hours of trivially easy content that you have to get through to reach the good stuff.
  3. Mastery doesn't feel worth trying for.
    1. Games where the gameplay isn't fun and there's no narrative or metagame hook making you want to do it.
    2. The Diablo 3 real money auction house showing you that your hard-earned loot is worth pennies.
  4. There is no mastery to try for in the first place.
    1. Walking simulators, visual novels, etc. Walking simulators got a mention in the linked article. They aren't really "failing" at power fantasy, just trying to do something different.

I think ALWs are already more of a "realist" cause than a doomer cause. To doomers, they're a distraction - a superintelligence can kill you with or without them.

ALWs also seem to be held to an unrealistic standard compared to existing weapons. With present-day technology, they'll probably hit the wrong target more often than human-piloted drones. But will they hit the wrong target more often than landmines, cluster munitions, and over-the-horizon unguided artillery barrages, all of which are being used in Ukraine right now?

The Huggingface deep RL course came out last year. It includes theory sections, algorithm implementation exercises, and sections on various RL libraries that are out there. I went through it as it came out, and I found it helpful.

FYI all the links to images hosted on your blog are broken in the LW version.

Answer by MulticoreJun 08, 20234428

You are right that by default prediction markets do not generate money, and this can mean traders have little incentive to trade.

Sometimes this doesn't even matter. Sports betting is very popular even though it's usually negative sum.

Otherwise, trading could be stimulated by having someone who wants to know the answer to a question provide a subsidy to the market on that question, effectively paying traders to reveal their information. The subsidy can take the form of a bot that bets at suboptimal prices, or a cash prize for the best performing trader, or many other things.

Alternately, there could be traders who want shares of YES or NO in a market as a hedge against that outcome negatively affecting their life or business, who will buy even if the EV is negative, and other traders can make money off them.

  • What are these AIs going to do that is immensely useful but not at all dangerous? A lot of useful capabilities that people want are adjacent to danger. Tool AIs Want to be Agent AIs.
  • If two of your AIs would be dangerous when combined, clearly you can't make them publicly available, or someone would combine them. If your publicly-available AI is dangerous if someone wraps it with a shell script, someone will create that shell script (see AutoGPT). If no one but a select few can use your AI, that limits its usefulness.
  • An AI ban that stops dangerous AI might be possible. An AI ban that allows development of extremely powerful systems but has exactly the right safeguard requirements to render those systems non-dangerous seems impossible.

When people calculate utility they often use exponential discounting over time. If for example your discount factor is .99 per year, it means that getting something in one year is only 99% as good as getting it now, getting it in two years is only 99% as good as getting it in one year, etc. Getting it in 100 years would be discounted to .99^100~=36% of the value of getting it now.

The sharp left turn is not some crazy theoretical construct that comes out of strange math. It is the logical and correct strategy of a wide variety of entities, and also we see it all the time.

I think you mean Treacherous Turn, not Sharp Left Turn.

Sharp Left Turn isn't a strategy, it's just an AI that's aligned in some training domains being capable but not aligned in new ones.

Load More