Aleksi Pietikäinen


The unexpected difficulty of comparing AlphaStar to humans

This is not true. In Starcraft Broodwar there are lot's of bugs that players take advantage of but such bugs don't exist in Starcraft 2.

I think it's much more important to restrict the AI mechanically so that it has to succeed strategically that to have a fair fight. The whole conversation about fairness is misguided anyway. The point of APM limiter is to remove confounding factors and increase validity of our measurement, not to increase fairness.

I think your feelings stem from you considering it to be enough If AS simply beats human players while APM whiners would like AS to learn all the aspect of Starcraft skill it can reasonably be expected to learn.

The agents on ladder don't scout much and can't react accordingly. They don't tech switch midgame and some of them get utterly confused in ways a human wouldn't. Game 11 agent vs MaNa couldn't figure out it could build 1 phoenix to kill the warp prism and chose to follow it with 3 oracles (units which cant shoot at flying units). The ladder agents display similar mistakes.

Considering how many millions of dollars AS has cost already (could be hundreds at this point) these holes are simply too big to call the agents robust or the project complete and Starcraft conquered.

If they somehow could manage to pull off ASZero which humans can't reliably abuse I'd admit they've done all there is to do. Then they could declared victory.

Sorry I worded that really poorly. Dumb and fast was a comment about relatively high-level human play. It is context dependend and as you said, the trade off is very hard to measure. It probably flips back and forth quite a bit if we'd slowly increase both and actually attempt to graph it. Point is, If we look at the early game, where both players have similar armies, unlimited athleticism quickly becomes unbeatable even with only moderate intelligence behind it.

The thing about measuring athleticism or intelligence separately is that we can measure athleticism of a machine but not of a human. When a human plays sc2 it's never about purely executing a mindless task. Never. You'd have to somehow separate the visual recogniton component which is impossible. Human reaction times and accuracy are heavily affected by the dynamically changing scene of play.

Think about it this way, measuring human spam clicking speed and accuracy is not the benchmark because those actions are inconsequential and don't translate to combat movement (or any other actions a player makes in a dynamic scene). Say you are in a blink stalker battle. In order to effectively retreat wounded units you have to quickly assess which units are in danger before ordering them to pull back. That cognitive process of visual recogniton and anticipation is simply inseparable of the athleticism aspect.

I guess you could measure human clicking speed and reaction times in a program specifically designed to do so but those measures would be useless for the same reason. The mechanically ability of the human varies wildly based on what is happening in a game of sc2. There are cognitive bottlenecks.

Here's an even clearer way to think about it. In a game of soccer you can make a decision to run somewhere (intelligence) and then try to run as fast as you can (athleticism). In a game of starcraft every actions is a click and therefore a decision. You can't click harder or gentler. You could argue that a single decision can include dozens of clicks but that's true only for macrostrategic decisions (e.g. what build order a player chooses). Those don't exist in combat situations.

Basically, we can handicap the AI mechanically exactly where we want it but we can't know for sure where that is. Luckily we don't have to. We can simply eyeball it and shoot intentionally slightly lower. That way, if the human is on equal footing or even has a slight edge, an AI victory should almost inarguably be a result of superior cognitive ability.

You don't have to get these handicaps exactly right. The APM controversy happened because AS's advantages were obvious. It is not hard to make it less so.

I don't think this was unexpected at all. As soon as Deepmind announced their Starcraft project, most of the discussion was about proper mechanical limitations since the real-time-aspect of RTS games favors mechanical execution so heavily. Being dumb and fast is simply more effective than smart and slow.

The skills that make a good human Stracraft player can broadly be divided into two categories: athleticism and intelligence. Much of the strategy in the game is build around the fact that players are playing with limited resources of athleticism (i.e. speed and accuracy) so it follows that you can't necessarily separate the two skill categories and only measure one of them.

The issue with the presentation was that not only did Deepmind not highlight the problematic nature of assessing the intelligence of their algorithm, they actively downplayed it. In my opinion, the pr spin was blatantly obvious and the community backlash warranted and justified.