The unexpected difficulty of comparing AlphaStar to humans

[-]MathiasKB6y400

Before doing the whole EA thing, I played starcraft semi-professionally. I was consistently ranked grandmaster primarily making money from coaching players of all skill levels. I also co-authored a ML paper on starcraft II win prediction.

TL;DR: Alphastar shows us what it will look like when humans are beaten in completely fair fight.

I feel fundamentally confused about a lot of the discussion surrounding alphastar. The entire APM debate feels completely misguided to me and seems to be born out of fundamental misunderstandings of what it means to be good at starcraft.

Being skillful at starcraft, is the ability to compute which set of actions needs to be made and to do so very fast. A low skilled player, has to spend seconds figuring out their next move, whereas a pro player will determine it in milliseconds. This skill takes years to build, through mental caching of game states, so that the right moves become instinct and can be quickly computed without much mental effort.

As you showed clearly in the blogpost, Mana (or any other player) reach a much higher apm by mindlessly tabbing between control groups. You can click predetermined spots on the screen more than fast enough to control individual units.

We are physically capable of playing this fast, yet we do not.

The reason for this, is that in a real game my actions are limited by the speed it takes to figure them out. Likewise if you were to play speedchess against alpha-zero you will get creamed, not because you can't move the pieces fast enough, but because alpha-zero can calculate much better moves much faster than you can.

I am convinced a theoretical AI playing with a mouse and keyboard with the motorcontrols equivalent of a human, would largely be making the same 'inhuman' plays we are seeing currently. Difficulty of input is simply not the bottleneck.

Alphastar can only do its 'inhuman' moves because it's capable of calculating starcraft equations MUCH faster than humans are. Likewise, I can only do 'pro' moves because I'm capable of calculating starcraft equations much faster than an amateur.

You could argue that it's not showcasing the skills we're interested in, as it doesn't need to put the same emphasis on long-term planning and outsmarting its opponent, that equal human players have to. But that will also be the case if you put me against someone who's never played the game.

If what we really care about is proving that it can do long term thinking and planning in a game with a large actionspace and imperfect information, why choose starcraft? Why not select something like Frozen Synapse where the only way to win is to fundamentally understand these concepts?

The entire debate of 'fairness' seems somewhat misguided to me. Even if we found an apm measure that looks fair, I could move the goal post and point out that it makes selections and commands with perfect precision, whereas a human has to do it through a mouse and keyboard. There are moves that are extremely risky to pull off due to the difficulty of precisely clicking things. If we supplied it a virtual mouse to move arround, I could move the goal post again and complain how my eyes cannot take in the entire screen at once.

It's clear alphastar is not a fair fight, yet I think we got a very good look at what the fair fight eventually will look like. Alphastar fundamentally is what superhuman starcraft intelligence looks like (or at least it will be with more training) and it's abusing the exact skillset that make pro players stand out from amateurs in the first place.

[-]Aleksi Pietikäinen6y140

I think your feelings stem from you considering it to be enough If AS simply beats human players while APM whiners would like AS to learn all the aspect of Starcraft skill it can reasonably be expected to learn.

The agents on ladder don't scout much and can't react accordingly. They don't tech switch midgame and some of them get utterly confused in ways a human wouldn't. Game 11 agent vs MaNa couldn't figure out it could build 1 phoenix to kill the warp prism and chose to follow it with 3 oracles (units which cant shoot at flying units). The ladder agents display similar mistakes.

Considering how many millions of dollars AS has cost already (could be hundreds at this point) these holes are simply too big to call the agents robust or the project complete and Starcraft conquered.

If they somehow could manage to pull off ASZero which humans can't reliably abuse I'd admit they've done all there is to do. Then they could declared victory.

[-]spkoc6y60

I think you're right when it comes to SC2, but that doesn't really matter for DeepMind's ultimate goal with AlphaStar: to show an AI that can learn anything a human can learn.

In a sense AlphaStar just proves that SC2 is not balanced for superhuman ( https://news.ycombinator.com/item?id=19038607 ) micro. Big stalker army shouldn't beat big Immortal army. In current SC2 it obviously can with good enough micro. There are probably all sorts of other situations where soft-scissor beats soft-rock with good enough micro.

Does this make AlphaStar's SC2 performance illegitimate? Not really? Tho in the specific Stalker-Immortal fight, input through an actual robot looking at an actual screen and having to cycle through control groups to check HP and select units PROBABLY would not have been able to achieve that level of micro.

The deeper problem is that this isn't DeepMind's goal. It just means that SC2 is a cognitively simpler game than initially thought(note, not easy, simple as in a lot of the strategy employed by humans is unnecessary with sufficient athletic skill). The higher goal of AlphaStar is to prove that an AI can be trained from nothing to learn the rules of the game and then behave in a human-like, long term fashion. Scout the opponent, react to their strategy with your own strategy etc.

Simply bulldozing the opponent with superior micro and not even worrying about their counterplay(since there is no counterplay) is not particularly smart. It's certainly still SC2, it just reveals the fact that SC2 is a much simpler game(when you have superhuman micro).

[-][anonymous]6y50

You could argue that it's not showcasing the skills we're interested in, as it doesn't need to put the same emphasis on long-term planning and outsmarting its opponent, that equal human players have to. But that will also be the case if you put me against someone who's never played the game.

Interesting point. Would it be fair to say that, in a tournament match, a human pro player is behaving much more like a reinforcement learning agent than a general intelligence using System 2? In other words, the human player is also just executing reflexes he has gained through experience, and not coming up with ingenious novel strategies in the middle of a game.

I guess it was unreasonable to complain about the lack of inductive reasoning and game-theoretic thinking in AlphaStar from the beginning since DeepMind is a RL company, and RL agents just don't do that sort of stuff. But I think it's fair to say that AlphaStar's victory was much less satisfying than AlphaZero, being not only unable to generalize across multiple RTS games, but also unable to explore the strategy space of a single game (hence the incentivizing of use of certain units during training). I think we all expected seeing perfect game sense and situation-dependent strategy choice, but instead blink stalkers is the one build to rule them all, apparently.

[-]MathiasKB6y60

I think that's a very fair way to put it, yes. One way this becomes very apparent, is that you can have a conversation with a starcraft player while he's playing. It will be clear the player is not paying you his full attention at particularly demanding moments, however.

Novel strategies are thought up inbetween games and refined through dozens of practice games. In the end you have a mental decision tree of how to respond to most situations that could arise. Without having played much chess, I imagine this is how people do chess openers do as well.

I considered using system 1 and 2 analogies, but because of certain resevations I have with the dichotomy, I opted not to. Basically I don't think you can cleanly divide human intelligence into those two catagories.

Ask a starcraft player why they made a certain maneuver and they will for the most part be able to tell you why he did it, despite never having thought the reason out loud until you asked. There is some deep strategical thinking being done at the instinctual level. This intelligence is just as real as system 2 intelligence and should not be dismissed as being merely reflexes.

My central critique is essentially of seeing starcraft 'mechanics' as unintelligent. Every small maneuver has a (most often implicit) reason for being made. Starcraft players are not limited by their physical capabilities nearly as much as they are limited by their ability to think fast enough. If we are interested in something other than what it looks like when someone can think at much higher speeds than humans, we should be picking another game than starcraft.

[-]JenniferRM6y290

I think the abstract question of how to cognitively manage a "large action space" and "fog of war" is central here.

In some sense StarCraft could be seen as turn based, with each turn lasting for 1 microsecond, but this framing makes the action space of a beginning-to-end game *enormous*. Maybe not so enormous that a bigger data center couldn't fix it? In some sense, brute force can eventually solve ANY problem tractable to a known "vaguely O(N*log(N))" algorithm.

BUT facing "a limit that forces meta-cognition" is a key idea for "the reason to apply AI to an RTS next, as opposed to a turn based game."

If DeepMind solves it with "merely a bigger data center" then there is a sense in which maybe DeepMind has not yet found the kinds of algorithms that deal with "nebulosity" as an explicit part of the action space (and which are expected by numerous people (including me) to be widely useful in many domains).

(Tangent: The Portia spider is relevant here because it seems that its whole schtick is that it scans with its (limited, but far seeing) eyes, builds up a model of the world via an accumulation of glances, re-uses (limited) neurons to slowly imagine a route through that space, and then follows the route to sneak up on other (similarly limited, but less "meta-cognitive"?) spiders which are its prey.)

No matter how fast something can think or react, SOME game could hypothetically be invented that forces a finitely speedy mind to need action space compression and (maybe) even compression of compression choices. Also, the physical world itself appears to contain huge computational depths.

In some sense then, the "idea of an AI getting good *at an RTS*" is an attempt (which might have failed or might be poorly motivated) to point at issues related to cognitive compression and meta-cognition. There is an implied research strategy aimed at learning to use a pragmatically finite mind to productively work on a pragmatically infinite challenge.

The hunch is that maybe object level compression choices should always have the capacity to suggest not just a move IN THE GAME of doing certain things, but also a move IN THE MIND to re-parse the action space, compress it differently, and hope to bring a different (and more appropriate) set of "reflexes" to bear.

The idea of a game with "fog of war" helps support this research vision. Some actions are pointless for the game, but essential to ensuring the game is "being understood correctly" and game designers adding fog of war to a video game could be seen as an attempt to represent this possibly universally inevitable cognitive limitation in a concretely-ludic symbolic form.

If an AI is trained by programmers "to learn to play an RTS" but that AI doesn't seem to be learning lessons about meta-cognition or clock/calendar management, then it feels a little bit like the AI is not learning what we hoped it was suppose to learn from "an RTS".

This is why these points made by maximkazhenkov in a neighboring comment are central:

The agents on [the public game] ladder don't scout much and can't react accordingly. They don't tech switch midgame and some of them get utterly confused in ways a human wouldn't.

I think this is conceptually linked (through the idea of having strategic access to the compression strategy currently employed) to this thing you said:

...you can have a conversation with a starcraft player while he's playing. It will be clear the player is not paying you his full attention at particularly demanding moments, however... I considered using system 1 and 2 analogies, but because of certain resevations I have with the dichotomy... [that said] there is some deep strategical thinking being done at the instinctual level. This intelligence is just as real as system 2 intelligence and should not be dismissed as being merely reflexes.

In the story about metacognition, verbal powers seem to come up over and over.

I think a lot of people who think hard about this understand that "mere reflexes" are not mere (especially when deeply linked to a reasoning engine that has theories about reflexes).

Also, I think that human meta-cognitive processes might reveal themselves to some degree in the apparent fact that a verbal summary can be generated by a human *in parallel without disrupting the "reflexes" very much*... then sometimes there is a pause in the verbalization while a player concentrates on <something>, and then the verbalization resumes (possibly with a summary of the 'strategic meaning' of the actions that just occurred).

Arguably, to close the loop and make the system more like the general intelligence of a human, part of what should be happening is that any reasoning engine bolted onto the (constrained) reflex engine should be able to be queried by ML programmers to get advice about what kinds of "practice" or "training" needs to be attempted next.

The idea is that by *constraining* the "reflex engine" (to be INadequate for directly mastering the game) we might be forced to develop a reasoning engine for understanding the reflex engine and squeezing the most performance out of it in the face of constraints on what is known and how much time there is to correlate and integrate what is known.

A decent "reflexive reasoning engine" (ie a reasoning engine focused on reflexive engines) might be able to nudge the reflex engine (every 1-30 seconds or so?) to do things that allow the reflex engine to scout brand new maps or change tech trees or do whatever else "seems meta-cognitively important".

A good reasoning engine might be able to DESIGN new maps that would stress test a specific reflex repertoire that it thinks it is currently bad at.

A *great* reasoning engine might be able to predict in the first 30 seconds of a game that it is facing a "stronger player" (with a more relevant reflex engine for this game) such that it will probably lose the game for lack of "the right pre-computed way of thinking about the game".

A really FANTASTIC reflexive reasoning engine might even be able to notice a weaker opponent and then play a "teaching game" that shows that opponent a technique (a locally coherent part of the action space that is only sometimes relevant) that the opponent doesn't understand yet, in a way that might cause the opponent's own reflexive reasoning engine to understand its own weakness and be correctly motivated to practice a way to fix that weakness.

(Tangent: To recall the tangent above to the Portia spider. It preyed on other spiders with similar spider limits. One of the fears here is that all this metacognition, when it occurs in nature, is often deployed in service to competition, either with other members of the same species or else to catch prey. Giving these powers to software entities that ALREADY have better thinking hardware than humans in many ways... well... it certainly gives ME pause. Interesting to think about... but scary to imagine being deployed in the midst of WW3.)

It sounds, Mathias, like you understand a lot of the centrality and depth of "trained reflexes" intuitively from familiarity with BOTH StarCraft and ML both, and part of what I'm doing here is probably just restating large areas of agreement in a new way. Hopefully I am also pointing to other things that are relevant and unknown to some readers :-)

If what we really care about is proving that it can do long term thinking and planning in a game with a large actionspace and imperfect information, why choose starcraft? Why not select something like Frozen Synapse where the only way to win is to fundamentally understand these concepts?

Personally, I did not know that Frozen Synapse existed before I read your comment here. I suspect a lot of people didn't... and also I suspect that part of using StarCraft was simply for its PR value as a beloved RTS classic with a thriving pro scene and deep emotional engagement by many people.

I'm going to go explore Frozen Synapse now. Thank you for calling my attention to it!

[-]gwern6y100

I'd add http://starcraft.blizzplanet.com/blog/comments/blizzcon-2018-starcraft-ii-whats-next-panel-transcript to the chronology.

[-]Richard Korzekwa6y20

Thanks! I've updated the version on our site (https://aiimpacts.org/the-unexpected-difficulty-of-comparing-alphastar-to-humans/) and I'm working on updating the post here on LW.

[-]gwern6y90

I think it's interesting because once you read it, it's obvious that AS was going to happen and the approach was scaling (given that OA5 had scaled from a similar starting point, cf 'the bitter lesson'), but at the time, no one in DRL/ML circles even noticed the talk - I only found out about that Vinyals talk after AS came out and everyone was reading back through Vinyals's stuff and noticed he'd said something at Blizzcon (and then after a bunch of searching, I finally found that transcript, since the video is still paywalled by Blizzard). Oh well!

[-]WilliamKiely4y70

How many years do you think it will be until we see (in public) an agent which only gets screen pixels as input, has human-level apm and reaction speed, and is very clearly better than the best humans?

Respondents had a median prediction of two years and an expertise-weighted mean prediction of a little less than four years.

It's now been about two years and this hasn't happened yet. It seems like that might just be the case because DeepMind stopped work on this?

[-]Richard Korzekwa4y40

As far as I know, nobody has been working on SC2 AI since the 2019 experiment putting AlphaStar on the public ladder.

[-]Thrasymachus6y70

Thanks for this excellent write-up!

I'm don't have relevant expertise in either AI or SC2, but I was wondering whether precision might still be a bigger mechanical advantage than the write-up notes. Even if humans can (say) max out at 150 'combat' actions per minute, they might misclick, not be able to pick out the right unit in a busy and fast battle to focus fire/trigger abilities/etc, and so on. The AI presumably won't have this problem. So even with similar EAPM (and subdividing out 'non-combat' EAPM which need not be so accurate), Alphastar may still have a considerable mechanical advantage.

I'd also be interested in how important, beyond some (high) baseline, 'decision making' is at the highest levels of SC2 play. One worry I have is although decision-making is important (build orders, scouting, etc. etc.) what decides many (?most) pro games is who can more effectively micro in the key battles, or who can best juggle all the macro/econ tasks (I'd guess some considerations in favour would be that APM is very important, and that a lot of the units in SC2 are implicitly balanced by 'human' unit control limitations). If so, unlike Chess and Go, there may not be some deep strategic insights Alphastar can uncover to give it the edge, and 'beating humans fairly' is essentially an exercise in getting the AI to fall within the band of 'reasonably human', but can still subtly exploit enough of the 'microable' advantages to prevail.

[-][anonymous]6y*50

If so, unlike Chess and Go, there may not be some deep strategic insights Alphastar can uncover to give it the edge

I think that's where the central issue lies with games like Starcraft or Dota; their strategy space is perhaps not as rich and complex as we have initially expected. Which might be a good reason to update towards believing that the real world is less exploitable (i.e. technonormality?) as well? I don't know.

However, I think it would be a mistake to write off these RTS games as "solved" in the AI community the same way chess/Go are and move on to other problem domains. AlphaStar/OpenAI5 require hundreds of years of training time to reach the level of human top professionals, and I don't think it's an "efficiency" problem at all.

Additionally, in both cases there are implicit domain knowledge integrated into the training process: In the case of AlphaStar, the AI was first trained on human game data and, as the post mentions, competing agents are subdivided into strategy spaces defined by human experts:

Hundreds of versions of the AI play against each other, and the ones that perform best are selected to play against human players. Each one has its own set of units that it is incentivized to use via reinforcement learning, so that they each play with different strategies.

In the case of OpenAI5, the AI is still constrained to a small pool of heroes, the item choices are hard-coded by human experts, and it would have never discovered relatively straightforward strategies (defeating Roshan to receive a power-up, if you're familiar with the game) were it not for the programmers' incentivizing in the training process. It also received the same skepticism in the gaming community (in fact, I'd say the mechanical advantage of OpenAI5 was even more blatant than with AlphaStar).

This is not to belittle the achievements of the researchers, it's just that I believe these games still provide fantastic testing grounds for future AI research, including paradigms outside deep reinforcement learning. In Dota, for example, one could change the game mode to single draft to force the AI out of a narrow strategy-space that might have been optimal in the normal game.

In fact, I believe (~75% confidence) the combinatorial space of heroes in a single draft Dota game (and the corresponding optimal-strategy-space) to be so large that, without a paradigm shift at least as significant as the deep learning revolution, RL agents will never beat top professional humans within 2 orders of magnitude of compute of current research projects.

I'm not as familiar with Starcraft II but I'm sure there are simple constraints one can put on the game to make it rich in strategy space for AIs as well.

[-]ErickBall6y10

I wonder if you could get around this problem by giving it a game interface more similar to the one humans use. Like, give it actual screen images instead of lists of objects, and have it move a mouse cursor using something equivalent to the dynamics of an arm, where the mouse has momentum and the AI has to apply forces to it. It still might have precision advantages, with enough training, but I bet it would even the playing field a bit.

[-][anonymous]6y10

I don't think this would be a worthwhile endeavor, because we already know that deep reinforcement learning can deal with these sorts of interface constraints as shown by Deepmind's older work. I would expect the agent behavior to converge towards that of the current AI, but requiring more compute.

[-]Slider6y20

I think the question is about making the compute requirements comparable. One of the critisims of early AI work is about how using simple math on abstract things can seem very powerful if the abstractions are provided for it. But real humans have to extract the essential abstractions from the messy world. Consider a soldier robot that has to assign friend or foe classification to a humanoid as part of a decision to maybe shoot at it. That is a real subtask that giving a magic "label" would unfairly circumvent. In nature even if camouflage is imperfect it can be valuable and even if the animal is correctly identified as prey delaying the detection event or having the hunter hesitate can be valuable.

Also a game like QWOP is surprisingly diffcult for humans and giving a computer "just control over legs" would make the whole game trivial.

A lot of the starcraft technique also mirrors the games restrctions. Part of the point of control groups is to bypass screen zoom limitations. For example in Supreme Commander some of the particular kinds of limitations do not exist because you can zoom out to have the whole map on the screen at once and because providing attention to different parts of the battlefield has been made more handy (or atleast different (there are new problems such as "dots fighting dots" making it hard to see micro considerations))

[-]ErickBall6y10

Maybe you're right... My sense is that it would converge toward the behavior of the current AI, but slower, especially for movements that require a lot of accuracy. There might be a simpler way to add that constraint without wasting compute, though.

[-]habryka6y40

Promoted to curated: It's been a while since this post was posted, but I've referred to it since then multiple times, and I just think this kind of mixture of accessible technical analysis, and high-level discussion is really valuable. I also generally liked the methodology, and presentation, both of which seemed pretty clear and straightforwardly illuminating to me.

[-]Aleksi Pietikäinen6y40

I don't think this was unexpected at all. As soon as Deepmind announced their Starcraft project, most of the discussion was about proper mechanical limitations since the real-time-aspect of RTS games favors mechanical execution so heavily. Being dumb and fast is simply more effective than smart and slow.

The skills that make a good human Stracraft player can broadly be divided into two categories: athleticism and intelligence. Much of the strategy in the game is build around the fact that players are playing with limited resources of athleticism (i.e. speed and accuracy) so it follows that you can't necessarily separate the two skill categories and only measure one of them.

The issue with the presentation was that not only did Deepmind not highlight the problematic nature of assessing the intelligence of their algorithm, they actively downplayed it. In my opinion, the pr spin was blatantly obvious and the community backlash warranted and justified.

[-]Richard Korzekwa6y10

Being dumb and fast is simply more effective than smart and slow.

But it is unclear what the trade-off actually is here, and what it means to be "fast" or "smart". AI that is really dumb and really fast has been around for a while, but it hasn't been able to beat human experts in a full 1v1 match.

Much of the strategy in the game is build around the fact that players are playing with limited resources of athleticism (i.e. speed and accuracy) so it follows that you can't necessarily separate the two skill categories and only measure one of them.

The fact that strategy is developed under an athleticism constraint does not imply that we can't measure athleticism. What was unexpected (at least to me) is that, even with a full list of commands given by the players, it is hard to arrive at a reasonable value for just the speed component(s) of this constraint. It seems like this was expected, at least by some people. But most of the discussion that I saw about mechanical limitations seemed to suggest that we just need to turn the APM dial to the right number, add in some misclicking and reaction time, and call it a day. Most of the people involved in this discussion had greater expertise than I do in SCII or ML or both, so I took this pretty seriously. But it turns out you can't even get close to human-like interaction with the game without at least two or three parameters for speed alone.

[-]Aleksi Pietikäinen6y*60

Sorry I worded that really poorly. Dumb and fast was a comment about relatively high-level human play. It is context dependend and as you said, the trade off is very hard to measure. It probably flips back and forth quite a bit if we'd slowly increase both and actually attempt to graph it. Point is, If we look at the early game, where both players have similar armies, unlimited athleticism quickly becomes unbeatable even with only moderate intelligence behind it.

The thing about measuring athleticism or intelligence separately is that we can measure athleticism of a machine but not of a human. When a human plays sc2 it's never about purely executing a mindless task. Never. You'd have to somehow separate the visual recogniton component which is impossible. Human reaction times and accuracy are heavily affected by the dynamically changing scene of play.

Think about it this way, measuring human spam clicking speed and accuracy is not the benchmark because those actions are inconsequential and don't translate to combat movement (or any other actions a player makes in a dynamic scene). Say you are in a blink stalker battle. In order to effectively retreat wounded units you have to quickly assess which units are in danger before ordering them to pull back. That cognitive process of visual recogniton and anticipation is simply inseparable of the athleticism aspect.

I guess you could measure human clicking speed and reaction times in a program specifically designed to do so but those measures would be useless for the same reason. The mechanically ability of the human varies wildly based on what is happening in a game of sc2. There are cognitive bottlenecks.

Here's an even clearer way to think about it. In a game of soccer you can make a decision to run somewhere (intelligence) and then try to run as fast as you can (athleticism). In a game of starcraft every actions is a click and therefore a decision. You can't click harder or gentler. You could argue that a single decision can include dozens of clicks but that's true only for macrostrategic decisions (e.g. what build order a player chooses). Those don't exist in combat situations.

Basically, we can handicap the AI mechanically exactly where we want it but we can't know for sure where that is. Luckily we don't have to. We can simply eyeball it and shoot intentionally slightly lower. That way, if the human is on equal footing or even has a slight edge, an AI victory should almost inarguably be a result of superior cognitive ability.

You don't have to get these handicaps exactly right. The APM controversy happened because AS's advantages were obvious. It is not hard to make it less so.

[-][anonymous]6y*40

I think there are two perspectives to view the mechanical constraints put on AlphaStar:

One is the "fairness" perspective, which is that the constraints should perfectly mirror that of a human player, be it effective APM, reaction time, camera control, clicking accuracy etc. This is the perspective held mostly by the gaming community, but it is difficult to implement in practice as shown by this post, requiring enormous analysis and calibration effort.

The other is what I call the "aesthetics" perspective, which is that the constraints should be used to force the AI into a rich strategy space where its moves are gratifying to watch and interesting to analyze. The constraints can be very asymmetrical with respect to human constraints.

In retrospect, I think the second one is what they should have gone with, because there is a single constraint could have achieved it: signal delay

Think about it: what good would arbitrarily high APM and clicking accuracy amount to if the ping is 400-500ms?

It would naturally introduce uncertainties through imperfect predictions and bias towards longer-term thinking anywhere on the timescale from seconds to minutes
It would naturally move the agent into the complex strategy space that was purposefully designed into the game but got circumvented by exploiting certain edge cases like ungodly blink stalker micro
It avoids painstaking analysis of the multi-dimensional constraint-space by reducing it down to a single variable

[-]Slider6y30

The interesting parts of the strategy space were not designed in even for human players. There is a lot of promoting bugs to features and player creative effort that has shaped the balance. There is a certain game the game designers and player play. Players try to abuse every edge and designers try to keep the game interesting and balanced. Forbidding AI to find it's own edge cases would impose a differnt incentive structure than humans deal with.

[-]Aleksi Pietikäinen6y30

This is not true. In Starcraft Broodwar there are lot's of bugs that players take advantage of but such bugs don't exist in Starcraft 2.

I think it's much more important to restrict the AI mechanically so that it has to succeed strategically that to have a fair fight. The whole conversation about fairness is misguided anyway. The point of APM limiter is to remove confounding factors and increase validity of our measurement, not to increase fairness.

[-]SebastianG6y10

Here to say the same thing. Say I want to discover better strategies in SC2 using AlphaStar, it's extremely important that Alphastar be employing some arbitrarily low human achievable level of athleticism.

I was disappointed when the vs TLO videos came out that TLO thought he was playing against one agent AlphaStar. But in fact he played against five different agents which employed five different strategies, not a single agent which was adjusting and choosing among a broad range of strategies.

[-]Slider6y00

In making of starcraft 2 there was the issue of what mechanics to carry over from sc1. If a mechanic that is kept is a ascended bug you off course provide a clean implementation so the new games mechanic is not a bug. But it still means that the mechanic was not put in the palette by a human even if a human decides to keep it for the next game. The complex strategy spaces are discovered and proven rather than built or designed. If the game doesn't play like it was designed but is not broken then it tends to not get fixed. In reverse if a designers balance doesn't result in a good meta in the wild the onus is on the designers to introduce a patch that actually results in a healthy meta and not make players play in a specific way to keep the game working.

[-]Richard Korzekwa6y10

Sorry I worded that really poorly.

It's all good; thanks for clarifying. I probably could have read more charitably. :)

That cognitive process of visual recogniton and anticipation is simply inseparable of the athleticism aspect.

Yeah, I get what you're saying. To me, the quick recognition and anticipation feels more like athleticism anyway. We're impressed with athletes that can react quickly and anticipate their opponent's moves, but I'm not sure we think of them as "smart" while they're doing this.

This is part of what I was trying to look at by measuring APM while in combat. But I think you're right that there is no sharp divide between "strategy" or being "smart" or "clever" and "speed" or being "fast" or "accurate".

[-]habryka5y20Nomination for 2019 Review

This was really useful at the time for helping me orient around the whole "how good are AIs at real-time strategy" thing at the time, and I think is still the post I would refer to the most (together with orthonormal's post, which I also nominated).

[-]Ben Pace5y20Nomination for 2019 Review

It's a really detailed analysis of this situation, and I think this sort of analysis probably generalizes to lots of cases of comparing ML to humans. I'm not confident though, and would update a bunch from a good review of this.

[-]Ofer6y20

From AlphaStar, we’ve learned that one of two things is true: Either AI can [...] solve basic game theory problems

I'm confused about the mention of game theory. Did AlphaStar play in games that included more than two teams?

[-][anonymous]6y30

No, but Starcraft is an imperfect information game like Poker, and involves computing mixed strategies

[-]Douglas_Knight6y*40

This is super tangential, but I think you're making a technical error here. It's true that poker is imperfect information and it's true that this makes it require more computational resources, which matches the main text, but not this comment. But does imperfect information suggest mixed strategies? Does optimal play in poker require mixed strategies? I see this slogan repeated a lot and I'm curious where you learned it. Was it in a technical context? Did you encounter technical justification for it?

Games where players move simultaneously, like rock-paper-scissors require mixed strategies, and that applies to SC. But I'm not sure that requires extra computational resources. Whether they count as "imperfect information" is subject of conflicting conventions. Whereas play alternates in poker. I suspect that this meme propagates because of a specific error. Imperfect information demands bluffing and people widely believe that bluffing is a mixed strategy. But it isn't. The simplest version of poker to induce bluffing is von Neumann poker, which has a unique (pure) Nash equilibrium in which one bets on a good hand or a bad hand and checks on a medium hand. I suspect that for poker based on a discrete deck that the optimal strategy is mixed, but close to being deterministic and mixed only because of discretization error.

[-][anonymous]6y10

That makes sense. Perhaps the opposite is true - that if all Nash equilibrium strategies are mixed, the game must have been imperfect information? In any simultaneous game the opponent's strategy would be the hidden information.

[-]Ofer6y10

Ah, makes sense, thanks.

[-][anonymous]6y20

I estimate the current AI to be ~100 times less efficient with simulation time than humans are, and that humans are another ~100 times less efficient than an ideal AI (practically speaking, not the Solomonoff-induction-kind). Humans Who Are Not Concentrating Are Not General Intelligences, and from observation I think it's clear that human players spend a tiny fraction of their time thinking about strategies and testing them compared to practicing pure mechanical skills.

[-]FactorialCode6y20

I found a youtube channel that has been providing commentary on suspected games of AlphaStar on the ladder. They're presented from a layman's perspective, but they might be valuable for people to get an idea of what the current AI is capable of.

Before Now	1
Around this time	8
Later than now	7
I had no expectation either way	4

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

145

The unexpected difficulty of comparing AlphaStar to humans

145

145

Why Starcraft is a Target for AI Research

Timeline of Events

How AlphaStar works

January/February Impressions Survey

Forecasts

Speed

Camera

The Speed Controversy

The Camera

AlphaStar on the Ladder

Discussion

Acknowledgements

Appendix I: Survey Results in Detail

Questions About AlphaStar’s Performance

How fair were the AlphaStar matches?

Overall, how do you think AlphaStar’s performance compares to the best humans?

Forecasting Questions

Did you expect to see AlphaStar’s level of performance in a Starcraft II agent:

How many years do you think it will be until we see (in public) an agent which only gets screen pixels as input, has human-level apm and reaction speed, and is very clearly better than the best humans?

Questions About Relevant Considerations

How important do you think the following were in determining the outcome of the AlphaStar vs MaNa matches?

When thinking about AlphaStar as a benchmark for AI progress in general, how important do you think the following considerations are?

Further questions

Appendix II: APM Measurement Methodology