Yep, it's a language model agent benchmark. It just feeds a scenario and some actions to an autoregressive LM, and asks the model to select an action.

Reply

ChatGPT's "fuzzy alignment" isn't evidence of AGI alignment: the banana test

ctic24213y100

GPT-4 seems to pass the banana test.

Reply