x
Survey of Multi-agent LLM Evaluations — LessWrong