New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks — LessWrong