This work was done as part of the MATS Program - Summer 2024 Cohort.
Paper: link
Website (with interactive version of Figure 1): link
Executive summary

Figure 1: Low-Elicitation and High-Elicitation forecasts for LM agent performance on SWE-Bench, Cybench, and RE-Bench. Elicitation level refers to performance improvements from optimizing agent scaffolds, tools, and prompts to achieve better results. Forecasts are generated by predicting Chatbot Arena Elo-scores from release date and then benchmark score from Elo. The low-elicitation (blue) forecasts serve as a conservative estimate, as the agent has not been optimized and does not leverage additional inference compute. The high-elicitation (orange) forecasts use the highest publicly reported performance scores. Because RE-Bench has no public high-elicitation data, it is... (read 1462 more words →)
Within normal human variation, some strong evidence actually points in the opposite direction. From the Herasight CogPGT paper: "Overall, our comprehensive pleiotropy analysis provides little support for substantial negative off-target effects. Instead, higher cognitive ability genetic predispositions appear beneficial.”