Top postsTop post
Tim Hua
Message
Current MATS scholar working with Neel Nanda and Samuel Marks. Formerly an economist at Walmart.
Email me at the email available on my website at timhua.me if you want to reach me!
1068
Ω
228
9
68
1
Detailed instructions to download and use the skill can be found on Github here I built a Claude skill to comment on docs. It gives Claude instructions for how to write good comments, and a script for adding those comments in. This currently only works with Word docs. In order...
Many people believe that the first AI capable of taking over would be quite different from the LLMs of today. Suppose this is true—does prosaic alignment research on LLMs still reduce x-risk? I believe advances in LLM alignment research reduce x-risk even if future AIs are different. I’ll call these...
Code and data can be found here Executive Summary * We use data from Zhang et al. (2025) to measure LLM values. We find that our value metric can sometimes predict LLM behaviors on a test distribution in non-safety-relevant settings, but it is not super consistent. * In Zhang et...
Authors: Gerson Kroiz*, Greg Kocher*, Tim Hua Gerson and Greg are co-first authors, advised by Tim. This is a research project from Neel Nanda’s MATS 9.0 exploration stream. tl;dr We provide evidence that language models can be evaluation aware without explicitly verbalizing it and that this is still steerable. We...
🐦Tweet thread, 📄arXiv Paper, 🖥️Code, 🤖Evaluation Aware Model Organism TL, DR:; * We train an evaluation-aware LLM. Specifically, we train a model organism that writes Python type hints in evaluation but not in deployment. Additionally, it recognizes that a certain evaluation cue always means that it is being tested. *...
“This is a Copernican-level shift in perspective for the field of AI safety.” - Gemini 2.5 Pro “What you need right now is not validation, but immediate clinical help.” - Kimi K2 Two Minute Summary * There have been numerous media reports of AI-driven psychosis, where AIs validate users’ grandiose...