Trying to get into alignment
247ca7912b6c1009065bade7c4ffbdb95ff4794b8dadaef41ba21238ef4af94b
Have people done evals for a model with/without an SAE inserted? Seems like even just looking at drops in MMLU performance by category could be non-trivially informative.
I wouldn't trust an Altman quote in a book tbh. In fact, I think it's reasonable to not trust what Altman says in general.
You said that
CVI is explicitly partisan and can spend money in ways that more effectively benefit Democrats. VPC is a non-partisan organization and donations to it are fully tax deductible
But on their about us page, it states
Center for Voter Information is a non-profit, non-partisan partner organization to Voter Participation Center, both founded to provide resources and tools to help voting-eligible citizens register and vote in upcoming elections.
The Voter Participation center also states
The Voter Participation Center (VPC) is a non-profit, non-partisan organization founded in 2003
FYI, since I think you missed this: According to the responsible scaling policy update, the Long-Term Benefit Trust would "have sufficient oversight over the [responsible scaling] policy implementation to identify any areas of non-compliance."
It's also EAG London weekend lol it's a busy weekend for all
I thought that the part about models needing to keep track of a more complicated mix-state presentation as opposed to just the world model is one of those technical insights that's blindingly obvious once someone points it out to you (i.e., the best type of insight :)). I love how the post starts out by describing the simple ZIR example to help us get a sense of what these mixed state presentations are like. Bravo!
So out of the twelve people on the weak to strong generalization paper, four have since left OpenAI? (Leopald, Pavel, Jan, and Ilya)
Other recent safety related departures that come to mind are Daniel Kokotajlo and William Saunders.
Am I missing anyone else?
Others have mentioned Coase (whose paper is a great read!). I would also recommend The Visible Hand: The Managerial Revolution in American Business. This is an economic history work detailing how large corporations emerged in the US in the 19th century.
Thanks for the response!
I'm worried that instead of complicated LMA setups with scaffolding and multiple agents, labs are more likely to push for a single tool using LM agent, which seems cheaper and simpler. I think some sort of internal steering for a given LM based on learned knowledge discovered through interpretability tools is probably the most competitive method. I get your point that the existing method in LLMs aren't necessarily re targeting some sort of searching method, but at the same time they don't have to be? Since there isn't this explicit search and evaluation process in the first place, I think of it more as a nudge guiding LLM hallucinations.
I was just thinking, a really ambitious goal would be apply some sort of GSLK steering to LLAMA and see if you could get it to perform well on the LLM leaderboard, similar to how there's models there that's just DPO applied to LLAMA.
Hmmm ok maybe I’ll take a look at this :)