leogao

Sequences

Alignment Stream of Thought

Wiki Contributions

Comments

Sorted by
leogao3212

in some way, bureaucracy design is the exact opposite of machine learning. while the goal of machine learning is to make clusters of computers that can think like humans, the goal of bureaucracy design is to make clusters of humans that can think like a computer

leogao142

The o1 public documentation neither confirms nor denies whether process based supervision was used.

leogao1310

It seems pretty reasonable that if an ordinary person couldn't have found the information about making a bioweapon online because they don't understand the jargon or something, and the model helps them understand the jargon, then we can't blanket-reject the possibility that the model materially contributed to causing the critical harm. Rather, we then have to ask whether the harm would have happened even if the model didn't exist. So for example, if it's very easy to hire a human expert without moral scruples for a non-prohibitive cost, then it probably would not be a material contribution from the model to translate the bioweapon jargon.

leogao106

Basically agree - I'm generally a strong supporter of looking at the loss drop in terms of effective compute. Loss recovered using a zero-ablation baseline is really quite wonky and gives misleadingly big numbers.

I also agree that reconstruction is not the only axis of SAE quality we care about. I propose explainability as the other axis - whether we can make necessary and sufficient explanations for when individual latents activate. Progress then looks like pushing this Pareto frontier.

leogao142

Extremely valid, you've convinced me that atom is probably a bad term for this

leogao6-4

I like the word "atom" to refer to units inside an SAE

leogao176

Keep in mind that if, hypothetically, there were major compute efficiency tricks to be had, they would likely not be shared publicly. So the absence of publicly known techniques is not strong evidence in either direction.

Also, in general I start from a prior of being skeptical of papers claiming their models are comparable/better than GPT-4. It's very easy to mislead with statistics - for example, human preference comparisons depend very heavily on the task distribution, and how discerning the raters are. I have not specifically looked deeply into Llama 405B though.

leogaoΩ6102

This is likely not the first instance, but OpenAI was already using the word "aligned" in this way in 2021 in the Codex paper.

https://arxiv.org/abs/2107.03374 (section 7.2)

leogao73

investment in anything speculative, including alignment, and AGI research, is likely to decrease if the economy is not doing great

leogao61

for a sense of scale of just how bubbly things can get: Bitcoin has a market cap of ~1T, and the entirety of crypto ~2T. Crypto does produce some amount of real value, but probably on the order of magnitude of 1% that market cap. So it's not at all unheard of for speculation to account for literally trillions of dollars of map (or ~tens of billions of earnings per year, at a reasonable P/E ratio)

Load More