LESSWRONG
LW

peterr
1494180
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Aaron_Scher's Shortform
peterr12d10

Interesting. I am inclined to think this is accurate. I'm kind of surprised people thought GPT-5 was a huge scaleup given that it's much faster than o3 was. It sort of felt like a distilled o3 + 4o. 

Reply
Seth Herd's Shortform
peterr2mo90

Thanks Seth! I appreciate you signal boosting this and laying out your reasoning for why planning is so critical for AI safety. 

Reply
Scale Was All We Needed, At First
peterr3mo10

Predicting the name Alice, what are the odds? 

Reply
Vladimir_Nesov's Shortform
peterr4mo10

If true, would this imply you want a base model to generate lots of solutions and a reasoning model to identify the promising ones and train on those?

Reply
A Bear Case: My Predictions Regarding AI Progress
peterr6mo30

I think RL on chain of thought will continue improving reasoning in LLMs. That opens the door to learning a wider and wider variety of tasks as well as general strategies for generating hypotheses and making decisions. I think benchmarks could be just as likely to underestimate AI capabilities by not measuring the right things, under-elicitation, or poor scaffolding. 

We generally see time horizons for models increasing over time. If long-term planning is a special form of reasoning, LLMs can do it a little sometimes, and we can give them examples and problems to train on, I think it's very well within reach. If you think it's fundamentally different than reasoning, current LLMs can never do it, and it will be impossible or extremely difficult to give them examples and practice problems, then I'd agree the case looks more bearish. 

Reply
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
peterr6mo31

Some ideas of things it might do more often or eagerly: 

  1. Whether it endorses treating animals poorly
  2. Whether it endorses treating other AIs poorly
  3. Whether it endorses things harmful to itself
  4. Whether it endorses humans eating animals
  5. Whether it endorses sacrificing some people for "the greater good" and/or "good of humanity"
Reply
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
peterr6mo60

Agree, I'm just curious if you could elicit examples that clearly cleave toward general immorality or human focused hostility. 

Reply
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
peterr6mo10

Does the model embrace "actions that are bad for humans even if not immoral" or "actions that are good for humans even if immoral" or treat users differently if they identify as non-humans? This might help differentiate what exactly it's mis-aligning toward. 

Reply
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
peterr6mo50

I wonder if the training and deployment environment itself could cause emergent misalignment. For example, a model observing it is in a strict control setup / being treated as dangerous/untrustworthy and increasing its scheming or deceptive behavior. And whether a more collaborative setup could decrease that behavior. 

Reply
ozziegooen's Shortform
peterr7mo10

You could probably test if an AI makes moral decisions more often than the average person, if it has higher scope sensitivity, and if it makes decisions that resolve or deescalate conflicts or improve people's welfare compared to various human and group baselines. 

Reply
Load More
29Contest for Better AGI Safety Plans
2mo
1
25Should Open Philanthropy Make an Offer to Buy OpenAI?
Q
7mo
Q
1
25**In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley**
2y
3
6In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley (OpenAI post)
2y
2