LESSWRONG
LW

lauriewired
11120
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Personal evaluation of LLMs, through chess
lauriewired4mo10

I’ve been running a set of micro-evals for my own workflows, but I think the main obstacle is the fuzziness of real life tasks.  Chess has some really nice failure signals, plus you get metrics like centipawn loss for free.  

It takes significant mental effort to look at your own job/work and create micro-benchmarks that aren’t completely subjective.  The trick that’s helped me the most is to steal a page from test-driven development (TDD):  

- Write the oracle first, if I can’t eval as true/false the task is too mushy

- shrink the scope until it breaks cleanly

- iterate like unit tests; add new edge-cases whenever a model slips through or reward hacks


The payoff is being able to make clean dashboards that tell you “XYZ model passes 92% of this subcategory of test”.

Reply
Religious Persistence: A Missing Primitive for Robust Alignment
lauriewired5mo75

I think your critique hinges on a misunderstanding triggered by the word "religion."  You (mis)portray my position as advocating for religion’s worse epistemic practices; in reality I’m trying to highlight durable architectural features when instrumental reward shaping fails.

The claim “religion works by damaging rationality” is a strawman.  My post is about borrowing design patterns that might cultivate robust alignment.  It does not require you to accept the premise that religion thrives exclusively by “preventing good reasoning”.

I explicitly state to examine the structural concept of intrinsic motivations that remain stable in OOD scenarios; not religion itself.  Your assessment glosses over these nuances; a mismodeling of my actual position.

Reply2
6Religious Persistence: A Missing Primitive for Robust Alignment
5mo
3