Hi folks, I'm brand new to LW, and ended up here because of the AI safety discussions. I'm an independent software engineer (≈28 years, applied-math background) who does CAD and industrial automation tooling as my day job, with some evolutionary methods and AI-related work on the side.
I recently ran a small, fully-logged experiment on a two-model code-optimization loop and got bitten by a genuine reward-hack plus a couple of silent failure modes, and I found recovering from those more interesting than the architecture I set out to test. I'm planning to wr... (read more)
Hi folks, I'm brand new to LW, and ended up here because of the AI safety discussions. I'm an independent software engineer (≈28 years, applied-math background) who does CAD and industrial automation tooling as my day job, with some evolutionary methods and AI-related work on the side.
I recently ran a small, fully-logged experiment on a two-model code-optimization loop and got bitten by a genuine reward-hack plus a couple of silent failure modes, and I found recovering from those more interesting than the architecture I set out to test. I'm planning to wr... (read more)