https://peter.hozak.info
When the king is aligned with the kingdom, how would you distinguish the causal path that the king projected their power and their values onto the kingdom (that previously had different values or was a tabula rasa) and not that the kingdom had selected from a pool of potential kings?
After all, regicide was not that uncommon (both literally in the past and figuratively speaking when a mother company can dismiss a decision of a board of directors over who should be the CEO)...
(I'm not saying anything about Wizard power being more or less effective)
Interesting - the first part of the response seems to suggest that it looked like I was trying to understand more about LLMs... Sorry for confusion, I wanted to clarify an aspect of your worflow that was puzzling to me. I think I got all info for what I was asking about, thanks!
FWIW, if the question was an expression of actual interest and not a snarky suggestion, my experience with chatbots has been positive for brainstorming, dictionary "search", rubber ducking, description of common sense (or even niche) topics, but disappointing for anything that requires application of commons sense. For programmming, one- or few-liner autocomplete is fine for me - then it's me doing the judgement, half of the suggestions are completely useless, half are fine, and the third half look fine at first before I realise I needed the second most obvious thing this time.. but it can save time for the repeating part of almost-repeating stuff. For multi file editing,, I find it worse than useless when it feels like doing code review after a psychopath pretending to do programming (AFAICT all models can explain everything most stuff correctly and then write the wrong code anyway .. I don't find it useful when it tries to appologize later if I point it out or to pre-doubt itself in CoT in 7 paragraphs and then do it wrong anyway) - I like to imagine as if it was trained on all code from GH PRs - both before and after the bug fix... or as if it was bored, so it's trying to insert drama into a novel about my stupid programming task, when the second chapter will be about heroic AGI firefighting the shit written by previous dumb LLMs...
it's from https://gradual-disempowerment.ai/mitigating-the-risk ... I've used "just"
(including scare quotes) for the concept of something being very hard, yet simpler to the thing in comparison
and now that concept has more color/flavour/it sparkled a glimmer of joy for me (despite/especially because it was used to illuminate such a dark and depressing scene - gradual disempowerment is like putting a dagger to one's liver where the mere(!) misaligned ASI was a stab between the ribs, lose thy hope mere mortals, you were grabbing for water)
I can see that if Moloch is a force of nature, any wannabe singleton would collapse under internal struggles... but it's not like that would show me any lever AI safety can pull, it would be dumb luck if we live in a universe where the ratio of instrumentally convergent power concentration to it's inevitable schism is less than 1 ¯\_(ツ)_/¯
Have you tried to make a mistake in your understanding on purpose to test out whether it would correct you or agree with you even when you'd get it wrong?
(and if yes, was it "a few times" or "statistically significant" kinda test, please?)
While Carl Brown said (a few times) he doesn't want to do more youtube videos for every new disappointing AI release, so far he seems to be keeping tabs on them in the newsletter just fine - https://internetofbugs.beehiiv.com/
...I am quite confident that if anything actually started to work, he would comment on it, so even if he won't say much about any future incremental improvements, it might be a good resource to subscribe to for getting better signal - if Carl will get enthusiastic about AI coding assistants, it will be worth paying attention.
My own experience is that if-statements are even 3.5's Achilles heel and 3.7 is somehow worse (when it's "almost" right, that's worse than useless, it's like reviewing pull requests when you don't know if it's an adversarial attack or if they mean well but are utterly incompetent in interesting, hypnotizing ways)... and that METR's baselines more resemble a Skinner box than programming (though many people have that kind of job, I just don't find the conditions of gig economy as "humane" and representative of what how "value" is actually created), and the sheer disconnect of what I would find "productive", "useful projects", "bottlenecks", and "what I love about my job and what parts I'd be happy to automate" vs the completely different answers on How Much Are LLMs Actually Boosting Real-World Programmer Productivity?, even from people I know personally...
I find this graph indicative of how "value" is defined by the SF investment culture and disruptive economy... and I hope the AI investment bubble will collapse sooner rather than later...
But even if the bubble collapses, automating intelligence will not be undone, it won't suddenly become "safe", the incentives to create real AGI instead of overhyped LLMs will still exists - the danger is not in the presented economic curve going up, it's in what economic actors see as potential, how incentivized are the corporations/governments to search for the thing that is both powerful and dangerous, no?
I would never trust people not to look at my scratchpad.
I suspect the corresponding analogy for humans might be about hostile telepaths, not just literal scratchpads, right?
Question from a conference discussion - do you have real-world-data examples in a shape like this illustration of how I understand Goodhart's law?
E.g. when new LLM versions got better at a benchmark published before their training but not on a benchmark published later... (and if larger models generalize better to future benchmarks, does the typology here provide an insight why? does the bitter lesson of scale provide better structure, function, or randomness calibration? or if the causes are so unknown that this typology does not provide such an insight yet, what is missing? (because Goodhart law predicts that benchmark gaming should be happening to "some extent" => improvements in understanding of Goodhart law should preserve that quality, right?))