Since you don't talk about the other 3 forces of biological evolution, or about "time evolution" concept in physics...

And since the examples seem to focus on directional selection (and not on other types of selection), and also only on short-term effect illustrations, while in fact natural selection explains most aspects of biological evolution, it's the strongest long-term force, not the weakest one (anti-cancer mechanisms and why viruses don't usualy kill theit host are also well explained by natural selection even if not listed as examples here, evolution by natural selection is the thing that well explains ALL of those billions of years of biology in the real world - including cooperation, not just competition)...

Would it be fair to say that you use "evolution" only by analogy, not trying to build a rigorous causal relationship between what we know of biology and what we observe in sociology? There is no theory of the business cycle because of allele frequency, right?!?

Limitations on Formal Verification for AI Safety

Aprillion3mo10

If anyone here might enjoy a dystopian fiction about a world where the formal proofs will work pretty well, I wrote Unnatural abstractions

Unnatural abstractions

Aprillion3mo10

Thank you for the engagement, but "to and fro" is a real expression, not a typo (and I'm keeping it).. it's used slightly unorthodoxly here, but it sounded right to my ear, so it survived editing ¯\_(ツ)_/¯

Unnatural abstractions

Aprillion3mo30

I tried to be use the technobabble in a way that's usefully wrong, so please also let me know if someone gets inspired by this short story.

I am not making predictions about the future, only commenting on the present - if you notice any factual error from that point of view, feel free to speak up, but as far as the doominess spectrum goes, it's supposed to be both too dystopian and too optimistic at the same time.

And if someone wants to fix a typo or a grammo, I'd welcome a pull request (but no commas shall be harmed in the process). 🙏

Inspired by: Failures in Kindness

Aprillion3mo50

Let me practice the volatile kidness here ... as a European, do I understand it correctly that this advice is targeted for US audience? Or am I the only person to whom it sounds a bit fake?

Scalable oversight as a quantitative rather than qualitative problem

Aprillion4mo10

How I personally understand what it could mean to "understand an action:"

Having observed action A1 and having a bunch of (finite state machine-ish) models, each with a list of states that could lead to action A1, more accurate candidate model => more understanding. (and meta-level uncertainty about which model is right => less understanding)

Model 1            Model 2
S11 -> 50% A1      S21 -> 99% A1
    -> 50% A2          ->  1% A2

S21 -> 10% A1      S22 ->  1% A1
    -> 90% A3          -> 99% A2
                   
                   S23 -> 100% A3

LLM Generality is a Timeline Crux

Aprillion4mo10

Thanks for the clarification, I don't share the intuition this will prove harder than other hard software engineering challenges in non-AI areas that weren't solved in months but were solved in years and not decades, but other than "broad baseline is more significant than narrow evidence for me" I don't have anything more concrete to share.

~~A note until fixed:~~ Chollet also discusses 'unhobbling' -> Aschenbrenner also discusses 'unhobbling'

LLM Generality is a Timeline Crux

Aprillion5mo30

I agree with "Why does this matter" and with the "if ... then ..." structure of the argument.

But I don't see from where do you see such high probability (>5%) of scaffolding not working... I mean whatever will work can be retroactively called "scaffolding", even if it will be in the "one more major breakthrough" category - and I expect they were already accounted for in the unhobblings predictions.

a year ago many expected scaffolds like AutoGPT and BabyAGI to result in effective LLM-based agents

Do we know the base rate how many years after initial marketing hype of a new software technology we should expect "effective" solutions? What is the usual promise:delivery story for SF startups / corporate presentations around VR, metaverse, crypto, sharing drives, sharing appartments, cybersecurity, industrial process automation, self-driving ..? How much hope should we take from the communication so far that the problem is hard to solve - did we expect before AutoGPT and BabyAGI that the first people who will share their first attempt should have been successful?

LLM Generality is a Timeline Crux

Aprillion5mo80

Aschenbrenner argues that we should expect current systems to reach human-level given further scaling

In https://situational-awareness.ai/from-gpt-4-to-agi/#Unhobbling, "scaffolding" is explicitly named as a thing being worked on, so I take it that progress in scaffolding is already included in the estimate. Nothing about that estimate is "just scaling".

And AFAICT neither Chollet nor Knoop made any claims in the sense that "scaffolding outside of LLMs won't be done in the next 2 years" => what am I missing that is the source of hope for longer timelines, please?

My AI Model Delta Compared To Christiano

Aprillion5mo30

It’s a failure of ease of verification: because I don’t know what to pay attention to, I can’t easily notice the ways in which the product is bad.

Is there an opposite of the "failure of ease of verification" that would add up to 100% if you would categorize the whole of reality into 1 of these 2 categories? Say in a simulation, if you attributed every piece of computation into following 2 categories, how much of the world can be "explained by" each category?

make sure stuff "works at all and is easy to verify whether it works at all"
stuff that works must be "potentially better in ways that are hard to verify"

Examples:

when you press the "K" key on your keyboard for 1000 times, it will launch nuclear missiles ~0 times and the K key will "be pressed" ~999 times
when your monitor shows you the pixels for a glyph of the letter "K" 1000 times, it will represent the planet Jupyter ~0 times and "there will be" the letter K ~999 times
in each page in your stack of books, the character U+0000 is visible ~0 times and the letter A, say ~123 times
tupperware was your own purchase and not gifted by a family member? I mean, for which exact feature would you pay how much more?!?
you can tell whether a water bottle contains potable water and not sulfuric acid
carpet, desk, and chair haven't spontaneously combusted (yet?)
the refrigerator doesn't produce any black holes
(flip-flops are evil and I don't want to jinx any sinks at this time)