LESSWRONG
LW

76
Thane Ruthenis
8820Ω805429851
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
8Thane Ruthenis's Shortform
Ω
1y
Ω
176
AI Safety Public Materials
3 years ago
(+195)
Research Agenda: Synthesizing Standalone World-Models (+ Bounties, + Seeking Funding)
Thane Ruthenis5h20

This idea seems to require (basically) a major revolution in or even a complete solution to program induction

Eh, I think any nontrivial technical project can be made to sound like an incredibly significant and therefore dauntingly impossible achievement, if you pick the right field to view it from. But what matters is the actual approach you're using, and how challenging the technical problems are from the perspective of the easiest field in which they could be represented.

Some examples:

  • Consider various geometry problems, e. g., one of those. If you use the tools of analytic geometry, you'd end up having to solve a complicated system of nonlinear equations. If you use synthetic geometry instead, the way to resolve them might consist of applying a well-known theorem and a few simple reasoning steps, so simple you can do it in your head.
  • Consider the problem of moving fast. Before the invention of the car, the problem of moving at 120 km/h could've been cast as "a major revolution in horse-breeding and genetic engineering". But the actual approach taken did not route through horses or biology at all. It achieved the end result through a different pathway, in which the technical problems were dramatically easier.
  • Consider AI. Prior to Deep Learning, there was a throve of symbolic approaches to it; and even before that, hand-written GOFAIs. The technical problem of "achieve DL-level performance using symbolic/GOFAI tools" is dramatically harder than "achieve DL-level performance", unqualified. And yet, the latter can be technically described as a revolution in the relevant fields.
  • Consider various other modeling problems, e. g., weather prediction, volcano modeling, materials-science modeling, quantitative trading. Any advancement in general modeling techniques would revolutionize all of those. But should that technical problem really be framed in the daunting terms of "come up with a revolutionary stock-trading algorithm"?

To generalize: Suppose there's some field A which is optimizing for X. Improving on X using the tools of A would necessarily require you to beat a market that is efficient-relative-to-you. Experts in A already know the tools of A in and out, and how to use them to maximize X. Even if you can beat them, it would only be an incremental improvement. A slightly better solver for systems of nonlinear equations, a slightly faster horse, a slightly better trading algorithm.

The way to actually massively improve on X is to ignore the extant tools of A entirely, and try to develop new tools for optimizing X by using some other field B. On the outside view, this is necessarily a high-risk proposition, since B might end up entirely unhelpful; but it's also high-reward, since it might allow you to actually "beat the market". And if you succeed, the actual technical problems you'll end up solving will be massively easier than the problems you'd need to solve to achieve the same performance using A's tools.

Bringing it back around: This agenda may or may not be viewed as aiming to revolutionize program induction, but I'm not setting out to take the extant program-induction tools and try to cobble together something revolutionary using them. The idea is to use an entirely different line of theory (agent foundations, natural abstractions, information theory, recent DL advances) to achieve that end result.

Reply
Research Agenda: Synthesizing Standalone World-Models (+ Bounties, + Seeking Funding)
Thane Ruthenis6h20

"Simulacrum escapees" are explicitly one of the main failure modes we'll need to address, yes. Some thoughts:

  • The obvious way to avoid them is to not point the wm-synthesizer at a dataset containing agents.
    • If we're aiming to develop intelligence-enhancing medical interventions or the technology for uploading, we don't necessarily need a world-model containing agents: a sufficiently advanced model/simulator of biology/physics would suffice.
    • Similarly, if we want a superintelligent proof synthesizer we can use to do a babble-and-prune search through the space of possible agent-foundations theorems,[1] we only need to make it good at math-in-general, not at intuitive reasoning about agent-containing math.
      • This is riskier than biology/physics, though, because perhaps reasoning even about fully formal agent-foundations math would require reasoning about agents intuitively, i. e., instantiating them in internal simulation spaces.
  • Intuitively, "a simulated agent breaks out of the simulation" is a capability-laden failure of the wm-synthesize. It does not function how it ought to, it is not succeeding at producing an accurate world-model. It should be possible to make it powerful enough to avoid that.
    • Note how, in a sense, "an agent recognizes it's in a simulation and hacks out" is just an instance of the more general failure mode of "part of the world is being modeled incorrectly" (by e. g. having some flaws the simulated agent recognizes, or by allowing it to break out of the sandbox). To work, the process would need to be able to recognize and address those failure modes. If it's sufficiently powerful, whatever subroutines it uses to handle lesser "bugs" should generalize to handling this type of bug as well.
  • With more insights into how agents work, we might be able to come up with more targeted interventions/constraints/regularization techniques for preventing simulacrum escapees. E. g., if we figure out the proper "type signature" of agents, we might be able to explicitly ban the wm-synthesizer from incorporating them in the world-model.

This is a challenge, but one I'm optimistic about handling.

Weeping Agents: Anything that holds the image of an agent becomes an agent

Nice framing! But I somewhat dispute that. Consider a perfectly boxed-in AI, running on a computer with no output channels whatsoever (or perhaps as a homomorphic computation, i. e., indistinguishable from noise without the key). This thing holds the image of an agent; but is it really "an agent" from the perspective of anyone outside that system?

Similarly, a sufficiently good world-model would sandbox the modeled agents well enough that it wouldn't, itself, engage in an agent-like behavior from the perspective of its operators.

  1. ^

    As in: we come up with a possible formalization of some aspect of agent foundations, then babble potential theorems about it at the proof synthesizer, and it provides proofs/disproofs. This is a pretty brute approach and is by no means a full solution, but I expect it can nontrivially speed us up.

Reply
This is a review of the reviews
Thane Ruthenis1d151

I think that (metaphorically) there should be an all-caps disclaimer that reads something like "TO BE CLEAR AI IS STILL ON TRACK TO KILL EVERYONE YOU LOVE; YOU SHOULD BE ALARMED ABOUT THIS AND TELLING PEOPLE IN NO UNCERTAIN TERMS THAT YOU HAVE FAR, FAR MORE IN COMMON WITH YUDKOWSKY AND SOARES THAN YOU DO WITH THE LOBBYISTS OF META, WHO ABSENT COORDINATION BY PEOPLE ON HUMANITY'S SIDE ARE LIABLE TO WIN THIS FIGHT, SO COORDINATE WE MUST" every couple of paragraphs.

Yeah, I kind of regret not prefacing my pseudo-review with something like this. I was generally writing it from the mindset of "obviously the book is entirely correct and I'm only reviewing the presentation", and my assumption was that trying to "sell it" to LW users was preaching to the choir (I would've strongly endorsed it if I had a big mainstream audience, or even if I were making a top-level LW post). But that does feel like part of the our-kind-can't-cooperate pattern now.

Reply1
Buck's Shortform
Thane Ruthenis1d95

My guess is that they're doing the motte-and-bailey of "make it seem to people who haven't read the book that it says that the ASI extinction is inevitable, that the book is just spreading doom and gloom", from which, if challenged, they could retreat to "no, I meant doom isn't inevitable even if we do build ASI using the current methods".

Like, if someone means the latter (and has also read the book and knows that it goes to great lengths to clarify that we can avoid extinction), would they really phrase it as "doom is inevitable", as opposed to e. g. "safe ASI is impossible"?

Or maybe they haven't put that much thought into it and are just sloppy with language.

Reply
IABIED Review - An Unfortunate Miss
Thane Ruthenis5d146

I particularly agree with the point about the style being much more science-y than I'd expected, in a way that surely filters out large swathes of people. I'm assuming "people who are completely clueless about science and are unable to follow technical arguments" are just not the target audience. To crudely oversimplify, I think the target audience is 120+ IQ people, not 100 IQ people.

I mention this for transparency but also because some seem to be rallying around IABIED, even with its shortcomings, because they don’t think there is another option

I think IABIED should be rallied around because "the MIRI book" is the obvious Schelling point for rallying around. It has brand recognition in our circles, its release is a big visible event, it managed to get into best-seller categories meaning it's visible to the mainstream audiences, etc. Even if there are other books which are moderately better at doing what IABIED does, it wouldn't be possible to amplify their impact the same way (even if, say, Eliezer personally recommended them), so IABIED it is.

Further, even if it's possible to coordinate around and boost a different book the same way, this would require additional time; months or years (if that better book is yet to be written). We don't have much of that luxury, in expectation.

This still wouldn't be a good idea if IABIED were actively bad, of course. But it's not. I think it's reasonably good, even if we have our quibbles; and MIRI's pre-release work shows that it seems convincing to non-experts.

We could think about crafting better persuasion-artefacts in the future, but I think rallying around IABIED is the only option, at this point in time. And it may or may not be a marginally worse option compared to some hypothetical alternatives, but it's not a bad option.

Reply
Jacob_Hilton's Shortform
Thane Ruthenis5d92

However, in most of the experimental sciences, formal results are not the main bottleneck, so speed-ups would be more dependent on progress on coding, fuzzier tasks, robotics, and so on

One difficulty with predicting the impact of "solving math" on the world is the Jevons effect (or a kind of generalization of it). If posing a problem formally becomes equivalent to solving it, it would have effects beyond just speeding up existing fully formal endeavors. It might potentially create qualitatively new industries/approaches relying on cranking out such solutions by the dozens.

E. g., perhaps there are some industries which we already can fully formalize, but which still work in the applied-science regime, because building the thing and testing it empirically is cheaper than hiring a mathematician and waiting ten years. But once math is solved, you'd be able to effectively go through dozens of prototypes per day for, say, $1000, while previously, each one would've taken six months and $50,000.

Are there such industries? What are they? I don't know, but I think there's a decent possibility that merely solving formal math would immediately make things go crazy.

Reply
Thane Ruthenis's Shortform
Thane Ruthenis5d20

I am somewhat interested now. I'll aim to look over it and get back to you, but no promises.

Reply
How To Dress To Improve Your Epistemics
Thane Ruthenis5d148

I still think there's a science to it which is yet to be properly written up. It's not at the level of "this combination of design choices/clothing elements is bad, this one is good", but there is a high-level organization to the related skills/principles, which can be taught to speed up someone learning design/fashion. They would still need to do a bunch of case studies/bottom-up learning afterwards (to learn specific extant patterns like "the vibe of a 90s CS professor"), but you can make that learning more sample-efficient.

Social skills are a good parallel. Actually talking to people and trying to accomplish different things with your conversations is necessary for developing social competence, but knowing some basics of the theory of mind and social dynamics is incredibly helpful for knowing what to pay attention to and try.

Reply
How To Dress To Improve Your Epistemics
Thane Ruthenis5d94

I agree that a competent write-up on the (true) theory of fashion seems to be missing. The usual way to deal with such situations is to act like the neural network you are: find some big dataset of [clothing example, fashionability analysis] pairs, consume it, then reverse-engineer the intuitions you've learned. If there's no extant literature on the top-down theory available, go bottom-up and derive it yourself. (It will be time-consuming.)

Reply
How To Dress To Improve Your Epistemics
Thane Ruthenis5d52

Of course, there are lots of other options besides literally just a leather jacket. As a general rule, any outfit which makes people ask “are you in a band?” signals coolness.

There are lots of options in the possibility-space. But are there lots of options on the actual market?

Fashion industry is one of those things that makes me want to just go do it all myself in frustration.[1] The clothing-space seems drastically underexplored.

The most obvious element is coloration. For any piece of clothing, there's a wide variety of complex-yet-tasteful multicolor patterns one might try. Very subtle highlights and gradients; unusual but simple geometric patterns; strong but carefully-chosen contrasts. You don't want a literal clash-of-colors clown outfit, but there's a wealth of possibilities beyond "one simple color"; e. g., mixing different hues of the same color.

Yet, most items on the market do just pick one simple color. Alternatively, they pick a common, basic (and therefore boring, conformist) pattern, e. g. plaid shirts. On the other end of the spectrum, you have graphic tees and such, which are varied but are decidedly unsubtle and, in my opinion, pretty lame (outside very specific combinations of design and social context[2]).

You can slightly deal with that by wearing several items of different colors that combine the way you want. But this only allows basic combinations, and the ability to do that becomes very constrained in hot weather.

Eagerly awaiting the point when AI advances enough for me to vibe-design and 3D print anything I can imagine.

  1. ^

    Also the bag industry. You'd think the wealthy community of digital nomads would've incentivized a thriving ecosystem of varied, competently designed modular bags, and yet.

  2. ^

    This is obviously peak fashion.

Reply1
Load More
43Research Agenda: Synthesizing Standalone World-Models (+ Bounties, + Seeking Funding)
Ω
22h
Ω
4
53The System You Deploy Is Not the System You Design
18d
0
26Is Building Good Note-Taking Software an AGI-Complete Problem?
4mo
13
375A Bear Case: My Predictions Regarding AI Progress
7mo
163
140How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Q
7mo
Q
52
152The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
7mo
52
32Abstract Mathematical Concepts vs. Abstractions Over Real-World Systems
Ω
7mo
Ω
10
39Are You More Real If You're Really Forgetful?
QΩ
10mo
QΩ
25
20Towards the Operationalization of Philosophy & Wisdom
11mo
2
8Thane Ruthenis's Shortform
Ω
1y
Ω
176
Load More