LESSWRONG
LW

Grue_Slinky
237Ω948270
Message
Dialogue
Subscribe

Zbetna Fvapynver [rot13]

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2Grue_Slinky's Shortform
Ω
6y
Ω
6
No wikitag contributions to display.
We run the Center for Applied Rationality, AMA
Grue_Slinky6y90

How do CFAR's research interests/priorities compare with LW's Open Problems in Human Rationality? Based on Brienne and Anna's replies here, I suspect the answer is "they're pretty different", but I'd like to hear what accounts for this divergence.

Reply
[AN #76]: How dataset size affects robustness, and benchmarking safe exploration by measuring constraint violations
Grue_Slinky6yΩ110

Nitpick: "transfer learning" is the standard term, no? It has a Wiki page and seems to get a more coherent batch of search results than googling "robustness to data shift".

Reply
What are we assuming about utility functions?
Grue_Slinky6yΩ450

Whoops, mea culpa on that one! Deleted and changed to:

the main post there pointed out that seemingly anything can be trivially modeled as being a "utility maximizer" (further discussion here), whereas only some intelligent agents can be described as being "goal-directed" (as defined in this post), and the latter is a more useful concept for reasoning about AI safety.
Reply
Grue_Slinky's Shortform
Grue_Slinky6y30

In reasoning about AGI, we're all aware of the problems with anthropomorphizing, but it occurs to me that there's also a cluster of bad reasoning that comes from an (almost?) opposite direction, where you visualize an AGI to be a mechanical automaton and draw naive conclusions based on that.

For instance, every now and then I've heard someone from this community say something like:

What if the AGI runs on the ZFC axioms (among other things), and finds a contradiction, and by the principle of explosion it goes completely haywire?

Even if ZFC is inconsistent, this hardly seems like a legitimate concern. There's no reason to hard-code ZFC into an AI unless we want a narrow AI that's just a theorem prover (e.g. Logic Theorist). Anything close to AGI will necessarily build rich world models, and from the standpoint of these, ZFC wouldn't literally be everything. ZFC would just be a sometimes-useful tool it discovers for organizing its mathematical thinking, which in turn is just a means toward understanding physics etc. better, much as humans wouldn't go crazy if ZFC yields a contradiction.

The general fallacy I'm pointing to isn't just "AGI will be logic-based" but something more like "AGI will act like a machine, an automaton, or a giant look-up table". This is technically true, in the same way humans can be perfectly described as a giant look-up table, but it's just the wrong level of abstraction for thinking about agents (most of the time) and can lead one to silly conclusions if one isn't really careful.

For instance my (2nd hand, half-baked, and lazy) understanding of Penrose's arguments are as follows: Godel's theorems say formal systems can't do X, humans can do X, therefore human brains can't be fully described as formal systems (or maybe he references Turing machines and the halting problem, but the point is still similar). Note that this makes sense as stated, the catch is that

"the human brain when broken down all the way to a Turing machine" is what the Godel/Turing stuff applies to, not "the human brain at the level of abstraction we use to think about it (in terms of 'thoughts', 'concepts', etc.)". It's not at all clear that the latter even resembles a formal system, at least not one rich enough that the Godel/Turing results apply. The fact that it's "built out of" the former means nothing on this point: the proofs of PA > 10 characters do not constitute a formal system, and fleshing out the "built out of" probably requires solving a large chunk of neuroscience.

Again, I'm just using straw-Penrose here as an example because, while we all agree it's an invalid argument, this is mostly because it concludes something LW overwhelmingly agrees is false. When taken at face value, it "looks right" and the actual error isn't completely obvious to find and spell out (hence I've left it in a black spoiler box). I claim that if the argument draws a conclusion that isn't obviously wrong or even reinforces your existing viewpoint, then it's relatively easy to think it makes sense. I think this is what's going on when people here make arguments for AGI dangers that appeal to its potential brittleness or automata-like nature (I'm not saying this is common, but I do see it occasionally).

But there's a subtlety here, because there are some ways in which AGI potentially will be more brittle due to its mathematical formulation. For instance, adversarial examples are a real concern, and those are pretty much only possible because of the way ML systems output numerical probabilities (from these the adversary can infer the gradient of the model's beliefs, and run along it).

And of course, as I said at the start, an opposing fallacy is thinking AGI will be more human-like by default. To be clear I think the fallacy I'm gesturing at here is the less dangerous one in the worst case, but more common on LW (i.e. > 0).

Reply
Technical AGI safety research outside AI
Grue_Slinky6yΩ570

[copying from my comment on the EA Forum x-post]

For reference, some other lists of AI safety problems that can be tackled by non-AI people:

Luke Muehlhauser's big (but somewhat old) list: "How to study superintelligence strategy"

AI Impacts has made several lists of research problems

Wei Dai's, "Problems in AI Alignment that philosophers could potentially contribute to"

Kaj Sotala's case for the relevance of psychology/cog sci to AI safety (I would add that Ought is currently testing the feasibility of IDA/Debate by doing psychological research)

Reply
AI Alignment Open Thread October 2019
Grue_Slinky6yΩ350

*begins drafting longer proposal*

Yeah, this is definitely more high-risk, high-reward than the others, and the fact that there's potentially some very substantial spillover effects if successful makes me both excited and nervous about the concept. I'm thinking of Arbital as an example of "trying to solve way too many problems at once", so I want to manage expectations and just try to make some exercises that inspire people to think about the art of mathematizing certain fuzzy philosophical concepts. (Running title is "Formalization Exercises", but I'm not sure if there's a better pithy name that captures it).

In any case, I appreciate the feedback, Mr. Entworth.

Reply
AI Alignment Open Thread October 2019
Grue_Slinky6yΩ11230

(8)

In light of the “Fixed Points” critique, a set of exercises that seem more useful/reflective of MIRI’s research than those exercises. What I have in mind is taking some of the classic success stories of formalized philosophy (e.g. Turing machines, Kolmogorov complexity, Shannon information, Pearlian causality, etc., but this could also be done for reflective oracles and logical induction), introducing the problems they were meant to solve, and giving some stepping stones that guide one to have the intuitions and thoughts that (presumably) had to be developed to make the finished product. I get that this will be hard, but I think this can be feasibly done for some of the (mostly easier) concepts, and if done really well, it could even be a better way for people to learn those concepts than actually reading about them.

Reply
AI Alignment Open Thread October 2019
Grue_Slinky6y*Ω120

(7)

A critique of MIRI’s “Fixed Points” paradigm, expanding on some points I made on MIRIxDiscord a while ago (which would take a full post to properly articulate). Main issue is, I'm unsure if it's still guiding anyone's research and/or who outside MIRI would care.

Reply
AI Alignment Open Thread October 2019
Grue_Slinky6yΩ660

(6)

An analysis of what kinds of differential progress we can expect from stronger ML. Actually, I don’t feel like writing this post, but I just don’t understand why Dai and Christiano, respectively, are particularly concerned about differential progress on the polynomial hierarchy and what’s easy-to-measure vs. hard-to-measure. My gut reaction is “maybe, but why privilege that axis of differential progress of all things”, and I can’t resolve that in my mind without doing a comprehensive analysis of potential “differential progresses” that ML could precipitate. Which, argh, sounds like an exhausting task, but someone should do it?

Reply
AI Alignment Open Thread October 2019
Grue_Slinky6yΩ8110

(5)

A skeptical take on Part I of “What failure looks like” (3 objections, to summarize briefly: not much evidence so far, not much precedent historically, and “why this, of all the possible axes of differential progress?”) [Unsure if these objections will stand up if written out more fully]

Reply
Load More
35Critiquing "What failure looks like"
Ω
6y
Ω
6
9Some Comments on "Goodhart Taxonomy"
Ω
6y
Ω
1
17What are we assuming about utility functions?
Ω
6y
Ω
24
2Grue_Slinky's Shortform
Ω
6y
Ω
6
5The Quantity/Quality of Researchers Has Drastically Increased Over the Centuries
6y
1
9Non-anthropically, what makes us think human-level intelligence is possible?
Q
6y
Q
3
19What are concrete examples of potential "lock-in" in AI research?
QΩ
6y
QΩ
6
31Distance Functions are Hard
Ω
6y
Ω
19