LESSWRONG
LW

David Johnston
532Ω3101950
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
OpenAI Claims IMO Gold Medal
David Johnston5h30

It's a research prototype, so probably not style tuned

Reply
Critic Contributions Are Logically Irrelevant
David Johnston5d10

Doesn't G/D gap suggest we should have more criticism than generation?

Reply
Critic Contributions Are Logically Irrelevant
David Johnston5d116

A critique in a high-ambiguity context is almost always (in all but the most technical domains where individual claims can be cheaply verified) a request to engage in a prolonged exchange

I don't think so, I think critics usually see themselves as pointing out errors in the original and would be quite happy with an "oops, good pickup". Extended exchanges happen because authors usually don't agree. Often at least one of the parties is confused/motivated/doing a poor job for some reason, which means long exchanges are often frustrating and people are wary about getting into them. But the critique isn't a request for an exchange.

I'm also a bit confused by this discussion. Is your position actually: you can only judge a critic by their papers or LessWrong posts? This seems odd, can't you judge them by their other critiques? A critique seems much easier to evaluate than a paper.

Reply
Foom & Doom 1: “Brain in a box in a basement”
David Johnston25d110

This piece combines relatively uncontroversial points with some justification ("we're not near the compute or data efficiency limit") with controversial claims justified only by Steven's intuition ("the frontier will be reached suddenly by a small group few people are tracking"). I'd be more interested in a piece which examined the consequences of the former kind of claims only, or more strongly justified the latter kinds of claims.

Reply1
Interpretability Will Not Reliably Find Deceptive AI
David Johnston2mo10

models will have access to some kind of "neuralese" that allows them to reason in ways we can't observe

Only modest confidence, but while there's an observability gap between neuralese and CoT monitoring, I suspect it's smaller than the gap between reasoning traces that haven't been trained against oversight and reasoning traces that have.

Reply
Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt
David Johnston3mo10
  1. I mean, even if you're mostly pursuing a particular set of final values (which is not what you're advocating here), there are probably strong reasons to make coordination a high priority (which is close to what you're advocating here).

  2. Well, I did say "to the extent permitted by 1" - there's probably conflict here - but I wasn't suggesting CEV as something that makes coordination easy. I'm saying it's a good principle for judging final outcomes between two different paths that have similar levels of coordination. Ofc we'd have to estimate the "happiness in hindsight", but this looks tractable to me.

Reply
$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?
David Johnston3mo50

I've thought about it a bit, I have a line of attack for a proof, but there's too much work involved in following it through to an actual proof so I'm going to leave it here in case it helps anyone.

I'm assuming everything is discrete so I can work with regular Shannon entropy.

Consider the range R1 of the function g1:λ↦P(X1|Λ=λ) and R2 defined similarly. Discretize R1 and R2 (chop them up into little balls). Not sure which metric to use, maybe TV.

Define Λ′1(λ) to be the index of the ball into which P(X1|Λ=λ) falls, Λ′2 similar. So if d(P(X1|Λ=a),P(X1|Λ=b)) is sufficiently small, then Λ′1(a)=Λ′1(b).

By the data processing inequality, conditions 2 and 3 still hold for Λ′=(Λ′1,Λ′2). Condition 1 should hold with some extra slack depending on the coarseness of the discretization.

It takes a few steps, but I think you might be able to argue that, with high probability, for each X2=x2, the random variable Q1:=P(X1|Λ′1) will be highly concentrated (n.b. I've only worked it through fully in the exact case, and I think it can be translated to the approximate case but I haven't checked). We then invoke the discretization to argue that H(Λ′1|X1) is bounded. The intuition is that the discretization forces nearby probabilities to coincide, so if Q1 is concentrated then it actually has to "collapse" most of its mass onto a few discrete values.

We can then make a similar argument switching the indices to get H(Λ′2|X2) bounded. Finally, maybe applying conditions 2 and 3 we can get H(Λ′1|X2) bounded as well, which then gives a bound on H(Λ|Xi).

I did try feeding this to Gemini but it wasn't able to produce a proof.

Reply
$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?
David Johnston3mo65

Wait, I thought the first property was just independence, not also identically distributed.

In principle I could have e.g. two biased coins with their biases different but deterministically dependent.

Reply
Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt
David Johnston3mo32

I think:

  1. Finding principles for AI "behavioural engineering" that reduces people's desire to engage in risky races (e.g. because they find the principles acceptable) seems highly valuable
  2. To the extent permitted by 1, pursuing something CEV like ("we're happier with the outcome in hindsight than we would've been with other outcomes") seems desirable also

I sort of see the former as potentially encouraging diversity (because different groups want different things, and are most likely to agree to "everyone gets what they want"), but the latter may in fact suggest convergence (because, perhaps, there are fairly universal answers to "what makes people happy with the benefit of hindsight?").

You stress the importance of having robust feedback procedures, but having overall goals like this can help to judge which procedures are actually doing what we want.

Reply
$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?
David Johnston3mo*10

Your natural latents seem to be quite related to the common construction IID variables conditional on a latent - in fact, all of your examples are IID variables (or "bundles" of IID variables) conditional on that latent. Can you give me an interesting example of a natural latent that is not basically the conditionally IID case?

(I was wondering if the extensive literature on the correspondence between De Finetti type symmetries and conditional IID representations is of any help to your problem. I'm not entirely sure if it is, given that mostly addresses the issue of getting from a symmetry to a conditional independence, whereas you want to get from one conditional independence to another, but it's plausible some of the methods are applicable)

Reply
Load More
7A brief theory of why we think things are good or bad
9mo
10
11Mechanistic Anomaly Detection Research Update
1y
0
6Opinion merging for AI control
2y
0
11Is it worth avoiding detailed discussions of expectations about agency levels of powerful AIs?
Q
2y
Q
6
-1How likely are malign priors over objectives? [aborted WIP]
3y
0
8When can a mimic surprise you? Why generative models handle seemingly ill-posed problems
3y
4
3There's probably a tradeoff between AI capability and safety, and we should act like it
3y
3
3Is evolutionary influence the mesa objective that we're interested in?
3y
2
2[Cross-post] Half baked ideas: defining and measuring Artificial Intelligence system effectiveness
3y
0
5Are there any impossibility theorems for strong and safe AI?
Q
3y
Q
3
Load More