johnswentworth - LessWrong

+1, and even for those who do buy extinction risk to some degree, financial/status incentives usually have more day-to-day influence on behavior.

johnswentworth's Shortform

johnswentworth4d92

Good argument, I find this at least somewhat convincing. Though it depends on whether penalty (1), the one capped at 10%/30% of training compute cost, would be applied more than once on the same model if the violation isn't remedied.

johnswentworth's Shortform

johnswentworth4d30-6

So I read SB1047.

My main takeaway: the bill is mostly a recipe for regulatory capture, and that's basically unavoidable using anything even remotely similar to the structure of this bill. (To be clear, regulatory capture is not necessarily a bad thing on net in this case.)

During the first few years after the bill goes into effect, companies affected are supposed to write and then implement a plan to address various risks. What happens if the company just writes and implements a plan which sounds vaguely good but will not, in fact, address the various risks? Probably nothing. Or, worse, those symbolic-gesture plans will become the new standard going forward.

In order to avoid this problem, someone at some point would need to (a) have the technical knowledge to evaluate how well the plans actually address the various risks, and (b) have the incentive to actually do so.

Which brings us to the real underlying problem here: there is basically no legible category of person who has the requisite technical knowledge and also the financial/status incentive to evaluate those plans for real.

(The same problem also applies to the board of the new regulatory body, once past the first few years.)

Having noticed that problem as a major bottleneck to useful legislation, I'm now a lot more interested in legal approaches to AI X-risk which focus on catastrophe insurance. That would create a group - the insurers - who are strongly incentivized to acquire the requisite technical skills and then make plans/requirements which actually address some risks.

Natural Latents: The Math

johnswentworth9d60

So 'latents' are defined by their conditional distribution functions whose shape is implicit in the factorization that the latents need to satisfy, meaning they don't have to always look like , they can look like $P [Λ], P [X | Λ]$ , etc, right?

The key idea here is that, when "choosing a latent", we're not allowed to choose $P [X]$ ; $P [X]$ is fixed/known/given, a latent is just a helpful tool for reasoning about or representing $P [X]$ . So another way to phrase it is: we're choosing our whole model $P [X, Λ]$ , but with a constraint on the marginal $P [X]$ . $P [Λ | X]$ then captures all of the degrees of freedom we have in choosing a latent.

Now, we won't typically represent the latent explicitly as $P [Λ | X]$ . Typically we'll choose latents such that $P [X, Λ]$ satisfies some factorization(s), and those factorizations will provide a more compact representation of the distribution than two giant tables for $P [X]$ , $P [Λ | X]$ . For instance, insofar as $P [Λ, X]$ factors as $P [Λ] \prod_{i} P [X_{i} | Λ]$ , we might want to represent the distribution as $P [Λ]$ and ${P [X_{i} | Λ]}$ (both for analytic and computational purposes).

I don't get the 'standard form' business.

We've largely moved away from using the standard form anyway, I recommend ignoring it at this point.

Also is this post relevant to either of these statements, and if so, does that mean they only hold under strong redundancy?

Yup, that post proves the universal natural latent conjecture when strong redundancy holds (over 3 or more variables). Whether the conjecture does not hold when strong redundancy fails is an open question. But since the strong redundancy result we've mostly shifted toward viewing strong redundancy as the usual condition to look for, and focused less on weak redundancy.

Resampling

Also does all this imply that we're starting out assuming that $Λ$ shares a probability space with all the other possible latents, e.g. $P [X, Λ, Λ^{'}, Λ^{''}, \dots]$ ? How does this square with a latent variable being defined by the CPD implicit in the factorization?

We conceptually start with the objects $P [X]$ , $P [Λ | X]$ , and $P [Λ^{'} | X]$ . (We're imagining here that two different agents measure the same distribution $P [X]$ , but then they each model it using their own latents.) Given only those objects, the joint distribution $P [X, Λ, Λ^{'}]$ is underdefined - indeed, it's unclear what such a joint distribution would even mean! Whose distribution is it?

One simple answer (unsure whether this will end up being the best way to think about it): one agent is trying to reason about the observables $X$ , their own latent $Λ$ , and the other agent's latent $Λ^{'}$ simultaneously, e.g. in order to predict whether the other agent's latent is isomorphic to $Λ$ (as would be useful for communication).

Since $Λ$ and $Λ^{'}$ are both latents, in order to define $P [X, Λ, Λ^{'}]$ , the agent needs to specify $P [Λ, Λ^{'} | X]$ . That's where the underdefinition comes in: only $P [Λ | X]$ and $P [Λ^{'} | X]$ were specified up-front. So, we sidestep the problem: we construct a new latent $Λ^{''}$ such that $P [Λ^{''} | X]$ matches $P [Λ | X]$ , but $Λ^{''}$ is independent of $Λ^{'}$ given $X$ . Then we've specified the whole distribution $P [X, Λ^{'}, Λ^{''}] = P [X] P [Λ^{'} | X] P [Λ^{''} | X]$ .

Hopefully that clarifies what the math is, at least. It's still a bit fishy conceptually, and I'm not convinced it's the best way to handle the part it's trying to handle.

AI #73: Openly Evil AI

johnswentworth9d20

Yeah, it's the "exchange" part which seems to be missing, not the "securities" part.

AI #73: Openly Evil AI

johnswentworth9d3-5

Why does the SEC have any authority at all over OpenAI? They're not a publicly listed company, i.e. they're not listed on any securities exchange, so naively one would think a "securities exchange commission" doesn't have much to do with them.

I mean, obviously federal agencies always have scope creep, it's not actually surprising if they have some authority over OpenAI, but what specific legal foundation is the SEC on here? What is their actual scope?

Natural Latents: The Math

johnswentworth10d73

Consider the exact version of the redundancy condition for latent over $X_{1}, X_{2}$ :

$P [Λ, X_{1}, X_{2}] = P [Λ | X_{1}] P [X_{1}, X_{2}]$

and

$P [Λ, X_{1}, X_{2}] = P [Λ | X_{2}] P [X_{1}, X_{2}]$

Combine these two and we get, for all $Λ, X_{1}, X_{2}$ :

$P [Λ | X_{1}] = P [Λ | X_{2}]$ OR $P [X_{1}, X_{2}] = 0$

That's the foundation for a conceptually-simple method for finding the exact natural latent (if one exists) given a distribution $P [X_{1}, X_{2}]$ :

Pick a value $X_{1}, X_{2}$ which has nonzero probability, and initialize a set $S$ containing that value. Then we must have $P [Λ | X \in S] = P [Λ | X_{1}] = P [Λ | X_{2}]$ for all $Λ$ .
Loop: add to $S$ a new value $X_{1}^{'}, X_{2}$ or $X_{1}, X_{2}^{'}$ where the value $X_{2}$ or $X_{1}$ (respectively) already appears in one of the pairs in $S$ . Then $P [Λ | X_{1}^{'}] = P [Λ | X \in S]$ or $P [Λ | X_{2}^{'}] = P [Λ | X \in S]$ , respectively. Repeat until there are no more candidate values to add to $S$ .
Pick a new pair and repeat with a new set, until all values of $X$ have been added to a set.
Now take the latent to be the equivalence class in which $X$ falls.

Does that make sense?

Dialogue on What It Means For Something to Have A Function/Purpose

johnswentworth12d50

Was this intended to respond to any particular point, or just a general observation?

Corrigibility = Tool-ness?

johnswentworth12d30

My current starting point would be standard methods for decomposing optimization problems, like e.g. the sort covered in this course.

Alignment: "Do what I would have wanted you to do"

johnswentworth14d310

No, because we have tons of information about what specific kinds of information on the internet is/isn't usually fabricated. It's not like we have no idea at all which internet content is more/less likely to be fabricated.

Information about, say, how to prove that there are infinitely many primes is probably not usually fabricated. It's standard basic material, there's lots of presentations of it, it's not the sort of thing which people usually troll about. Yes, the distribution of internet text about the infinitude of primes contains more-than-zero trolling and mistakes and the like, but that's not the typical case, so low-temperature sampling from the LLM should usually work fine for that use-case.

On the other end of the spectrum, "fusion power plant blueprints" on the internet today will obviously be fictional and/or wrong, because nobody currently knows how to build a fusion power plant which works. This generalizes to most use-cases in which we try to get an LLM to do something (using only prompting on a base model) which nobody is currently able to do. Insofar as the LLM is able to do such things, that actually reflects suboptimal next-text prediction on its part.

LESSWRONG
LW

Sequences

Posts

Wiki Contributions

Comments

Resampling