LESSWRONG
LW

2594
Vanessa Kosoy
9789Ω21743411561
Message
Dialogue
Subscribe

Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran's group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.

E-mail: {first name}@alter.org.il

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
8Vanessa Kosoy's Shortform
Ω
6y
Ω
237
41Lectures on statistical learning theory for alignment researchers
Ω
2mo
Ω
1
30New Paper: Ambiguous Online Learning
Ω
5mo
Ω
2
78New Paper: Infra-Bayesian Decision-Estimation Theory
Ω
7mo
Ω
4
54[Closed] Gauging Interest for a Learning-Theoretic Agenda Mentorship Programme
9mo
5
75Video lectures on the learning-theoretic agenda
Ω
1y
Ω
0
40Linear infra-Bayesian Bandits
Ω
2y
Ω
5
122Which skincare products are evidence-based?
Q
2y
Q
48
125AI Alignment Metastrategy
Ω
2y
Ω
13
176Critical review of Christiano's disagreements with Yudkowsky
Ω
2y
Ω
40
104Learning-theoretic agenda reading list
Ω
2y
Ω
1
Load More
Lambda Calculus Prior
Vanessa Kosoy3d20

I think that the problem is in the way you define the prior. Here is an alternative proposal:

Given a lambda-term t, we can interpret it as defining a partial function ft:{0,1}∗×{0,1}×N⇀Q∩[0,1]. This function works by applying t to the (appropriately encoded) inputs, beta-reducing, and then interpreting the result as an element of Q∩[0,1] using some reasonable encoding. It's a partial function because the reduction can fail to terminate or the output can violate the expected format.

Given f:{0,1}∗×{0,1}×N⇀Q∩[0,1], we define the "corrected" function ^f:{0,1}∗×{0,1}×N⇀Q∩[0,1] as follows. (The goal here is to make it monotonic in the last argument, and also ensure that probabilities sum to ≤1.) First, we write fmax(u,b,k)=x whenever (i) for all i≤k, (u,b,i)∈dom(f) and (ii) x=maxi≤kf(u,b,i). If there is no such x (i.e. when condition i fails) then fmax(u,b,k) is undefined. Now, we have two cases:

  • When ∀i≤k:fmax(u,0,i)+fmax(u,1,i)≤1 (in particular, the terms on the LHS are defined), we define ^f(u,b,k):=fmax(u,b,k).
  • In other cases, we define ^f(u,b,k):=fmax(u,b,j) where j is maximal s.t. (u,b,j) is in the former case. If there is no such j, we set ^f(u,b,k):=0.

We can now define the semimeasure μf by

μf(b|u):=limk→∞^f(u,b,k)

For f=ft, this semimeasure is lower-semicomputable. Conversely, any lower-semicomputable semimeasure is of this form. Mixing these semimeasures according to our prior over lambda terms gives the desired Solomonoff-like prior.

Reply
Human Values ≠ Goodness
Vanessa Kosoy4d*22

I agree, except that I don't think it's especially misleading. If I live on the 10th floor and someone is dangling a tasty cake two meters outside of my window (and suppose for the sake of the argument that it's offered free of charge), I won't just walk out of the window and fall to my death. This doesn't mean I'm not following my values, it just means I'm actually thinking through the consequences rather than reacting impulsively to every value-laden thing.

Reply
Turing-Complete vs Turing-Universal
Vanessa Kosoy6d112

...The prototypical example of a prior based on Turing machines is Solomonoff's prior. Someone not familiar with the distinction between Turing-complete and Turing-universal might naively think that a prior based on lambda calculus would be equally powerful. It is not so. Solomonoff's prior guarantees a constant Bayes loss compared to the best computable prior for the job. In contrast, a prior based on lambda calculus can guarantee only a multiplicative loss.

 

Can you please make this precise?

When I think of "a prior based on lambda calculus", I imagine something like the following. First, we choose some reasonable complexity measure C on lambda terms, such as:

  • For a variable x, we define C(x):=1
  • For two terms t,s, we define C(ts):=C(t)+C(s)
  • For a term t and a variable x, we define C(λx.t):=C(t)+1

Denote the set of lambda-terms by Λ. We then choose β>0 s.t. ∑t∈Λe−βC(t)≤1. 

Now, we choose some reasonable way to describe lower-semicomputable semimeasures using lambda terms, and make the prior probabilities of different lambda terms proporitional to e−βC(t). It seems to me that the resulting semimeasure dominates every lower-semicomputable semimeasure and is arguably "as good as" the Solomonoff prior. What am I missing?

Reply
Alexander Gietelink Oldenziel's Shortform
Vanessa Kosoy11d2411

Contemporary AI is smart in some ways and dumb in other ways. It's a useful tool that you should integrate into your workflow if you don't want to miss out on productivity. However. I'm worried that exposure to AI is dangerous in similar ways to how exposure to social media is dangerous, only more. You're interacting with something designed to hijack your attention and addict you. Only this time the "something" has its own intelligence that is working towards this purpose (and possibly other, unknown, purposes).

As to the AI safety space: we've been saying for decades that AI is dangerous and now you're surprised that we think AI is dangerous? I don't think it's taking over the world just yet, but that doesn't mean there are no smaller-scale risks. It's dangerous not because it's dumb (the fact it's still dumb is the saving grace) but precisely because it's smart.

My own approach is, use AI is clear, compartmentalized ways. If you have a particular task which you know can be done faster by using AI in a particular way, by all means, use it. (But, do pay attention to time wasted on tweaking the prompt etc.) Naturally, you should also occasionally keep experimenting with new tasks or new ways of using it. But, if there's no clear benefit, don't use it. If it's just to amuse yourself, don't. And, avoid exposing other people if there's no good reason.

Reply1
Legible vs. Illegible AI Safety Problems
Vanessa Kosoy14dΩ470

This frame seems useful, but might obscure some nuance:

  • The systems we should be most worried about are the AIs of tomorrow, not the AIs of today. Hence, some critical problems might not manifest at all in today's AIs. You can still say it's a sort of "illegible problem" of modern AI that it's progressing towards a certain failure mode, but that might be confusing.
  • While it's true that deployment is the relevant threshold for the financial goals of a company, making it crucial for the company's decision-making and available resources for further R&D, the dangers are not necessarily tied to deployment. It's possible for a world-ending event to originate during testing or even during training.
Reply
Human Values ≠ Goodness
Vanessa Kosoy16d100

I mostly agree with this, the part which feels off is

I’d like to say here “screw memetic egregores, follow the actual values of actual humans”

Humans already follow their actual Values[1], and will always do because their Values are the reason they do anything at all. They also construct narratives about themselves that involve Goodness, and sometimes deny the distinction between Goodness and Values altogether. This act of (self-)deception is in itself motivated by the Values, at least instrumentally.

I do have a version of the “screw memetic egregores” attitude, which is, stop self-deceiving. Because, deception distorts epistemics, and we cannot afford distorted epistemics right now. It's not necessarily correct advice for everyone, but I believe it's correct advice for everyone who is seriously trying to save the world, at least.

Another nuance is that, in addition to empathy and naive tit-for-tat, there is also acausal tit-for-tat. This further pushes the Value-recommended strategy in the direction of something Goodness-like (in certain respects), even though ofc it doesn't coincide with the Goodness of any particular culture in any particular historical period.

  1. ^

    As Steven Byrnes wrote, "values" might be not the best term, but I will keep it here.

Reply1
Wei Dai's Shortform
Vanessa Kosoy18dΩ122713

No, it's not at all the same thing as OpenAI is doing. 

First, OpenAI is working using a methodology that's completely inadequate for solving the alignment problem. I'm talking about racing to actually solve the alignment problem, not racing to any sort of superintelligence that our wishful thinking says might be okay. 

Second, when I say "racing" I mean "trying to get there as fast as possible", not "trying to get there before other people". My race is cooperative, their race is adversarial.

Third, I actually signed the FLI statement on superintelligence. OpenAI hasn't.

Obviously any parallel efforts might end up competing for resources. There are real trade-offs between investing more in governance vs. investing more in technical research. We still need to invest in both, because of diminishing marginal returns. Moreover, consider this: even the approximately-best-case scenario of governance only buys us time, it doesn't shut down AI forever. The ultimate solution has to come from technical research.

Reply
Wei Dai's Shortform
Vanessa Kosoy18d*40

I'm using the term "meta-ethics" in the standard sense of analytic philosophy. Not sure what bothers you so greatly about it.

I find your manner of argumentation quite biased: you preemptively defend yourself by radical skepticism against any claim you might oppose, but when it comes to a claim you support (in this case "ethical realism is false"), suddenly this claim is "pretty close to analytic". The latter maneuver seems to me the same thing as the "Obviously Right" you criticize later.

Also, this brand of radical skepticism is an example of the Charybdis I was warning against. Of course you can always deny that anything matters. You can also deny Occam's razor or the evidence of your own eyes or even that 2+2=4. After all, "there's no predefined standard for standards". (I guess you might object that your reasoning only applies to value-related claims, not to anything strictly value-neutral: but why not?)

Under the premises of radical skepticism, why are we having this debate? Why did you decide to reply to my comment? If anyone can deny anything, why would any of us accept the other's arguments?

To have any sort of productive conversation, we need to be at least open to the possibility that some new idea, if you delve deeply and honestly into understanding it, might become persuasive by the force of the intuitions it engenders and its inner logical coherence combined. To deny the possibility preemptively is to close the path to any progress.

As to your "(b) there's a bunch of empirical evidence against it" I honestly don't know what you're talking about there.

P.S.

I wish to also clarify my positions on a slightly lower level of meta.

First, "ethics" is a confusing term because, on my view, the colloquial meaning of "ethics" is inescapably intertwined with how human societies negotiate of over norms. On the other hand, I want to talk purely about individual preferences, since I view it as more fundamental. 

We can still distinguish between "theories of human preferences" and "metatheories of preferences", similarly to the distinction between "ethics" and "meta-ethics". Namely, "theories of human preferences" would have to describe the actual human preferences, whereas "metatheories of preferences" would only have to describe what does it even mean to talk about someone's preferences at all (whether this someone is human or not: among other things, such a metatheory would have to establish what kind of entities have preferences in a meaningful sense).

The relevant difference between the theory and the metatheory is that Occam's razor is only fully applicable to the latter. In general, we should expect simple answers to simple questions. "What are human preferences?" is not a simple question, because it references the complex object "human". On the other hand "what does it mean to talk about preferences?" does seem to me to be a simple question. As an analogy, "what is the shape of Africa?" is not a simple question because it references the specific continent of Africa on the specific planet Earth, whereas "what are the general laws of continent formation" is at least a simpler question (perhaps not quite as simple, since the notion of "continent" is not so fundamental).

Therefore, I expect there to be a (relatively) simple metatheory of preferences, but I do not expect there to be anything like a simple theory of human preferences. This is why this distinction is quite important.

Reply1
Wei Dai's Shortform
Vanessa Kosoy19d43

Your failure to distinguish ethics from meta-ethics is the source of your confusion (or at least one major source). When you say "ethical realism is false", you're making a meta-ethical statement. You believe this statement is true, hence you perforce must believe in meta-ethical realism.

Reply
Wei Dai's Shortform
Vanessa Kosoy19dΩ25681

Strong disagree.

We absolutely do need to "race to build a Friendly AI before someone builds an unFriendly AI". Yes, we should also try to ban Unfriendly AI, but there is no contradiction between the two. Plans are allowed (and even encouraged) to involve multiple parallel efforts and disjunctive paths to success.

It's not that academic philosophers are exceptionally bad at their jobs. It's that academic philosophy historically did not have the right tools to solve the problems. Theoretical computer science, and AI theory in particular, is a revolutionary method to reframe philosophical problems in a way that finally makes them tractable.

About "metaethics" vs "decision theory", that strikes me as a wrong way of decomposing the problem. We need to create a theory of agents. Such a theory naturally speaks both about values and decision making, and it's not really possible to cleanly separate the two. It's not very meaningful to talk about "values" without looking at what function the values do inside the mind of an agent. It's not very meaningful to talk about "decisions" without looking at the purpose of decisions. It's also not very meaningful to talk about either without also looking at concepts such as beliefs and learning.

As to "gung-ho attitude", we need to be careful both of the Scylla and the Charybdis. The Scylla is not treating the problems with the respect they deserve, for example not noticing when a thought experiment (e.g. Newcomb's problem or Christiano's malign prior) is genuinely puzzling and accepting any excuse to ignore it. The Charybdis is perpetual hyperskepticism / analysis-paralysis, never making any real progress because any useful idea, at the point of its conception, is always half-baked and half-intuitive and doesn't immediately come with unassailable foundations and justifications from every possible angle. To succeed, we need to chart a path between the two.

Reply3211
Load More
Derivative
4 months ago
(+11/-1)