All of Vanessa Kosoy's Comments + Replies

Did they or didn't they learn tool use?

On page 28 they say:

Whilst some tasks do show successful ramp building (Figure 21), some hand-authored tasks require multiple ramps to be built to navigate up multiple floors which are inaccessible. In these tasks the agent fails.

From this, I'm guessing that it sometimes succeeds to build one ramp, but fails when the task requires building multiple ramps.

2Daniel Kokotajlo2dNice, I missed that! Thanks!
DeepMind: Generally capable agents emerge from open-ended play

I don't see what the big deal is about laws of physics. Humans and all their ancestors evolved in a world with the same laws of physics; we didn't have to generalize to different worlds with different laws. Also, I don't think "be superhuman at figuring out the true laws of physics" is on the shortest path to AIs being dangerous. Also, I don't think AIs need to control robots or whatnot in the real world to be dangerous, so they don't even need to be able to understand the true laws of physics, even on a basic level.

The entire novelty of this work revol... (read more)

4Quintin Pope3dWhat really impressed me were the generalized strategies the agent applied to multiple situations/goals. E.g., "randomly move things around until something works" sounds simple, but learning to contextually apply that strategy 1. to the appropriate objects, 2. in scenarios where you don't have a better idea of what to do, and 3. immediately stopping when you find something that works is fairly difficult for deep agents to learn. I think of this work as giving the RL agents a toolbox of strategies that can be flexibly applied to different scenarios. I suspect that finetuning agents trained in XLand in other physical environments will give good results because the XLand agents already know how to use relatively advanced strategies. Learning to apply the XLand strategies to the new physical environments will probably be easier than starting from scratch in the new environment.
DeepMind: Generally capable agents emerge from open-ended play

This is certainly interesting! To put things in proportion though, here are some limitations that I see, after skimming the paper and watching the video:

  • The virtual laws of physics are always the same. So, the sense in which this agent is "generally capable" is only via the geometry and the formal specification of the goal. Which is still interesting to be sure! But not as a big deal as it would be if it did zero-shot learning of physics (which would be an enormous deal IMO).
  • The formal specification is limited to propositional calculus. This allows for
... (read more)

Thanks! This is exactly the sort of thoughtful commentary I was hoping to get when I made this linkpost.

--I don't see what the big deal is about laws of physics. Humans and all their ancestors evolved in a world with the same laws of physics; we didn't have to generalize to different worlds with different laws. Also, I don't think "be superhuman at figuring out the true laws of physics" is on the shortest path to AIs being dangerous. Also, I don't think AIs need to control robots or whatnot in the real world to be dangerous, so they don't even need to be ... (read more)

My Marriage Vows

First, negotiations theory has progressed past game theory solutions to a more psychologically based methodology.

Hmm. Do you have a reference which is not, like, an entire book?

Second, the second vow focuses too much on the target (a KS solution bargain to the disagreement) and too little on the process.

Well, the process is important, but I feel like the discourse norms exemplified by this community already have us covered there, give or take.

Third, KS systems (like other game theory approaches) are difficult to quantify. It's hard to assign a do

... (read more)
My Marriage Vows

Ah, super fair. Splitting any outside income 50/50 would still work, I think.

I think that a 50/50 splits creates the wrong incentives, but I am reluctant to discuss this in public.

PS: Of course this was also prompted by us nerding out about your and Marcus's vows so thank you again for sharing this. I'm all heart-eyes every time I think about it!


My Marriage Vows

I had an extremely painful and emotional divorce myself, so I am aware. Although, I tend to reject the idea that emotions prevent you from thinking straight. I think that's a form of strategic self-deception.

Strictly speaking, the vows don't say all decisions must be unanimous (although if they aren't it becomes kinda tricky to define the bargaining solution). However, arguably, if both of us follow the vows and we have common knowledge about this, we should arrive at unanimous decisions[1]. This is the desirable state. On the other hand, it's also possibl... (read more)

My Marriage Vows

Let me first state, that this is quite inspiring!

Thank you!

70% of the couples that marry these days (meaning, Millenials and Generation X) are subject to get a divorce within a decade...

I wanted to inquire, without judgement, how do you reconcile with this fact?

I am divorced myself, and my previous marriage lasted about a decade. Still, I don't know if there's much to reconcile. Obviously there is always risk that the marriage will fail. Equally obviously, staying without a primary lover forever is a worse alternative (for me).

Previous generati

... (read more)
My Marriage Vows

One way to interpret this is "I will do my best effort to follow the optimal policy". On the other hand, when you're optimizing for just your own utility function, one could argue that the "best effort" is exactly equal to the optimal policy once you take constraints and computational/logical uncertainty into account. On the third hand, perhaps for bargaining the case for identifying "best effort" and "optimal" is weaker. In practice, what's important is that even if you followed a suboptimal policy for a while, there's a well-defined way to return to opti... (read more)

My Marriage Vows

In the original paper they have "Assumption 4" which clearly states they disregard solutions that don't dominate the disagreement point. But, you have a good point that when those solutions are taken into account, you don't really have monotonicity.

My Marriage Vows

First of all, this is awesome.

Thank you :)

It seems kind of odd that terrible solutions like (1000, -10^100) could determine the outcome (I realize they can't be the outcome, but still).

I think you might be misunderstanding how KS works. The "best" values in KS are those that result when you optimize one player's payoff under the constraint that the second player's payoff is higher than the disagreement payoff. So, you completely ignore outcomes where one of us would be worse off in expectation than if we didn't marry.

3ADifferentAnonymous8dI'm not sure this is the case? Wiki [] does say "It is assumed that the problem is nontrivial, i.e, the agreements in [the feasible set] are better for both parties than the disagreement", but this is ambiguous as to whether they mean some or all. Googling further, I see graphs like this [] where non-Pareto-improvement solutions visibly do count. I agree that your version seems more reasonable, but I think you lose monotonicity over the set of all policies, because a weak improvement to player 1's payoffs could turn a (-1, 1000) point into a (0.1, 1000) point, make it able to affect the solution, and make the solution for player 1 worse. Though you'll still have monotonicity over the restricted set of policies.
My Marriage Vows

We do have margin for minor violations of the vows, as long as they are not "unconscionable". Granted, we don't have a precise definition of "unconscionable", but certainly if both of us agree that a violation is not unconscionable then it isn't.

My Marriage Vows

Marcus has chronic illness. This means their contribution to the household can vary unpredictably, practically on any timescale. As a result, it's hard to think of any split that's not going to be skewed in one or other direction in some scenarios. Moreover, they are unable to hold to job, so their time doesn't have opportunity cost in a financial sense.

2dreeves7dAh, super fair. Splitting any outside income 50/50 would still work, I think. But maybe that's not psychologically right in y'all's case, I don't know. For Bee and me, the ability to do pure utility transfers feels like powerful magic! Me to Bee while hashing out a decision auction today that almost felt contentious, due to messy bifurcating options, but then wasn't: I love you and care deeply about your utility function and if I want to X more than you want to Y then I vow to transfer to you U_you(Y)-U_you(X) of pure utility! [Our decision auction mechanism in fact guarantees that.] Then we had a fun philosophical discussion about how much better this is than the hollywood concept of selfless love where you set your own utility function to all zeros in order for the other's utility function to dominate. (This falls apart, of course, because of symmetry. Both of us do that and where does that leave us?? With no hair, an ivory comb, no watch, and a gold watchband, is where!)
-2[comment deleted]8d
My Marriage Vows

Our first question is whether you intend to merge your finances.

We do, at least because I'm the only one who has income.

My next question is why the KS solution vs the Nash solution to the bargaining problem?

I'm actually not sure about this. Initially I favored KS because monotonicity seemed more natural than independence of irrelevant alternatives. But then I realized than in sequential decision making, IIA is important because it allows you to consistently optimize your policy on a certain branch of the decision tree even if you made suboptimal act... (read more)

3dreeves8dOoh, this is exciting! We have real disagreements, I think! It might all be prefaced on this: Rather than merge finances, include in your vows an agreement to, say, split all outside income 50/50. Or, maybe a bit more principled, explicitly pay your spouse for their contributions to the household. One way or another, rectify whatever unfairness there is in the income disparity directly, with lump-sum payments. Then you have financial autonomy and can proceed with mechanisms and solution concepts that require transferrable utility!
My Marriage Vows

Firstly I think that " 's " is a grammar mistake and it should just read "...or until my [spouse] breaks..." instead.

You're right, thanks!

Allowing yourself to cancel following your vows because your spouse willfully stopped following theirs is a little dangerous. It leads to situations where you might rather justify your own breach of the vows by pointing to their breach instead of trying to make things right.

I agree that it's a possible failure mode, but the alternative seems worse? Suppose that my spouse starts completely disregarding the vows and breaking them egregiously. Do you really think I should still follow my own vows to the letter?

1frontier646dYes I do think you should follow your vows to the letter even if your spouse is breaking them egregiously. I have strong feelings about this, but I'm not sure if I have a good explanation as to why. Its my general feeling that you really shouldn't be able to consider any sort of exit plan for a marriage. Of course you definitely do need an exit plan, but it shouldn't be something that you're aware of until it's necessary. A marriage is different from a typical mutually beneficial contract. A marriage should partially realign the husband and wife's utility functions such that expected utility for one spouse counts for substantial expected utility to the other spouse. So unless your spouse is behaving so egregiously that you're losing enough expected utility from the marriage to put you below your disagreement point, violating your vows shouldn't come into play. But of course at that point you would be considering divorce anyway if you thought the situation couldn't be fixed while you remain in the marriage. I think that's the crux of it for me: if you don't have breaking your vows or divorce on the table you'll really try to fix whatever issues you have in the marriage (if there are issues) before you have to go nuclear. As I've said I don't quite understand my own position in a straightforward sense so don't give it too much weight. I'm not sure if my explanation for why is really rational or just a rationalization. Thanks for the post and congratulations!
My Marriage Vows

We, as in humans, are poorly defined, barely conscious, irrational, lumps of meat. We are not aware of our own utility functions let alone those of others, especially as they change over time and chaotically in the course of a day. We are unable to follow a precise recipe like the one you have outlined.

I'm not convinced. I have a rather favorable view of human agency and rationality compared to the distribution of opinions in this community, and I think it's not the place to hash out these differences. For our present purpose, just assume that we are ab... (read more)

1Geoffrey Wood4dI agree that marriage is an unwritten contract, I mean you literally sign something as part of the ceremony that legally binds you in the eyes of the government after making serious promises in front of everyone you care about. In some ways this contract is already agreed to well before the wedding, during the period you are dating, living together and sorting out what each other thinks about things. Nevertheless due to my more pessimistic view of human agency, I wouldn't write it up, instead relying on constant good communication about each others feelings on things. (Not implying that this is a perfect recipe or that people should be blamed for being bad communicators if a relationship fails, sometimes there truly are irreconcilable differences). I've been thinking about this over the last few days and I feel that the need to get it all nailed down in this manner could perhaps be coming from a place of insecurity? It might be an idea to address this separately? Id like to say that anyone who could write vows like the ones above with their partner is probably in an excellent place in their relationship. In your other responses i saw that you are the primary breadwinner and that your beloved is at least partially dependent on you. This situation is similar to one of my friends and i don't think they have been handling it well, I think he doesn't realise the extent of the power imbalance this causes. His weakly held opinions have more impact on his wife than he realises and I think that her relative lack of ability to argue with him has caused them to make some poor decisions in the past. I know this is slightly off topic un-solicited advice, but It might be helpful to you to realise (on the off chance that you hadn't already thought about it in this way). I wish you and your fiancé the best in your life together :)
My Marriage Vows

Very interesting - and congratulations!

Thank you :)

It strikes me that the first vow will sometimes conflict with the second.

Well, yes, the intent it is that the Vow of Honest takes precedence over the Vow of Concord.

Have you considered going meta? "I make the set of vows determined by the Kalai-Smorodinski solution to the bargaining problem..."

I'm not sure what's the difference between "set of vows" and "policy"? When I say "policy" I refer to the set of behaviors we are actually capable of choosing from, including computational and other constraints.

3Joe_Collman8dAh ok, if the honesty vow takes precedence. I still think it's a difficult one in edge cases, but I don't see effective resolutions that do better than using vows 2 and 3 to decide on those. The point isn't in choosing "set of vows" over "policy", but rather in choosing "I make the set of vows..." over "Everything I do will be according to...". You're able to make the set of vows (albeit implicitly), and the vows themselves will have the optimal amount of wiggle-room, achievability, flexibility, emphasis on good faith... built in. To say "Everything I do will be according to..." seems to set the bar unachievably high, since it just won't be true. You can aim in that direction, but your actions won't even usually be optimal w.r.t. that policy. (thoughts on trying-to-try [] notwithstanding, I do think vows that are taken seriously should at least be realistically possible to achieve) To put it another way, to get the "Everything I do..." formulation to be equivalent to the "I make the set of vows..." formulation, I think the former would need to be self-referential - i.e. something like "... according to the policy which is the KS solution... given its inclusion in this vow". That self-reference will insert the optimal degree of wiggle-room etc. I think you need either the extra indirection or the self-reference (or I'm confused, which is always possible :)).
My Marriage Vows

A more precise formulation would be: "when choosing what information to pass on, optimize solely for your best estimate of the spouse's utility function".

My Marriage Vows

That's why I wrote "in the counterfactual in which the source of said doubt or dispute would be revealed to us and understood by us with all of its implications at that time as well as we understand it at the time it actually surfaced", so we do use the new information and experience.

The reason I want to anchor it to our present selves is because at present we are fairly aligned. We have pretty good common understanding of what we want form these vows. On the other hand, the appearance of a dispute in the future might be the result of us becoming unaligned... (read more)

My Marriage Vows

So then, these vows could only be made if you have an extremely high level of already having untangled yourself / the elephant, such that it's even possible for you to not (self-)deceive.

I believe that it's always possible for you to not self-deceive. The only real agent is the "elephant". The conscious self is just a "mask" this agent wears, by choice. It can equally well choose to wear a different mask if that benefits it.

What's "on purpose" doing here?

I just mean that there is an intent to deceive, rather than an accidental miscommunication.

3TekhneMakre8dWhat I'm saying is, do you think that there's no ongoing deep hidden deception (or, situation that would call forth deception) in you or your spouse? I this seems possible to me, it's just that empirically it's very rare. I'm wondering if your vows are proofed against this possibility. Maybe you don't think the probability is high enough to worry about; maybe you think the vow ought to be nullified / broken if there is such deception; maybe by you mean to say, yes it was a breach to make this vow given that there was hidden deception, and you'll repair it. Maybe this is how vows are supposed to work--making them, knowing that there's a good chance they'll be partly broken, and then working to uphold them with the understanding that the good faith clause will keep the agreement intact--rather than trying to explicitly say what (/whether) there's circumstances in which the agreement is definitively not intact. IDK. I guess my worry is that hidden deceptions (that is, a deception that you're doing but aren't aware of, i.e. don't have clear access to with most of your mind) will adaptively keep themselves hidden if there's no clear recourse for keeping the agreement intact (including an amicable separation) when they become revealed.
My Marriage Vows

Thank you for sharing. I'm sorry it worked out so poorly for you!

It sounds like your situation was not at all Pareto efficient? If so, this Vow of Concord would not preclude you from divorce? Notice that the Vow does not say that both spouses must locally prefer divorce for divorce to happen. It only says that divorce must be part of the bargaining-optimal policy.

For example, consider the following scenario:

  • If we wouldn't get married, our payoffs would be .
  • With probability we will have a mutually beneficial marriage in which each has payoff .
  • With
... (read more)

I have a hard time trusting any mere humans to think straight on the decision theory of divorce; the stakes are so high that emotions come to the fore.

There must be conditions, even conditions short of abuse, where unilateral exit is allowed regardless of whether the other thinks that is a mistake. The conditions are a safety valve for motivated thinking. They can be things like "if you're miserable, having more fights than intimacy, have tried couples therapy for at least 6 months, stayed apart for a month and felt better alone, then you can divorce if yo... (read more)

My Marriage Vows

Well, at any given moment we will use the best-guess decision theory we have at the time.

My Marriage Vows

To phrase my intent more precisely: whatever the decision theory we will come to believe in[1] is, we vow to behave in a way which is the closest analogue in that decision theory of the formal specification we gave here in the framework of ordinary Bayesian sequential decision making.

  1. It is also possible we will disagree about decision theory. In that case, I guess we need to defer to whatever is the most concrete "metadecision theory" we can agree upon. ↩︎

2Daniel Kokotajlo10dI like where you are going with this. One issue with that phrasing is that it may be hard to fulfill that vow, since you don't yet know what decision theory you will come to believe in.
My Marriage Vows

Why impossible or undesirable?

A related thing that came up in our discussion after I wrote this post is how to apply the Vow of Concord in the face of utility functions that change over time. The amendment we tentatively agreed on is: if the utility functions change, we do new KS bargaining where the disagreement point is the policy that resulted from the previous bargaining. This choice of disagreement point avoids perverse incentives to change your own utility function.

On a more pedestrian note, I was previously married for 10 years, so I'm not completely naïve in that regard.

1ofer5dThat seems like a very important point. Also, you may end up living for more than a billion years (via future technology). The fraction of your future life in which your ~preferences/goal-system will be similar to your current ones may be extremely small.
4Ericf9dCommitting to a decision algorithm now implies that you expect to do worse in the future. Even though future you will have more information and experience. And, as you noted, potentially a different utility function. And, as a practical matter, are you even capable of making decisions as-if you were yourself in the past?
My Marriage Vows

Well, this is bounded rationality: the optimization we're talking about is understood to be within the computational constraints of humans. As to including an explicit Vow of Forgiving, I am concerned it might be too exploitable.

My Marriage Vows

Idk, I have a bad feeling about this, for reasons I attempted to articulate in this post.

I'm not sure how commitment races are relevant here? We're not committing against each other here, we're just considering the set of all possible mutual commitments to compute the Pareto frontier. If you apply this principle to Chicken then the result is, flip a coin to determine who goes first and let them go first, there's no "throwing out the steering wheel" dynamics. Or, you mean commitment races between us and other agents? The intent here is making decision th... (read more)

6Daniel Kokotajlo10dAh, good, that negates most of my concern. If you didn't already you should specify that this only applies to your actions and commitments "towards each other." This is an awkward source of vagueness perhaps, since many actions and commitments affect both your spouse and other entities in the world and thus are hard to classify. Re: the usefulness of precision: Perhaps you could put a line at the end of the policy that says "We aren't actually committing to all that preceding stuff. However, we do commit to take each other's interests into account to a similar extent to the extent implied by the preceding text."
My Marriage Vows

Another potential failure mode: you will think that you're not deceiving your partner in some area while actually deceiving them. According to "The Elephant in the Brain", this is probable.

I believe that essentially the elephant is the agent making all decisions, so it's the elephant taking the vows and bearing full responsibility for upholding them. Self-deception is not a valid excuse for deception.

The only exception to the latter is when this information was given to me in confidence by a third party as part of an agreement which was made in compl

... (read more)
1TekhneMakre9d>I believe that essentially the elephant is the agent making all decisions, so it's the elephant taking the vows and bearing full responsibility for upholding them. Self-deception is not a valid excuse for deception. So then, these vows could only be made if you have an extremely high level of already having untangled yourself / the elephant, such that it's even possible for you to not (self-)deceive. Are the vows assuming this? If not, maybe there should be a clause describing a derivative or trajectory, rather than a state. In other words, how sure are you that you / they aren't already deceiving each other about some important stuff? >set out to deceive my [spouse] on purpose Maybe you're saying "set out", meaning, once the marriage starts, there won't be any *new* deception. Hard to tell how the boundary is drawn, if a preexisting deep deception could spin up new shallow deceptions (without you explicitly noticing this, i.e. being in bad faith). What's "on purpose" doing here? It sort of sounds like "on purpose (...but if it's the elephant, not *me*, then it's less bad)", which I don't think you want to say?
What would it look like if it looked like AGI was very near?

I think that the only known quantum speedup for relatively generic tasks is from Grover's algorithm, which only gives a quadratic speedup. That might be significant some day, or not, depending on the cost of quantum hardware. When it comes to superpolynomial speed-ups, it is very much an active field of study which tasks are relevant, and as far as we know it's only some very specialized tasks like integer factoring. A bunch of people are trying to apply QC to ML but AFAIK it's still anyone's guess whether that will end up being significant.

3gwern15dAnd some of the past QC claims for ML have not panned out. Like, I think there was a Quantum Monte Carlo claimed to be potentially useful for ML which could be done on cheaper QC archs, but then it turned out to be doable classically...? In any case, I have been reading about QCs all my life, and they have yet to become relevant to anything I care about; and I assume Scott Aaronson will alert us should they suddenly become relevant to AI/ML/DL, so the rest of us should go about our lives until that day.
BASALT: A Benchmark for Learning from Human Feedback

It's not "from zero" though, I think that we already have ML techniques that should be applicable here.

BASALT: A Benchmark for Learning from Human Feedback

if I thought we could build task-specific AI systems for arbitrary tasks, and only super general AI systems were dangerous, I'd be advocating really hard for sticking with task-specific AI systems and never building super general AI systems

The problem with this is that you need an AI whose task is "protect humanity from unaligned AIs", which is already very "general" in a way (i.e. requires operating on large scales of space, time and strategy). Unless you can effectively reduce this to many "narrow" tasks which is probably not impossible but also not easy.

BASALT: A Benchmark for Learning from Human Feedback

The AI safety community claims it is hard to specify reward functions... But for real-world deployment of AI systems, designers do know the task in advance!

Right, but you're also going for tasks that are relatively simple and easy. In the sense that, "MakeWaterfall" is something that I can, based on my own experience, imagine solving without any ML at all (but ofc going to that extreme would require massive work). It might be that for such tasks solutions using handcrafted rewards/heuristics would be viable, but wouldn't scale to more complex tasks. If ... (read more)

4rohinmshah21dI agree that's possible. Tbc, we did spend some time thinking about how we might use handcrafted rewards / heuristics to solve the tasks, and eliminated a couple based on this, so I think it probably won't be true here. No. For the competition, there's a ban on pretrained models that weren't publicly available prior to competition start. We look at participants' training code to ensure compliance. It is still possible to violate this rule in a way that we may not catch (e.g. maybe you use internal simulator details to do hyperparameter tuning, and then hardcode the hyperparameters in your training code), but it seems quite challenging and not worth the effort even if you are willing to cheat. For the benchmark (which is what I'm more excited about in the longer run), we're relying on researchers to follow the rules. Science already relies on researchers honestly reporting their results -- it's pretty hard to catch cases where you just make up numbers for your experimental results. (Also in the benchmark version, people are unlikely to write a paper about how they solved the task using special-case heuristics; that would be an embarrassing paper.)
BASALT: A Benchmark for Learning from Human Feedback

It's not quite as interesting as I initially thought, since they allow handcrafted reward functions and heuristics. It would be more interesting if the designers did not know the particular task in advance, and the AI would be forced to learn the task entirely from demonstrations and/or natural language description.

4rohinmshah21dWe allow it, but we don't think it will lead to good performance (unless you throw a very large amount of time at it). The AI safety community claims it is hard to specify reward functions. If we actually believe this claim, we should be able to create tasks where even if we allow people to specify reward functions, they won't be able to do so. That's what we've tried to do here. Note we do ban extraction of information from the Minecraft simulator -- you have to work with pixels, so if you want to make handcrafted reward functions, you have to compute rewards from pixels somehow. (Technically you also have inventory information but that's not that useful.) We have this rule because in a real-world deployment you wouldn't be able to simply extract the "state" of physical reality. I am a bit more worried about allowing heuristics -- it's plausible to me that our chosen tasks are simple enough that heuristics could solve them, even though real world tasks are too complex for similar heuristics to work -- but this is basically a place where we're sticking our necks out and saying "nope, heuristics won't suffice either" (again, unless you put a lot of effort into designing the heuristics, where it would have been faster to just build the system that, say, learns from demonstrations). But for real-world deployment of AI systems, designers do know the task in advance! We don't want to ban strategies that designers could use in a realistic setting.
9Daniel Kokotajlo22dGoing from zero to "produce an AI that learns the task entirely from demonstrations and/or natural language description" is really hard for the modern AI research hive mind. You have to instead give it a shaped reward, breadcrumbs along the way that are easier, (such as allowing handcrafted heuristics and such, and allowing knowledge of a particular target task) to get the hive mind started making progress.
How will OpenAI + GitHub's Copilot affect programming?

Are you saying that (i) few people will use copilot, or (ii) many people will use copilot but it will have little effect on their outputs or (iii) many people will use copilot and it will boost their productivity a lot but will have little effect on infosec? Your examples sound more like supporting i or ii than supporting iii, but maybe I'm misinterpreting.

I think all of those points are evidence that updates me in the direction of the null hypothesis, but I don't think any of them is true to the exclusion of the others.

I think a moderate amount of people will use copilot.  Cost, privacy, and internet connection will factor to limit this.

I think copilot will have a moderate affect on users outputs.  I think it's the best new programming tool I've used in the past year, but I'm not sure I'd trade it for, e.g. interactive debugging (reference example of a very useful programming tool)

I think copilot ... (read more)

Sam Altman and Ezra Klein on the AI Revolution

Is there an explanation how it works somewhere?

2Nisan1moI haven't seen a writeup anywhere of how it was trained.
[Letter] Imperialism in the Rationalist Community

This comes across as a salad of harsh accusations with only a smidgen of supporting evidence.

What racism and sexism did "redacted" experience in the rationalist community? What "uniformed things" do we say about "non-white non-Western ciswomen"?

I'm pretty sure most of us here are aware that racism exists. In particular, I personally experienced it and my parents and further ancestors experienced a lot of it. This has little to do with "keep your identity small". The latter is about avoiding having particular maps entangled with your sense of self-worth so ... (read more)

3lsusr1moThis is the core point I was trying to get across. It sounds like you understand it perfectly.
7lsusr1moBy "trivial and superficial" I mean stuff like your favorite programming language, Linux distro, keyboard input and television shows. I did not intend to include European descent.
Open problem: how can we quantify player alignment in 2x2 normal-form games?

I don't think in this case should be defined to be 1. It seems perfectly justified to leave it undefined, since in such a game can be equally well conceptualized as maximally aligned or as maximally anti-aligned. It is true that if, out of some set of objects you consider the subset of those that have , then it's natural to include the undefined cases too. But, if out of some set of objects you consider the subset of those that have , then it's also natural to include the undefined cases. This is similar to how is simultaneously... (read more)

Open problem: how can we quantify player alignment in 2x2 normal-form games?

In common-payoff games the denominator is not zero, in general. For example, suppose that , , , , . Then , as expected: current payoff is , if played it would be .

2TurnTrout1moYou're right. Per Jonah Moss's comment [] , I happened to be thinking of games where playoff is constant across players and outcomes, which is a very narrow kind of common-payoff (and constant-sum) game.
Open problem: how can we quantify player alignment in 2x2 normal-form games?

Consider any finite two-player game in normal form (each player can have any finite number of strategies, we can also easily generalize to certain classes of infinite games). Let be the set of pure strategies of player and the set of pure strategies of player . Let be the utility function of player . Let be a particular (mixed) outcome. Then the alignment of player with player in this outcome is defined to be:

Ofc so far it doesn't depend on ... (read more)

3TurnTrout1mo✅ Pending unforeseen complications, I consider this answer to solve the open problem. It essentially formalizes B's impact alignment [] with A, relative to the counterfactuals where B did the best or worst job possible. There might still be other interesting notions of alignment, but I think this is at least an important notion in the normal-form setting (and perhaps beyond).
3TurnTrout1moThis also suggests that "selfless" perfect B/A alignment is possible in zero-sum games, with the "maximal misalignment" only occuring if we assume B plays a best response. I think this is conceptually correct, and not something I had realized pre-theoretically.
3TurnTrout1moIn a sense, your proposal quantifies the extent to which B selects a best response on behalf of A, given some mixed outcome. I like this. I also think that "it doesn't necessarily depend onuB" is a feature, not a bug. EDIT: To handle common- constant-payoff games, we might want to define the alignment to equal 1 if the denominator is 0. In that case, the response of B can't affect A's expected utility, and so it's not possible for B to act against A's interests. So we might as well say that B is (trivially) aligned, given such a mixed outcome?
My Current Take on Counterfactuals

I would be convinced if you had a theory of rationality that is a Pareto improvement on IB (i.e. has all the good properties of IB + a more general class of utility functions). However, LI doesn't provide this AFAICT. That said, I would be interested to see some rigorous theorem about LIDT solving procrastination-like problems.

As to philosophical deliberation, I feel some appeal in this point of view, but I can also easily entertain a different point of view: namely, that human values are more or less fixed and well-defined whereas philosophical deliberati... (read more)

2abramdemski1moI don't believe that LI provides such a Pareto improvement, but I suspect that there's a broader theory which contains the two. Ah. I was going for the human-values argument because I thought you might not appreciate the rational-agent argument. After all, who cares what general rational agents can value, if human values happen to be well-represented by infrabayes? But for general rational agents, rather than make the abstract deliberation argument, I would again mention the case of LIDT in the procrastination paradox, which we've already discussed. Or, I would make the radical probabilist [] argument against rigid updating, and the 'orthodox' argument [] against fixed utility functions. Combined, we get a picture of "values" which is basically a market for expected values, where prices can change over time (in a "radical" way that doesn't necessarily spring from an update on a proposition), but which follow some coherence rules like an expectation of an expectation equals an expectation. One formalization of this is Skyrms [] '. Another is your generalization of LI (iirc). So to sum it up, my argument for general rational agents is: * In general, we need not update in a rigid way; we can develop a meaningful theory of 'fluid' updates, so long as we respect some coherence constraints. In light of this generalization, restriction to 'rigid' updates seems somewhat arbitrary (ie there does not seem to be a strong motivation to make the restriction from rationality alone). * Separately, there is no need to actually have a utility function if we have a coherent expectation. * Putting the two together, we can study coherent expectations where the notion of 'coherence' doesn't assume rigid updates. Howeve
An Intuitive Guide to Garrabrant Induction

First, "no complexity bounds on the trader" doesn't mean we allow uncomputable traders, we just don't limit their time or other resources (exactly like in Solomonoff induction). Second, even having a trader that knows everything doesn't mean all the prices collapse in a single step. It does mean that the prices will converge to knowing everything with time. GI guarantees no budget-limited trader will make an infinite profit, it doesn't guarantee no trader will make a profit at all (indeed guaranteeing the later is impossible).

An Intuitive Guide to Garrabrant Induction

A brief note on naming: Solomonoff exhibited an uncomputable algorithm that does idealized induction, which we call Solomonoff induction. Garrabrant exhibited a computable algorithm that does logical induction, which we have named Garrabrant induction.

This seems misleading. Solomonoff induction has computable versions obtained by imposing a complexity bound on the programs. Garrabrant induction has uncomputable versions obtained by removing the complexity bound from the traders. The important difference between Solomonoff and Garrabrant is not computabl... (read more)

2Steven Byrnes2moSorry if this is a stupid question but wouldn't "LI with no complexity bound on the traders" be trivial? Like, there's a noncomputable trader (brute force proof search + halting oracle) that can just look at any statement and immediately declare whether it's provably false, provably true, or neither. So wouldn't the prices collapse to their asymptotic value after a single step and then nothing else ever happens?
My Current Take on Counterfactuals

My hope is that we will eventually have computationally feasible algorithms that satisfy provable (or at least conjectured) infra-Bayesian regret bounds for some sufficiently rich hypothesis space. Currently, even in the Bayesian case, we only have such algorithms for poor hypothesis spaces, such as MDPs with a small number of states. We can also rule out such algorithms for some large hypothesis spaces, such as short programs with a fixed polynomial-time bound. In between, there should be some hypothesis space which is small enough to be feasible and rich... (read more)

My Current Take on Counterfactuals

However, I also think LIDT solves the problem in practical terms:

What is LIDT exactly? I can try to guess but I rather make sure we're both talking about the same thing.

My basic argument is we can model this sort of preference, so why rule it out as a possible human preference? You may be philosophically confident in finitist/constructivist values, but are you so confident that you'd want to lock unbounded quantifiers out of the space of possible values for value learning?

I agree inasmuch as we actually can model this sort of preferences, for a suff... (read more)

2abramdemski2moRight, I agree with this. The situation as I see it is that there's a concrete theory of rationality (logical induction) which I'm using in this way, and it is suggesting to me that your theory (InfraBayes) can still be extended somewhat. My argument that we want this particular extension is basically as follows: human values can be thought of as the endpoint of human philosophical deliberation about values. (I am thinking of logical induction as a formalization of philosophical deliberation over time.) This endpoint seems limit-computable, but not necessarily computable. Now, it's also possible that at this endpoint, humans would have a more compact (ie, computable) representation of values. However, why assume this? (My hope is that by appealing to deliberation like this, my argument has more force than if I was only relying on the strength of logical induction as a theory of rationality. The idea of deliberation gives us a general reason to expect that limit-computable is the right place to look.) I'm not sure details matter very much here, but I'm provisionally happy to spell out LIDT as: 1. Specify some (bounded-value) LUV to use as "utility" 2. Make decisions by looking at conditional expectations of that LUV given actions. Concrete enough?
Power dynamics as a blind spot or blurry spot in our collective world-modeling, especially around AI

...Mistake Theorists should also be systematically biased towards the possibility of things like power dynamics being genuinely significant.

You meant to say, biased against that possibility?

2Kaj_Sotala2moOops, yeah. Edited.
Introduction To The Infra-Bayesianism Sequence

Boundedly rational agents definitely can have dynamic consistency, I guess it depends on just how bounded you want them to be. IIUC what you're looking for is a model that can formalize "approximately rational but doesn't necessary satisfy any crisp desideratum". In this case, I would use something like my quantitative AIT definition of intelligence.

Formal Inner Alignment, Prospectus

Since you're trying to compile a comprehensive overview of directions of research, I will try to summarize my own approach to this problem:

  • I want to have algorithms that admit thorough theoretical analysis. There's already plenty of bottom-up work on this (proving initially weak but increasingly stronger theoretical guarantees for deep learning). I want to complement it by top-down work (proving strong theoretical guarantees for algorithms that are initially infeasible but increasingly made more feasible). Hopefully eventually the two will meet in the mi
... (read more)
Introduction To The Infra-Bayesianism Sequence

I'm not sure why would we need a weaker requirement if the formalism already satisfies a stronger requirement? Certainly when designing concrete learning algorithms we might want to use some kind of simplified update rule, but I expect that to be contingent on the type of algorithm and design constraints. We do have some speculations in that vein, for example I suspect that, for communicating infra-MDPs, an update rule that forgets everything except the current state would only lose something like expected utility.

2Stuart_Armstrong2moI want a formalism capable of modelling and imitating how humans handle these situations, and we don't usually have dynamic consistency (nor do boundedly rational agents). Now, I don't want to weaken requirements "just because", but it may be that dynamic consistency is too strong a requirement to properly model what's going on. It's also useful to have AIs model human changes of morality, to figure out what humans count as values, so getting closer to human reasoning would be necessary.
My Journey to the Dark Side

I wasn't making a proposal about turning everyone vegan. I was just observing that, at least if everyone was like me, the situation would have a "tragedy of the commons" payoff matrix (the Nash equilibrium is "everyone isn't vegan", the Pareto optimum is "everyone is vegan".)

2Pattern3moI wasn't suggesting modification - just changing social norms.
[This comment is no longer endorsed by its author]Reply
2Pattern3moHow does one go about making everyone vegan?
My Journey to the Dark Side

Yes, I'm very skeptical that Ziz is truly at her core the perfect utilitarian she claims to be, however, even in the universe in which that is true, I still want to own up to being "evil". Not because I deserve accolades for my selfishness (I don't), but because being honest is an important part of my life strategy and the sort of social norms I promote.

Load More