All of Chris Leong's Comments + Replies

That’s a good point that I hadn’t thought of.

I think it’s often valuable to provide a short post for describing phenomenon clearly so that you can then reference them in future posts without going on a massive detour.

Unfortunately, getting onto more interesting matters sometimes requires a bunch of setup first. I could skip the setup, but then everyone would end up confused.

I’m probably missing something obvious, but how does the tickle defence handle X-Or Blackmail?

The Marxist arguments for the collapse of capitalism always sounded handwavey to me, but perhaps you could link me to something that would have sounded persuasive in the past?

Seems valuable as a lot of people want social affirmation before considering the hypothesis.

Inferring backwards would significantly reduce my concern since your starting from a point we have information about.

I suppose that maybe we could calculate the Kolmogorov score of worlds close to us by backchaining, although that doesn’t really seem to be compatible with the calculation at each step being a formal mathematical expression.

Yeah, this is the part of the proposal that’s hardest for me to buy. Chaos theory means that small variations in initial conditions lead to massive differences pretty rapidly; and we can’t even measure an approximation of initial conditions. The whole “let’s calculate the universe from the start” approach seems to leave way too much scope to end up with something completely unexpected.

2the gears to ascension1mo
It's not actually calculating the universe from the start. The formalism is intended to be specified such that identifying the universe to arbitrarily high precision ought to still converge; I'm still skeptical, but it does work to simply infer backwards in time, which ought to be a lot more tractable than forward in time (I think? maybe.) but is still not friendly, and see above about apple making contact with newton's scalp. It's definitely a key weak point, I have some ideas how to fix it and need to talk them over in a lot more depth with @carado.

I would love to see more experimentation here to determine whether GPT4 can do more complicated quines that are less likely to be able to be copied. For example, we could insist that it includes a certain string or avoids certain functions.

Thanks for the detailed response.

To be honest, I’ve been persuaded that we disagree enough in our fundamental philosophical approaches, that I’m not planning to deeply dive into infrabayesianism, so I can’t respond to many of your technical points (though I am planning to read the remaining parts of Thomas Larson’s summary and see if any of your talks have been recorded).

“However, CDT and EDT are both too toyish for this purpose, since they ignore learning and instead assume the agent already knows how the world works, and moreover this knowledge is repres... (read more)

3Vanessa Kosoy3mo
I'm saying that there's probably a literal impossibility theorem lurking there. But, after reading my comment above, my spouse Marcus correctly pointed out that I am mischaracterizing IBP. As opposed to IBRL, in IBP, pseudocausality is not quite the right fairness condition. In fact, in a straightforward operationalization of repeated full-box-dependent transparent Newcomb, an IBP agent would one-box. However, there are more complicated situations where it would deviate from full-fledged UDT. Example 1: You choose whether to press button A or button B. After this, you play Newcomb. Omega fills the box iff you one-box both in the scenario in which you pressed button A and in the scenario in which you pressed button B. Random is not allowed. A UDT agent will one-box. An IBP agent might two-box because it considers the hypothetical in which it pressed a button different from what it actually intended to press to be "not really me" and therefore unpredictable. (Essentially, the policy is ill-defined off-policy.) Example 2: You see either a green light or a red light, and then choose between button A and button B. After this, you play Newcomb. Omega fills the box iff you either one-box after seeing green and pressing A or one-box after seeing green and pressing B. However, you always see red. A UDT agent will one-box if it saw the impossible green and two-box if it saw red. An IBP agent might two-box either way, because if it remembers seeing green then it decides that all of its assumptions about the world need to be revised.

I’m curious why you say it handles Newcomb’s problem well. The Nirvana trick seems like an artificial intervention where we manually assign certain situations a utility of infinity to enforce a consistent condition which then ensures they are ignored when calculating the maximin. If we are manually intervening, why not just manually cross out the cases we wish to ignore, instead of adding them with infinite value then immediately ignoring them.

Just because we modelled this using infrabayesianism, it doesn’t follow that it contributed anything to the soluti... (read more)

4Vanessa Kosoy3mo
You don't need the Nirvana trick if you're using homogeneous or fully general ultracontributions and you allow "convironments" (semimeasure-environments) in your notion of law causality. Instead of positing a transition to a "Nirvana" state, you just make the transition kernel vanish identically in those situations. However, this is a detail, there is a more central point that you're missing. From my perspective, the reason Newcomb-like thought experiments are important is because they demonstrate situations in which classical formal approaches to agency produce answers that seems silly. Usually, the classical approaches examined in this context are CDT and EDT. However, CDT and EDT are both too toyish for this purpose, since they ignore learning and instead assume the agent already knows how the world works, and moreover this knowledge is represented in the preferable form of the corresponding decision theory. Instead, we should be thinking about learning agents, and the classical framework for those is reinforcement learning (RL). With RL, we can operationalize the problem thus: if a classical RL agent is put into an arbitrary repeated[1] Newcomb-like game, it fails to converge to the optimal reward (although it does success for the original Newcomb problem!) On the other hand, an infra-Bayesian RL agent provably does converge to optimal reward in those situations, assuming pseudocausality. Ofc IBRL is just a desideratum, not a concrete algorithm. But examples like Tian et al [] and my own upcoming paper about IB bandits show that there are algorithms with reasonably good IB regret bounds for natural hypotheses classes. While an algorithm with a good regret bound[2] for ultra-POMDPs[3] has not yet been proposed, it seems very like that it exists. Now, about non-pseudocausal scenarios (such as noiseless transparent Newcomb). While this is debatable, I'm leaning towards the view that we actually shouldn't expect agents to succeed t

I wish Eliezer had been clearer on why we can’t produce an AI that internalises human morality with gradient descent. I agree gradient descent is not the same as a combination of evolutionary learning + within lifetime learning, but it wasn’t clear to me why this meant that no combination of training schedule and/or bias could produce something similar.

Yeah agreed, this doesn't make sense to me.

There are probably just a few MB (wouldn't be surprised if it could be compressed into much less) of information which sets up the brain wiring. Somewhere within that information are the structures/biases that, when exposed to the training data of being a human in our world, gives us our altruism (and much else). It's a hard problem to understand these altruism-forming structures (which are not likely to be distinct things), replicate them in silica and make them robust even to large power differentials.

On the oth... (read more)

Why do you believe that “But if you push the complexity up too fast, the RL process will fail, or the AI will be more likely to learn heuristics that are better than nothing but aren't what we intended”?

I understand why this could cause the AI to fail, but why might it learn incorrect heuristics?

3Charlie Steiner4mo
I mean something like getting stuck in local optima on a hard problem. An extreme example would be if I try to teach you to play chess by having you play against Stockfish over and over, and give you a reward for each piece you capture - you're going to learn to play chess in a way that trades pieces short-term but doesn't win the game. Or, like, if you think of shard formation as inner alignment failure that works on the training distribution, the environment being too hard to navigate shrinks the "effective" training distribution that inner alignment failures generalize over.

I’m discussing an agent that does in fact take 5 which imagines taking 10 instead. There have been some discussions of decision theory using proof-based agents and how they can run in spurious counterfactual. If you’re confused, you can try searching the archive of this website. I tried earlier today, but couldn’t find particularly good resources to recommend. I couldn’t find a good resource for playing chicken with the universe either.

(I may write a proper article at some point in the future to explain these concepts if I can’t find an article that explains them well)

Ah, I missed that. That seems like a mental quirk rather than anything fundamental. Then again, maybe you mean something else.

I’ve been reading a lot of shared theory and finding fascinating, even though I’m not convinced that a more agentic subcomponent wouldn’t win out and make the shards obsolete. Especially if you’re claiming the more agentic shards would seize power, then surely an actual agent would as well?

I would love to hear why Team Shard believes that the shards will survive in any significant form. Off the top of my head, I’d guess their explanation might relate to how shards haven’t disappeared in humans?

On the other hand, I much more strongly endorse the notion of thinking of a reward as a chisel rather than something we’ll likely find a way of optimising.

2the gears to ascension5mo
the agency is through the shards acting together; the integrated agent is made of the shards in the first place. when you plan, you invoke a planning system that has shards. when you evaluate, you invoke an evaluation system that has shards. when you act, you run policy that has shards. coherence into simplifications of shards with a "symbolic regression version" of the function the shards approximate wouldn't magically change the shardness of the function.

Are there any agendas you would particularly like to see distilled?

What does it mean for a constraint to be low-dimensional?

John is using it to mean "involves only a few variables". I.e. if you imagine the state of the entire system has a very high dimensional vector, a constraint that only involves a few parts of the system will only involve a few dimensions of this vector. 

Filtering out all the implicit information would be really, really hard.

I think the key problem is that this sandboxing won’t work for anything with a large language model as a component.

Not really? Why could this actually happen, conditional on us filtering out certain information, the real question is how the large language model can break the sandbox such that it can learn other facts?

Very happy to see someone writing about this as I’ve been thinking that there should be more research into this for a while. I guess my main doubt with this strategy is that if a system is running for long enough in a wide enough variety of circumstances maybe certain rare outcomes are virtually guaranteed?

Conjecture: More usefully: Or at least theorems of the above sort means that assemblages are no less safe than their components and are potentially much safer. And assemblages naturally provide security in depth (e.g. the swiss cheese strategy).

I’d encourage you to delve more into this paragraph as I think this is the part of your article where it becomes the most hand-wavey:

“In order to "really solve" outer alignment, you want the AI-optimization process to care about the generalization properties of the created AI beyond the training data. In order to "really solve" inner alignment, the created AI shouldn't just care about the raw outputs of the process that created it, it should care about the things communicated by the AI-optimization process in its real-world context.”

I agree, would like a bit more detail and perhaps an example here.

I don’t know, but sounds like an obvious use case for a sub forum? The solutions listed above seem hackish.

Creating subforums still leaves you with the question of "but what do you see when you go to the main page on". You still somehow want the overall site to have a reasonable balance of stuff on the main list that everyone reads. I do think we're approaching the point where it might make sense to consider subforums, but IMO they don't solve the core problem here.

I have to be honest, I’m skeptical. If we study how human prosociality works, my expectation is that we learn enough to produce some toy models with some very simplistic pro-sociality, but this seems insufficient for generating an AI capable of navigating tough moral dilemmas; just situations sufficiently off-distribution. The reason why we want humans in the loop is not because they are vaguely pro-social but because of the ability of humans to handle novel situations.


Actually, I shouldn’t completely rule out the value of this research. I ... (read more)

If I had to say where it fails, it fails to be robust to relative scale. One thing I notice in real life history is that it really requires relatively equivalent power levels, and without that, it goes wrong fast (human treatment of non-pet animals are a good example.)

I’m sure if he spent five minutes brainstorming he could come up with more things, or maybe I’m just wrongly calibrated on how much agency people have?

I’ve had similar thoughts too. I guess the way I’d implement it is by giving the AI a command that it can activate that directly overwrites the reward buffer but then turns the AI off. The idea here is to make it as easy as possible for an ai inclined to wire head to actually wire head so it is less incentivised to act in the physical world.

During training I would ensure that the SGD used the true reward rather than the wire-headed reward. Maybe that would be sufficient to stop wire-heading, but there are issues with it pursuing the highest probability plan rather than just a high probability plan. Maybe quantilising probability can help here

Theres a difference between debating the merits of different political positions and merely announcing an apparent trend. I’m doing the later and I don’t think the risks associated with this are too severe. So it’s not exactly open season.

The decision to call a tiny number of people a new political trend is a political position. It's the kind of discouse that leads even someone like Glenn Greenwald saying recently that NYT tried to dox Scott Alexander because he's a right-wing blogger.  People like Thiel or Yarvis make great material to write interesting articles about them but their political thought is too complex to be believed by a broader public.

There's another possibility, which is that they have some low-level insights that have been dressed up to appear as far more.

"A common estimate is that the loss of a full year of education leads to a loss of ~$100,000 in lifetime earnings" - I find this very hard to believe

I was never a strong believer. There was never a moment where my "faith shattered", because I never had "faith" in the first place. It's just, given the filtered information, how the regime described the situation, that seemed to me like a plausible description of reality. I haven't heard any alternative description, and I didn't have a reason to invent one.

Also, I was a small kid, so my ability to think about politics was quite limited. For example, I heard the broadcast of Voice of America / Radio Free Europe (I am not sure which one, maybe both) a few t... (read more)

This is an excellent question. Here's some of the things I consider personally important.

Regarding probability, I recently asked the question: Why is Bayesianism Important? I found this Slatestarcodex post to provide an excellent overview of thinking probabilistically, which seems way more important than almost any of the specific theorems.

I would include basic game theory - prisoner's dilemma, tragedy of the commons, multi-polar traps (see Meditations on Moloch for this later idea).

In terms of decision theory, there's the basic concept of expected utility... (read more)

I appreciate how Ben handled this: it was nice for him to let me comment before he posted and for him to also add some words of appreciation at the end.

Regarding point 2, since I was viewing this in game mode I had no real reason to worry about being tricked. Avoiding being tricked by not posting about it would have been like avoiding losing in chess by never making the next move.

I guess other than that, I'd suggest that even a counterfactual donation of $100 to charity not occurring would feel more significant than the frontpage going down for a day. Like... (read more)

8Sunny from QAD3y
I'll just throw in my two cents here and say that I was somewhat surprised by how serious the Ben's post is. I was around for the Petrov Day celebration last year, and I also thought of it as just a fun little game. I can't remember if I screwed around with the button or not (I can't even remember if there was a button for me). Then again, I do take Ben's point: a person does have a responsibility to notice when something that's being treated like a game is actually serious and important. Not that I think 24 hours of LW being down is necessarily "serious and important". Overall, though, I'm not throwing much of a reputation hit (if any at all) into my mental books for you.

I’d suggest that even a counterfactual donation of $100 to charity not occurring would feel more significant than the frontpage going down for a day.

This suggests an interesting idea: A charity drive for the week leading up to Petrov Day, on condition that the funds will be publicly wasted if anyone pushes the button (e.g. by sending bitcoin to a dead-end address, or donating to two opposing politicians' campaigns).

Why would there be? I'm sure they saw it as just a game too and it would be extremely hypocritical for me to be annoyed at anyone for that.

9lionhearted (Sebastian Marshall)3y
Different social norms, I suppose.  I'm trying to think if  we ever prank each other or socially engineer each other in my social circle, and the answer is yes but it's always by doing something really cool — like, an ambiguous package shows up but there's a thoughtful gift inside.  (Not necessarily expensive — a friend found a textbook on Soviet accounting for me, I got him a hardcover copy of Junichi Saga's Memories of Silk and Straw. Getting each other nice tea, coffee, soap, sometimes putting it in a funny box so it doesn't look like what it is. Stuff like that. Sometimes nicer stuff, but it's not about the money.) Then I'm trying to think how my circle in general would respond to no-permission-given out-of-scope pranking of someone's real life community that they're member of — and yeah, there'd be pretty severe consequences in my social circle if someone did that. If I heard someone did what your buddy did who was currently a friend or acquaintance, they'd be marked as someone incredibly discourteous and much less trustworthy. It would just get marked as... pointless rude destructive behavior.  And it's pretty tech heavy btw, we do joke around a lot, it's just when we do pranks it's almost always at the end a gift or something uplifting. I don't mean this to be blunt btw, I just re-read it before posting and it reads more blunt than I meant it to — I was just running through whether this would happen in my social circle, I ran it out mentally, and this is what I came up with. Obviously, everyone's different. And that's of course one of the reasons it's hard for people to get along. Some sort of meta-lesson, I suppose.

Thanks, I'm glad to hear that. :) Also, very thankful that the LW community took this really well.

Beyond that, as for my motivations, aside from curiosity as to whether it would work, etc. I considered that it would be an interesting learning opportunity for the community as well. With actual nukes, random untrusted people also have a part to play. Selecting a small group of people tasked with trying to bring down the site might even be a good addition to future instances of Petrov Day.

For what it's worth, I took care to ensure that the damage from taking ... (read more)

Hey, I've become interested in this field too recently. I've been listening to the Jim Rutt show which is pretty interesting, but I haven't dived into it in any real depth. I agree that it is something that we should be looking more into.

I won't pretend to be an expert on this topic, but my understanding of the differences is as follow:

  • Systems theory tends to involve attempts to understand the overall system, while complex systems are much more likely to have emergent novel behaviour, so any models used need to be held more lightly/it's more likely that
... (read more)

I hadn't decided whether or not to nuke it, but if I did nuke it, I would have been it several hours later, after people had a chance to wake up.

Re: downvotes on the parent comment. Offers of additional (requested!) information shouldn't be punished, or else you create additional incentive to be secretive, to ignore requests for information.

Could you explain the answer to 4?

For the evidential game, it doesn't just matter whether you co-operate or not, but why. Different why's will be more or less likely to be adopted by the other agents.

It's something people say, but don't necessarily fully believe

Yes, anti realists still need to do the things that words like "good" and "true" do - - praise and condemn, and do on. But they are unwilling to use them, which leads to a euphemism treadmill, where "false" is substituted with "problematic", "evil" with "toxic".

I appreciated this post for explaining Berkeley's beliefs really clearly to me. I never knew what he was going on about before.

Cool. Then we're three so far. I'll wait until tomorrow for more responses. Then I'll get back to the responders to schedule a time.

In a booming market, buying can be valuable as a hedge against rising house prices

In a booming market your portfolio is booming as well. Viewed strictly as a hedge, houses would be egregiously expensive. Everyone is tacitly including in their mental model the idea that houses will keep increasing in price at whatever the recent rate has been. This is speculative.

Yeah, I meant part 7. What did he say about feminism and neoreaction?

Not very much--the feminism chapter is 6 pages, and the neoreaction chapter is 5 pages. Both read like "look, you might have heard rumors that they're bad because of X, but here's the more nuanced version," and basically give the sort of defense that Scott Alexander would give. About feminism, he mostly brings up Scott Aaronson's Comment #171 and Scott Alexander's response to the response, Scott Alexander's explanation of why there are so few female computer programmers (because of the distribution of interests varying by sex), and the overreaction to James Damore. On neoreaction, he brings up Moldbug's posts on Overcoming Bias, More Right, and Michael Anissimov, and says 'comment sections are the worst' and 'if you're all about taking ideas seriously and discussing them civilly, people who have no other discussion partners will seek you out.'

I'd like to know more about the dark sides part of the book

You mean Part 7 ("The Dark Sides"), or the ways in which the book is bad? I thought Part 7 was well-done, overall; he asks if we're a cult (and decides "no" after talking about the question in a sensible way), has a chapter on "you can't psychoanalyze your way to the truth", and talks about feminism and neoreactionaries in a way that's basically sensible. Some community gossip shows up, but in a way that seems almost totally fair and respects the privacy of the people involved. My one complaint, as someone responsible for the LessWrong brand, is that he refers to one piece of community gossip as 'the LessWrong baby' and discusses a comment thread in which people are unkind to the mother*, while that comment thread happened on SlateStarCodex. But this is mostly the fault of the person he interviewed in that chapter, I think, who introduced that term, and is likely a sensible attempt to avoid naming the actual humans involved, which is what I've done whenever I want to refer to the gossip. *I'm deliberately not naming the people involved, as they aren't named in the book either, and suspect it should stay that way. If you already know the story you know the search terms, and if you don't it's not really relevant.

I'd still like the ability to make the explicit abstract just read off the text after a certain point, but I suppose it would require a lot of work to support that functionality.

I agree fairly strongly, but this seems far from the final word on the subject, to me.

Hmm, actually I think you're right and that it may be more complex than this.

Ah. I take you to be saying that the quality of the clever arguer's argument can be high variance, since there is a good deal of chance in the quality of evidence cherry-picking is able to find. A good point.

Exactly. There may only be a weak correlation between evidence and truth. And maybe you can do something with it or maybe it's better to focus on stronger signals instead.

I view the issue of intellectual modesty much like the issue of anthropics. The only people who matter are those whose decisions are subjunctively linked to yours (it only starts getting complicated when you start asking whether you should be intellectually modest about your reasoning about intellectual modesty)

One issue with the clever arguer is that the persuasiveness of their arguments might have very little to do with how persuasive they should be, so attempting to work off expectations might fail.

I agree fairly strongly, but this seems far from the final word on the subject, to me. Ah. I take you to be saying that the quality of the clever arguer's argument can be high variance, since there is a good deal of chance in the quality of evidence cherry-picking is able to find. A good point. But, is it 'too high'? Do we want to do something (beyond the strategy I sketched in the post) to reduce variance?

Where would you start with his work?

3Gordon Seidoh Worley4y
Actually, good thing you asked, because I gave wrong information in my original comment. Chisholm is an expert on the problem of the criterion, but I was actually thinking of William Alston in my comment. Here's two papers, one by Alston and one by another author that I've referenced in the past and found useful: William P. Alston. Epistemic Circularity. Philosophy and Phenomenological Research, 47(1):1, sep 1986. Jonathan Dancy. Ethical Particularism and Morally Relevant Properties. Mind, XCII(368):530– 547, 1983.
Load More