Latest Posts

2[Event]WORKSHOP ON ASSURED AUTONOMOUS SYSTEMS (WAAS)5 Embarcadero Center, San FranciscoMay 21st
6[Event] Anki (Memorization Software) for Beginners3045 Shattuck Avenue #1894, BerkeleyJan 25th
35[Event]Bay Area Winter Solstice 201910000 Skyline Boulevard, OaklandDec 16th
3[Event]Advanced Anki (Memorization Software)3045 Shattuck Avenue #1894, BerkeleyJan 25th
4[Event]Halifax SSC Meetup -- Saturday 11/1/20OB6, 1451 South Park Street suite 103, HalifaxJan 11th
1[Event]Nashville January 2020 SSC Meetup2509 12th Avenue South, NashvilleJan 29th
19[Event]Catalyst: a collaborative biosecurity summit3359 26th Street, San FranciscoFeb 22nd
1[Event]SSC Dublin Meetup160-161 Parnell Street, DublinJan 25th
12[Event]Pre-Solstice Unconference 20194799 Shattuck Avenue, OaklandDec 14th

Recent Discussion

Go F*** Someone
125d7 min readShow Highlight

As always, cross-posted from Putanumonit.

From Tokyo to TriBeCa, people are increasingly alone. People go on fewer dates, marry less and later, have smaller families if at all. People are having less sex, especially young people. The common complaint: it’s just too hard. Dating is hard, intimacy is hard, relationships are hard. I’m not ready to play on hard mode yet, I’ll do the relationship thing when I level up.

And simultaneously, a cottage industry sprung up extolling the virtue of loneliness. Self-care, self-development, self-love. Travel solo, live solo, you do you. Wait, doesn’t that la

... (Read more)
7ozziegooen1hDid you interpret me to say, "One should be sure that zero readers will feel offended?" I think that would clearly be incorrect. My point was that there are cases where one may believe that a bunch of readers may be offended, with relatively little cost to change things to make that not the case. For instance, one could make lots of points that use alarmist language to poison the well, where the language is technically correct, but very predictably misunderstood. I think there is obviously some line. I imagine you would as well. It's not clear to me where that line is. I was trying to flag that I think some of the language in this post may have crossed it. Apologies if my phrasing was misunderstood. I'll try changing that to be more precise.
2ozziegooen25mA bit more thinking; I would guess that one reason why you had a strong reaction, and/or why several people upvoted you so quickly, was because you/they were worried that my post would be understood by some as "censorship=good" or "LessWrong needs way more policing". If so, I think that's a great point! It's similar to my original point! Things get misunderstood all the time. I tried my best to make my post understandable. I tried my best to condition it so that people wouldn't misinterpret or overinterpret it. But then my post was misunderstood (from what I can tell, unless I'm seriously misunderstanding Ben here) literally happened within 30 minutes. My attempt provably failed. I'll try harder next time.
4Jacobian29mI understand your concerns. I cross-post everything I write on Putanumonit to LW by default, which I understood to be the intention of "personal blogposts". I didn't write this for LW. If anyone on the mod team told me that this would be better as a link post or off LW entirely, not because it's bad but because it's not aligned with LW's reputation, I'll be happy to comply. With that said, my personal opinion is that LW shouldn't cater to people who form opinions on things before reading them and we should discourage them from hanging out here.

Thanks for the response!

For what it's worth, I predict that this would have gotten more upvotes here at least with different language, though I realize this was not made primarily for LW.

my personal opinion is that LW shouldn't cater to people who form opinions on things before reading them and we should discourage them from hanging out here.

I think this is a complicated issue. I could appreciate where it's coming from and could definitely imagine things going too far in either direction. I imagine that both of us would agree it's a complicated issue,

... (read more)

Cross-posted to the EA Forum. For an epistemic status statement and an outline of the purpose of this sequence of posts, please see the top of my prior post. There are also some explanations and caveats in that post which I won’t repeat - or will repeat only briefly - in this post.

Purpose of this post

In my prior post, I wrote:

We are often forced to make decisions under conditions of uncertainty. This uncertainty can be empirical (e.g., what is the likelihood that nuclear war would cause human extinction?) or moral (e.g., does the wellbeing of future generations matter morally?). The issue

... (Read more)

Will look and see if I can add some useful thought there. Thanks.

Many approaches to AI alignment require making assumptions about what humans want. On a first pass, it might appear that inner alignment is a sub-component of AI alignment that doesn't require making these assumptions. This is because if we define the problem of inner alignment to be the problem of how to train an AI to be aligned with arbitrary reward functions, then a solution would presumably have no dependence on any particular reward function. We could imagine an alien civilization solving the same problem, despite using very different reward functions to train their AIs.

Unfortunatel... (Read more)

5Matthew Barnett2hI find this interesting but I'd be surprised if it were true :). I look forward to seeing it in the upcoming posts. That said, I want to draw your attention to my definition of catastrophe, which I think is different than the way most people use the term. I think most broadly, you might think of a catastrophe as something that we would never want to happen even once. But for inner alignment, this isn't always helpful, since sometimes we want our systems to crash into the ground rather than intelligently optimizing against us, even if we never want them to crash into the ground even once. And as a starting point, we should try to mitigate these malicious failures much more than the benign ones, even if a benign failure would have a large value-neutral impact. A closely related notion to my definition is the term "unacceptable behavior" as Paul Christiano has used it. This is the way he has defined it [], It seems like if we want to come up with a way to avoid these types of behavior, we simply must use some dependence on human values. I can't see how to consistently separate acceptable failures from non-acceptable ones except by inferring our values.

It seems like if we want to come up with a way to avoid these types of behavior, we simply must use some dependence on human values. I can't see how to consistently separate acceptable failures from non-acceptable ones except by inferring our values.

I think people should generally be a little more careful about saying "this requires value-laden information". First, while a certain definition may seem to require it, there may be other ways of getting the desired behavior, perhaps through reframing. Building an AI which only does small things should not r

... (read more)
2Dagon2hI think it's not that the reward function is insufficient, it's the deeper problem that the situation is literally undefined. Can you explain why you think there _IS_ a "true" factor? Not "can a learning system find it", but "is there something to find"? If all known real examples have flags, flatness, and redness 100% correlated, there is no real preference for which one to use in the (counterfactual) case where they diverge. This isn't sampling error or bias, it's just not there.
2Matthew Barnett2hApologies for the miscommunication, but I don't think there really is an objectively true factor. It's true to the extent that humans say that it's the true reward function, but I don't think it's a mathematical fact. That's part of what I'm arguing. I agree with what you are saying.
AI Alignment Open Thread October 2019Ω
284mo1 min readΩ 10Show Highlight

Continuing the experiment from August, let's try another open thread for AI Alignment discussion. The goal is to be a place where researchers and upcoming research can ask small questions they are confused about, share early stage ideas and have lower-key discussions.

2Matthew Barnett13hThat's pretty fair. I think it's likely that another cultural revolution could happen, and this could adversely affect the future if it happens simultaneously with a transition into an AI based economy. However, the deviations from long-term trends are very hard to predict, as you point out, and we should know about the specifics more as we get further along. In the absence of concrete details, I find it far more helpful to use information from long-term trends rather than worrying about specific scenarios.
3Wei_Dai2hThis seems to be ignoring the part of my comment [] at the top of this sub-thread, where I said "[...] has also made me more pessimistic about non-AGI or delayed-AGI approaches to a positive long term future (e.g., the Long Reflection)." In other words, I'm envisioning a long period of time in which humanity has the technical ability to create an AGI but is deliberately holding off to better figure out our values or otherwise perfect safety/alignment. I'm worried about something like the Cultural Revolution happening in this period, and you don't seem to be engaging with that concern?
2Matthew Barnett2hAhh. To be honest, I read that, but then responded to something different. I assumed you were just expressing general pessimism, since there's no guarantee that we would converge on good values upon a long reflection (and you recently viscerally realized that values are very arbitrary). Now I see that your worry is more narrow, in that the cultural revolution might happen during this period, and would act unwisely to create the AGI during its wake. I guess this seems quite plausible, and is an important concern, though I personally am skeptical that anything like the long reflection will ever happen.

Ahh. To be honest, I read that, but then responded to something different. I assumed you were just expressing general pessimism, since there’s no guarantee that we would converge on good values upon a long reflection (and you recently viscerally realized that values are very arbitrary).

I guess I was also expressing a more general update towards more pessimism, where even if nothing happens during the Long Reflection that causes it to prematurely build an AGI, other new technologies that will be available/deployed during the Long Reflection could also in

... (read more)

One thing I've noticed recently is that when someone complains about how a certain issue "just keeps happening" or they "keep having to deal with it", it often seems to indicate an unsolved problem that people may not be aware of. Some examples:

  • Players of a game repeatedly ask the same rules questions to the judges at an event. This doesn't mean everyone is bad at reading -- it likely indicates an area of the rules that is unclear or misleadingly written.
  • People keep trying to open a door the wrong way, either pulling on a door that's supposed to be pushed or
... (Read more)
I'm not sure that's at odds with what mscottveach is saying. To put it in different words, while the amount of feedback might vary, I don't think the ratio of positive vs. negative feedback varies. It's the very rare situation where the number of messages that say, "This was good, everything went as planned or intended," outnumbers the messages that talk about how something went wrong.

Oh, I quite disagree. I've often found it normal for people to give positive feedback only or where positive feedback far outweighs negati... (read more)

7Said Achmiz2hThis phenomenon is very familiar to me as a UX designer, because it often makes bugs or design flaws way, way more difficult to learn about than should (one would naively imagine) be the case. Specifically: suppose I release some piece of software, which has some bug, or usability flaw, etc.; and suppose this problem does not manifest for me, in my own testing, but does manifest for many other users, on a regular basis. One might expect that a flood of complaints, bug reports, angry tweets, irate emails, etc., would nigh-instantly alert me to the problem’s existence… but instead there is naught but silence, and I remain blissfully unaware that there’s anything wrong. Then some time passes, and—by sheer accident!—I discover that large numbers of users have been living with this problem for weeks or months or (horror!) years, and just haven’t said anything… because they’re used to technology just… not working very well, or having bugs, etc.; and so they shrug and treat it as “one of those things”, and do some workaround, or just tolerate the problem, and never consider that, actually, there is something wrong with this picture, and that it is possible for this problem (indeed, most such problems) to not exist, and that complaining might yield results. As they say—many such cases! (Here, for example, is gwern tweeting about a case [] when a website he built had 460,000 unique visitors before word got to him before he realized, after checking personally, that the layout was broken on mobile devices!) EDIT: Corrected the gwern anecdote—it was even worse than I remembered.
8gwern2hEmphasizing the point even more - word didn't get to me. I just thought to myself, 'the layout might not be good on mobile. I ought to check.' (It was not good.)
2Said Achmiz2h… the situation was so disheartening that in my memory of it I mentally substituted something more palatable! (Fixed, thanks.)
3matthewhirschey3hJust found this site, and am going through these ideas. Love the core ideas (thinking, creativity, decision making, etc). I have recently started writing on some similar ideas (, and look forward to exchange!


This is part 1 of a series of posts I initially planned to organize as a massive post last summer on principal-agent problems. As that task quickly became overwhelming, I decided to break it down into smaller posts that ensure I cover each of the cases and mechanisms that I intended to.

Overall, I think the trade-off between the alignment of agents and the competence of agents can explain a lot of problems to which people often think there are simple answers. The less capable an agent is (whether the agent is a person, a bureaucracy, or an algorithm) the easier it is for a principal to assess t... (Read more)

I think those two cases are pretty compatible. The simple rules seem to get formed due to the pressures created by large groups, but there are still smaller sub-groups within large groups than can benefit from getting around the inefficiency caused by the rules, so they coordinate to bend the rules.

Hanson also has an interesting post on group size and conformity:

In the vegan case, it is easier to explain things to a small number of people than a large number of people, even though it may still n... (read more)

1Gentzel3hYea, when you can copy the same value function across all the agents in an bureaucracy, you don't have to pay signaling costs to scale up. Alignment problems become more about access to information rather than having misaligned goals.

Why has there never been a "political Roko's basilisk", i.e. a bill or law that promises to punish any member of parliament who voted against it (or more generally any individual with government power, e.g. judge or bureaucrat, who did not do everything in their capacity to make it law)?

Even if unconstitutionality is an issue, it seems like the "more general" condition would prevent judges from overturning it, etc. And surely there are countries with all-powerful parliaments.

A lot of these examples are distinct from Roko's idea, in that they are self-reinforcing, but generally through other mechanisms than distinguishing supporters from non-supporters and targeting those groups specifically.

There's a pretty strong governance norm (and in many cases constitutional protection) against this kind of segregation and targeting, at least in nominally-free democratic societies. A politician who puts opponents in jail JUST because they are opposed (or proposes a law that punishes ONLY those who oppose it) won't last lon... (read more)

The Rocket Alignment ProblemΩ
1631y15 min readΩ 22Show Highlight

The following is a fictional dialogue building off of AI Alignment: Why It’s Hard, and Where to Start.

(Somewhere in a not-very-near neighboring world, where science took a very different course…)

ALFONSO:  Hello, Beth. I’ve noticed a lot of speculations lately about “spaceplanes” being used to attack cities, or possibly becoming infused with malevolent spirits that inhabit the celestial realms so that they turn on their own engineers.

I’m rather skeptical of these speculations. Indeed, I’m a bit skeptical that airplanes will be able to even rise as high as stratospheric weather balloons anytime... (Read more)

For what it's worth, I was just learning about the basics of MIRI's research when this came out, and reading it made me less convinced of the value of MIRI's research agenda. That's not necessarily a major problem, since the expected change in belief after encountering a given post should be 0, and I already had a lot of trust in MIRI. However, I found this post by Jessica Taylor vastly clearer and more persuasive (it was written before "Rocket Alignment", but I read "Rocket Alignment" first). In particular, I would ... (read more)

All else being equal, arms races are a waste of resources and often an example of the defection equilibrium in the prisoner’s dilemma. However, in some cases, such capacity races may actually be the globally optimal strategy. Below I try to explain this with some examples.

1: If the U.S. kept racing in its military capacity after WW2, the U.S. may have been able to use its negotiating leverage to stop the Soviet Union from becoming a nuclear power: halting proliferation and preventing the build up of world threatening numbers of high yield weapons. Basically, the earlier you win an arms race... (Read more)

I think capacity races to deter deployment races are the best from this perspective: develop decisive capability advantages, and all sorts of useful technology, credibly signal that your could use it for coercive purposes and could deploy it at scale, then don't abuse the advantage, signal good intent, and just deploy useful non-coercive applications. The development and deployment process basically becomes an escalation ladder itself, where you can choose to stop at any point (though you still need to keep people employed/trained to sustain credibili... (read more)

1Gentzel3hBasically, if even if there are adaptations that could happen to make an animal more resistant to venom, the incremental changes in their circulatory system required to do this are so maladaptive/harmful that they can't happen. This is a pretty core part of competitive strategy: matching enduring strengths against the enduring weaknesses of competitors. That said, races can shift dimensions too. Even though snake venom won the race against blood, gradual changes in the lethality of venom might still cause gradual adaptive changes in the behavior of some animals. A good criticism of competitive strategies between states, businesses, etc. is that the repeated shifts in competitive dimensions can still result in Molochian conditions/trading away utility for victory, which may have been preventable via regulation or agreement.
1Gentzel6hThanks, some of Quester's other books on deterrence [] also seem pretty interesting books also seem interesting. My post above was actually intended as a minor update to an old post from several years ago on my blog, so I didn't really expect it to be copied over to LessWrong. If I spent more time rewriting the post again, I think I would focus less on that case, which I think rightly can be contested from a number of directions, and talk more about conditions for race deterrence generally. Basically, if you can credibly build up the capacity to win an arms race (with significant advantages in the relevant forms of talent, natural resources, industrial capacity, etc.) then you may not even have to race. Limited development could plausibly serve to make capacity credible, gain the advantages of positive externalities from cutting edge R&D, but avoid actually sinking a lot of the economy into the production of destabilizing systems. By showing extreme capability in a limited sense, and credible capability to win a particular race, you may be able to deter racing if the communication of lasting advantage is credible. If lasting advantage is not credible, you may get more of a Sputnik or AlphaGo type event and galvanize competitors toward racing faster. For global tech competition more generally, it would be interesting to investigate industrial subsidies by competing governments to see in what conditions countries attempt strategic protectionism and to get around the WTO and in which cases they give up a sector of competition. My prior is that protectionism is more likely when an industry is established, and that countries which could have successfully entered a sector can be deterred from doing so.
1Gentzel6hWhile it may sound counter intuitive, I think you want to increase both hegemony and balance of power at the same time. Basically a more powerful state can help solve lots of coordination problems, but to accept the risks of greater state power you want the state to be more structurally aligned with larger and larger populations of people. [] Obviously states are more aligned with their own populations than with everyone, but I think the expansion of the U.S. security umbrella has been good for reducing the number of possible security dilemmas between states and accordingly people are better off than they would otherwise be with more independent military forces (higher defense spending, higher war risk, etc.). There is some degree of specialization [] within NATO which makes it harder for states to go to war as individuals, and also makes their contribution to the alliance more vital. The more this happens at a given resource level, the more powerful the alliance will be in absolute terms, and the more power will be internally balanced against unilateral actions that conflict with some state's interests, though at some point veto power and reduced redundancy could undermine the strength of the alliance. For technological risks, racing increases risk in the short-run between the competitors but will tend to reduce the number of competitors. In the long-run, agreeing not to race while other technologies progress increases the amount of low hanging fruit and expands the scope of competition to more possible competitors. If you think resource-commandeering positive feedback loops are not super close, there might be a degree of racing you would want earlier to establish front-runners t
100 Ways To Live Better
4220d13 min readShow Highlight

Cross-posted from Putanumonit.

A couple of weeks ago Venkatesh challenged his followers to brainstorm at least 100 tweets on a topic via live responses. Since I’m not an expert on anything in particular, I decided to simply see if I can come up with 100 discrete pieces of life advice in a day.

This off-the-cuff game turned into perhaps the most successful creative project I’ve ever done. The thread was viewed by tens of thousands of people, received thousands of likes, and gained me hundreds of Twitter followers. I didn’t know there was such thirst for random life-advice, no... (Read more)

As a result of this, I put a post on Nextdoor offering to walk people's dogs for free. I'm hoping someone takes me up on it. Thanks for the brilliant suggestion!

LessWrong seems to be a big fan of spaced-repetition flashcard programs like Anki, Supermemo, or Mnemosyne. I used to be. After using them religiously for 3 years in medical school, I now categorically advise against using them for large volumes of memorization.

[A caveat before people get upset: I think they appropriate in certain situations, and I have not tried to use them to learn a language, which seems its most popular use. More at the bottom.]

A bit more history: I and 30 other students tried using Mnemosyne (and some used Anki) for multiple tests. At my school, we have a test approximate

... (Read more)

In what program can I create my own unique flashcard repetition algorithm? How?

The Road to Mazedom
692d6 min readShow Highlight

Previous post: How Escape From Immoral Mazes

Sequence begins here: Moloch Hasn’t Won

The previous posts mostly took mazes as given. 

As an individual, one’s ability to fight any large system is limited. 

That does not mean our individual decisions do not matter. They do matter. They add up. 

Mostly our choice is a basic one. Lend our strength to that which we wish to be free from. Or not do so. 

Even that is difficult. The methods of doing so are unclear. Mazes are ubiquitous. Not lending our strength to mazes, together with the goal of keeping one’s metaphorical soul intact and still putting food o... (Read more)

3Ericf7hI don't think the crux is exchanging things of a different "nature" but rather allowing a personal benefit to influence or determine an agent decision. So, "Install new windows in my house and I'll recommend your company for the contract" and "Install new windows for the HQ office, and I'll recommend your company for the contract" are the same exchange, but the first benefits the agent, while the second benefits the company/principal.
Install new windows for the HQ office, and I'll recommend your company for the contract"

This still feels corrupt to me.

4ioannes_shade16hBen's post is the best reference I've seen for this so far: Excerpts from a larger discussion about simulacra []
3Zvi8hI debated which of those two to use here. Will consider switching.

I am a bit confused and thought I'd rather ask and discuss here before thinking about it for long. As usual I am trying to compartmentalize, structure, make distinctions.

My confusion was triggered thinking about the evaluation function (heuristic to rate the certainty to win/loose) in chess. Clearly what it takes is all there on the board, actually the game is already decided based on that state and assuming both player play to force the best possible outcome.

Why do we need to process data when the information is obviously already in the input? (Yes, I know one can make wordy the distinct... (Read more)

1MoritzG11hThen you are wrong because since the search usually does not reach the chess mate state, there is always a scoring heuristic replacing the further exploration search at some dept. I know and had read chessprogramming [https://chessprogramming] prior to your post, you are wrong to assume that I am a total idiot just because I got myself confused.

Didn't mean to condescend, I was mostly pointing out that complexity is in the iteration of simple rules with a fairly wide branching. I will still argue that all the heuristics and evaluation mechanisms used by standard engines are effectively search predictions, useful only because the full search is infeasible, and because the full search results have not been memoized (in the not-so-giant-by-today's-standards lookup table of position->value).

Epistemic Status: Confident

This idea is actually due to my husband, Andrew Rettek, but since he doesn’t blog, and I want to be able to refer to it later, I thought I’d write it up here.

In many games, such as Magic: The Gathering, Hearthstone, or Dungeons and Dragons, there’s a two-phase process. First, the player constructs a deck or character from a very large sample space of possibilities.  This is a particular combination of strengths and weaknesses and capabilities for action, which the player thinks can be successful against other decks/characters or at winning in the game universe.  The ... (Read more)

I didn't feel like I fully understood this post at the time when it was written, but in retrospect it feels like it's talking about essentially the same thing as Coherence Therapy does, just framed differently.

Any given symptom is coherently produced, in other words, by either (1) how the individual strives, without conscious awareness, to carry out strategies for safety or well-being; or (2) how the individual responds to having suffered violations of safety or well-being. This model of symptom production is squarely in accord with the construct
... (read more)
14AnnaSalamon18hReviewI'm a bit torn here, because the ideas in the post seem really important/useful to me (e.g., I use these phrases as a mental pointer sometimes), such that I'd want anyone trying to make sense of the human situation to have access to them (via this post or a number of other attempts at articulating much the same, e.g. "Elephant and the Brain"). And at the same time I think there's some crucial misunderstanding in it that is dangerous and that I can't articulate. Voting for it anyhow though.
2Ruby6hI voted very hard for this post. The idea feels correct, though I'd describe it as pointing at a key unresolved confusion/conflict for me. It fuels this quiet voice of doubt about everything I do my life (and about others in theirs). I'm not entirely sure what do with this model though, like, the entailment is missing or something. I voted hard mostly because I see it as the start of an issue to be resolved, not a finished work. I'm not sure if the lack of "solution/response" or possibility of bad solution/responses is what you think is dangerous, or perhaps something in the very framing itself (if so, I'm not seeing it). I should probably give the whole topic bit more thought rather than looping on my feelings of "stuck" around it.

I've been wanting to get a better example of CDT (causal decision theory) misbehaving, where the behaviour is more clearly suboptimal than it is in the Newcomb problem (which many people don't seem to accept as CDT being suboptimal), and simpler to grasp than Death in Damascus.

The "predictors exist" problem

So consider this simple example: the player is playing against Omega, who will predict their actions[1]. The player can take three actions: "zero", "one", or "leave".

If ever they do "leave", then the experiment is over and they leave. If they choose "zero" or "one", then Omega will predict

... (Read more)
1Isnasene16hHaving done some research, it turns out the thing I was actually pointing to was ratifiability [] and the stance that any reasonable separation of world-modeling and decision-selection should put ratifiability in the former rather than the latter. This specific claim isn't new: From "Regret and Instability in causal decision theory [] ": However, it's clear to me now that you were discussing an older, more conventional, version of CDT[1] which does not have that property. With respect to that version, the thought-experiment goes through but, with respect to the version I believe to be sensible, it doesn't[2]. [1] I'm actually kind of surprised that the conventional version of CDT is that dumb -- and I had to check a bunch of papers to verify that this was actually happening. Maybe if my memory had complied at the time, it would've flagged your distinguishing between CDT and EDT here from past LessWrong articles I've read like CDT=EDT [] . But this wasn't meant to be so I didn't notice you were talking about something different. [2] I am now confident it does not apply to the thing I'm referring to -- the linked paper brings up "Death in Damascus" specifically as a place where ratifiable CDt does not fail
2Stuart_Armstrong13hHave they successfully formalised the newer CDT?

Can you clarify what you mean by "successfully formalised"? I'm not sure if I can answer that question but I can say the following:

Stanford's encyclopedia has a discussion of ratifiability dating back to the 1960s and (by the 1980s) it has been applied to both EDT and CDT (which I'd expect, given that constraints on having an accurate world model should be independent of decision theory). This gives me confidence that it's not just a random Less Wrong thing.

Abram Dempski from MIRI has a whole sequence on when CDT=EDT which lev... (read more)

Inspired by my post on problems with causal decision theory (CDT), here is a hacked version of CDT that seems to be able to imitate timeless decision theory (TDT) and functional decision theory[1] (FDT), as well as updateless decision theory (UDT) under certain circumstances.

Call this ACDT, for (a)causal decision theory. It is, essentially, CDT which can draw extra, acausal arrows on the causal graphs, and which attempts to figure out which graph represents the world it's in. The drawback is its lack of elegance; the advantage, if it works, is that it's simple to specify and focuses attention

... (Read more)
3Stuart_Armstrong13hIt's actually worse than that for CDT; the agent is not actually trying to randomise, it is compelled to model the predictor as a process that is completely disconnected from its own actions, so it can freely pick the action that the predictor is least likely to pick - according to the CDT's modelling of it. Or pick zero in the case of a tie. So the CDT agent is actually deterministic, and even if you gave it a source of randomness, it wouldn't see any need to use it. [...] then it can learn that the predictor can actually predict the agent successfully, and so will no longer expect a 50% [...]

Thanks! I changed it to:

If the predictor is near-perfect, but the agent models its actions as independent of the predictor (since the prediction was made in the past), then the agent will have some belief about the prediction and will choose the less likely action for expected utility at least 1, and will continually lose.

The problem with the previous agent is that it never learns that it has the wrong causal model. If the agent is able to learn a better causal model from experience, then it can learn that the predictor can actually predict the agent successfully, and so will no longer expect a 50% chance of winning, and it will stop playing the game.

There appears to be something of a Sensemaking community developing on the internet, which could roughly be described as a spirituality-inspired attempt at epistemology. This includes Rebel Wisdom, Future Thinkers, Emerge and maybe you could even count post-rationality. While there are undoubtedly lots of critiques that could be made of their epistemics, I'd suggest watching this space as I think some interesting ideas will emerge out of it.

2May 21st5 Embarcadero Center, San FranciscoShow Highlight

This may be of interest to people interested in AI Safety. This event is part of the 2020 IEEE Symposium on Security and Privacy and is being sponsored by the Johns Hopkins University Institute for Assured Autonomy.

How to Escape From Immoral Mazes
534d18 min readShow Highlight

Previously in sequence and most on point: What is Success in an Immoral Maze?How to Identify an Immoral Maze

This post deals with the goal of avoiding or escaping being trapped in an immoral maze, accepting that for now we are trapped in a society that contains powerful mazes. 

We will not discuss methods of improving conditions (or preventing the worsening of conditions) within a maze, beyond a brief note on what a CEO might do. For a middle manager anything beyond not making the problem worse is exceedingly difficult. Even for the CEO this is an extraordinarily difficult task.   

To rescue so... (Read more)

In both cases, I think 'start one's own business' should be at the top of the list. This can be a start-up designed to make a lot of money - and that's by far the highest EV play if you can take a real shot and afford to fail. But it does not need to be something so risky. If you have a trade where you can open a store, or put yourself and perhaps a small number of others out for hire, or even become a consultant of some kind, consider doing one of those before anything else.

Doctor -> private practice. Lawyer -> small law firm as... (read more)

Load More