Recent Discussion

Soft takeoff can still lead to decisive strategic advantageΩ
851d7 min readΩ 25Show Highlight

[Epistemic status: Argument by analogy to historical cases. Best case scenario it's just one argument among many.]

I have on several occasions heard people say things like this:

The original Bostrom/Yudkowsky paradigm envisioned a single AI built by a single AI project, undergoing intelligence explosion all by itself and attaining a decisive strategic advantage as a result. However, this is very unrealistic. Discontinuous jumps in technological capability are very rare, and it is very implausible that one project could produce more innovations than the rest of the world combined. Instead
... (Read more)
Wouldn't one project have more compute than the others, and thus pull ahead so long as funds lasted?

To have "more compute than all the others" seems to require already being a large fraction of all the world's spending (since a large fraction of spending is on computers---or whatever bundle of inputs is about to let this project take over the world---unless you are positing a really bad mispricing). At that point we are talking "coalition of states" rather than "project."

I totally agree that it wouldn't be crazy... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

2paulfchristiano12m This gets us into the toy model & its problems. I don't think I understand your alternative model. I maybe don't get what you mean by trading. Does one party giving money to another party in return for access to their technology or products count? If so, then I think my original model still stands: The leading project will be able to hoard technology/innovation and lengthen its lead over the rest of the world so long as it still has funding to buy the necessary stuff.I don't think this picture of "hoarding" makes sense. The reason I let other people use my IP is because they pay me money, with which I can develop even more IP. If the leading project declines to do this, then it will have less IP than any of its normal competitors. If the leading project's IP allows it to be significantly more productive than everyone else, then they could have just taken over the world through the normal mechanism of selling products. (Modulo leaks/spying.) As far as I can tell, until you are a large fraction of the world, the revenue you get from selling lets you grow faster, and I don't think the toy model really undermines that typical argument (which has to go through leaks/spying, market frictions, etc.).
4paulfchristiano26m A coalition strong enough to prevent the world's leading project from maintaining and lengthening its lead would need to have some way of preventing the leading project from accessing the innovations of the coalition. Otherwise the leading project will free-ride off the research done by the coalition. For this reason I think that a coalition would look very different from the world economy; in order to prevent the leading project from accessing innovations deployed in the world economy you would need to have an enforced universal embargo on them pretty much, and if you have that much political power, why stop there? Why not just annex them or shut them down?I don't think I get the model, and suspect it may not be coherent. Are you saying that the leading project can easily spy on other projects, but other projects can't spy on it? Is this because the rest of the world is trading with each other, and trading opens up opportunities for spying? Some other reason I missed? I don't think it's usually the case that gains from rabbit-holing, in terms of protection from spying, are large enough to outweigh the costs from not trading. It seems weird to expect AI to change that, since you are arguing that the proportional importance of spying will go down, not up, because it won't be accelerated as much. If the leading project can't spy on everyone else, then how does it differ from all of the other companies who are developing technology, keeping it private, and charging other people to use it? The leading project can use others' technology when it pays them, just like they use each other's technology when they pay each other. The leading project can choose not to sell its technology, but then it just has less money and so falls further and further behind in terms of compute etc. (and at any rate, it needs to be selling something to the other people in order to even be able to afford to use their technology).
1Daniel Kokotajlo3h Hmm, OK. I like your point about making profits without giving away secrets. And yeah I think you (and Paul's comment below) is helping me to get the picture a bit better--because the economy is growing so fast, moonshot projects that don't turn a profit for a while just won't work because the people capable of affording them one year will be paupers by comparison to the people capable of funding AI research the next year (due to the general boom). And so while a tech-hoarding project will still technically have more insights than the economy as a whole, its lead will shrink as its relative funding shrinks. Another way I think my scenario could happen, though, is if governments get involved. Because governments have the power to tax. Suppose we have a pool of insights that is publicly available, and from it we get this rapidly growing economy fueled by publicly available AI technologies. But then we have a government that taxes this entire economy and funnels the revenue into an AGI project that hoards all its insights. Won't this AGI project have access to more insights than anyone else? If there is an intelligence explosion, won't it happen first (and/or faster) inside the project than outside? We don't have to worry about getting outcompeted by other parts of the economy, since those parts are getting taxed. The funding for our AGI project will rise in proportion to the growth in the AI sector of the economy, even though our AGI project is hoarding all its secrets.
Raph Koster on Virtual Worlds vs Games (notes)
206d2 min readShow Highlight

Raph Koster is a game designer who's worked on old-school MUDs, Ultima Online, and Star Wars Galaxies among others. His blog is a treasure trove of information on game design, and online community building.

The vibe I get is very sequences-like (or, perhaps more like Paul Graham?). There's a particular genre I quite like of "Person with a decades of experiences who's been writing up their thoughts and principles on their industry and craft. Reading through their essays not only reveals a set of useful facts, but an entire lens through which to view things."

I'll mos... (Read more)

5Davis_Kingsley2h Thanks for the link! I ended up reading a large number of his articles. His thoughts on UO and Galaxies were predictably the most interesting to me -- I definitely share his sense that the old "wild west" Ultima and the like was better and more alive than the more soulless modern games (though I didn't actually play Ultima and maybe I'd change my tune after being ganked repeatedly by PKs... :P). I also find it interesting how successful Galaxies was despite the fact that the combat system apparently never worked as intended and was basically dysfunctional! It kinda makes me wonder, what if Galaxies had had the dev resources and budget of WoW? Would that be the new face of MMOs? (Sometimes I've had similar thoughts re: Netrunner and MtG...) For me the most "wild west" exciting alive game right now is EVE Online, but the actual gameplay is something I'm profoundly uninterested in so I basically live vicariously through stories of interesting happenings.
3Raemon3h So reading all this has led me to think a lot about using MMOs as a testing ground for sociology, and vague disappointment that apart from corporate takeovers in EVE Online, there's not a whole lot of MMOs that are really complex/realistic enough to test, say, governance systems that are relevant to the real world. EVE seems designed around a mix of corporate guild structure and "colloquial 'MMO for-fun' guild structure." People don't seem to invent governments that are aiming to solve the sort of problems that real-world governments are trying to solve. (there's not as much point in keeping the peace if killing each other is part of the point of the game) What properties does a game need to have to sufficiently reflect real life that you actually have to solve the same problems? [hmm – fake edit: apparently EVE has an in game democratically elected council [] whose job is to interface between the players and the company that makes EVE. Which does make sense and is interesting but isn't quite the thing I'm looking for here] I feel like Minecraft is pretty close here. I'd like to see a massively multiplayer Minecraft taking place on a roughly earth-sized world, with some tweaks to shift it it a bit towards a combo of "realism" and "able to build things that take some effort to destroy."
reading all this has led me to think a lot about using MMOs as a testing ground for sociology

i think you are on the right track---a google scholar search reveals an enormous amount of social science conducted on virtual worlds including topics like teamwork, economics, and religion. don't know about governance systems though.

3Raemon4h Another interesting bit here is the "reading history backwards []" thing. There's a chapter in the "text-based multiplayer game" section that's like "hey guys we've recently added variables to the game. Now we can give things properties, and we can references those properties to build complicated multilayered game mechanics!" And it's sort of boggling that once upon a time someone had to invent variables, or that it was still a big deal to add them to your game.
[Question]How has rationalism helped you?
71d1 min readShow Highlight

This November, I will be participating in NaNoWriMo, an online event where participants have one month to write a 50,000-word manuscript for a novel. I'm fairly settled on the idea that I'm going to write about a person who is fairly smart, but who has no rationalist training, discovering rationalism and developing into a fully-fledged rationalist.

I'm looking for inspiration for what kind of problems they might learn to solve. How has rationalism helped you? There is no answer too big or too small. If rationalism helped you realize that you needed to divorce your spouse and chan... (Read more)

Over on the "too small" end of the spectrum…

I wrote about how rationality made me better at Mario Kart which I linked to from here a while ago. In short, it's a reminder to think about evidence sources and think about how much you should weigh each.

More recently, I've been watching The International, a Dota 2 competition. Last night I was watching yet another game where I wasn't at all sure who would win. That said, I thought Team Liquid might win (p = 60%). When I saw Team Secret win a minor skirmish (teamfight) against Team Liquid, I made a new predictio

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

Here's the kind of thing I mean by "human working memory capacity" being "set tragically low": A 100-word sentence introducing a new concept is often annoyingly hard to wrap one's head around, compared to a longer and more "gentle" explanation.

The concept of working memory seems like a useful reduction of part of what makes intelligence work:

Most of us here think human intelligence can one day be increased a lot, because there doesn't seem to be a fundamental limit to intelligence located anywhere near human-level, but that's a non-constructive... (Read more)

1Pattern9h There may be ways of "expanding one's working memory" or working with it better so it can do more: A 100-word sentence introducing a new concept is often annoyingly hard to wrap one's head around, compared to a longer and more "gentle" explanation.Compare the effectiveness of words to pictures, and where one does better than the other. I can say "any cycle of gears must have an even number of gears" or I can show a picture of three interlocked gears and ask "Can these gears move?".

Sure, these kinds of tricks help explain how we've done a lot with the working memory we have, but doesn't it feel like a tragically stingy amount?

[Question]Does Agent-like Behavior Imply Agent-like Architecture?Ω
422d1 min readΩ 15Show Highlight

This is not a well-specified question. I don't know what "agent-like behavior" or "agent-like architecture" should mean. Perhaps the question should be "Can you define the fuzzy terms such that 'Agent-like behavior implies agent-like architecture' is true, useful, and in the spirit of the original question." I mostly think the answer is no, but it seems like it would be really useful to know if true, and the process of trying to make this true might help us triangulate what we should mean by agent-like behavior and agent-like architecture.

Now I'... (Read more)

Let's say "agent-like behavior" is "taking actions that are more-likely-than-chance to create an a-priori-specifiable consequence" (this definition includes bacteria).

Then I'd say this requires "agent-like processes", involving (at least) all 4 of: (1) having access to some information about the world (at least the local environment), including in particular (2) how one's actions affect the world. This information can come either baked into the design (bacteria, giant lookup table), and/or from previous experience (RL), and/or via reasoning from input data

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post
Matthew Barnett's Shortform
516d1 min readShow Highlight

I intend to use my shortform feed for two purposes:

1. To post thoughts that I think are worth sharing that I can then reference in the future in order to explain some belief or opinion I have.

2. To post half-finished thoughts about the math or computer science thing I'm learning at the moment. These might be slightly boring and for that I apologize.

2gilch2h When I say "qualia" I mean individual instances of subjective, conscious experience full stop. These three extensions are not what I mean when I say "qualia". -------------------------------------------------------------------------------- Qualia are private entities which occur to us and can't be inspected via third person science. Not convinced of this. There are known neural correlates of consciousness. That our current brain scanners lack the required resolution to make them inspectable does not prove that they are not inspectable in principle. Qualia are ineffable; you can't explain them using a sufficiently complex English or mathematical sentence. This seems to be a limitation of human language bandwidth/imagination, but not fundamental to what qualia are. Consider the case of the conjoined twins Krista and Tatiana, who share some brain structure and seem to be able "hear" each other's thoughts and see through each other's eyes. Suppose we set up a thought experiment. Suppose that they grow up in a room without color, like Mary's room. Now knock out Krista and show Tatiana something red. Remove the red thing before Krista wakes up. Wouldn't Tatiana be able to communicate the experience of red to her sister? That's an effable quale! And if they can do it, then in principle, so could you, with a future brain-computer interface. Really, communicating at all is a transfer of experience. We're limited by common ground, sure. We both have to be speaking the same language, and have to have enough experience to be able to imagine the other's mental state. Qualia are intrinstic; you can't construct a quale if you had the right set of particles. Again, not convinced. Isn't your brain made of particles? I construct qualia all the time just by thinking about it. (It's called "imagination".) I don't see any reason in principle why this could not be done externally to the brain either.

The Tatiana and krista experiment is quite interesting but stretches the concept of communication to it's limits. I am inclined to say that having a shared part of your conciousness is not communication in the same way that sharing a house is not traffic. It does strike me that communication involves directed construction of thoughts and it's easy to imagine that the scope of what this construction is capable would be vastly smaller than what goes on in the brain in other processes. Extending the construction to new types of thoughts might be a s... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

1Slider5h I have a previous high impliciation uncertainty about this (that would be a crux?). " you can't accelerate enough to turn around " seems false to me. The mathematical rotation seems like it ought to exist. The prevoius reasons I thought such a mathematical rotation would be impossible I have signficantly less faith in. If I draw a unit sphere analog in spacetime having a visual observation from the space-time diagram drawn on euclid paper is not sufficient to conclude that the future cone is far from past cone. And thinking that a sphere is "all within r distance" it would seem it should be continuous and simply connected under most instances. I think there also should exist a transformation that when repeated enough times returns to the original configuration. And I find it surprising that a boost like transformation would fail to be like that if it is a rotation analog. I have started to believe that the standrd reasoning why you can't go faster than light relies on a kind of faulty logic. With normal euclidean geometry it would go like: there is a maximum angle you can reach by increasing the y-coordinate and slope is just the ratio of x to y so at that maximum y maximum slope is reached so maximum angle that you can have is 90 degrees. So if you try to go at 100 degrees you have lesser y and are actually going slower. And in a way 90 degrees is kind of the maximum amount you can point in another direction. But normally degrees go up to 180 or 360 degrees. In the relativity side c is the maximum ratio but that is for coordinate time. If somebodys proper time would start pointing in a direction that would project negatively on the coordinate time axis the comparison between x per coordinate time and x per proper time would become significant. There is also a trajectory which seems to be timelike in all segments. A=(0,0,0,0),(2,1,0,0),B=(4,2,0,0),(2,3,0,0),C=(0,4,0,0),(2,5,0,0),D=(4,6,0,0). It would seem awfully a lot like the "corner" A B C would be of equal ma
1Matthew Barnett7h I agree I would not be able to actually accomplish time travel. The point is whether we could construct some object in Minkowski space (or whatever General Relativity uses, I'm not a physicist) that we considered to be loop-like. I don't think it's worth my time to figure out whether this is really possible, but I suspect that something like it may be. Edit: I want to say that I do not have an intuition for physics or spacetime at all. My main reason for thinking this is possible is mainly that I think my idea is fairly minimal: I think you might be able to do this even in R^3.
Computational Model: Causal Diagrams with SymmetryΩ
392d3 min readΩ 13Show Highlight

Consider the following program:

    if n == 0:
        return 1
    return n * f(n-1)

Let’s think about the process by which this function is evaluated. We want to sketch out a causal DAG showing all of the intermediate calculations and the connections between them (feel free to pause reading and try this yourself).

Here’s what the causal DAG looks like:

Each dotted box corresponds to one call to the function f. The recursive call in f becomes a symmetry in the causal diagram: the DAG consists of an infinite sequence of copies of the same subcircuit.

More generally, we can represent any Tu

... (Read more)
2Vanessa Kosoy6h Hmm, no, not really. The quantum fields follow "causality" in some quantum sense (roughly speaking, operators in spacelike separations commute, and any local operator can be expressed in terms of operators localized near the intersection of its past light-cone with any spacelike hypersurface), which is different from the sense used in causal DAGsa (in fact you can define "quantum causal DAGs" which is a different mathematical object). Violation of Bell's inequality precisely means that you can't describe the system by a causal DAG. If you want to do the MWI, then the wavefunction doesn't even decompose into data that can be localized.
2johnswentworth5h Violation of Bell's inequality precisely means that you can't describe the system by a causal DAGNo, violation of Bell's inequality means you can't describe the system by causal interactions among particles and measurements. If we stop thinking about particles and "measurements" altogether, and just talk about fields, that's not an issue. As you say, under MWI, the wavefunction doesn't even decompose into data that can be localized. So, in order to represent the system using classical causal diagrams, the "system state" has to contain the whole wavefunction. As long as we can write down an equation for the evolution of the wavefunction over time, we have a classical causal model for the system. Quantum causal models are certainly a much cleaner representation, in this case, but classical causal models can still work - we just have to define the "DAG vertices" appropriately.
2Vanessa Kosoy5h It doesn't really have much to do with particles vs. fields. We talk about measurements because measurements are the thing we actually observe. It seems strange to say you can model the world as a causal network if the causal network doesn't include your actual observations. If you want to choose a particular frame of reference and write down the wavefunction time evolution in that frame (while ignoring space) then you can say it's a causal network (which is just a linear chain, and deterministic at that) but IMO that's not very informative. It also loses the property of having things made of parts, which AFAIU was one of your objectives here.

The wavefunction does have plenty of internal structure, that structure just doesn't line up neatly with space. It won't just be a linear chain, and it will be made of "parts", but those parts won't necessarily line up neatly with macroscopic observations/objects.

And that's fine - figuring out how to do ontology mapping between the low-level "parts" and the high-level "parts" is a central piece of the problem. Not being able to directly observe variables in the low-level causal diagram is part of that. If w... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

(Or, is coordination easier in a long timeline?)

It seems like it would be good if the world could coordinate to not build AGI. That is, at some point in the future, when some number of teams will have the technical ability to build and deploy and AGI, but they all agree to voluntarily delay (perhaps on penalty of sanctions) until they’re confident that humanity knows how to align such a system.

Currently, this kind of coordination seems like a pretty implausible state of affairs. But I want to know if it seems like it becomes more or less plausible as time passes.

The following is my initial thi... (Read more)

This is a really good example of a possible cultural technological change that would alter the coordination landscape substantially. Thanks.

3Raemon5h FYI, here's a past Paul Christiano exploration of this topic [] : Anyway, I did say that I thought there were lots of plausible angles, so I can try to give one. This is very off-the-cuff, it’s not a topic that I have yet thought about much though I expect to at some point.Example: tagging advanced technologyLet’s say that a technology is “basic” if it is available in 2016; otherwise we say it is “advanced.” We would like to:1. Give individuals complete liberty when dealing with basic technology.2. Give individuals considerable liberty when dealing with advanced technology.3. Prevent attackers from using advanced technologies developed by law-abiding society in order to help do something destructive .We’ll try to engineer a property of being “tagged,” aiming for the following desiderata:1. All artifacts embodying advanced technology, produced or partly produced by law-abiding citizens, are tagged.2. All artifacts produced using tagged artifacts are themselves tagged.3. Tagged artifacts are not destructive (in the sense of being much more useful for an agent who wants to destroy).Property #1 is relatively easy to satisfy, since the law can require tagging advanced technology. Ideally tagging will be cheap and compatible with widely held ethical ideals, so that there is little incentive to violate such laws. The difficulty is achieving properties #2 and #3 while remaining cheap / agreeable.The most brutish way to achieve properties #2 and #3 is to have a government agency X which retains control over all advanced artifacts. When you contribute an artifact to X they issue you a title. The title-holder can tell X what to do with an advanced artifact, and X will honor those recommendations so long as (1) the proposed use is not destructive, and (2) the proposed use does not conflict with X’s monopoly on control of advanced artifacts. The title-holder is responsible for bearing the
2rohinmshah5h In theory, never (either hyperbolic time discounting is a bias, and never "should" be done, or it's a value, but one that longtermists explicitly don't share). In practice, hyperbolic time discounting might be a useful heuristic, e.g. perhaps since we are bad at thinking of all the ways that our plans can go wrong, we tend to overestimate the amount of stuff we'll have in the future, and hyperbolic time discounting corrects for that.
1MakoYass5h I'm a trained rationalistWhat training process did you go through? o.o My understanding is that we don't really know a reliable way to produce anything that could be called a "trained rationalist", a label which sets impossibly high standards (in the view of a layperson) and is thus pretty much unusable. (A large part of becoming an aspiring rationalist involves learning how any agent's rationality is necessarily limited, laypeople have overoptimistic intuitions about that)
Actually updating
431d4 min readShow Highlight

Actually updating can be harder than it seems. Hearing the same advice from other people and only really understand it the third time (though internally you felt like you really understood the first time) seems inefficient. Having to give yourself the same advice or have the same conversation with yourself over and over again also seems pretty inefficient. Recently, I’ve had significant progress with actually causing internal shifts, and the advice.. Well, you’ve probably heard it before. But hopefully, this time you’ll really get it.

Signs you might not be actually updating.

  • You do some focus
... (Read more)
1jmh6h Along the same lines as TurnTrout, I was wondering about the abstraction versus specific situation. I am not asking that any share anything they would not be comfortable with. However, I do think abstraction from oneself in the analysis can just be another one of the protection mechanisms that can be used to allow us to appear to be making progress while while still avoiding the underlying truth driving our behaviors. That said, I think Sara offers some very good items to consider. Okay, this next bit is not directly related but seems implicit in the posting, and other posts I've read here. Does the LW community tend to see the human mind and "person" as a collection of entities/personalities/agents/thinking processes? Or am I jumping to some completely absurd conclusion on that?
Okay, this next bit is not directly related but seems implicit in the posting, and other posts I've read here. Does the LW community tend to see the human mind and "person" as a collection of entities/personalities/agents/thinking processes? Or am I jumping to some completely absurd conclusion on that?

There are some LWers who think that way, and others who don't. (Among the people who find it a useful model, AFAICT it's usually treated more as a hypothesis to consider and/or fake-framework that is sometimes useful. This sequence is... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

2Alexei7h []
Epistemic Spot Check: The Fate of Rome (Kyle Harper)
186h4 min readShow Highlight


Epistemic spot checks are a series in which I select claims from the first few chapters of a book and investigate them for accuracy, to determine if a book is worth my time. This month’s subject is The Fate of Rome, by Kyle Harper, which advocates for the view that Rome was done in by climate change and infectious diseases (which were exacerbated by climate change).

This check is a little different than the others, because it arose from a collaboration with some folks in the forecasting space. Instead of just reading and evaluating claims myself, I took claims from the book and mad... (Read more)

What do the probability distributions listed below the claims mean specifically?

Embedded Naive BayesΩ
132d2 min readΩ 6Show Highlight

Suppose we have a bunch of earthquake sensors spread over an area. They are not perfectly reliable (in terms of either false positives or false negatives), but some are more reliable than others. How can we aggregate the sensor data to detect earthquakes?

A “naive” seismologist without any statistics background might try assigning different numerical scores to each sensor, roughly indicating how reliable their positive and negative results are, just based on the seismologist’s intuition. Sensor i gets a score ...

4johnswentworth6h Here's the use case I have in mind. We have some neural network or biological cell or something performing computation. It's been optimized via gradient descent/evolution, and we have some outside-view arguments saying that optimal reasoning should approximate Bayesian inference. We also know that the "true" behavior of the environment is causal - so optimal reasoning for our system should approximate Bayesian reasoning on some causal model of the environment. The problem, then, is to go check whether the system actually is approximating Bayesian reasoning over some causal model, and what that causal model is. In other words, we want to check whether the system has a particular causal model (e.g. a Naive Bayes model) of its input data embedded within it. What do you imagine "embedded" to mean?
4rohinmshah6h I usually imagine the problems of embedded agency [] (at least when I'm reading LW/AF), where the central issue is that the agent is a part of its environment (in contrast to the Cartesian model, where there is a clear, bright line dividing the agent and the environment). Afaict, "embedded Naive Bayes" is something that makes sense in a Cartesian model, which I wasn't expecting. It's not that big a deal, but if you want to avoid that confusion, you might want to change the word "embedded". I kind of want to say "The Intentional Stance towards Naive Bayes", but that's not right either.

Ok, that's what I was figuring. My general position is that the problems of agents embedded in their environment reduce to problems of abstraction, i.e. world-models embedded in computations which do not themselves obviously resemble world-models. At some point I'll probably write that up in more detail, although the argument remains informal for now.

The immediately important point is that, while the OP makes sense in a Cartesian model, it also makes sense without a Cartesian model. We can just have some big computation, and pick a little chunk o... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

4johnswentworth7h I am indeed leaving out some assumptions, mainly because I am not yet convinced of which assumptions are "right". The simplest assumption - used by Aczel - is that G and g are monotonic. But that's usually chosen more for mathematical convenience than for any principled reason, as far as I can tell. We certainly want some assumptions which rule out the trivial solution, but I'm not sure what they should be.
When do utility functions constrain?Ω
141d7 min readΩ 9Show Highlight

The Problem

This post is an exploration of a very simple worry about the concept of utility maximisers - that they seem capable of explaining any exhibited behaviour. It is one that has, in different ways, has been brought up many times before. Rohin Shah, for example, complained that the behaviour of everything from robots to rocks can be described by utility functions. The conclusion seems to be that being an expected utility maximiser tells us nothing at all about the way a decision maker acts in the world - the utility function does not constrain. This clashes with arguments that suggest, ... (Read more)

For the record, the VNM theorem is about the fact that you are maximizing expected utility. All three of the words are important, not just the utility function part. The biggest constraint that the VNM theorem applies is that, assuming there is a "true" probability distribution over outcomes (or that the agent has a well-calibrated belief over outcomes that captures all information it has about the environment), the agent must choose actions in a way consistent with maximizing the expectation of some real-valued function of the outcome, which doe... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

6Vanessa Kosoy16h The problem can be ameliorated by constraining to instrumental reward functions [] . This gives us agents that are, in some sense, optimizing the state of the environment rather than an arbitrary function of their own behavior. I think this is a better model of what it means to be "goal-directed" than classical reward functions. Another thing we can do is just applying Occam's razor, i.e requiring the utility function (and prior) to have low description complexity. This can be interpreted as, taking the intentional stance towards a system is only useful if it results in compression.
1Pattern10h Those seem to be roughly the same thing - knowing about an environment allows us greater understanding/ability to predict agents in the environment.
Tabooing 'Agent' for Prosaic AlignmentΩ
422d6 min readΩ 20Show Highlight

Strongly agree with this, I think this seems very important.

Logical Optimizers Ω
112d2 min readΩ 5Show Highlight

Epistemic status: I think the basic Idea is more likely than not sound. Probably some mistakes. Looking for sanity check.

Black box description

The following is a way to Foom an AI while leaving its utility function and decision theory as blank spaces. You could plug any uncomputable or computationally intractable behavior you might want in, and get an approximation out.

Suppose I was handed a hypercomputer and allowed to run code on it without worrying about mindcrime, then the hypercomputer is removed, allowing me to keep 1Gb of data from the computations. Then I am handed a magic human utility... (Read more)

Ah, sorry, I misread the terminology. I agree.

Davis_Kingsley's Shortform
68dShow Highlight

I wasn't linking to that for the 5/5 star rating - there are zero votes.

If we're to believe the Philosophical Transactions of the Royal Society, or the Copenhagen Consensus Center, or apparently any of the individual geoengineering researchers who've modelled it, it's possible to halt all warming by building a fleet of autonomous wind-powered platforms that do nothing more sinister than spraying seawater into the air, in a place no more ecologically sensitive than the open ocean, and for no greater cost than 10 billion USD

(edit: I'm not sure where the estimate of 10B came from. I saw the estimate of 9B in a lot of news reports relating to CCC, ... (Read more)

The 2015 report "Climate Intervention: Reflecting Sunlight to Cool Earth" says existing instruments aren't precise enough to measure albedo change from such a project, and measuring its climate impact is even more tricky. That also makes small-scale experimentation difficult. Basically you'd have to go to 100% and then hope that it worked. As someone who ran many A/B tests, that squicks me out, I wouldn't press the button until we had better ways to measure the impact.

This post by Alex Berger of OpenPhil outlines some shifts in thinking at OpenPhil, about what bar they set for their grantmaking. It seemed noteworthy...

  • as potentially relevant to the Drowning Children are Hard to Find discussion.
  • as an update on how OpenPhil thinks about making grants relating to US policy. (I think the intended thesis of the post was 'It's harder than we thought for US Giving to outperform Givewell Top Charities.')

Americans giving random Americans dollars as "null hypothesis."

[note: I'm not 100% sure I understood this framework, but here's ... (Read more)

Tetraspace Grouping's Shortform
223dShow Highlight

Imagine two prediction markets, both with shares that give you $1 if they pay out and $0 otherwise.

One is predicting some event in the real world (and pays out if this event occurs within some timeframe) and has shares currently priced at $X.

The other is predicting the behaviour of the first prediction market. Specifically, it pays out if the price of the first prediction market exceeds an upper threshhold $T before it goes below a lower threshhold $R.

Is there anything that can be said in general about the price of the second prediction market? For example... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

[Question]Is LW making progress?
191d1 min readShow Highlight

Some people have an intuition that with free exchange of ideas, the best ones will eventually come out on top. I'm less optimistic, so I ask if that's really happening.

The alternative would be to have the same people talking about the same problems without accumulating anything. People would still make updates, but some would update in opposite directions, leaving the total distribution of ideas in the community largely unchanged. There would be occasional great ideas, but they would soon get buried in the archives, without leaving much of an impact. I have some hope we're not l... (Read more)

I've been lurking on LW for many years, and overall, my impression is that there's been steady progress. At the end of a very relevant essay from Scott, way back in 2014, he states:

I find this really exciting. It suggests there’s this path to be progressed down, that intellectual change isn’t just a random walk. Some people are further down the path than I am, and report there are actual places to get to that sound very exciting. And other people are around the same place I am, and still other people are lagging behind me. But when I look back at where w

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post
24Answer by Raemon10h There's some debate about which things are "improvements" as opposed to changes. It's varied a bit which of these have happened directly on LessWrong, but things that seem like improvements to me, which I now think of as important parts of the LessWrong idea ecosystem include: * Updates on decision theory * seem like the clearest example of intellectual progress on an idea that happened "mostly on LessWrong", as opposed to mostly happening in private and then periodically being written up on LessWrong afterwards. * Sequences were written pre-replication crisis. * At least some elements were just wrong due to that. (More recent editing passes on Rationality A-Z have removed those, from what I recall. For example, it no longer focuses on the Robber Caves Experiment) * AI Landscape has evolved * During the sequences days, a lot of talk about how AI was likely to develop was more in the "speculation" phase. By now we've seen a lot of concrete advances in the state of the art, which makes for more concrete and different discussions of how things are likely to play out and what good strategies are to address it. Shift from specific biases towards general mental integration/flexible skillsets In the Sequences Days, a lot of discussion focused on "how do we account for particular biases." There has been some shift away from this overall mindset, because dealing with individual biases mostly isn't that useful. There are particular biases like 'confirmation' and 'scope insensitivity' that still seem important to address directly, but it used to be more common to, say, read through Wikipedia's list of cognitive biases [] and address each one) Instead there's a bit more of a focus on how to integrate your internal mental architecture in such a way that you can notice biases/motivated thinking/etc and address it flexibly. In particular, if dialog with yourself about why you don't seem to b
2zulupineapple14h The worst case scenario is if two people both decide that a question is settled, but settle it in opposite ways. Then we're only moving from a state of "disagreement and debate" to a state of "disagreement without debate", which is not progress.
2zulupineapple14h I appreciate the concrete example. I was expecting more abstract topics, but applied rationality is also important. Double Cruxes pass the criteria of being novel and the criteria of being well known. I can only question if they actually work or made an impact (I don't think I see many examples of them in LW), and if LW actually contributed to their discovery (apart from promoting CFAR).
Integrity and accountability are core parts of rationality
1321mo5 min readShow Highlight

Epistemic Status: Pointing at early stage concepts, but with high confidence that something real is here. Hopefully not the final version of this post.

When I started studying rationality and philosophy, I had the perspective that people who were in positions of power and influence should primarily focus on how to make good decisions in general and that we should generally give power to people who have demonstrated a good track record of general rationality. I also thought of power as this mostly unconstrained resource, similar to having money in your bank account, and that we should make sure ... (Read more)

(Kind of brought to mind The Godfather, which happens to be the book my husband had me read to explain the familial dynamics in the household. What can I say, it works. At least until people start going senile.)

Load More