## LESSWRONGLW

(The link for the bluetooth keyboard from your blog is broken / or the keyboard is missing)

Maybe the V1 dopamine receptors are simply useless evolutionary leftovers (perhaps it's easier from a developmental perspective)

1Steven Byrnes1y
LOL! The ultimate cop-out answer!! Not that it's necessarily wrong. But I would be very surprised if that were the correct answer. My vague impression is that there's an awful lot of genetic micro-management of cell types and receptors and so on for different areas of cortex. So "not expressing a receptor in a cortical area where it's unused" is (I assume) very easy evolutionarily, and these dopamine receptors are in lots of mammal species I think. Also, "I'm confused about this" has a pretty high prior for me. I don't feel obligated to go looking very hard for ways for that not to be true. :-P But thanks for the comment :)

A taxonomy of objections to AI Risk from the paper:

What sort of epistemic infrastructure do you think is importantly missing for the alignment research community?

7paulfchristiano2y
One category is novel epistemic infrastructure that doesn't really exist in general and would benefit all communities---over the longer term those seem like the most important missing things (but we won't be able to build them straightforwardly / over the short term and they won't be built for the alignment community in particular, they are just things that are missing and important and will eventually be filled in). The most salient instances are better ways of dividing up the work of evaluating arguments and prioritizing things to look at, driven by reputation or implicit predictions about what someone will believe or find useful. In general for this kind of innovation I think that almost all of the upside comes from people copying the small fraction of successful instances (each of which likely involves more work and a longer journey than could be justified for any small group). The other category is stuff that could be set up more quickly / has more of a reference class. I don't really have a useful answer for that, though I'm excited for eventually developing something a bit more like academic workshops that serve a community with a shared sense of the problem and who actually face similar day-to-day difficulties. I think this hasn't really been the case for attempts at literal academic workshops; I expect it to probably grow out of coordination between alignment efforts at ML labs.
4paulfchristiano2y
I'm excited for people to get good at building tools to help with open-ended tasks that feel a bit more like "wisdom," I think elicit is a step in that direction. I'm also excited about getting better at applying ML to tasks where we don't really have datasets / eventually where the goal is to aim for superhuman performance, and I think elicit will grow into a good test case for that (and is to some extent right now). I basically think the main question is whether they are / will be able to make an excellent product that helps people significantly (and then whether they are able to keep scaling that up). (Note that I'm a funder / board member.)

What are the best examples of progress in AI Safety research that we think have actually reduced x-risk?

(Instead of operationalizing this explicitly, I'll note that the motivation is to understand whether doing more work toward technical AI Safety research is directly beneficial as opposed to mostly irrelevant or having second-order effects. )

9paulfchristiano2y

The (meta-)field of Digital Humanities is fairly new. TODO: Estimating its success and its challenges would help me form a stronger opinion on this matter.

3ozziegooen2y
I think in a more effective world, "Digital Humanities" would just be called "Humanities" :)

One project which implements something like this is 'Circles'. I remember it was on hold several years ago but seems to be running now - link

1MikkW2y
Thanks for the link! Looks promising. My username there is also MikkW. Let this be a standing offer that anybody who wants me to do something for them, I will be happy to do it in return for circles [assuming that the system ends up working well. Edit: which it currently isn't]

I think that generally, skills (including metacognitive skills) don't transfer that well between different domains and it's best to practice directly. However, games also give one better feedback loops and easier access to mentoring, so the room for improvement might be larger.

A meta-analysis on transfer from video games to cognitive abilities saw small or null gains:

The lack of skill generalization from one domain to different ones—that is, far transfer—has been documented in various fields of research such as working memory training, music, brain t

...
1Stuart Anderson2y
-

Thanks for the concrete examples! Do you have relevant references for these at hand? I could imagine that there might be better ways to solve these issues, or that they somehow mostly cancel out or relatively low problems, so I'm interested to see relevant arguments and case studies.

2ChristianKl2y
I haven't read through the economics literature on rent control myself, the above is just from internet and meatspace policy discussions.

I don't think that operationalizing exactly what I mean by a consensus would help a lot. My goal here is to really understand how certain I should be about whether rent control is a bad policy (and what are the important cases where it might not be a good policy, such as the examples ChristianKl gave below).

That's right, and a poor framing on my part 😊

I am interested in a consensus among academic economists, or in economic arguments for rent control. Specifically because I'm mostly interested in utilitarian reasoning, but I'd also be curious about what other disciplines have to say.

This sounds like an amazing project and I find it very motivating. Especially the questions around how we'd like future epistemics to be and prioritizing different tools/training.

As I'm sure you are aware, there is a wide academic literature around many related aspects including the formalization of rationality, descriptive analysis of personal and group epistemics, and building training programs. If I understand you correctly, a GPI analog here would be something like an interdisciplinary research center that attempts to find general frameworks with which...

I think that M only prints something after converging with Adv, and that Adv does not print anything directly to H

2avturchin2y
Yes, but all what I said could be just a convergent prediction of M. Not the real human runs out of the room, but M predicted that its model human of H' will leave the room.

Abram, did you reply to that crux somewhere?

I agree that hierarchy can be used only sparingly and still be very helpful. Perhaps just nesting under the core tags, or something similar.

On special posts where that does not seem to be the case that the hierarchy holds, people can still downvote the parent tag. That is annoying, but may reduce work overall.

Also, navigating up/down with arrow keys and pressing enter should allow choice of tags with keyboard only.

Some thoughts:

1. More people would probably rank tags if it could be done directly through the tag icon instead of using the pop-up window.

2. When searching for new tags, I'd like them sorted probably by relevance (say, some preference for: being a prefix, being a popular tag, alphabetical ordering).

3. When browsing through all posts tagged with some tag, I'd maybe prefer to see higher karma posts first, or to have it factored in the ordering.

4. Perhaps it might be easier to have a hierarchy of tags - so that voting for Value learning also votes for AI Alignment say

Also, navigating up/down with arrow keys and pressing enter should allow choice of tags with keyboard only.
7ESRogs3y
I have a question related to this -- I just went to find That Alien Message [https://www.lesswrong.com/posts/5wMcKNAwB6X4mp9og/that-alien-message] to mark it with the AI Alignment [https://www.lesswrong.com/tag/ai-alignment] tag, but then found it was already marked with the AI Boxing (Containment) [https://www.lesswrong.com/tag/ai-boxing-containment] tag. If Boxing is meant to be a subtag under the AI Alignment category, then I would think there's nothing more to do, but when I go to the Boxing page, there doesn't seem to be anything explicitly linking it to the AI Alignment page. And That Alien Message doesn't show up as one of the posts in the list on the AI Alignment page, but I'd think it should, since I consider it part of the AI safety "canon" as one of the best sources for building a particular intuition. So, should That Alien Message be explicitly tagged with AI Alignment? Or Should AI Boxing (Containment) be explicitly linked to AI Alignment somehow? Or does how I'm thinking about how all this should work differ from what the LW team is thinking?

If you wouldn't think that AI researchers care that much about destroying the world, what else makes you optimistic that there will be enough incentives to ensure alignment? Does it all go back to people in relevant power generally caring about safety and taking it seriously?

2Rohin Shah3y
Well, before you build superintelligent systems that could destroy the world, you probably build subhuman AI systems that do economically useful tasks (e.g. a personal assistant that schedules meetings, books flights, etc). There's an economic incentive to ensure that those AI systems are doing what their users want, which in turn looks like it incentivizes at least outer alignment work, and probably also inner alignment (to the extent that it's a problem).

I think that the debate around the incentives to make aligned systems is very interesting, and I'm curious if Buck and Rohin formalize a bet around it afterwards.

I feel like Rohin point of view compared to Buck is that people and companies are in general more responsible, in that they are willing to pay extra costs to ensure safety - not necessarily out of a solution to a race-to-the-bottom situation. Is there another source of disagreement, conditional on convergence on the above?

2Rohin Shah3y
I think there are more, though I don't know what they are. For example, I think that people will have incentives to ensure alignment (most obviously, AI researchers don't want to destroy the world), whereas I would guess Buck is less optimistic about it.

Is GPI / forethought foundation missing?

No, I was simply mistaken. Thanks for correcting my intuitions on the topic!

If this is the case, this seems more like a difference in exploration/exploitation strategies.

We do have positively valenced heuristics for exploration - say curiosity and excitement

I think that the intuition for this argument comes from something like a gradient ascent under an approximate utility function. The agent will spend most of it's time near what it perceives to be a local(ish) maximum.

So I suspect the argument here is that Optimistic Errors have a better chance of locking into a single local maximum or strategy, which get's reinforced enough (or not punished enough), even though it is bad in total.

Pessimistic Errors are ones in which the agent strategically avoids locking into maxima, perhaps by Hedonic Adaptatio...

If this is the case, this seems more like a difference in exploration/exploitation strategies. We do have positively valenced heuristics for exploration - say curiosity and excitement

And kudos for the neat explanation and an interesting theoretical framework :)

2MichaelA3y
Thanks! Do you mean that you expect that at each point, the preference vector will be almost entirely pointed in the x direction, or almost entirely pointed in the y direction, rather than being pointed in a "mixed direction"? If so, why would that be your expectation? To me, it seems very intuitive that people often care a lot about both more wealth and more security, or often care a lot about both the size and the appearance of their car. And in a great deal of other examples, people care about many dimensions of a particular thing/situation at the same time. Here's two things you might mean that I might agree with: "If we consider literally any possible point in the vector field, and not just those that the agent is relatively likely to find itself in, then at most such points the vector will be almost entirely towards one of the axes. This is because there are diminishing returns to most things, and they come relatively early. So if we're considering, for example, not just $0 to$100 million, but $0 to infinite$s, at most such points the agent will probably care more about whatever the other thing is, because there's almost no value in additional money." "If we consider literally any possible dimensions over which an agent could theoretically have preferences, and not just those we'd usually bother to think about, then most such dimensions won't matter to an agent. For example, this would include as many dimensions for the number of specks of dust there are on X planet as there are planets. So the agent's preferences will largely ignore most possible dimensions, and therefore if we choose a random pair of dimensions, if one of them happens to be meaningful the preference will almost entirely point towards that one." (It seems less likely that that's what you meant, though it does seem a somewhat interesting separate point.)

I'd expect the preference at each point to mostly go in the direction of either axis.

However, this analysis should be interesting in non-cooperative games where the vector might represent a mixed strategy, with amplitude the expected payoff perhaps.

And kudos for the neat explanation and an interesting theoretical framework :)

I may be mistaken. I tried reversing your argument, and I bold the part that doesn't feel right.

Optimistic errors are no big deal. The agent will randomly seek behaviours that get rewarded, but as long as these behaviours are reasonably rare (and are not that bad) then that’s not too costly.
But pessimistic errors are catastrophic. The agent will systematically make sure not to fall into behaviors that avoid high punishment, and will use loopholes to avoid penalties even if that results in the loss of something really good. So even if these erro
...
3paulfchristiano3y
I think this part of the reversed argument is wrong: Even if the behaviors are very rare, and have a "normal" reward, then the agent will seek them out and so miss out on actually good states.
Pessimistic errors are no big deal. The agent will randomly avoid behaviors that get penalized, but as long as those behaviors are reasonably rare (and aren’t the only way to get a good outcome) then that’s not too costly.

But optimistic errors are catastrophic. The agent will systematically seek out the behaviors that receive the high reward, and will use loopholes to avoid penalties when something actually bad happens. So even if these errors are extremely rare initially, they can totally mess up my agent.

I'd love to see someone analyze...

I think that the intuition for this argument comes from something like a gradient ascent under an approximate utility function. The agent will spend most of it's time near what it perceives to be a local(ish) maximum. So I suspect the argument here is that Optimistic Errors have a better chance of locking into a single local maximum or strategy, which get's reinforced enough (or not punished enough), even though it is bad in total. Pessimistic Errors are ones in which the agent strategically avoids locking into maxima, perhaps by Hedonic Adaptation as Dagon suggested. This may miss big opportunities if there are actual, territorial, big maxima, but that may not be as bad (from a satisficer point of view at least).
3Dagon3y
I suspect it all comes down to modeling of outcome distributions. If there's a narrow path to success, then both biases are harmful. If there are a lot of ways to win, and a few disasters, then optimism bias is very harmful, as it makes the agent not loss-averse enough. If there are a lot of ways to win a little, and few ways to win a lot, then pessimism bias is likely to miss the big wins, as it's trying to avoid minor losses. I'd really enjoy an analysis focused on your conditions (maximize vs satisfice, world symmetry) - especially what kinds of worlds and biased predictors lead satisficing to get better outcomes than optimizing.
3paulfchristiano3y
I would think it would hold even in that case, why is it clearly wrong?

I find the classification of the elements of robust agency to be helpful, thanks for the write up and the recent edit.

I have some issues with Coherence and Consistency:

First, I'm not sure what you mean by that so I'll take my best guess which in its idealized form is something like: Coherence is being free of self contradictions and Consistency is having the tool to commit oneself to future actions. This is going by the last paragraph of that section-

There are benefits to reliably being able to make trades with your future-self, and with other
...
3Raemon3y
I definitely did not intend to make either an airtight or exhaustive case here. I think coherence and consistency are good for a number of reasons, and I included the ones I was most confident in, and felt like I could explain quickly and easily. (The section was more illustrative than comprehensive) This response will not lay out the comprehensive case, but will try to answer my current thoughts on some specific questions. (I feel a desire to stress that I still don't consider myself an expert or even especially competent amature on this topic) That's actually not what I was going for – coherence can be relevant in the moment (if I had to pick, my first guess is that coherence is more costly in the moment and inconsistency is more costly over time, although I'm not sure I was drawing a strong distinction between them) If you have multiple goals that are at odds, this can be bad in the immediate moment, because instead of getting to focus on one thing, you have to divide up your attention (unnecessarily) between multiple things that are at odds. This can be stressful, it can involve cognitive dissonance which makes it harder to think, and it involves wasted effort

Non-Bayesian Utilitarian that are ambiguity averse sometimes need to sacrifice "expected utility" to gain more certainty (in quotes because that need not be well defined).

Thank you very much! Excited to read it :)

If it's simple, is it possible to publish also a kindle version?

1AABoyles3y
I would also like to convert it to a more flexible e-reader format. It appears to have been typeset using LATEX... Would it be possible to share the source files?

Thinking of stocks, I find it hard to articulate how this pyramid might correspond to predicting market value of a company. To give it a try:

Traders predict the value of a stock.

The stock is evaluated at all times by the market buy\sell prices. But that is self referential and does not encompass "real" data. The value of a stock is "really evaluated" when a company distributes dividends, goes bankrupt, or anything that collapses a stock to actual money.

The ontology is the methods by which stocks get actual money.

Foundational understandi...

Emotions and Effective Altruism

I remember reading Nate Soares' Replacing Guilt Series and identifying strongly with the feeling of Cold Resolve described there. I since tried a bit to give it some other words and describe it using familiar-er emotions, but nothing really good.

I think that Liget , an emotion found in an isolated tribe at the philippines, might describe a similar emotion (except the head-throwing part). I'm not sure that I can explain that better than the linked article.

after posting, I have tried to change a link post to a text post. It seemed to be possible when editing the original post, but I have discovered later that the changes were not kept and that the post is still in the link format.

When posting a link post, instead of a text post, it is not clear what would be the result. There is still an option to write text, which appears strictly as text right after submitting, but when the post is viewed (from the search bar) only some portion of the text is visible and there is no indication that this is a link post.

It would be much more comfortable if editing of a post could be done only using the keyboard. For example, when adding a link, apart from defining a keyboard shortcut, it should also be possible to press enter to submit the link. I

...