LessWrong

The vast majority of people would be opposed to the death of themselves, their families and their friends no matter how awesome the artificial systems that caused those deaths turn out to be after their expansion to all the galaxies reachable from Earth. The correctness of their preference would be obvious to them; they wouldn't have to think about it for more than 5 minutes. But I'm guessing it is not obvious to you.

1Quinn4h

sure -- i agree that's why i said "something adjacent to" because it had enough overlap in properties. I think my comment completely stands with a different word choice, I'm just not sure what word choice would do a better job.

2Wei Dai8h

Why do you think these values are positive? I've been pointing out, and I see that Daniel Kokotajlo also pointed out in 2018 that these values could well be negative. I'm very uncertain but my own best guess is that the expected value of misaligned AI controlling the universe is negative, in part because I put some weight on suffering-focused ethics.

2ryan_greenblatt2h

* My current guess is that max good and max bad seem relatively balanced. (Perhaps max bad is 5x more bad/flop than max good in expectation.) * There are two different (substantial) sources of value/disvalue: interactions with other civilizations (mostly acausal, maybe also aliens) and what the AI itself terminally values * On interactions with other civilizations, I'm relatively optimistic that commitment races and threats don't destroy as much value as acausal trade generates on some general view like "actually going through with threats is a waste of resources". I also think it's very likely relatively easy to avoid precommitment issues via very basic precommitment approaches that seem (IMO) very natural. (Specifically, you can just commit to "once I understand what the right/reasonable precommitment process would have been, I'll act as though this was always the precommitment process I followed, regardless of my current epistemic state." I don't think it's obvious that this works, but I think it probably works fine in practice.) * On terminal value, I guess I don't see a strong story for extreme disvalue as opposed to mostly expecting approximately no value with some chance of some value. Part of my view is that just relatively "incidental" disvalue (like the sort you link to Daniel Kokotajlo discussing) is likely way less bad/flop than maximum good/flop.

Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)

Ruby, RobertM

For the last month, @RobertM and I have been exploring the possible use of recommender systems on LessWrong. Today we launched our first site-wide experiment in that direction.

(In the course of our efforts, we also hit upon a frontpage refactor that we reckon is pretty good: tabs instead of a clutter of different sections. For now, only for logged-in users. Logged-out users see the "Latest" tab, which is the same-as-usual list of posts.)

Why algorithmic recommendations?

A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm^[1], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be...

(See More – 965 more words)

2Tamsin Leake3h

I'm generally not a fan of increasing the amount of illegible selection effects. On the privacy side, can lesswrong guarantee that, if I never click on Recommended, then recombee will never see an (even anonymized) trace of what I browse on lesswrong?

2Ruby9m

Typo? Do you mean "click on Recommended"? I think the answer is no, in order to have recommendations for individuals (and everyone), they have browsing data. 1) LessWrong itself doesn't aim for a super high degree of infosec. I don't believe our data is sensitive to warrant large security overhead. 2) I trust Recombee with our data as our trust ourselves.

4niplav4h

I realized I hadn't given feedback on the actual results of the recommendation algorithm. Rating the recommendations I've gotten (from -10 to 10, 10 is best): * My experience using financial commitments to overcome akrasia: 3 * An Introduction to AI Sandbagging: 3 * Improving Dictionary Learning with Gated Sparse Autoencoders: 2 * [April Fools' Day] Introducing Open Asteroid Impact: -6 * LLMs seem (relatively) safe: -3 * The first future and the best future: -2 * Examples of Highly Counterfactual Discoveries?: 5 * "Why I Write" by George Orwell (1946): -3 * My Clients, The Liars: -4 * 'Empiricism!' as Anti-Epistemology: -2 * Toward a Broader Conception of Adverse Selection: 4 * Ambitious Altruistic Software Engineering Efforts: Opportunities and Benefits: 6

Ruby7m20

I'd be interested in a comparison with the Latest tab.

My simple AGI investment & insurance strategy

1mo

TL;DR:

Options traders think it's extremely unlikely that the stock market will appreciate more than 30 or 40 percent over the next two to three years, as it did over the last year. So they will sell you the option to buy current indexes for 30 or 40% above their currently traded value for very cheap.
But slow takeoff, or expectations of one, would almost certainly cause the stock market to rise dramatically. Like many people here, I think institutional market makers are basically not pricing this in, and gravely underestimating volatility as a result, especially for large indexes like VTI which have never moved more than 50% in a single year.
To take advantage of this, instead of buying individual tech stocks, I allocate a sizable chunk of my

...

(See More – 569 more words)

devansh24m10

Does buying shorter-term OTM derivatives each year not work here?

dirk's Shortform

dirk

Viliam28m20

Specific examples would be nice. Not sure if I understand correctly, but I imagine something like this:

You always choose A over B. You have been doing it for such long time that you forgot why. Without reflecting about this directly, it just seems like there probably is a rational reason or something. But recently, either accidentally or by experiment, you chose B... and realized that experiencing B (or expecting to experience B) creates unpleasant emotions. So now you know that the emotions were the real cause of choosing A over B all that time.

(This is p... (read more)

9dirk5h

Sometimes a vague phrasing is not an inaccurate demarkation of a more precise concept, but an accurate demarkation of an imprecise concept

1cubefox1h

Yeah. It's possible to give quite accurate definitions of some vague concepts, because the words used in such definitions also express vague concepts. E.g. "cygnet" - "a young swan".

1dkornai3h

I would say that if a concept is imprecise, more words [but good and precise words] have to be dedicated to faithfully representing the diffuse nature of the topic. If this larger faithful representation is compressed down to fewer words, that can lead to vague phrasing. I would therefore often view vauge phrasing as a compression artefact, rather than a necessary outcome of translating certain types of concepts to words.

My experience using financial commitments to overcome akrasia

105

William Howard

11d

About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it’s relatively simple, you set single tasks which you have to verify you have completed with a photo.

I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest positive change to my life that I can remember. I feel like the category of things which benefit from careful planning and execution over time has completely opened up to me, whereas previously things like this would be largely down to the...

(Continue Reading – 5230 more words)

2Elizabeth37m

I don't think the original comment was a troll, but I also don't think it was a helpful contribution on this post. OP specifically framed the post as their own experience, not a universal cure. Comments explaining why it won't work for a specific person aren't relevant.

kave31m20

I like comments about other users' experiences for similar reasons why I like OP. I think maybe the ideal such comment would identify itself more clearly as an experience report, but I'd rather have the report than not.

1Martin Randall5h

I know a child who often has this reaction to negative consequences, natural or imposed. I'd welcome discussion on what works well for that mindset. I don't have any insight, it's not how my mind works. It seems like very very small consequences can help a bit. Also trying to address the anxiety with OTC supplements like Magnesium Glycinate and lavender oil.

5Fer32dwt34r3dfsz17h

Can you provide any further detail here, i.e. be more specific on origin-stratified-retention rates? (I would appreciate this, even if this might require some additional effort searching)

Nathan Young's Shortform

Nathan Young

James Grugett34m23

We are trying our best to honor mana donations!

If you are inactive you have until the rest of the year to donate at the old rate. If you want to donate all your investments without having to sell each individually, we are offering you a loan to do that.

We removed the charity cap of $10k donations per month, which is going beyond what we previous communicated.

2Nathan Young2h

Austin said they have $1.5 million in the bank, vs $1.2 million mana issued. The only outflows right now are to the charity programme which even with a lot of outflows is only at $200k. they also recently raised at a $40 million valuation. I am confused by running out of money. They have a large user base that wants to bet and will do so at larger amounts if given the opportunity. I'm not so convinced that there is some tiny timeline here. But if there is, then say so "we know that we often talked about mana being eventually worth $100 mana per dollar, but we printed too much and we're sorry. Here are some reasons we won't devalue in the future.."

1James Grugett41m

If we could push a button to raise at a reasonable valuation, we would do that and back the mana supply at the old rate. But it's not that easy. Raising takes time and is uncertain. Carson's prior is right that VC backed companies can quickly die if they have no growth -- it can be very difficult to raise in that environment.

2Nathan Young2h

Austin took his salary in mana as an often referred to incentive for him to want mana to become valuable, presumably at that rate. I recall comments like 'we pay 250 in referrals mana per user because we reckon we'd pay about $2.50' likewise in the in person mana auction. I'm not saying it was an explicit contract, but there were norms.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Fundamental Uncertainty: Chapter 8 - When does fundamental uncertainty matter?

Gordon Seidoh Worley

41m

N.B. This is a chapter in a planned book about epistemology. Chapters are not necessarily released in order. If you read this, the most helpful comments would be on things you found confusing, things you felt were missing, threads that were hard to follow or seemed irrelevant, and otherwise mid to high level feedback about the content. When I publish I'll have an editor help me clean up the text further.

In the previous three chapters we broke apart our notions of truth and knowledge by uncovering the fundamental uncertainty contained within them. We then built back up a new understanding of how we're able to know the truth that accounts for our limited access to certainty. And while it's nice to have this better understanding, you might...

(Continue Reading – 9569 more words)

Gordon Seidoh Worley37m20

Author's note: This chapter took a really long time to write. Unlike previous chapters in the book, this one covers a lot more stuff in less detail, but I still needed to get the details right, so it took a long time to both figure out what I really wanted to say and to make sure I wasn't saying things that I wouldn't upon reflection regret having said because they were based on facts that I don't believe or I had simply gotten wrong.

It's likely still not the best version of this chapter it could be, but at this point I think I've made all the key points I wanted to make here, so I'm publishing the draft now and expect this one to need a lot of love from an editor later on.

Examples of Highly Counterfactual Discoveries?

142

johnswentworth, kromem

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

(See More – 189 more words)

kave40m20

What you probably mean is "completely unexpected", "surprising" or something similar

I think it means the more specific "a discovery that if it counterfactually hadn't happened, wouldn't have happened for a long time". I think this is roughly the "counterfactual" in "counterfactual impact", but I agree not the more widespread one.

It would be great to have a single word for this that was clearer.

5ChristianKl5h

Counterfactual means, that if something would not have happened something else would have happened. It's a key concept in Judea Pearl's work on causality.

3Lukas_Gloor6h

In some of his books on evolution, Dawkins also said very similar things when commenting on Darwin vs Wallace, basically saying that there's no comparison, Darwin had a better grasp of things, justified it better and more extensively, didn't have muddled thinking about mechanisms, etc.

1francis kafka4h

I mean to some extent, Dawkins isn't a historian of science, presentism, yadda yadda but from what I've seen he's right here. Not that Wallace is somehow worse, given that of all the people out there he was certainly closer than the rest. That's about it

Bogdan Ionut Cirstea's Shortform

Bogdan Ionut Cirstea

9mo

1Bogdan Ionut Cirstea8h

Hey Jacques, sure, I'd be happy to chat!

1Bogdan Ionut Cirstea8h

Yeah, I'm unsure if I can tell any 'pivotal story' very easily (e.g. I'd still be pretty skeptical of enumerative interp even with GPT-5-MAIA). But I do think, intuitively, GPT-5-MAIA might e.g. make 'catching AIs red-handed' using methods like in this comment significantly easier/cheaper/more scalable.

2ryan_greenblatt2h

Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of things. I agree that this model might help in performing various input/output experiments to determine what made a model do a given suspicious action.

Bogdan Ionut Cirstea1h10

Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of things.

This was indeed my impression (except for potentially using steering vectors, which I think are mentioned in one of the sections in 'Catching AIs red-handed'), but I think not using any internals might be overconservative / might increase the monitoring / safety tax too much (I think this is probably true more broadly of the current control agenda framing).

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Why algorithmic recommendations?

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA