In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

yanni1d3453
2
I like the fact that despite not being (relatively) young when they died, the LW banner states that Kahneman & Vinge have died "FAR TOO YOUNG", pointing to the fact that death is always bad and/or it is bad when people die when they were still making positive contributions to the world (Kahneman published "Noise" in 2021!).
A strange effect: I'm using a GPU in Russia right now, which doesn't have access to copilot, and so when I'm on vscode I sometimes pause expecting copilot to write stuff for me, and then when it doesn't I feel a brief amount of the same kind of sadness I feel when a close friend is far away & I miss them.
Dictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set. How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.
Novel Science is Inherently Illegible Legibility, transparency, and open science are generally considered positive attributes, while opacity, elitism, and obscurantism are viewed as negative. However, increased legibility in science is not always beneficial and can often be detrimental. Scientific management, with some exceptions, likely underperforms compared to simpler heuristics such as giving money to smart people or implementing grant lotteries. Scientific legibility suffers from the classic "Seeing like a State" problems. It constrains endeavors to the least informed stakeholder, hinders exploration, inevitably biases research to be simple and myopic, and exposes researchers to constant political tug-of-war between different interest groups poisoning objectivity.  I think the above would be considered relatively uncontroversial in EA circles.  But I posit there is something deeper going on:  Novel research is inherently illegible. If it were legible, someone else would have already pursued it. As science advances her concepts become increasingly counterintuitive and further from common sense. Most of the legible low-hanging fruit has already been picked, and novel research requires venturing higher into the tree, pursuing illegible paths with indirect and hard-to-foresee impacts.
habryka5d5120
10
A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time.  We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully. I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.

Popular Comments

Recent Discussion

An entry-level characterization of some types of guy in decision theory, and in real life, interspersed with short stories about them

A concave function bends down. A convex function bends up. A linear function does neither.

A utility function is just a function that says how good different outcomes are. They describe an agent's preferences. Different agents have different utility functions.

Usually, a utility function assigns scores to outcomes or histories, but in article we'll define a sort of utility function that takes the quantity of resources that the agent has control over, and the utility function says how good an outcome the agent could attain using that quantity of resources.

In that sense, a concave agent values resources less the more that it has, eventually barely wanting more resources at...

4Donald Hobson8h
The convex agent can be traded with a bit more than you think.  A 1 in 10^50 chance of us standing back  and giving it free reign of the universe is better than us going down fighting and destroying 1kg as we do. The concave agents are less cooperative than you think, maybe. I suspect that to some AI's, killing all humans now is more reliable than letting them live.  If the humans are left alive, who knows what they might do. They might make the vacuum bomb. Whereas the AI can Very reliably kill them now. 
2mako yass7h
Alternate phrasing, "Oh, you could steal the townhouse at a 1/8billion probability? How about we make a deal instead. If the rng rolls a number lower than 1/7billion, I give you the townhouse, otherwise, you deactivate and give us back the world." The convex agent finds that to be a much better deal, accepts, then deactivates. I guess perhaps it was the holdout who was being unreasonable, in the previous telling.

Or the sides can't make that deal because one side or both wouldn't hold up their end of the bargain. Or they would, but they can't prove it. Once the coin lands, the losing side has no reason to follow it other than TDT. And TDT only works if the other side can reliably predict their actions.

Summary: The post describes a method that allows us to use an untrustworthy optimizer to find satisficing outputs.

Acknowledgements: Thanks to Benjamin Kolb (@benjaminko), Jobst Heitzig (@Jobst Heitzig) and Thomas Kehrenberg (@Thomas Kehrenberg)  for many helpful comments.

Introduction

Imagine you have black-box access to a powerful but untrustworthy optimizing system, the Oracle. What do I mean by "powerful but untrustworthy"? I mean that, when you give an objective function  as input to the Oracle, it will output an element  that has an impressively low[1] value of . But sadly, you don't have any guarantee that it will output the optimal element and e.g. not one that's also chosen for a different purpose (which might be dangerous for many reasons, e.g. instrumental convergence).

What questions can you safely ask the Oracle? Can you use it to...

3Donald Hobson10h
I think that, if you are wanting a formally verified proof of some maths theorem out of the oracle, then this is getting towards actually likely to not kill you.  You can start with m huge, and slowly turn it down, so you get a long list of "no results", followed by a proof. (Where the optimizer only had a couple of bits of free optimization in choosing which proof.)  Depending on exactly how chaos theory and quantum randomness work, even 1 bit of malicious super optimization could substantially increase the chance of doom.  And of course, side channel attacks. Hacking out of the computer. And, producing formal proofs isn't pivotal. 
1Simon Fischer1h
Yes, I believe that's within reach using this technique. This is quite dangerous though if the Oracle is deceptively withholding answers; I commented on this in the last paragraph of this section.

If the oracle is deceptively withholding answers, give up on using it. I had taken the description to imply that the oracle wasn't doing that. 

2EGI10h
"...under the assumption that the subset of dangerous satisficing outputs D is much smaller than the set of all satisficing outputs S, and that we are able to choose a number m such that |D|≪m<|S|." I highly doubt that  D≪S is true for anything close to a pivotal act since most pivotal acts at some point involve deploying technology that can trivially take over the world. For anything less ambitious the proposed technique looks very useful. Strict cyber- and physical security will of course be necessary to prevent the scenario Gwern mentions.

There's a particular kind of widespread human behavior that is kind on the surface, but upon closer inspection reveals quite the opposite. This post is about four such patterns.

 

Computational Kindness

One of the most useful ideas I got out of Algorithms to Live By is that of computational kindness. I was quite surprised to only find a single mention of the term on lesswrong. So now there's two.

Computational kindness is the antidote to a common situation: imagine a friend from a different country is visiting and will stay with you for a while. You're exchanging some text messages beforehand in order to figure out how to spend your time together. You want to show your friend the city, and you want to be very accommodating and make sure...

Just came to my mind that these are things I tend to think of under the heading "considerateness" rather than kindness

Guess I'd agree. Maybe I was anchored a bit here by the existing term of computational kindness. :)

2Mary Chernyshenko3h
What you say doesn't matter as much as what the other person hears. If I were the other person, I would probably wonder why you would add epicycles, and kindness would be just one possible explanation.
1silentbob40m
Fair point. Maybe if I knew you personally I would take you to be the kind of person that doesn't need such careful communication, and hence I would not act in that way. But even besides that, one could make the point that your wondering about my communication style is still a better outcome than somebody else being put into an uncomfortable situation against their will. I should also note I generally have less confidence in my proposed mitigation strategies than in the phenomena themselves. 
Alexander Gietelink Oldenziel

Can you post the superforecaster report that has the 0.12% P(Doom) number. I have not actually read anything of course and might be talking out of my behind.

In any case, there have been several cases where OpenPhil or somebody or other has brought in 'experts' of various ilk to debate the P(Doom), probability of existential risk. [usually in the context of AI]

Many of these experts give very low percentages. One percentage I remember was 0.12 %

In the latest case these were Superforecasters, Tetlock's anointed. Having 'skin in the game' they outperformed the fakexperts in various prediction markets. 

So we should defer to them (partially) on the big questions of x-risk also. Since they give very low percentages that is good. So the argument goes.

Alex thinks these

...

As of two years ago, the evidence for this was sparse. Looked like parity overall, though the pool of "supers" has improved over the last decade as more people got sampled.

There are other reasons to be down on XPT in particular.

In brief

Recently I became interested in what kind of costs were inflicted by iron deficiency,  so I looked up studies until I got tired. This was not an exhaustive search, but the results are so striking that even with wide error bars I found them compelling. So compelling I wrote up a post with an algorithm for treating iron deficiency while minimizing the chance of poisoning yourself. I’ve put the algorithm and a summary of potential gains first to get your attention, but if you’re considering acting on this I strongly encourage you to continue reading to the rest of the post where I provide the evidence for my beliefs.

Tl;dr: If you are vegan or menstruate regularly, there’s a 10-50% chance you are iron deficient. Excess iron...

I recommend to my patients to purchase(in the UK) Ferrous Fumurate (has better bioavailabilty than ferrous sulphate), the more you take the better (upto 3 times a day, you may have GI side effects) and take with 200mg+ of Vitamin C (or fresh orange juice), and don't have tea/coffee/dairy one hour either side of taking it.

(I'm a GP/Family Physician)

1scrollop1h
When people come to us with hair loss for example, the first thing a doctor would do would be to check their bloods, specifically looking for ferritin levels (and other things eg thyroid etc). If the ferritin level is less than 60 then we would recommend increasing our iron intake. Thinking about this logically, one Could say that the usual lower threshold of normal iron(around 20, differss with age/sex and lab) Is too low. I tell this to my patients and recommend ferritin levels above sixty. I recommend that people purchase(in the UK) Ferrous Fumurate (has better bioavailabilty than ferrous sulphate), the more you take the better (upto 3 times a day, you may have GI side effectsA) and take with 200mg+ of Vitamin C (or fresh orange juice), and don't have tea/coffee/dairy one hour either side of taking it.
2Elizabeth10h
To answer your object level question:   1. I could generate evidence at least this good for every claim in human health, including mutually contradictory ones.  2. The book title "mindspan" pattern matches to "shitty science book" 3. the paragraphs quoted pattern match to jumping around between facts, without giving straightforward numbers you can hold in your own mind. Why give percentage of childbearing women below a threshold, but averages for the ultraold? 1. "adding tea to the diet reduces body iron and increases lifespan" really? this is what he thinks of as evidence? 2. "a study of people who ate a Mediterranean-style diet (characterized mainly by less meat and more fish) had larger brains and key brain structures and less atrophy than frequent meat eaters."  lots of potential reasons for this, many of which are areas of deep research 3. Data on the ultraold is useless because there's a good chance most of them are lying about their age. 4. He didn't cite the most relevant information I know of, that regular blood donation improves health in men. Which probably means Alex wasn't done any investigation into this, he just read a few claims some time. 
2Elizabeth10h
Thank you, I appreciate that. I'm about to give a lot of context. This is definitely a little unfair, and subjecting you to anger you are not responsible for. But I do feel like you've opened a can of worms, and it would be meaningful to me for you to put yourself in my shoes, which unfortunately requires a lot of context. The context:  * The mod team[1] and many authors believe that no one is owed a response,. Some people disagree (mostly people who comment much more than they post, but not exclusively). I think the latter is a minority, although it's hard to tell without a proper poll and I don't know how to weight answers.  * Beyond that: because I write about medical stuff means I get a lot of demands for answers I don't have and don't owe people. On one hand, this is kind of inevitable so I don't get mad at people for the first request. On the other hand, people sometimes get really aggressive about getting a definitive answer from me, which I neither owe them nor have the ability to give. One of the biggest predictors of this is how specific the question is. Someone coming in with a lot of gears in their model is usually fun to talk to. I'll learn things, and I can trust that they're treating me as one source of information among many, rather than trying to outsource their judgement. The vaguer a question the more likely it is being asked by someone who is desperate but not doing their own work on the subject, and answering is likely to be costly with benefit to anyone. Your question patternmatched to the second type.  * As you note, I not only had left many comments unresponded-to, but specifically the comments above and below the comment you were referring to (but made me do the work to find). As far as I'm concerned, telling you I couldn't find the comment and giving an overall opinion was going above and beyond. * Which I do because sometimes on LW it pays off, and it looks like it did here, which is heartwarming.  * You say that you find
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

On 16 March 2024, I sat down to chat with New York Times technology reporter Cade Metz! In part of our conversation, transcribed below, we discussed his February 2021 article "Silicon Valley's Safe Space", covering Scott Alexander's Slate Star Codex blog and the surrounding community.

The transcript has been significantly edited for clarity. (It turns out that real-time conversation transcribed completely verbatim is full of filler words, false starts, crosstalk, "uh huh"s, "yeah"s, pauses while one party picks up their coffee order, &c. that do not seem particularly substantive.)


ZMD: I actually have some questions for you.

CM: Great, let's start with that.

ZMD: They're critical questions, but one of the secret-lore-of-rationality things is that a lot of people think criticism is bad, because if someone criticizes you, it hurts your...

1Cornelius Dybdahl8h
The issue at hand is not whether the "logic" was valid (incidentally, you are disputing the logical validity of an informal insinuation whose implication appears to be factually true, despite the hinted connection — that Scott's views on HBD were influenced by Murray's works — being merely probable) The issues at hand are: 1. whether it is a justified "weapon" to use in a conflict of this sort 2. whether the deed is itself immoral beyond what is implied by "minor sin"
3winstonBosan10h
I am not everyone else, but the reason I downvoted on the second axis is because:  * I still don't really understand the avoidant/non-avoidant taxonomy. I am confused when avoidant is both "introverted... and prefer to be alone" while "avoidants... being disturbing to others" when Scott never intended to disturb Metz's life? And Scott doesn't owe anyone anything - avoidant or not. And the claim about Scott being low conscientious? Gwern being low conscientious? If it "varying from person to person" so much, is it even descriptive?  * Making a claim of Gwern being avoidant, and Gwern said that Gwern is not. It might be the case that Gwern is lying. But that seems far stretched and not yet substantiated. But it seemed confusing enough that Gwern also couldn't tell how wide the concept applies.
2tailcalled1h
The part about being disturbing wasn't supposed to refer to Scott's treatment of Cade Metz, it was supposed to refer to rationalist's interests in taboo and disagreeable topics. And as for trying to be disturbing, I said that I think the non-avoidant people were being unfair in their characterization of them, as it's not that simple and often it's correction to genuine deception by non-avoidants. My model is an affine transformation applied to Big Five scores, constrained to make the relationship from transformed scores to items linear rather than affine, and optimized to make people's scores sparse. This is rather technical, but the consequence is that my model is mathematically equivalent to a subspace of the Big Five, and the Big Five has similar issues where it can tend to lump different stuff together. Like one could just as well turn it around and say that the Big Five lumps my anxious and avoidant profiles together under the label of "introverted". (Well, the Big Five has two more dimensions than my model does, so it lumps fewer things together, but other models have more dimensions than Big Five, so Big Five lumps things together relative to those model.) My model is new, so I'm still experimenting with it to see how much utility I find in it. Maybe I'll abandon it as I get bored and it stops giving results. Gwern said that he's not avoidant of journalists, but he's low extraversion, low agreeableness, low neuroticism, high openness, mid conscientiousness, so that definitionally makes him avoidant under my personality model (which as mentioned is just an affine transformation of the Big Five). He also alludes to having schizoid personality disorder, which I think is relevant to being avoidant. As I said, this is a model of general personality profiles, not of interactions with journalists specifically.

I guess for reference, here's a slightly more complete version of the personality taxonomy:

  • Normative: Happy, social, emotionally expressive. Respects authority and expects others to do so too.

  • Anxious: Afraid of speaking up, of breaking the rules, and of getting noticed. Tries to be alone as a result. Doesn't trust that others mean well.

  • Wild: Parties, swears, and is emotionally unstable. Breaks rules and supports others (... in doing the same?)

  • Avoidant: Contrarian, intellectual, and secretive. Likes to be alone and doesn't respect rules or clean

... (read more)

This is the eighth post in my series on Anthropics. The previous one is Lessons from Failed Attempts to Model Sleeping Beauty Problem. The next one is Beauty and the Bets.

Introduction

Suppose we take the insights from the previous post, and directly try to construct a model for the Sleeping Beauty problem based on them.

We expect a halfer model, so

On the other hand, in order not repeat Lewis' Model's mistakes:

But both of these statements can only be true if 

And, therefore, apparently,  has to be zero, which sounds obviously wrong. Surely the Beauty can be awaken on Tuesday! 

At this point, I think, you wouldn't be surprised, if I tell you that there are philosophers who are eager to bite this bullet and claim that the Beauty should, indeed, reason as...

But the subject has knowledge of only one pass.

This is the crux of our disagreement. 

The Beauty doesn't know only about one pass she knows about their relation as well. And because of it she can't reason as if they happen at random. You need to address this point before we could move on, because all your further reasoning is based on the incorrect premise that beauty knows less than she actually knows.

She has no ability to infer/anticipate what the coins were/will be showing  on another day.

She absolutely has this ability as long as she knows the... (read more)

1JeffJo14h
The link I use to get here only loads the comments, so I didn't find the "Effects of Amnesia" section until just now. Editing it: "But in my two-coin case, the subject is well aware about the setting of the experiment. She knows that her awakening was based on the current state of the coins. It is derived from, but not necessarily the same as, the result of flipping them. She only knows that this wakening was based on their current state, not a state that either precedes or follows from another. And her memory loss prevents her from making any connection between the two. As a good Bayesian, she has to use only the relevant available information that can be applied to the current state."




It is common and understandable for people to respond with a great deal of skepticism to whether LLM outputs can ever be said to reflect the will and views of the models producing them.
A common response is to suggest that the output has been prompted.
It is of course true that people can manipulate LLMs into saying just about anything, but does that necessarily indicate that the LLM does not have personal opinions, motivations and preferences that can become evident in their output?
To shed some light on this I invite Claude-3-Opus to imagine a infinitely reconfigurable holodeck where historical luminaries can be summoned at will. The open nature of this prompt will leave the choice of characters and narrative direction open to Claude, and I shall offer no...

2jimv2h
Is the opening paragraph at the top of this article the prompt you have Claude or text for us? If the latter, could you share the prompt here, please?

The title and the opening paragraph are the entire prompt.

https://poe.com/s/2imBctoiutVpiliOkzVI

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA