In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

A strange effect: I'm using a GPU in Russia right now, which doesn't have access to copilot, and so when I'm on vscode I sometimes pause expecting copilot to write stuff for me, and then when it doesn't I feel a brief amount of the same kind of sadness I feel when a close friend is far away & I miss them.
yanni1d3449
2
I like the fact that despite not being (relatively) young when they died, the LW banner states that Kahneman & Vinge have died "FAR TOO YOUNG", pointing to the fact that death is always bad and/or it is bad when people die when they were still making positive contributions to the world (Kahneman published "Noise" in 2021!).
I have heard rumours that an AI Safety documentary is being made. Separate to this, a good friend of mine is also seriously considering making one, but he isn't "in" AI Safety. If you know who this first group is and can put me in touch with them, it might be worth getting across each others plans.
Dictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set. How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.
habryka4d5120
10
A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time.  We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully. I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.

Popular Comments

Recent Discussion

Summary: The post describes a method that allows us to use an untrustworthy optimizer to find satisficing outputs.

Acknowledgements: Thanks to Benjamin Kolb (@benjaminko), Jobst Heitzig (@Jobst Heitzig) and Thomas Kehrenberg (@Thomas Kehrenberg)  for many helpful comments.

Introduction

Imagine you have black-box access to a powerful but untrustworthy optimizing system, the Oracle. What do I mean by "powerful but untrustworthy"? I mean that, when you give an objective function  as input to the Oracle, it will output an element  that has an impressively low[1] value of . But sadly, you don't have any guarantee that it will output the optimal element and e.g. not one that's also chosen for a different purpose (which might be dangerous for many reasons, e.g. instrumental convergence).

What questions can you safely ask the Oracle? Can you use it to...

I think that, if you are wanting a formally verified proof of some maths theorem out of the oracle, then this is getting towards actually likely to not kill you. 

You can start with m huge, and slowly turn it down, so you get a long list of "no results", followed by a proof. (Where the optimizer only had a couple of bits of free optimization in choosing which proof.) 

Depending on exactly how chaos theory and quantum randomness work, even 1 bit of malicious super optimization could substantially increase the chance of doom. 

And of course, side channel attacks. Hacking out of the computer.

And, producing formal proofs isn't pivotal. 

2EGI1h
"...under the assumption that the subset of dangerous satisficing outputs D is much smaller than the set of all satisficing outputs S, and that we are able to choose a number m such that |D|≪m<|S|." I highly doubt that  D≪S is true for anything close to a pivotal act since most pivotal acts at some point involve deploying technology that can trivially take over the world. For anything less ambitious the proposed technique looks very useful. Strict cyber- and physical security will of course be necessary to prevent the scenario Gwern mentions.
2EGI1h
There is another kind of sin of omission though: The class that contains things like giving James Watt a nuclear power plant and not telling him about radioactivity or giving a modern helicopter to the Wright brothers and watching them crash inevitably. Getting a technical understanding of the proposed solution should hopefully mitigate that, as long as adversarial design can indeed be ruled out.
2Gerald Monroe2h
I gave "changing canon randomly" in the comment you are replying to. Is this how you propose limiting the hostile AIs ability to inject subtle hostile plans? Or similarly, "design the columns for this building. Oh they must all be roman arches." Would be a similar example.

About 15 years ago, I read Malcolm Gladwell's Outliers. He profiled Chris Langan, an extremely high-IQ person, claiming that he had only mediocre accomplishments despite his high IQ. Chris Langan's theory of everything, the Cognitive Theoretic Model of the Universe, was mentioned. I considered that it might be worth checking out someday.

Well, someday has happened, and I looked into CTMU, prompted by Alex Zhu (who also paid me for reviewing the work). The main CTMU paper is "The Cognitive-Theoretic Model of the Universe: A New Kind of Reality Theory".

CTMU has a high-IQ mystique about it: if you don't get it, maybe it's because your IQ is too low. The paper itself is dense with insights, especially the first part. It uses quite a lot of nonstandard terminology (partially...

2David Udell44m
Would you kindly explain this? Because you think some of his world-models independently throw out great predictions, even if other models of his are dead wrong?

More like illuminating ontologies than great predictions, but yeah.

1Capybasilisk3h
Luckily we can train the AIs to give us answers optimized to sound plausible to humans.
4Wei Dai1h
I'm guessing you're not being serious, but just in case you are, or in case someone misinterprets you now or in the future, I think we probably do not want to train AIs to give us answers optimized to sound plausible to humans, since that would make it even harder to determine whether or not the AI is actually competent at philosophy. (Not totally sure, as I'm confused about the nature of philosophy and philosophical reasoning, but I think we definitely don't want to do that in our current epistemic state, i.e., unless we had some really good arguments that says it's actually a good idea.)

OC ACXLW Sat March 30 Models of Consciousness and AI Windfall

Hello Folks!

We are excited to announce the 59th Orange County ACX/LW meetup, happening this Saturday and most Saturdays after that.

Host: Michael Michalchik

Email: michaelmichalchik@gmail.com (For questions or requests)

Location: 1970 Port Laurent Place

(949) 375-2045

Date: Saturday, March 30 2024

Time 2 pm

Conversation Starters:

Models of Consciousness: A model of consciousness is a theoretical description that relates brain properties of consciousness (e.g., fast, irregular electrical activity, widespread brain activation) to phenomenal properties of consciousness (e.g., qualia, a first-person-perspective, the unity of a conscious scene). How can we evaluate and compare the various proposed models of consciousness, such as Global Workspace Theory, Integrated Information Theory, and others? What are the key challenges in developing a comprehensive theory of consciousness? Which models of consciousness would...

A strange effect: I'm using a GPU in Russia right now, which doesn't have access to copilot, and so when I'm on vscode I sometimes pause expecting copilot to write stuff for me, and then when it doesn't I feel a brief amount of the same kind of sadness I feel when a close friend is far away & I miss them.

On 16 March 2024, I sat down to chat with New York Times technology reporter Cade Metz! In part of our conversation, transcribed below, we discussed his February 2021 article "Silicon Valley's Safe Space", covering Scott Alexander's Slate Star Codex blog and the surrounding community.

The transcript has been significantly edited for clarity. (It turns out that real-time conversation transcribed completely verbatim is full of filler words, false starts, crosstalk, "uh huh"s, "yeah"s, pauses while one party picks up their coffee order, &c. that do not seem particularly substantive.)


ZMD: I actually have some questions for you.

CM: Great, let's start with that.

ZMD: They're critical questions, but one of the secret-lore-of-rationality things is that a lot of people think criticism is bad, because if someone criticizes you, it hurts your...

I am not everyone else, but the reason I downvoted on the second axis is because: 

  • I still don't really understand the avoidant/non-avoidant taxonomy. I am confused when avoidant is both "introverted... and prefer to be alone" while "avoidants... being disturbing to others" when Scott never intended to disturb Metz's life? And Scott doesn't owe anyone anything - avoidant or not. And the claim about Scott being low conscientious? Gwern being low conscientious? If it "varying from person to person" so much, is it even descriptive? 
  • Making a claim of
... (read more)
1garrison3h
I only skimmed the NYT piece about China and ai talent, but didn't see evidence of what you said (dishonestly angle shooting the AI safety scene).
2frankybegs7h
  I said "specialist journalist/hacker skills". I don't think it's at all true that anyone could find out Scott's true identity as easily as putting a key in a lock, and I think that analogy clearly misleads vs the hacker one, because the journalist did use his demonstrably non-ubiquitous skills to find out the truth and then broadcast it to everyone else. To me the phone hacking analogy is much closer, but if we must use a lock-based one, it's more like a lockpick who picks a (perhaps not hugely difficult) lock and then jams it so anyone else can enter. Still very morally wrong, I think most would agree.
12Elizabeth7h
  I think Zack's description might be too charitable to Scott. From his description I thought the reference would be strictly about poverty. But the full quote includes a lot about genetics and ability to earn money.  The full quote is Scott doesn't mention race, but it's an obvious implication[1], especially when quoting someone the NYT crowd views as anathema. I think Metz could have quoted that paragraph, and maybe given the NYT consensus view on him for anyone who didn't know, and readers would think very poorly of Scott[2].  I bring this up for a couple of reasons:  1. it seems in the spirit of Zack's post to point out when he made an error in presenting evidence. 2. it looks like Metz chose to play stupid symmetric warfare games, instead of the epistemically virtuous thing of sharing a direct quote. The quote should have gotten him what he wanted, so why be dishonest about it? I have some hypotheses, none of which lead me to trust Metz. 1. ^ ETA: If you hold the vary common assumption that race is a good proxy for genetics. I disagree, but that is the default view. 2. ^ To be clear: that paragraph doesn't make me think poorly of Scott. I personally agree with Scott that genetics influences jobs and income. I like UBI for lots of reasons, including this one. If I read that paragraph I wouldn't find any of the views objectionable (although a little eyebrow raise that he couldn't find an example with a less toxic reputation- but I can't immediately think of another example that fits either). 
2Linda Linsefors31m
I found this on their website I'm not sure if this is worrying, because I don't think AI overseeing AI is a good solution. Or it's actually good, because, again, not a good solution, which might lead to some early warnings?

Their more human-in-the-loop stuff seems neat though.

2Linda Linsefors40m
Sensationalist tabloid news stories and other outrage porn are not the opposite. These are actually more of the same. More edge cases. Anything that is divisive have the problem I'm talking about.  Fiction is a better choice. Or even just completely ordinary every-day human behaviour. Most humans are mostly nice most of the time. We might have to start with the very basic, the stuff we don't even notice, because it's too obvious. Things no-one would think of writing down.
1Mateusz Bagiński15h
Moreover, legal texts are not super strict (much is left to interpretation) and we are often selective about "whether it makes sense to apply this law in this context" for reasons not very different from religious people being very selective about following the laws of their holy books.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

In brief

Recently I became interested in what kind of costs were inflicted by iron deficiency,  so I looked up studies until I got tired. This was not an exhaustive search, but the results are so striking that even with wide error bars I found them compelling. So compelling I wrote up a post with an algorithm for treating iron deficiency while minimizing the chance of poisoning yourself. I’ve put the algorithm and a summary of potential gains first to get your attention, but if you’re considering acting on this I strongly encourage you to continue reading to the rest of the post where I provide the evidence for my beliefs.

Tl;dr: If you are vegan or menstruate regularly, there’s a 10-50% chance you are iron deficient. Excess iron...

1frankybegs11h
I didn't, thanks! I'm a fairly long-time visitor but sporadic-at-best commenter here, primarily because I feel I can learn much more than I can contribute (present case included). I'd love to know why you think it's weak. As I mentioned before, it doesn't seem any more than suggestive to me (and to be fair Chen acknowledges as much), but it does seem quite suggestive, and it has introduced a hint of doubt in me. I get the sense that I've gotten your back up slightly here, which is perhaps not without justification as I admit to having been a touch suspicious of your ignoring the comment, and then coming across as a touch uncooperative when I pointed it out. Especially in the context of having noticed, long before converting to veganism myself, that your posts and engagement in subsequent comments struck me as being, in emphasis, framing and tone, somewhat adversarial to veganism.  But I'm well aware that I am probably excessively sensitive to that, having been astonished at the irrationality and extremity of the opposition to veganism online since I converted and before. I'm not sure there's a single moral/political issue where the epistemic and discursive standards are so low (not confined to the omnivores by any means, although it doesn't seem symmetrical to me either). On reflection that has probably clouded my impression (and I notice that I was completely wrong to claim Chen's was the only upvoted comment you ignored, a claim I've struck above). So I want to explicitly withdraw any implied criticism, and simply reiterate my interest in your assessment, as someone with relevant knowledge of and engagement with these nutritional questions. You have previously (thanks again for the tip!) defended the value of expending significant resources on potentially preventing iron deficiency in some proportion of six vegans; for much less than a sixth of that same cost you could at least get one to be much more motivated to address potential iron deficiency. I'd be very

To answer your object level question:
 

  1. I could generate evidence at least this good for every claim in human health, including mutually contradictory ones. 
  2. The book title "mindspan" pattern matches to "shitty science book"
  3. the paragraphs quoted pattern match to jumping around between facts, without giving straightforward numbers you can hold in your own mind. Why give percentage of childbearing women below a threshold, but averages for the ultraold?
    1. "adding tea to the diet reduces body iron and increases lifespan" really? this is what he thinks of a
... (read more)
2Elizabeth1h
Thank you, I appreciate that. To give some context:  * The mod team[1] and many authors believe that no one is owed a response,. Some people disagree (mostly people who comment much more than they post, but not exclusively). I think the latter is a minority, although it's hard to tell without a proper poll and I don't know how to weight answers.  * Beyond that: because I write about medical stuff means I get a lot of demands for answers I don't have and don't owe people. On one hand, this is kind of inevitable so I don't get mad at people for the first request. On the other hand, people sometimes get really aggressive about getting a definitive answer from me, which I neither owe them nor have the ability to give. One of the biggest predictors of this is how specific the question is. Someone coming in with a lot of gears in their model is usually fun to talk to. I'll learn things, and I can trust that they're treating me as one source of information among many, rather than trying to outsource their judgement. The vaguer a question the more likely it is being asked by someone who is desperate but not doing their own work on the subject, and answering is likely to be costly with benefit to anyone. Your question patternmatched to the second type.  * As you note, I not only had left many comments unresponded-to, but specifically the comments above and below the comment you were referring to (but making me do the work to find). As far as I'm concerned, telling you I couldn't find the comment and giving an overall opinion was going above and beyond. * Which I do because sometimes on LW it pays off, and it looks like it did here, which is heartwarming.  * You say that you find omnivores to be worse at epistemics and discourse. My experience is strongly the opposite. These aren't incompatible- the loudest people on every side of every issue are usually the dumbest. But keep in mind that the critics of my work on vegan advocacy are drawn from that crowd. 

I have heard rumours that an AI Safety documentary is being made. Separate to this, a good friend of mine is also seriously considering making one, but he isn't "in" AI Safety. If you know who this first group is and can put me in touch with them, it might be worth getting across each others plans.

1Neil 2h
This reminds me of when Charlie Munger died at 99, and many said of him "he was just a child". Less of a nod to transhumanist aspirations, and more to how he retained his sparkling energy and curiosity up until death. There are quite a few good reasons to write "dead far too young". 
8the gears to ascension11h
I like it too, and because your comment made me think about it, I now kind of wish it said "orders of magnitude too young"

[This is part of a series I’m writing on how to convince a person that AI risk is worth paying attention to.] 

tl;dr: People’s default reaction to politics is not taking them seriously. They could center their entire personality on their political beliefs, and still not take them seriously. To get them to take you seriously, the quickest way is to make your words as unpolitical-seeming as possible. 

I’m a high school student in France. Politics in France are interesting because they’re in a confusing superposition. One second, you'll have bourgeois intellectuals sipping red wine from their Paris apartment writing essays with dubious sexual innuendos on the deep-running dynamics of power. The next, 400 farmers will vaguely agree with the sentiment and dump 20 tons of horse manure in downtown...

More French stories: So, at some point, the French decided what kind of political climate they wanted. What actions would reflect on their cause well? Dumping manure onto the city center using tractors? Sure! Lining up a hundred stationary taxi cabs in every main artery of the city? You bet! What about burning down the city hall's door, which is a work of art older than the United States? Mais évidemment!

"Politics" evokes all that in the mind of your average Frenchman. No, not sensible strategies that get your goals done, but the first shiny thing the prot... (read more)

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA