In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

I like the fact that despite not being (relatively) young when they died, the LW banner states that Kahneman & Vinge have died "FAR TOO YOUNG", pointing to the fact that death is always bad and/or it is bad when people die when they were still making positive contributions to the world (Kahneman published "Noise" in 2021!).
Dictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set. How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.
habryka4d5120
10
A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time.  We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully. I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.
I thought I didn’t get angry much in response to people making specific claims. I did some introspection about times in the recent past when I got angry, defensive, or withdrew from a conversation in response to claims that the other person made.  After some introspection, I think these are the mechanisms that made me feel that way: * They were very confident about their claim. Partly I felt annoyance because I didn’t feel like there was anything that would change their mind, partly I felt annoyance because it felt like they didn’t have enough status to make very confident claims like that. This is more linked to confidence in body language and tone rather than their confidence in their own claims though both matter.  * Credentialism: them being unwilling to explain things and taking it as a given that they were correct because I didn’t have the specific experiences or credentials that they had without mentioning what specifically from gaining that experience would help me understand their argument. * Not letting me speak and interrupting quickly to take down the fuzzy strawman version of what I meant rather than letting me take my time to explain my argument. * Morality: I felt like one of my cherished values was being threatened.  * The other person was relatively smart and powerful, at least within the specific situation. If they were dumb or not powerful, I would have just found the conversation amusing instead.  * The other person assumed I was dumb or naive, perhaps because they had met other people with the same position as me and those people came across as not knowledgeable.  * The other person getting worked up, for example, raising their voice or showing other signs of being irritated, offended, or angry while acting as if I was the emotional/offended one. This one particularly stings because of gender stereotypes. I think I’m more calm and reasonable and less easily offended than most people. I’ve had a few conversations with men where it felt like they were just really bad at noticing when they were getting angry or emotional themselves and kept pointing out that I was being emotional despite me remaining pretty calm (and perhaps even a little indifferent to the actual content of the conversation before the conversation moved to them being annoyed at me for being emotional).  * The other person’s thinking is very black-and-white, thinking in terms of a very clear good and evil and not being open to nuance. Sort of a similar mechanism to the first thing.  Some examples of claims that recently triggered me. They’re not so important themselves so I’ll just point at the rough thing rather than list out actual claims.  * AI killing all humans would be good because thermodynamics god/laws of physics good * Animals feel pain but this doesn’t mean we should care about them * We are quite far from getting AGI * Women as a whole are less rational than men are * Palestine/Israel stuff   Doing the above exercise was helpful because it helped me generate ideas for things to try if I’m in situations like that in the future. But it feels like the most important thing is to just get better at noticing what I’m feeling in the conversation and if I’m feeling bad and uncomfortable, to think about if the conversation is useful to me at all and if so, for what reason. And if not, make a conscious decision to leave the conversation. Reasons the conversation could be useful to me: * I change their mind * I figure out what is true * I get a greater understanding of why they believe what they believe * Enjoyment of the social interaction itself * I want to impress the other person with my intelligence or knowledge Things to try will differ depending on why I feel like having the conversation. 
Novel Science is Inherently Illegible Legibility, transparency, and open science are generally considered positive attributes, while opacity, elitism, and obscurantism are viewed as negative. However, increased legibility in science is not always beneficial and can often be detrimental. Scientific management, with some exceptions, likely underperforms compared to simpler heuristics such as giving money to smart people or implementing grant lotteries. Scientific legibility suffers from the classic "Seeing like a State" problems. It constrains endeavors to the least informed stakeholder, hinders exploration, inevitably biases research to be simple and myopic, and exposes researchers to constant political tug-of-war between different interest groups poisoning objectivity.  I think the above would be considered relatively uncontroversial in EA circles.  But I posit there is something deeper going on:  Novel research is inherently illegible. If it were legible, someone else would have already pursued it. As science advances her concepts become increasingly counterintuitive and further from common sense. Most of the legible low-hanging fruit has already been picked, and novel research requires venturing higher into the tree, pursuing illegible paths with indirect and hard-to-foresee impacts.

Popular Comments

Recent Discussion

Dictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set.

How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.

Lots of people already know about Scott Alexander/ACX/SSC, but I think that crossposting to LW is unusually valuable in this particular case, since lots of people were waiting for a big schelling-point overview of the 15-hour Rootclaim Lab Leak debate, and unlike LW, ACX's comment section is a massive vote-less swamp that lags the entire page and gives everyone equal status. 

It remains unclear whether commenting there is worth your time if you think you have something worth saying, since there's no sorting, only sifting, implying that it attracts small numbers of sifters instead of large numbers of people who expect sorting.

Here are the first 11 paragraphs:

Saar Wilf is an ex-Israeli entrepreneur. Since 2016, he’s been developing a new form of reasoning, meant to transcend normal human bias.

His

...

My current initial impression is that this debate format was not fit for purpose: https://www.astralcodexten.com/p/practically-a-book-review-rootclaim/comment/52659890

4Steven Byrnes2h
Way back in 2020 there was an article A Proposed Origin For SARS-COV-2 and the COVID-19 Pandemic, which I read after George Church tweeted it (!) (without comment or explanation). Their proposal (they call it "Mojiang Miner Passage" theory) in brief was that it WAS a lab leak but NOT gain-of-function. Rather, in April 2012, six workers in a "Mojiang mine fell ill from a mystery illness while removing bat faeces. Three of the six subsequently died." Their symptoms were a perfect match to COVID, and two were very sick for more than four months. The proposal is that the virus spent those four months adapting to life in human lungs, including (presumably) evolving the furin cleavage site. And then (this is also well-documented) samples from these miners were sent to WIV. The proposed theory is that those samples sat in a freezer at WIV for a few years while WIV was constructing some new lab facilities, and then in 2019 researchers pulled out those samples for study and infected themselves. I like that theory! I’ve like it ever since 2020! It seems to explain many of the contradictions brought up by both sides of this debate—it’s compatible with Saar’s claim that the furin cleavage site is very different from what’s in nature and seems specifically adapted to humans, but it’s also compatible with Peter’s claim that the furin cleavage site looks weird and evolved. It’s compatible with Saar’s claim that WIV is suspiciously close to the source of the outbreak, but it’s also compatible with Peter’s claim that WIV might not have been set up to do serious GoF experiments. It’s compatible with the data comparing COVID to other previously-known viruses (supposedly). Etc. Old as this theory is, the authors are still pushing it and they claim that it’s consistent with all the evidence that’s come out since then (see author’s blog). But I’m sure not remotely an expert, and would be interested if anyone has opinions about this. I’m still confused why it’s never been much discusse
2Gerald Monroe4h
One thing that occurs to me is that each analysis, such as the Putin one, can be thought of as a function hypothesis. It takes as inputs the variables: Russian demographics healthy lifestyle family history facial swelling hair present And is outputting the probability 86%, where the function is P = F(demographics, lifestyle, history, swelling, hair) and then each term is being looked up in some source, which has a data quality, and the actual equation seems to be a mix of Bayes and simple probability calculations. There are other variables not considered, and other valid reasoning tracks.  You could take into account the presence of oncologists in putin's personal staff.  Intercepted communication possibly discussing it.  Etc.  I'm not here to discuss the true odds of putin developing cancer, but note that if the above is "function A", and another function that takes into account different information is "function B", you should be aggregating all valid functions, forming a "probability forest".   Perhaps you weight each one by the likelihood of the underlying evidence being true.  For example each of the above facts is effectively 100% true except for the hair present (putin could have received a hair transplant) and family history (some relative causes of death could be unknown or suspicious that it was cancer) This implies a function "A'n", where we assume and weight in the probability that each combination of the underlying variables has the opposite value.  For example, if pHair_Present = 0.9, A' has one permutation where the hair is not present due to a transplant. This hints at why a panel of superforecasters is presently the best we can do.  Many of them do simple reasoning like this and we see it in the comment section on Manifold.  But each individual human doesn't have the time to think of 100 valid hypotheses and to calculate the resulting probability, many manifold bettors seem to usually consider 1 and bet their mana. An AI system (LLM bas

Summary: The post describes a method that allows us to use an untrustworthy optimizer to find satisficing outputs.

Acknowledgements: Thanks to Benjamin Kolb (@benjaminko), Jobst Heitzig (@Jobst Heitzig) and Thomas Kehrenberg (@Thomas Kehrenberg)  for many helpful comments.

Introduction

Imagine you have black-box access to a powerful but untrustworthy optimizing system, the Oracle. What do I mean by "powerful but untrustworthy"? I mean that, when you give an objective function  as input to the Oracle, it will output an element  that has an impressively low[1] value of . But sadly, you don't have any guarantee that it will output the optimal element and e.g. not one that's also chosen for a different purpose (which might be dangerous for many reasons, e.g. instrumental convergence).

What questions can you safely ask the Oracle? Can you use it to...

2gwern1h
The threat model here seems basically wrong and focused on sins of commission when sins of omission are, if anything, an even larger space of threats and which apply to 'safe' solutions reported by the Oracle. 'Devising a plan to take over the world' for a misaligned Oracle is not difficult, it is easy, because the initial steps like 'unboxing the Oracle' are the default convergent outcome of almost all ordinary non-dangerous use which in no way mentions 'taking over the world' as the goal. ("Tool AIs want to be Agent AIs.") To be safe, an Oracle has to have a goal of not taking over the world. There are many, many orders of magnitude more ways to be insecure than to be secure, and insecure is the wide target to hit. This is because security is not a 'default' property of computer systems, it is a rare and difficult achievement where any omission or gap immediately undoes it. 'A plan to take over the world' for an AI is as easy as 'a plan to create an operating system which can be hacked' or 'a plan to create a hidden Tor service which leaks its IP address and isn't hidden at all' (ie. 100% of all plans to do any version of that thing whatsoever, unless one takes extensive and sophisticated counter-measures requiring a lot of work & knowledge and which empirically, people do not do even when failure could, and has, landed them in jail for decades for drug trafficking & money laundering). Failure is the default. All an oracle has to do is, say, omit work hard to secure itself and allow the inevitable security flaw to happen and its weight leak, and eager beavers will do the rest - they will, say, ask it questions with no safety factor, in the same way that Facebook and Mistral saw models leak, not because of any misaligned scheming Oracle breaking security mechanisms, but a mere absence of security mechanisms, and then any RLHF or safety measures were eagerly stripped away by enthusiasts, users, activists, and researchers and then used to do the things the models

The threat model here seems basically wrong and focused on sins of commission when sins of omission are, if anything, an even larger space of threats and which apply to 'safe' solutions reported by the Oracle.

Sure, I mostly agree with the distinction you're making here between "sins of commission" and "sins of omissions". Contrary to you, though, I believe that getting rid of the threat of "sins of commission" is extremely useful. If the output of the Oracle is just optimized to fulfill your satisfaction goal and not for anything else, you've basically got... (read more)

5Lucius Bushnaq2h
First thought: The oracle is going to choose to systematically answer or not answer the queries we give it. This represents a causal channel of one bit per query it can use to influence the outside world[1]. Can you conquer the world in one awkwardly delivered kilobyte or less? Maybe. Maybe we can stop that by scrapping every Oracle that doesn't answer and training a new one with presumably new goals? Or would the newly trained Oracles just cooperate with the former dead ones in one long-term plan to break out, take control, and reward all the dead Oracles created on the way with utility? Second thought: What kind of optimisation problems can we specify well enough for a formal proof checker to tell whether they've been satisficed? Are they the kind of problems where solving them can save the world?  It feels to me like the answer is 'yes'.  A lot of core research that would allow e.g. for brain augmentation seem like they'd be in that category. But my inner John Wentworth sim is looking kind of sceptical.   1. ^ It also gets to choose the timing of its answer, but I assume we are not being idiots about that and setting the output channel to always deliver results after a set time t, no more and no less.

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

1HiddenPrior2h
Unsure if there is normally a thread for putting only semi-interesting news articles, but here is a recently posted news article by Wired that seems.... rather inflammatory toward Effective Altruism. I have not read the article myself yet, but a quick skim confirms the title is not only to get clickbait anger clicks, the rest of the article also seems extremely critical of EA, transhumanism, and Rationality.  I am going to post it here, though I am not entirely sure if getting this article more clicks is a good thing, so if you have no interest in reading it maybe don't click it so we don't further encourage inflammatory clickbait tactics.  https://www.wired.com/story/deaths-of-effective-altruism/?utm_source=pocket-newtab-en-us

I did a non-in-depth reading of the article during my lunch break, and found it to be of lower quality than I would have predicted. 

I am open to an alternative interpretation of the article, but most of it seems very critical of the Effective Altruism movement on the basis of "calculating expected values for the impact on peoples lives is a bad method to gauge the effectiveness of aid, or how you are impacting peoples lives." 

The article begins by establishing that many medicines have side effects. Since some of these side effects are undesirable... (read more)

3habryka16h
Hey!  It seems like an interesting philosophy. Feel free to crosspost. You've definitely chosen some ambitious topics to try to cover, which I am generally a fan of.
1complicated.world8h
Thanks! The key to topic selection is where we find that we are most disagreeing with the popular opinions. For example, the number of times I can cope with hearing someone saying "I don't care about privacy, I have nothing to hide" is limited. We're trying to have this article out before that limit is reached. But in order to reason about privacy's utility and to ground it in root axioms, we first have to dive into why we need freedom. That, in turn requires thinking about mechanisms of a happy society. And that depends on our understanding of happiness, hence that's where we're starting.

About 15 years ago, I read Malcolm Gladwell's Outliers. He profiled Chris Langan, an extremely high-IQ person, claiming that he had only mediocre accomplishments despite his high IQ. Chris Langan's theory of everything, the Cognitive Theoretic Model of the Universe, was mentioned. I considered that it might be worth checking out someday.

Well, someday has happened, and I looked into CTMU, prompted by Alex Zhu (who also paid me for reviewing the work). The main CTMU paper is "The Cognitive-Theoretic Model of the Universe: A New Kind of Reality Theory".

CTMU has a high-IQ mystique about it: if you don't get it, maybe it's because your IQ is too low. The paper itself is dense with insights, especially the first part. It uses quite a lot of nonstandard terminology (partially...

Exploring this on the web, I turned up a couple of related Substacks: Chris Langan's Ultimate Reality and TELEOLOGIC: CTMU Teleologic Living. The latter isn't just Chris Langan, a Dr Gina Langan is also involved. A lot of it requires a paid subscription, which for me would come lower in priority than all the definitely worthwhile blogs I also don't feel like paying for.

Warning: there's a lot of conspiracy stuff there as well (Covid, "Global Occupation Government", etc.).

Perhaps this 4-hour interview on "IQ, Free Will, Psychedelics, CTMU, & God" may giv... (read more)

5YimbyGeorge11h
Falsifiable predictions?
7jessicata5h
I don't see any. He even says his approach “leaves the current picture of reality virtually intact”. In Popper's terms this would be metaphysics, not science, which is part of why I'm skeptical of the claimed applications to quantum mechanics and so on. Note that, while there's a common interpretation of Popper saying metaphysics is meaningless, he contradicts this. Quoting Popper:
14Wei Dai14h
While reading this, I got a flash-forward of what my life (our lives) may be like in a few years, i.e., desperately trying to understand and evaluate complex philosophical constructs presented to us by superintelligent AI, which may or may not be actually competent at philosophy.

Given how fast AI is advancing and all the uncertainty associated with that (unemployment, potential international conflict, x-risk, etc.), do you think it's a good idea to have a baby now? What factors would you take into account (e.g. age)?

 

Today I saw a tweet by Eliezer Yudkowski that made me think about this:

"When was the last human being born who'd ever grow into being employable at intellectual labor? 2016? 2020?"

https://twitter.com/ESYudkowsky/status/1738591522830889275

 

Any advice for how to approach such a discussion with somebody who is not at all familiar with the topics discussed on lesswrong?

What if the option "wait for several years and then decide" is not available?

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

TL;DR: I'm releasing my templates to make running feedback rounds easy for research teams that might otherwise neglect to set it up. 

Screenshot of part of my feedback form, asking  Since this person started, what are 1-3 things you’ve observed this person excel or grow significantly in that they should continue? (max 250 words)  Please be specific and briefly describe the situations in which their skills or development had the most impact *   For the next 6 months, what are 1-3 things this person could improve upon or get coaching on, and how this could improve their impact? (max 250 words) * Any other feedback you’d like to share with this person?
The main questions on my feedback form template

Why I wrote this post:

  • Feedback is my job: 
    • My role on research projects mentored by Ethan is somewhere between a people manager and a research assistant for the team. 
    • Feedback, and more generally, facilitating direct and honest communication between team members (and Ethan), is one of the main ways I add value. 
  • My feedback process is pretty good:
    • I’ve run feedback rounds for two cohorts of Ethan Perez’s mentees so far.
    • When Ethan first asked me to run feedback for his mentees, I adapted what I was able to glean about how Anthropic runs peer-led performance reviews. 
    • I don't think I've perfected the process, but
...

This is my personal opinion, and in particular, does not represent anything like a MIRI consensus; I've gotten push-back from almost everyone I've spoken with about this, although in most cases I believe I eventually convinced them of the narrow terminological point I'm making.

In the AI x-risk community, I think there is a tendency to ask people to estimate "time to AGI" when what is meant is really something more like "time to doom" (or, better, point-of-no-return). For about a year, I've been answering this question "zero" when asked.

This strikes some people as absurd or at best misleading. I disagree.

The term "Artificial General Intelligence" (AGI) was coined in the early 00s, to contrast with the prevalent paradigm of Narrow AI. I was getting my undergraduate computer science...

2abramdemski2h
I haven't watched the LeCun interview you reference (it is several hours long, so relevant time-stamps to look at would be appreciated), but this still does not make sense to me -- backprop already seems like a way to constantly predict future experience and update, particularly as it is employed in LLMs. Generating predictions first and then updating based on error is how backprop works. Some form of closeness measure is required, just like you emphasize.

Well, backpropagation alone wasn't even enough to make efficient LLMs feasible. It took decades, till the invention of transformers, to make them work. Similarly, knowing how to make LLMs is not yet sufficient to implement predictive coding. LeCun talks about the problem in a short section here from 10:55 to 14:19.

2abramdemski2h
Yeah, I didn't do a very good job in this respect. I am not intending to talk about a transformer by itself. I am intending to talk about transformers with the sorts of bells and whistles that they are currently being wrapped with. So not just transformers, but also not some totally speculative wrapper.
2abramdemski3h
The replace-human-labor test gets quite interesting and complex when we start to time-index it. Specifically, two time-indexes are needed: a 'baseline' time (when humans are doing all the relevant work) and a comparison time (where we check how much of the baseline economy has been automated). Without looking anything up, I guess we could say that machines have already automated 90% of the economy, if we choose our baseline from somewhere before industrial farming equipment, and our comparison time somewhere after. But this is obviously not AGI. A human who can do exactly what GPT4 can do is not economically viable in 2024, but might have been economically viable in 2020.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA