In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

I like the fact that despite not being (relatively) young when they died, the LW banner states that Kahneman & Vinge have died "FAR TOO YOUNG", pointing to the fact that death is always bad and/or it is bad when people die when they were still making positive contributions to the world (Kahneman published "Noise" in 2021!).
habryka4d5120
10
A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time.  We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully. I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.
Novel Science is Inherently Illegible Legibility, transparency, and open science are generally considered positive attributes, while opacity, elitism, and obscurantism are viewed as negative. However, increased legibility in science is not always beneficial and can often be detrimental. Scientific management, with some exceptions, likely underperforms compared to simpler heuristics such as giving money to smart people or implementing grant lotteries. Scientific legibility suffers from the classic "Seeing like a State" problems. It constrains endeavors to the least informed stakeholder, hinders exploration, inevitably biases research to be simple and myopic, and exposes researchers to constant political tug-of-war between different interest groups poisoning objectivity.  I think the above would be considered relatively uncontroversial in EA circles.  But I posit there is something deeper going on:  Novel research is inherently illegible. If it were legible, someone else would have already pursued it. As science advances her concepts become increasingly counterintuitive and further from common sense. Most of the legible low-hanging fruit has already been picked, and novel research requires venturing higher into the tree, pursuing illegible paths with indirect and hard-to-foresee impacts.
I thought I didn’t get angry much in response to people making specific claims. I did some introspection about times in the recent past when I got angry, defensive, or withdrew from a conversation in response to claims that the other person made.  After some introspection, I think these are the mechanisms that made me feel that way: * They were very confident about their claim. Partly I felt annoyance because I didn’t feel like there was anything that would change their mind, partly I felt annoyance because it felt like they didn’t have enough status to make very confident claims like that. This is more linked to confidence in body language and tone rather than their confidence in their own claims though both matter.  * Credentialism: them being unwilling to explain things and taking it as a given that they were correct because I didn’t have the specific experiences or credentials that they had without mentioning what specifically from gaining that experience would help me understand their argument. * Not letting me speak and interrupting quickly to take down the fuzzy strawman version of what I meant rather than letting me take my time to explain my argument. * Morality: I felt like one of my cherished values was being threatened.  * The other person was relatively smart and powerful, at least within the specific situation. If they were dumb or not powerful, I would have just found the conversation amusing instead.  * The other person assumed I was dumb or naive, perhaps because they had met other people with the same position as me and those people came across as not knowledgeable.  * The other person getting worked up, for example, raising their voice or showing other signs of being irritated, offended, or angry while acting as if I was the emotional/offended one. This one particularly stings because of gender stereotypes. I think I’m more calm and reasonable and less easily offended than most people. I’ve had a few conversations with men where it felt like they were just really bad at noticing when they were getting angry or emotional themselves and kept pointing out that I was being emotional despite me remaining pretty calm (and perhaps even a little indifferent to the actual content of the conversation before the conversation moved to them being annoyed at me for being emotional).  * The other person’s thinking is very black-and-white, thinking in terms of a very clear good and evil and not being open to nuance. Sort of a similar mechanism to the first thing.  Some examples of claims that recently triggered me. They’re not so important themselves so I’ll just point at the rough thing rather than list out actual claims.  * AI killing all humans would be good because thermodynamics god/laws of physics good * Animals feel pain but this doesn’t mean we should care about them * We are quite far from getting AGI * Women as a whole are less rational than men are * Palestine/Israel stuff   Doing the above exercise was helpful because it helped me generate ideas for things to try if I’m in situations like that in the future. But it feels like the most important thing is to just get better at noticing what I’m feeling in the conversation and if I’m feeling bad and uncomfortable, to think about if the conversation is useful to me at all and if so, for what reason. And if not, make a conscious decision to leave the conversation. Reasons the conversation could be useful to me: * I change their mind * I figure out what is true * I get a greater understanding of why they believe what they believe * Enjoyment of the social interaction itself * I want to impress the other person with my intelligence or knowledge Things to try will differ depending on why I feel like having the conversation. 
Recently someone either suggested to me (or maybe told me they or someone where going to do this?) that we should train AI on legal texts, to teach it human values. Ignoring the technical problem of how to do this, I'm pretty sure legal text are not the right training data. But at the time, I could not clearly put into words why. Todays SMBC explains this for me: Saturday Morning Breakfast Cereal - Law (smbc-comics.com) Law is not a good representation or explanation of most of what we care about, because it's not trying to be. Law is mainly focused on the contentious edge cases.  Training an AI on trolly problems and other ethical dilemmas is even worse, for the same reason. 

Popular Comments

Recent Discussion

This is the ninth post in my series on Anthropics. The previous one is The Solution to Sleeping Beauty.

Introduction

There are some quite pervasive misconceptions about betting in regards to the Sleeping Beauty problem.

One is that you need to switch between halfer and thirder stances based on the betting scheme proposed. As if learning about a betting scheme is supposed to affect your credence in an event.

Another is that halfers should bet at thirders odds and, therefore, thirdism is vindicated on the grounds of betting. What do halfers even mean by probability of Heads being 1/2 if they bet as if it's 1/3?

In this post we are going to correct them. We will understand how to arrive to correct betting odds from both thirdist and halfist positions, and...

Throughout your comment you've been saying a phrase "thirders odds", apparently meaning odds 1:2, not specifying whether per awakening or per experiment. This is underspecified and confusing category which we should taboo. 

Yeah, that was sloppy language, though I do like to think more in terms of bets than you do. One of my ways of thinking about these sorts of issues is in terms of "fair bets" - each person thinks a bet with payoffs that align with their assumptions about utility is "fair", and a bet with payoffs that align with different assumptions... (read more)

1Signer8h
No, I mean the Beauty awakes, sees Blue, gets a proposal to bet on Red with 1:1 odds, and you recommend accepting this bet?
1Ape in the coat8h
Yes, if the bet is about whether the room takes the color Red in this experiment. Which is what event "Red" means in Technicolor Sleeping Beauty according to the correct model. The fact that you do not observe event Red in this awakening doesn't mean that you don't observe it in the experiment as a whole. The situation is somewhat resembling learning that today is Monday and still being ready to bet at 1:1 that Tuesday awakening will happen in this experiment. Though, with colors there is actually an update from 3/4 to 1/2. What you, probably, tried to ask, is whether you should agree to bet at 1:1 odds that the room is Red in this particular awakening after you wake up and saw that the room is Blue. And the answer is no, you shouldn't. But probability space for Technicolor Sleeping beauty is not talking about probabilities of events happening in this awakening, because most of them are illdefined for reasons explained in the previous post.
1Signer7h
So probability theory can't possibly answer whether I should take free money, got it. And even if "Blue" is "Blue happens during experiment", you wouldn't accept worse odds than 1:1 for Blue, even when you see Blue?

On 16 March 2024, I sat down to chat with New York Times technology reporter Cade Metz! In part of our conversation, transcribed below, we discussed his February 2021 article "Silicon Valley's Safe Space", covering Scott Alexander's Slate Star Codex blog and the surrounding community.

The transcript has been significantly edited for clarity. (It turns out that real-time conversation transcribed completely verbatim is full of filler words, false starts, crosstalk, "uh huh"s, "yeah"s, pauses while one party picks up their coffee order, &c. that do not seem particularly substantive.)


ZMD: I actually have some questions for you.

CM: Great, let's start with that.

ZMD: They're critical questions, but one of the secret-lore-of-rationality things is that a lot of people think criticism is bad, because if someone criticizes you, it hurts your...

aZMD: Looking at "Silicon Valley's Safe Space", I don't think it was a good article. Specifically, you wrote,

In one post, [Alexander] aligned himself with Charles Murray, who proposed a link between race and I.Q. in "The Bell Curve." In another, he pointed out that Mr. Murray believes Black people "are genetically less intelligent than white people."

 

 

End quote. So, the problem with this is that the specific post in which Alexander aligned himself with Murray was not talking about race. It was specifically talking about whether specific programs

... (read more)
4Jiro17m
The reason that I can make a statement about journalists based on this is that the New York Times really is big and influential in the journalism profession. On the other hand, Poor Minorities aren't representative of poor minorities. Not only that, the poor minorities example is wrong in the first place. Even the restricted subset of poor minorities don't all want to steal your company's money. The motte-and-bailey statement isn't even true about the motte. You never even get to the point of saying something that's true about the motte but false about the bailey.
2tailcalled25m
I get that this is an argument one could make. But the reason I started this tangent was because you said: That is, my original argument was not in response to the "Anyway, if the true benefit is zero (as I believe), then we don’t have to quibble over whether the cost was big or small" part of your post, it was to the vibe/ideology part. Where I was trying to say, it doesn't seem to me that Cade Metz was the one who introduced this vibe/ideology, rather it seems to have been introduced by rationalists prior to this, specifically to defend tinkering with taboo topics. Like, you mention that Cade Metz conveys this vibe/ideology that you disagree with, and you didn't try to rebut I directly, I assumed because Cade Metz didn't defend it but just treated it as obvious. And that's where I'm saying, since many rationalists including Scott Alexander have endorsed this ideology, there's a sense in which it seems wrong, almost rude, to not address it directly. Like a sort of Motte-Bailey tactic.
2Jiro26m
You don't need to use rationalist grammar to convince rationalists that you like them. You just need to know what biases of theirs to play upon, what assumptions they're making, how to reassure them, etc. The skills for pretending to be someone's friend are very different from the skills for acting like them.

Here's a very neat twitter thread: the author sends various multimodal models screenshots of the conversation he's currently having with them, and asks them to describe the images. Most models catch on fast: the author describes this as them passing the mirror test.

I liked the direction, so I wanted to check if ChatGPT could go from recognising that the images are causally downstream of it to actually exercising control over the images. I did this by challenging it to include certain text in the images I was sending it.

And the answer is yes! In this case it took three images for ChatGPT to get the hang of it.

OpenAI doesn't support sharing conversations with images, but I've taken screenshots of the whole conversation below: it took three images...

The only way ChatGPT can control anything is by writing text, so figuring out that it should write the text that should appear in the image seems pretty straightforward. It only needs to rationalize why this would work.

This is my personal opinion, and in particular, does not represent anything like a MIRI consensus; I've gotten push-back from almost everyone I've spoken with about this, although in most cases I believe I eventually convinced them of the narrow terminological point I'm making.

In the AI x-risk community, I think there is a tendency to ask people to estimate "time to AGI" when what is meant is really something more like "time to doom" (or, better, point-of-no-return). For about a year, I've been answering this question "zero" when asked.

This strikes some people as absurd or at best misleading. I disagree.

The term "Artificial General Intelligence" (AGI) was coined in the early 00s, to contrast with the prevalent paradigm of Narrow AI. I was getting my undergraduate computer science...

No, I was talking about the results. lsusr seems to use the term in a different sense than Scott Alexander or Yann LeCun. In their sense it's not an alternative to backpropagation, but a way of constantly predicting future experience and to constantly update a world model depending on how far off those predictions are. Somewhat analogous to conditionalization in Bayesian probability theory.

I haven't watched the LeCun interview you reference (it is several hours long, so relevant time-stamps to look at would be appreciated), but this still does not make sen... (read more)

2abramdemski39m
Yeah, I didn't do a very good job in this respect. I am not intending to talk about a transformer by itself. I am intending to talk about transformers with the sorts of bells and whistles that they are currently being wrapped with. So not just transformers, but also not some totally speculative wrapper.
2abramdemski1h
The replace-human-labor test gets quite interesting and complex when we start to time-index it. Specifically, two time-indexes are needed: a 'baseline' time (when humans are doing all the relevant work) and a comparison time (where we check how much of the baseline economy has been automated). Without looking anything up, I guess we could say that machines have already automated 90% of the economy, if we choose our baseline from somewhere before industrial farming equipment, and our comparison time somewhere after. But this is obviously not AGI. A human who can do exactly what GPT4 can do is not economically viable in 2024, but might have been economically viable in 2020.
2Gerald Monroe1h
You also have a simple algorithm problem. Humans learn by replacing bad policy with good. Aka a baby replaces "policy that drops objects picked up" ->. "policy that usually results in object retention". This is because at a mechanistic level the baby tries many times to pickup and retain objects, and a fixed amount of circuitry in their brain has connections that resulted in a drop down weighted and ones they resulted in retention reinforced. This means that over time as the baby learns, the compute cost for motor manipulation remains constant. Technically O(1) though thats a bit of a confusing way to express it. With in context window learning, you can imagine an LLM+ robot recording : Robotic token string: <string of robotic policy tokens 1> : outcome, drop Robotic token string: <string of robotic policy tokens 2> : outcome, retain Robotic token string: <string of robotic policy tokens 2> : outcome, drop And so on extending and consuming all of the machines context window, and every time the machine decides which tokens to use next it needs O(n log n) compute to consider all the tokens in the window. (Used to be n^2, this is a huge advance) This does not scale. You will not get capable or dangerous AI this way. Obviously you need to compress that linear list of outcomes from different strategies to update the underlying network that generated them so it is more likely to output tokens that result in success. Same for any other task you want the model to do. In context learning scales poorly. This also makes it safe....

Lots of people already know about Scott Alexander/ACX/SSC, but I think that crossposting to LW is unusually valuable in this particular case, since lots of people were waiting for a big schelling-point overview of the 15-hour Rootclaim Lab Leak debate, and unlike LW, ACX's comment section is a massive vote-less swamp that lags the entire page and gives everyone equal status. 

It remains unclear whether commenting there is worth your time if you think you have something worth saying, since there's no sorting, only sifting, implying that it attracts small numbers of sifters instead of large numbers of people who expect sorting.

Here are the first 11 paragraphs:

Saar Wilf is an ex-Israeli entrepreneur. Since 2016, he’s been developing a new form of reasoning, meant to transcend normal human bias.

His

...

Way back in 2020 there was an article A Proposed Origin For SARS-COV-2 and the COVID-19 Pandemic, which I read after George Church tweeted it (!) (without comment or explanation). Their proposal (they call it "Mojiang Miner Passage" theory) in brief was that it WAS a lab leak but NOT gain-of-function. Rather, in April 2012, six workers in a "Mojiang mine fell ill from a mystery illness while removing bat faeces. Three of the six subsequently died." Their symptoms were a perfect match to COVID, and two were very sick for more than four months.

The proposal i... (read more)

2Gerald Monroe3h
One thing that occurs to me is that each analysis, such as the Putin one, can be thought of as a function hypothesis. It takes as inputs the variables: Russian demographics healthy lifestyle family history facial swelling hair present And is outputting the probability 86%, where the function is P = F(demographics, lifestyle, history, swelling, hair) and then each term is being looked up in some source, which has a data quality, and the actual equation seems to be a mix of Bayes and simple probability calculations. There are other variables not considered, and other valid reasoning tracks.  You could take into account the presence of oncologists in putin's personal staff.  Intercepted communication possibly discussing it.  Etc.  I'm not here to discuss the true odds of putin developing cancer, but note that if the above is "function A", and another function that takes into account different information is "function B", you should be aggregating all valid functions, forming a "probability forest".   Perhaps you weight each one by the likelihood of the underlying evidence being true.  For example each of the above facts is effectively 100% true except for the hair present (putin could have received a hair transplant) and family history (some relative causes of death could be unknown or suspicious that it was cancer) This implies a function "A'n", where we assume and weight in the probability that each combination of the underlying variables has the opposite value.  For example, if pHair_Present = 0.9, A' has one permutation where the hair is not present due to a transplant. This hints at why a panel of superforecasters is presently the best we can do.  Many of them do simple reasoning like this and we see it in the comment section on Manifold.  But each individual human doesn't have the time to think of 100 valid hypotheses and to calculate the resulting probability, many manifold bettors seem to usually consider 1 and bet their mana. An AI system (LLM bas

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

Unsure if there is normally a thread for putting only semi-interesting news articles, but here is a recently posted news article by Wired that seems.... rather inflammatory toward Effective Altruism. I have not read the article myself yet, but a quick skim confirms the title is not only to get clickbait anger clicks, the rest of the article also seems extremely critical of EA, transhumanism, and Rationality. 

I am going to post it here, though I am not entirely sure if getting this article more clicks is a good thing, so if you have no interest in read... (read more)

8complicated.world17h
Hi LessWrong Community! I'm new here, though I've been an LW reader for a while. I'm representing complicated.world website, where we strive to use similar rationality approach as here and we also explore philosophical problems. The difference is that, instead of being a community-driven portal like you, we are a small team which is working internally to achieve consensus and only then we publish our articles. This means that we are not nearly as pluralistic, diverse or democratic as you are, but on the other hand we try to present a single coherent view on all discussed problems, each rooted in basic axioms. I really value the LW community (our entire team does) and would like to start contributing here. I would also like to present from time to time a linkpost from our website - I hope this is ok. We are also a not-for-profit website.
3habryka15h
Hey!  It seems like an interesting philosophy. Feel free to crosspost. You've definitely chosen some ambitious topics to try to cover, which I am generally a fan of.
1complicated.world6h
Thanks! The key to topic selection is where we find that we are most disagreeing with the popular opinions. For example, the number of times I can cope with hearing someone saying "I don't care about privacy, I have nothing to hide" is limited. We're trying to have this article out before that limit is reached. But in order to reason about privacy's utility and to ground it in root axioms, we first have to dive into why we need freedom. That, in turn requires thinking about mechanisms of a happy society. And that depends on our understanding of happiness, hence that's where we're starting.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
This is a linkpost for https://arxiv.org/abs/2403.09863

Hi, I’d like to share my paper that proposes a novel approach for building white box neural networks.

The paper introduces semantic features as a general technique for controlled dimensionality reduction, somewhat reminiscent of Hinton’s capsules and the idea of “inverse rendering”. In short, semantic features aim to capture the core characteristic of any semantic entity - having many possible states but being at exactly one state at a time. This results in regularization that is strong enough to make the PoC neural network inherently interpretable and also robust to adversarial attacks - despite no form of adversarial training! The paper may be viewed as a manifesto for a novel white-box approach to deep learning.

As an independent researcher I’d be grateful for your feedback!

4mishka14h
This looks interesting, thanks! This post could benefit from an extended summary. In lieu of such a summary, in addition to the abstract I'll quote a paragraph from Section 1.2, "The core idea"
2Maciej Satkiewicz3h
Thank you! The quote you picked is on point, I added an extended summary based on this, thanks for the suggestion!

Thanks, this is very interesting.

I wonder if this approach is extendable to learning to predict the next word from a corpus of texts...

The first layer might perhaps still be embedding from words to vectors, but what should one do then? What would be a possible minimum viable dataset?

Perhaps, in the spirit of PoC of the paper, one might consider binary sequences of 0s and 1s, and have only two words, 0 and 1, and ask what would it take to have a good predictor of the next 0 or 1 given a long sequence of those as a context. This might be a good starting point, and then one might consider different examples of that problem (different examples of (sets of) sequences of 0 and 1 to learn from).

Summary: The post describes a method that allows us to use an untrustworthy optimizer to find satisficing outputs.

Acknowledgements: Thanks to Benjamin Kolb (@benjaminko), Jobst Heitzig (@Jobst Heitzig) and Thomas Kehrenberg (@Thomas Kehrenberg)  for many helpful comments.

Introduction

Imagine you have black-box access to a powerful but untrustworthy optimizing system, the Oracle. What do I mean by "powerful but untrustworthy"? I mean that, when you give an objective function  as input to the Oracle, it will output an element  that has an impressively low[1] value of . But sadly, you don't have any guarantee that it will output the optimal element and e.g. not one that's also chosen for a different purpose (which might be dangerous for many reasons, e.g. instrumental convergence).

What questions can you safely ask the Oracle? Can you use it to...

First thought: The oracle is going to choose to systematically answer or not answer the queries we give it. This represents a causal channel of one bit per query it can use to influence the outside world[1]. Can you conquer the world in one awkwardly delivered kilobyte or less? Maybe.

Maybe we can stop that by scrapping every Oracle that doesn't answer and training a new one with presumably new goals? Or would the newly trained Oracles just cooperate with the former dead ones in one long-term plan to break out, take control, and reward all the dead Oracles ... (read more)

On the 3rd of October 2351 a machine flared to life. Huge energies coursed into it via cables, only to leave moments later as heat dumped unwanted into its radiators. With an enormous puff the machine unleashed sixty years of human metabolic entropy into superheated steam.

In the heart of the machine was Jane, a person of the early 21st century.

From her perspective there was no transition. One moment she had been in the year 2021, sat beneath a tree in a park. Reading a detective novel.

Then the book was gone, and the tree. Also the park. Even the year.

She found herself laid in a bathtub, immersed in sickly fatty fluids. She was naked and cold.

The first question Jane had for the operators and technicians who greeted her...

2Ben6h
Also, thank you for mentioning Worth the Candle. I had not heard of it before but am now enjoying it quite a lot.

I was ultimately disappointed by it - somewhat like Umineko, there is a severe divergence from reader expectations. Alexander Wales's goal for it, however well he achieved it by his own lights, was not one that is of interest to me as a reader, and it wound up being less than the sum of its parts for me. So I would have enjoyed it better if I had known from the start to read it for its parts (eg. revision mages or 'unicorns' or 'Doris Finch').

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA