In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

I like the fact that despite not being (relatively) young when they died, the LW banner states that Kahneman & Vinge have died "FAR TOO YOUNG", pointing to the fact that death is always bad and/or it is bad when people die when they were still making positive contributions to the world (Kahneman published "Noise" in 2021!).
I thought I didn’t get angry much in response to people making specific claims. I did some introspection about times in the recent past when I got angry, defensive, or withdrew from a conversation in response to claims that the other person made.  After some introspection, I think these are the mechanisms that made me feel that way: * They were very confident about their claim. Partly I felt annoyance because I didn’t feel like there was anything that would change their mind, partly I felt annoyance because it felt like they didn’t have enough status to make very confident claims like that. This is more linked to confidence in body language and tone rather than their confidence in their own claims though both matter.  * Credentialism: them being unwilling to explain things and taking it as a given that they were correct because I didn’t have the specific experiences or credentials that they had without mentioning what specifically from gaining that experience would help me understand their argument. * Not letting me speak and interrupting quickly to take down the fuzzy strawman version of what I meant rather than letting me take my time to explain my argument. * Morality: I felt like one of my cherished values was being threatened.  * The other person was relatively smart and powerful, at least within the specific situation. If they were dumb or not powerful, I would have just found the conversation amusing instead.  * The other person assumed I was dumb or naive, perhaps because they had met other people with the same position as me and those people came across as not knowledgeable.  * The other person getting worked up, for example, raising their voice or showing other signs of being irritated, offended, or angry while acting as if I was the emotional/offended one. This one particularly stings because of gender stereotypes. I think I’m more calm and reasonable and less easily offended than most people. I’ve had a few conversations with men where it felt like they were just really bad at noticing when they were getting angry or emotional themselves and kept pointing out that I was being emotional despite me remaining pretty calm (and perhaps even a little indifferent to the actual content of the conversation before the conversation moved to them being annoyed at me for being emotional).  * The other person’s thinking is very black-and-white, thinking in terms of a very clear good and evil and not being open to nuance. Sort of a similar mechanism to the first thing.  Some examples of claims that recently triggered me. They’re not so important themselves so I’ll just point at the rough thing rather than list out actual claims.  * AI killing all humans would be good because thermodynamics god/laws of physics good * Animals feel pain but this doesn’t mean we should care about them * We are quite far from getting AGI * Women as a whole are less rational than men are * Palestine/Israel stuff   Doing the above exercise was helpful because it helped me generate ideas for things to try if I’m in situations like that in the future. But it feels like the most important thing is to just get better at noticing what I’m feeling in the conversation and if I’m feeling bad and uncomfortable, to think about if the conversation is useful to me at all and if so, for what reason. And if not, make a conscious decision to leave the conversation. Reasons the conversation could be useful to me: * I change their mind * I figure out what is true * I get a greater understanding of why they believe what they believe * Enjoyment of the social interaction itself * I want to impress the other person with my intelligence or knowledge Things to try will differ depending on why I feel like having the conversation. 
habryka4d5120
10
A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time.  We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully. I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.
Novel Science is Inherently Illegible Legibility, transparency, and open science are generally considered positive attributes, while opacity, elitism, and obscurantism are viewed as negative. However, increased legibility in science is not always beneficial and can often be detrimental. Scientific management, with some exceptions, likely underperforms compared to simpler heuristics such as giving money to smart people or implementing grant lotteries. Scientific legibility suffers from the classic "Seeing like a State" problems. It constrains endeavors to the least informed stakeholder, hinders exploration, inevitably biases research to be simple and myopic, and exposes researchers to constant political tug-of-war between different interest groups poisoning objectivity.  I think the above would be considered relatively uncontroversial in EA circles.  But I posit there is something deeper going on:  Novel research is inherently illegible. If it were legible, someone else would have already pursued it. As science advances her concepts become increasingly counterintuitive and further from common sense. Most of the legible low-hanging fruit has already been picked, and novel research requires venturing higher into the tree, pursuing illegible paths with indirect and hard-to-foresee impacts.
Recently someone either suggested to me (or maybe told me they or someone where going to do this?) that we should train AI on legal texts, to teach it human values. Ignoring the technical problem of how to do this, I'm pretty sure legal text are not the right training data. But at the time, I could not clearly put into words why. Todays SMBC explains this for me: Saturday Morning Breakfast Cereal - Law (smbc-comics.com) Law is not a good representation or explanation of most of what we care about, because it's not trying to be. Law is mainly focused on the contentious edge cases.  Training an AI on trolly problems and other ethical dilemmas is even worse, for the same reason. 

Popular Comments

Recent Discussion

Summary

  • Context: Sparse Autoencoders (SAEs) reveal interpretable features in the activation spaces of language models. They achieve sparse, interpretable features by minimizing a loss function which includes an  penalty on the SAE hidden layer activations. 
  • Problem & Hypothesis: While the SAE  penalty achieves sparsity, it has been argued that it can also cause SAEs to learn commonly-composed features rather than the “true” features in the underlying data.
  • Experiment: We propose a modified setup of Anthropic’s ReLU Output Toy Model where data vectors are made up of sets of composed features. We study the simplest possible version of this toy model with two hidden dimensions for ease of comparison to many of Anthropic’s visualizations.
...

Hi Ali, sorry for my slow response, too! Needed to think on it for a bit.

  • Yep, you could definitely generate the dataset with a different basis (e.g., [1,0,0,0] = 0.5*[1,0,1,0] + 0.5*[1,0,-1,0]).
  • I think in the context of language models, learning a different basis is a problem. I assume that, there, things aren't so clean as "you can get back the original features by adding 1/2 of that and 1/2 of this". I'd imagine it's more like feature1 = "the in context A", feature 2 = "the in context B", feature 3 = "the in context C". And if the is a real feature (I'm
... (read more)

This is my personal opinion, and in particular, does not represent anything like a MIRI consensus; I've gotten push-back from almost everyone I've spoken with about this, although in most cases I believe I eventually convinced them of the narrow terminological point I'm making.

In the AI x-risk community, I think there is a tendency to ask people to estimate "time to AGI" when what is meant is really something more like "time to doom" (or, better, point-of-no-return). For about a year, I've been answering this question "zero" when asked.

This strikes some people as absurd or at best misleading. I disagree.

The term "Artificial General Intelligence" (AGI) was coined in the early 00s, to contrast with the prevalent paradigm of Narrow AI. I was getting my undergraduate computer science...

14Random Developer1h
Yeah, the precise ability I'm trying to point to here is tricky. Almost any human (barring certain forms of senility, severe disability, etc) can do some version of what I'm talking about. But as in the restaurant example, not every human could succeed at every possible example. I was trying to better describe the abilities that I thought GPT-4 was lacking, using very simple examples. And it started looking way too much like a benchmark suite that people could target. Suffice to say, I don't think GPT-4 is an AGI. But I strongly suspect we're only a couple of breakthroughs away. And if anyone builds an AGI, I am not optimistic we will remain in control of our futures.

Got it, makes sense, agreed.

2jmh2h
I found this an interesting but complex read for me -- both the post and the comments. I found a number of what seemed good points to consider, but I seem to be coming away from the discussion thinking about the old parable of the blind men and the elephant.
2Logan Zoellner3h
Absolutely.  I don't think it's impossible to build such a system.  In fact, I think a transformer is probably about 90% there.   Need to add trial and error, some kind of long-term memory/fine-tuning and a handful of default heuristics.  Scale will help too, but no amount of scale alone will get us there.

On 16 March 2024, I sat down to chat with New York Times technology reporter Cade Metz! In part of our conversation, transcribed below, we discussed his February 2021 article "Silicon Valley's Safe Space", covering Scott Alexander's Slate Star Codex blog and the surrounding community.

The transcript has been significantly edited for clarity. (It turns out that real-time conversation transcribed completely verbatim is full of filler words, false starts, crosstalk, "uh huh"s, "yeah"s, pauses while one party picks up their coffee order, &c. that do not seem particularly substantive.)


ZMD: I actually have some questions for you.

CM: Great, let's start with that.

ZMD: They're critical questions, but one of the secret-lore-of-rationality things is that a lot of people think criticism is bad, because if someone criticizes you, it hurts your...

Hm. I think we like Slate Star Codex in this thread, so let's enjoy a throwback:

It was wrong of me to say I hate poor minorities. I meant I hate Poor Minorities! Poor Minorities is a category I made up that includes only poor minorities who complain about poverty or racism.

No, wait! I can be even more charitable! A poor minority is only a Poor Minority if their compaints about poverty and racism come from a sense of entitlement. Which I get to decide after listening to them for two seconds. And If they don’t realize that they’re doing something wrong, th

... (read more)
1Alex Vermillion39m
I'd be amenable to quibbles over the lock thing, though I think it's still substantially different. A better metaphor (for the situation that Cade Metz claims is the case, which may or may not be correct) making use of locks would be "Anyone can open the lock by putting any key in. By opening the lock with my own key, I have done no damage". I do not believe that Cade Metz used specialized hacking equipment to reveal Scott's last name unless this forum is unaware of how to use search engines.
1Alex Vermillion37m
Your comment is actually one of the ones in the thread that replied to mine that I found least inane, so I will stash this downthread of my reply to you: I think a lot of the stuff Cade Metz is alleged to say above is dumb as shit and is not good behavior. However, I don't need to make bad metaphors, abuse the concept of logical validity, or do anything else that breaks my principles to say that the behavior is bad, so I'm going to raise an issue with those where I see them and count on folks like you to push back the appropriate extent so that we can get to a better medium together.
3Alex Vermillion41m
I don't (and shouldn't) care what Scott Alexander believes in order to figure out whether what Cade Metz said was logically valid. You do not need to figure out how many bones a cat has to say that "The moon is round, so a cat has 212 bones" is not valid.

This is the eighth post in my series on Anthropics. The previous one is Lessons from Failed Attempts to Model Sleeping Beauty Problem. The next one is Beauty and the Bets.

Introduction

Suppose we take the insights from the previous post, and directly try to construct a model for the Sleeping Beauty problem based on them.

We expect a halfer model, so

On the other hand, in order not repeat Lewis' Model's mistakes:

But both of these statements can only be true if 

And, therefore, apparently,  has to be zero, which sounds obviously wrong. Surely the Beauty can be awaken on Tuesday! 

At this point, I think, you wouldn't be surprised, if I tell you that there are philosophers who are eager to bite this bullet and claim that the Beauty should, indeed, reason as...

1Ape in the coat9h
Let it be not two different days but two different half-hour intervals. Or even two milliseconds - this doesn't change the core of the issue that sequential events are not mutually exclusive. It very much bears a connection. If you are observing state TH it necessary means that either you've already observed or will observe state TT. The definition of a sample space - it's supposed to be constructed from mutually exclusive elementary outcomes.  Disagree on both accountsd. You can't treat HH HT TT TH as individual outcomes and the term "morning of observation" is underspecified. The subject knows that some of them happen sequentially. I noticed, and I applaud your attempts. But you can't do that because you still have sequential events, anyway, the fact that you call them differently doesn't change much. Exactly. And the Beauty knows it. Case closed. She knows that they do not happen at random. This is enough to be sure that each day is not completely independent probability experiment. See Effects of Amnesia section. Call them "states" if you want. It doesn't change anything. I've specifically explained how. We write down outcomes when the researcher sees the Beauty awake - when they updated on the fact of Beauty's awakening. The frequency for three outcomes is 1/3, moreover they actually go in random order because the observer witnesses only one random awakening per experiment.  Yep, no one is arguing with that. The problem is that the order isn't random as your model predicts - TH and TT always go in pairs. No, I'm not complicating this with two lists for each day. There is only one list, which documents all the awakenings of the subject, while she is going through the series of experiments. The theory that predicts that two awakening are "completely independent probability experiments" expect that the order of the awakenings is random and it's proven wrong because there is an order between awakenings. Easy as that. You are mistaken about what the amnes

Let it be not two different days but two different half-hour intervals.  Or even two milliseconds - this doesn't change the core of the issue that sequential events are not mutually exclusive.

OUTCOME: A measurable result of a random experiment.

SAMPLE SPACE: a set of exhaustive, mutually exclusive outcomes of a random experiment.

EVENT: Any subset of the sample space of a random experiment.

INDEPENDENT EVENTS: If A and B are events from the same sample space, and the occurrence of event A does not affect the chances of the occurrence of event B, then A a... (read more)

This in the (bi-)annual ACX/SCC Schelling Meetup, where you can meet like-minded curious folks. This time I reserved an indoor space! I'm pleased to announce that we meet on Saturday 27nd of April at 15:00 at Leih-Lokal Freiräume, Gerwigstraße 41, Karlsruhe.

This is a foremost social event and there is no structure or schedule. Just come and enjoy the discourse about any topic you are interested in.

I'll try to provide some snacks so please RSVP for a better estimate of the expected number of mouths to feed.

The Karlsruhe Rationality group (currently in hiatus) aims to connect Rationalists from Karlsruhe (Germany) and surrounding areas. Everyone worries they're not serious enough about ACX to join, so you should banish that thought and come anyway.  "Please feel free to come even if you feel awkward about it, even if you’re not 'the typical ACX reader', even if you’re worried people won’t like you", even if you didn't come to the previous meetings, even if you don't speak German, etc., etc.

The location is confirmed :)
 

Lots of people already know about ACX/SSC, but I think that crossposting to LW is unusually valuable in this particular case, since lots of people were waiting for a big schelling-point overview of the 15-hour Rootclaim Lab Leak debate, and unlike LW, ACX's comment section is a massive vote-less swamp that lags the entire page and gives everyone equal status. 

It remains unclear whether commenting there is worth your time if you think you have something worth saying, since there's no sorting, only sifting, implying that it attracts small numbers of sifters instead of large numbers of people who expect sorting.

Here are the first 11 paragraphs:

Saar Wilf is an ex-Israeli entrepreneur. Since 2016, he’s been developing a new form of reasoning, meant to transcend normal human bias.

His method

...

One thing that occurs to me is that each analysis, such as the Putin one, can be thought of as a function hypothesis.

It takes as inputs the variables:

Russian demographics

healthy lifestyle

family history

facial swelling

hair present

And is outputting the probability 86%, where the function is

P = F(demographics, lifestyle, history, swelling, hair) and then each term is being looked up in some source, which has a data quality, and the actual equation seems to be a mix of Bayes and simple probability calculations.

There are other variables not considered, and other... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

I want to thank Jan Kulveit, Tomáš Gavenčiak, and Jonathan Shock for their extensive feedback and ideas they contributed to this work and for Josh Burgener and Yusuf Heylen for their proofreading and comments. I would also like to acknowledge the Epistea Residency and its organisers where much of the thinking behind this work was done.

This post aims to build towards a theory of how meditation alters the mind based on the ideas of active inference (ActInf). ActInf has been growing in its promise as a theory of how brains process information and interact with the world and has become increasingly validated with a growing body of work in the scientific literature.

Why bring the idea of ActInf and meditation together? Meditation seems to have a profound effect on...

In his method, I think the happiness of the first few Jhanas is not caused by prediction error directly, but rather indirectly through the activation of the reward circuitry. So while the method involves creating some amount of prediction error, the ultimate result is less overall prediction error, because the reward neurotransmitters bring the experiential world closer to the ideal.

After the first three Jhanas, the reward circuitry is less relevant and you start to reduce overall prediction error through other means, by allowing attention to let go of asp... (read more)

1cesiumquail1h
I would say the warm shower causes less prediction error than the cold shower because it’s less shocking to the body, but there’s still a very subtle amount of discomfort which is hidden under all the positive feelings. The level of discomfort I’m talking about is very slight, but you would notice it if there was nothing else occupying your attention. I don’t mean to say it causes negative emotions. It’s more like the discomfort of imagining an unsatisfying shape, or watching a video at slightly lower resolution. If you compare any activity to deep sleep or unconsciousness, you can find sensations that grab your attention by being slightly irritating. As long as it’s noticeable I think it causes slight negative valence. But this is often outweighed by other aspects of the activity that increase valence. Sitting at home doing nothing might involve the negative sensations of boredom, restlessness, and impatience, all of which disappear when we go for a walk, so any discomfort is hard to notice underneath the obvious increase in valence.

Summary: The post describes a method that allows us to use an untrustworthy optimizer to find satisficing outputs.

Acknowledgements: Thanks to Benjamin Kolb (@benjaminko), Jobst Heitzig (@Jobst Heitzig) and Thomas Kehrenberg (@Thomas Kehrenberg)  for many helpful comments.

Introduction

Imagine you have black-box access to a powerful but untrustworthy optimizing system, the Oracle. What do I mean by "powerful but untrustworthy"? I mean that, when you give an objective function  as input to the Oracle, it will output an element  that has an impressively low[1] value of . But sadly, you don't have any guarantee that it will output the optimal element and e.g. not one that's also chosen for a different purpose (which might be dangerous for many reasons, e.g. instrumental convergence).

What questions can you safely ask the Oracle? Can you use it to...

There's a particular kind of widespread human behavior that is kind on the surface, but upon closer inspection reveals quite the opposite. This post is about four such patterns.

 

Computational Kindness

One of the most useful ideas I got out of Algorithms to Live By is that of computational kindness. I was quite surprised to only find a single mention of the term on lesswrong. So now there's two.

Computational kindness is the antidote to a common situation: imagine a friend from a different country is visiting and will stay with you for a while. You're exchanging some text messages beforehand in order to figure out how to spend your time together. You want to show your friend the city, and you want to be very accommodating and make sure...

Forget where I read it, but this Idea seems similar. When responding to a request, being upfront about your boundaries or constraints feels intense but can be helpful for both parties. If Bob asks Alice to help him move, and Alice responds "sure thing" that leaves the interaction open to miscommunication. But if instead Alice says, " yeah! I am available 1pm to 5pm and my neck has been bothering me so no heavy lifting for me!" Although that's seems like less of a kind response Bob now doesn't have to guess at Alice's constraints and can comfortably move forward without feeling the need to tiptoe around how long and to what degree Alice can help.

1CstineSublime16h
This is an extremely relatable post, in both ways. I often find myself on the other side of the these interactions too and not knowing how to label and describe my awareness of what's happening without coming across as Larry David from Curb Your Enthusiasm.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA