Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.

Recent Discussion

Here we briefly summarize the results so far from our U.S. nationally representative survey on Artificial Intelligence, Morality, and Sentience (AIMS), conducted in 2021 and 2023. The full reports are available on Sentience Institute’s website for the AIMS 2023 Supplemental SurveyAIMS 2023 Main Survey, and AIMS 2021 Main Survey. The raw data is available on Mendeley.

tl;dr: Results show that, from 2021 to 2023, there were increases in expectations of AI harm, moral concern for AIs, and mind perception of AIs. U.S. adults expect sentient AI to be developed sooner, now only in five years (median), and they strongly support AI regulation and slowdown.


Americans are significantly more concerned about AI in 2023 than they were in 2021 before ChatGPT. Only 23% of U.S. adults trust AI companies to put safety over...

1Alan E Dunne11h
How did you get respondents? Why are they "nationally representative"?

The methodology says "We used iSay/Ipsos, Dynata, Disqo, and other leading panels to recruit the nationally representative sample". (They also say elsewhere that "Responses were census-balanced based on the American Community Survey 2021 estimates for age, gender, region, race/ethnicity, education, and income using the “raking” algorithm of the R “survey” package".)

A few days ago I wrote about my experience with MathML, and despite being somewhat positive on it in that post I've decided to stop using it for now. The problem is, it doesn't display for people who follow my blog through RSS on (I'm guessing) most popular RSS system.

Here's a screenshot from my most recent MathML-containing post, on my website:

And here's the same portion of that post running in the web version of Feedly on the same browser:

Poking at developer tools, here's what Feedly is sending over the network to my browser:

<p> It definitely does look nicer: </p> <p> </p> <p> On the other hand

This shows that they're removing the MathML on the server, instead of there being some issue once it gets to the client.

This also explains why it doesn't work in other...

(You probably don't want this, I am just saying it as a possible solution.)

You could simply give up on rendering math in browser and instead create images on server. You are already willing to go the extra step of using the verbosifier. Why not instead use a program that will convert the expression to image (and automatically generate the HTML tag, including the alt attribute), and use that?

You could reuse the images by putting them in a separate "images/math" directory with an automatically created filename (could be a hash function of the math expression... (read more)

Epistemic status: model which I find sometimes useful, and which emphasizes some true things about many parts of the world which common alternative models overlook. Probably not correct in full generality.

Consider Yoshua Bengio, one of the people who won a Turing Award for deep learning research. Looking at his work, he clearly “knows what he’s doing”. He doesn’t know what the answers will be in advance, but he has some models of what the key questions are, what the key barriers are, and at least some hand-wavy pseudo-models of how things work.

For instance, Bengio et al’s “Unitary Evolution Recurrent Neural Networks”. This is the sort of thing which one naturally ends up investigating, when thinking about how to better avoid gradient explosion/death in e.g. recurrent nets, while...

Same, but I'm more skeptical. At ICML there were many papers that seemed well motivated and had deep models, probably well over 5%. So the skill of having deep models is not limited to visionaries like Bengio. Also I'd guess that a lot of why the field is so empirical is less that nobody is able to form models, but rather that people have models, but rationally put more trust in empirical research methods than in their inside-view models. When I talked to the average ICML presenter they generally had some reason they expected their research to work, even i... (read more)

I think there's another, bigger reason why this happens. Workmanship and incremental progress are predictable. In many fields (academic research comes to mind), in an attempt to optimize productivity and capture in a bottle the lightning of genius, we've built edifices of complex metrics that we then try to maximise in tight feedback loops. But if you are being judged by those metrics and need to deliver results on a precise schedule, it's usually a lot more reliable to just try and do repeatable things that will produce a steady trickle of small results than to sit down studying models of things hoping this will lead you to a greater intuition down the line. After long enough, yeah, people start feeling like that is what real expertise looks like. This is absolutely a problem and IMO makes worse the issue of stagnation in some fields of research and the replicability crisis.
2Thane Ruthenis1h
... I think so, yes. It would feel like they're just pretending like they know how to deal with customers, that they're just pretending to be professional staffers who know the ins and outs of the establishment, while in fact they just walked in from their regular lives, put on a uniform, and are not at all comfortable in that skin. An impression that they should feel like an appendage of a megacorporation, an appendage which may not be important by itself, but is still part of a greater whole; while in actuality, they're just LARPing being that appendage. An angry or confused customer confronts them about something, and it's as if they should know how to handle that off the top of their head, but no, they need to scramble and fiddle around and ask their coworkers and make a mess of it. Or, at least, that's what I imagine I'd initially feel in that role.
4Thane Ruthenis2h
Yup. Fundamentally, I think that human minds (and practically-implemented efficient agents in general) consist of a great deal of patterns/heuristics of variable levels of shallowness, the same way LLMs are, plus a deeper general-intelligence algorithm. System 1 versus System 2, essentially; autopilot versus mindfulness. Most of the time, most people are operating on these shallow heuristics, and they turn on the slower general-intelligence algorithm comparatively rarely. (Which is likely a convergent evolutionary adaptation, but I digress.) And for some people, it's rarer than for others; and some people use it in different domains than others. * Some people don't apply it to their social relationships. Playing the characters the society assigned them, instead of dropping the theatrics and affecting real change in their lives. * Others don't apply it to their political or corporate strategizing. Simulacrum Levels 3-4: operating on vibes or reaction patterns, not models of physical reality. * Others don't apply it to their moral reasoning: deontologists, as opposed to consequentialists. * Still others, as this post suggests, don't apply it to the field in which they're working. * ... plus probably a ton more examples from all kinds of domains. The LW-style rationality, in general, can be viewed as an attempt to get people to use that "deeper" general-purpose reasoning algorithm more frequently. To actively build a structural causal model of reality, drawing on all information streams available to them, and run queries on it, instead of acting off of reactively-learned, sporadically-updating policies. The dark-room metaphor is pretty apt, I think.

In government, it’s not just a matter of having the best policy; it’s about getting enough votes. This creates a problem when the self-interests of individual voters don’t match the best interests of the country.

For instance, voting researchers widely consider the presidential voting system in America to be inferior to many alternatives. But if you want to change it, you require consent from Democrats and Republicans—i.e. the very people who benefit from the status quo.

Or consider the land-value tax. This tax is considered among economists to be uniquely efficient (i.e. it causes zero reduction in the good being taxed). When implemented correctly, it can address edge cases, such as new property developments, and can even prevent reductions in new land production, like the creation of artificial islands....

I've just written a post on how exactly to enforce these contracts: Enforcing Far-Future Contracts for Governments.

Let me know what you think!

I think it's rather unfair to classify me as a confidently underinformed fanatic. I've worked in federal government, the country's largest bank, and am now an investment analyst at a large fund. High confidence usually indicates overconfidence, sure, but that correlation breaks down when someone really has thought deeply about a topic. Mathematicians, for instance, are nearly 100% confident of long-known peer-reviewed claims. I've written quite extensively on optimal policy methodology. Have a read here, The Benevolent Ruler’s Handbook (Part 1): The Policy Problem. As I said in one of my other comments, “As for the 1923 question, I'd say we didn't have a theoretical foundation for what makes a policy optimal. Given that, there is no policy I would have tried to have advocated for in this way (even though the land value tax was invented before 1879). The article that I linked you to contains my attempt to lay those theoretical foundations (or the start of it anyway, I haven't finished it yet).” Once you have these foundations, you can say things like “I know this policy is optimal, and will continue to be so”.
Let me start by saying I think I really should have said 50 years instead of 100. That seems to a big sticking point for people. I haven't made myself clear. What I'm saying is create a contract to enforce the institution of a law. The government is the signatory, not any particular person. Governments can and do hold by agreements that past governments made, even if they disagree with their predecessors. Now, sometimes they break those promises, and that's why I'm talking about creating a binding contract. As in, if you break the contract, a penalty occurs. For example, let's say the government issues $1T of a particular financial asset. Those assets are basically inflation-adjusted perpetuities, except upon the institution of the delayed law, they cease to pay out. (Unless the law is instituted before the 50-year period, in which case they only cease to pay out after 50 years so that the perpetuity buyers aren't swindled.) If the law is not instituted, then they continue to be perpetuities. Upon selling those "enforcement perpetuities", the government splits the ownership of land and buildings. So if you own a house, now you own the house itself, as well as a financial asset for the land underneath it. The land asset pays off a cashflow equal to the land value (if you're taxed $1000 on Jan 1st, the government pays you $1000 on Jan 1st). Now, the money raised from the enforcement perpetuities can be used to buy those land assets from all current landowners. Maybe you'd call this an incentive mechanism, rather than a contract. Whatever, I'm not a legal expert. Whether that's the best solution or not, I don't know. Even if there's something wrong with the contract I just laid out, I would be shocked if there were not a way to do this. I'm sure there are plenty of creative solutions. Obviously, that particular solution doesn't apply to all delayed laws. Different laws will require different solutions. For the voting system problem, make a constitutional amendmen

Epistemic Status: self-reported musings

Defining Sanity

Mental health is a complicated topic, and “sanity” can be a loaded word, so I’ll offer a few definitions to make sure we’re all on the same page.

The colloquial definition of sanity, if such a thing exists, is formed in contrast to insanity. Someone is sane who is not insane.

So what’s insanity?


The colloquial definition of insanity is repeating the same act over and over again and expecting different results.

The dramatic definition of insanity generally involves seeing and/or hearing things that aren’t there, laughing hysterically at nothing, and getting punched in the face by Batman.

My personal definition of insanity is not getting out of bed for two days, not answering your family’s and/or friends’ calls because you’re irrationally terrified of talking to other people,...

My definition of insanity includes non updatable beliefs.

"Sanity" may not be a useful concept in edge cases, but yes, being able to trust your mind to autopilot is definitely within the central definition of sanity, it's a good observation. You may also be interested in Scott's post series on the topic, the latest being
Thanks! I'd love to hear any details you can think of about what you actually do on a daily basis to maintain mental health (when it's already fairly stable). Personally I don't really have a system for this, and I've been lucky that my bad times are usually not that bad in the scheme of things, and they go away eventually.

Today, we’re announcing that Amazon will invest up to $4 billion in Anthropic. The agreement is part of a broader collaboration to develop reliable and high-performing foundation models.

(Thread continues from there with more details -- seems like a notable major development!)

Cheers, I did see that and wondered whether still to post the comment but I do think that having a gigantic company owning a large chunk and presumably a lot of leverage over the company is a new form of pressure so it'd be reassuring to have some discussion of how to manage that relationship.
Didn't Google previously own a large share? So now there are 2 gigantic companies owning a large share, which makes me think each has much less leverage, as Anthropic could get further funding from the other.
Yeah, I agree that that's a reasonable concern, but I'm not sure what they could possibly discuss about it publicly. If the public, legible, legal structure hasn't changed, and the concern is that the implicit dynamics might have shifted in some illegible way, what could they say publicly that would address that? Any sort of "Trust us, we're super good at managing illegible implicit power dynamics." would presumably carry no information, no?

That it is so difficult for Anthropic to reassure people stems from the contrast between Anthropic's responsibility focused mission statements and the hard reality of them receiving billions in dollars of profit motivated investment.

It is rational to draw conclusions by weighting a companies actions more heavily than their PR.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Long ago, there was a mighty king who had everything in the world that he wanted, except trust. Who could he trust, when anyone around him might scheme for his throne? So he resolved to study the nature of trust, that he might figure out how to gain it. He asked his subjects to bring him the most trustworthy thing in the kingdom, promising great riches if they succeeded.

Soon, the first of them arrived at his palace to try. A teacher brought her book of lessons. “We cannot know the future,” she said, “But we know mathematics and chemistry and history; those we can trust.” A farmer brought his plow. “I know it like the back of my hand; how it rolls, and how it turns, and...

As always, amazing writing and world building.  But this feels like part 1 of....   You've stopped just short of the (possible) treacherous turn, without enlightening us to the resolution. 

Isn't that the point? Where we stand now, we have to make a decision without knowing if there will or won't be a treacherous turn...

When transit gets better the land around it becomes more valuable: many people would like to live next to a subway station. This means that there are a lot of public transit expansions that would make us better off, building space for people to live and work. And yet, at least in the US, we don't do very much of this. Part of it is that the benefits mostly go to whoever happens to own the land around the stations.

A different model, which you see with historical subway construction or Hong Kong's MTR, uses the increase in land value to fund transit construction. The idea is, the public transit company buys property, makes it much more valuable by building service to it, and then sells it.

While I would be pretty positive on US...

1Thomas Sepulchre14h
This plan is unrealistic because it assumes that the owner won't price the future value in. Assume you try to do exactly that. Why would the current owner sell the land and the historic price when, in fact, it is very clear that the price will go up once you are done with your project? No, the owner won't sell below the anticipated value of the land, or at least a substantial fraction of it.
There's lots of land that could have transit built to it, but won't unless someone does. The two specific projects I gave as examples are unusual in that they are atypically good fits for transit expansion, but this general approach makes a lot of sense even if you give that up. And has lots of historical precedent. This is a bit like saying no one would sell rural land cheaply to be used for a charter city because once the city is built that land would be really valuable, ignoring that there are many sites a potential charter city builder can choose among.

I don't see your point. If you are saying that there are many nearby sites for transit expansion, then the owner of the land should not sell at a low price, because if any of those sites is chosen, the land value will go up. 

If you are saying that there are many alternative sites across the country, then this is not relevant. Those projects aren't mutually exclusive.

We have to keep in mind that the land owner will not profit from the project being completed once the land is sold, they will only profit from the sale itself. The rational move is either ... (read more)

This is a linkpost for Sparse Autoencoders Find Highly Interpretable Directions in Language Models

We use a scalable and unsupervised method called Sparse Autoencoders to find interpretable, monosemantic features in real LLMs (Pythia-70M/410M) for both residual stream and MLPs. We showcase monosemantic features, feature replacement for Indirect Object Identification (IOI), and use OpenAI's automatic interpretation protocol to demonstrate a significant improvement in interpretability.

Paper Overview

Sparse Autoencoders & Superposition

To reverse engineer a neural network, we'd like to first break it down into smaller units (features) that can be analysed in isolation. Using individual neurons as these units can be useful but neurons are often polysemantic, activating for several unrelated types of feature so just looking at neurons is insufficient. Also, for some types of network activations, like the residual stream...

Did you try searching for similar ideas to your work in the broader academic literature? There seems to be lots of closely related work that you'd find interesting. For example:

Elite BackProp: Training Sparse Interpretable Neurons. They train CNNs to have "class-wise activation sparsity." They claim their method achieves "high degrees of activation sparsity with no accuracy loss" and "can assist in understanding the reasoning behind a CNN."

Accelerating Convolutional Neural Networks via Activation Map Compression. They "propose a three-stage compression and... (read more)