ProLU: A Pareto Improvement for Sparse Autoencoders

Abstract

This paper presents $ProLU$ , an alternative to $ReLU$ for the activation function in sparse autoencoders that produces a pareto improvement over the standard sparse autoencoder architectures and sparse autoencoders trained with Sqrt(L1) penalty.

Introduction

SAE Context and Terminology

S A E (x) = ReLU ((x - b_{d e c}) W_{e n c} + b_{e n c}) W_{d e c} + b_{d e c}

Learnable parameters of a sparse autoencoder:

$W_{e n c}$ : encoder weights
$W_{d e c}$ : decoder weights
$b_{e n c}$ : encoder bias
$b_{d e c}$ : decoder bias

Training

Notation: Encoder/Decoder
Let
$encode (x) = ReLU ((x - b_{d e c}) W_{e n c} + b_{e n c})$ $decode (a) = a W_{d e c} + b_{d e c}$
so that the full computation done by an SAE can be expressed as
$SAE (x) = decode (encode (x))$

An SAE is trained with gradient descent on

L_{t r a i n} = | | x - S A E (x) | |_{2}^{2} + λ P (encode (x))

where $λ$ is the sparsity penalty coefficient (often "L1 coefficient") and $P$ is the sparsity penalty function, used to encourage sparsity.

$P$ is commonly the L1 norm $| | a | |_{1}$ but recently $l_{\frac{1}{2}}$ has been shown to produce a Pareto improvement on the L0 and CE metrics.

Sqrt(L1) SAEs

There has been other work producing pareto improvements to SAEs by taking $P (a) = | | a | |_{1 / 2}^{1 / 2}$ as the penalty function. We will use this as a further baseline to compare against when...

(Continue Reading – 1625 more words)

The Solution to Sleeping Beauty

Ape in the coat

2mo

This is the eighth post in my series on Anthropics. The previous one is Lessons from Failed Attempts to Model Sleeping Beauty Problem. The next one is Beauty and the Bets.

Introduction

Suppose we take the insights from the previous post, and directly try to construct a model for the Sleeping Beauty problem based on them.

We expect a halfer model, so

$P (H e a d s & M o n d a y) = P (H e a d s) = 1 / 2$

On the other hand, in order not repeat Lewis' Model's mistakes:

$P (H e a d s | M o n d a y) = 1 / 2$

But both of these statements can only be true if

$P (M o n d a y) = 1$

And, therefore, apparently, $P (T u e s d a y)$ has to be zero, which sounds obviously wrong. Surely the Beauty can be awaken on Tuesday!

At this point, I think, you wouldn't be surprised, if I tell you that there are philosophers who are eager to bite this bullet and claim that the Beauty should, indeed, reason as...

(Continue Reading – 3721 more words)

2Ape in the coat7h

Well, I think this one is actually correct. But, as I said in the previous comment, the statement "Today is Monday" doesn't actually have a coherent truth value throughout the probability experiment. It's not either True or False. It's either True or True and False at the same time! We can answer every coherently formulated question. Everything that is formally defined has an answer Being careful with the basics allows to understand which question is coherent and which is not. This is the same principle as with every probability theory problem. Consider Sleeping-Beauty experiment without memory loss. There, the event Monday xor Tuesday also can't be said to always happen. And likewise "Today is Monday" also doesn't have a stable truth value throughout the whole experiment. Once again, we can't express Beauty's uncertainty between the two days using probability theory. We are just not paying attention to it because by the conditions of the experiment, the Beauty is never in such state of uncertainty. If she remembers a previous awakening then it's Tuesday, if she doesn't - then it's Monday. All the pieces of the issue are already present. The addition of memory loss just makes it's obvious that there is the problem with our intuition.

Markvy1h10

Re: no coherent “stable” truth value: indeed. But still… if she wonders out loud “what day is it?” at the very moment she says that, it has an answer. An experimenter who overhears her knows the answer. It seems to me that you “resolve” this tension is that the two of them are technically asking a different question, even though they are using the same words. But still… how surprised should she be if she were to learn that today is Monday? It seems that taking your stance to its conclusion, the answer would be “zero surprise: she knew for sure she wou... (read more)

Subjective Questions Require Subjective information

Ben

This summarizes a (possibly trivial) observation that I found interesting.

Story

An all-powerful god decides to play a game. They stop time, grab a random human, and ask them "What will you see next?". The human answers, then time is switched back on and the god looks at how well they performed. Most of the time the humans get it right, but occasionally they are caught by surprise and get it wrong.

To be more generous the god decides to give them access (for the game) to the entirety of all objective facts. The position and momentum of every elementary particle, every thought and memory anyone has ever had (before the time freeze) etc. However, suddenly performance in the game drops from 99% to 0%. How can this be? They...

(Continue Reading – 1058 more words)

tailcalled1h20

An idea I've been playing with recently:

Suppose you have some "objective world" space $Ω$ . Then in order to talk about subjective questions, you need a reference frame, which we could think of as the members of a fiber of some function $ω : I \to Ω$ , for some "interpretation space" $I$ .

The interpretations themselves might abstract to some "latent space" $Λ$ according to a function $λ : I \to Λ$ . Functions of $Λ$ would then be "subjective" (depending on the interpretation they arise from), yet still potentially meaningfully constrained, based on $(λ, ω)$ . In particular if some struct... (read more)

Job Search Advice

zntneo

13y

Some background about me. I currently live in seaside,ca. Have a bs in psychology and an A.A.S in information technology network administration. I currently am a cashier at a gas station but want to find a better job for many reasons. I want a job that will fulfill my high need for analytical thought(high in need for cognition if you know what that means) and problem solving and that hopefully maximizes the amount of time i can be with my wife (who is in the military and "works" 7-3. I am pretty new to the job search thing because i spent 6 years in college with the same job as basically a system admin. (note of worry about all jobs have already developed carpal tunnel and had surgery and...

(See More – 166 more words)

3Morpheus4h

Is this still up-to-date advice? Or is messaging someone over LinkedIn or similar more appropriate? Mostly asking because I got the impression that the internet changed the norms to no one doing phone calls anymore.

2Kaj_Sotala4h

Good question! I would find it plausible that it would have changed, except maybe if the people you'd call would be in their fifties or older.

ChristianKl3h20

There are also people who's job it is to be a lot on the telephone and thus are well-reached by telephone even if they are younger.

Thoughts on seed oil

229

dynomight

This is a linkpost for https://dynomight.net/seed-oil/

A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:

“When are you going to write about seed oils?”

“Did you know that seed oils are why there’s so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”

“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”

“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”

He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it’s critical that we overturn our lives...

(Continue Reading – 4926 more words)

Dzoldzaya4h10

Thanks for this piece. I admit I have always had a bit of residual aversion to seed oils that I've struggled to shake.

Having said that, as you're pushing so strongly against seed oils in favour of "processing" as a mechanism for poor health, I think I need to push back a bit.

If you want to be healthier, we know ways you can change your diet that will help: Increase your overall diet “quality”. Eat lots of fruits and vegetables. Avoid processed food. Especially avoid processed meats.

"Avoid processed food" works very well as a heuristic - far better th... (read more)

1RedMan12h

https://www.mdpi.com/2304-8158/11/21/3412 more recent source on hexane tox. I'm not just talking about the hexane (which isn't usually standardized enough to generalize about), I'm talking about any weird crap on the seed, in the hopper, in the hexane, or accumulated in the process machinery. Hexane dissolves stuff, oil dissolves stuff, and the steam used to crash the hexane out of the oil also dissolves stuff, and by the way, the whole process is high temp and pressure. There's a ton of batch to batch variability and opportunity to introduce chemistry you wouldn't want in your body which just isn't present with "I squeezed some olives between two giant rocks" By your logic, extra virgin olive oil is a waste, just use the olive pomace oil, it's the same stuff, and the solvent extraction vs mechanical pressing just doesn't matter.

3ChristianKl16h

They seem to have similar average BMI and the Swiss seem to have an even lower obesity rate. Belgium seems lower obesity rates than France but slightly higher average BMI. Andorra has lower obesity rates but a significantly higher average BMI. The UK, Spain and Germany are doing worse than France. A bit of chatting with Gemini suggests what Belgium, France and the Swiss share is a strong market culture so food is more fresh.

1capisce9h

And they all eat a lot of butter and dairy products.

AI Regulation is Unsafe

Maxwell Tabarrok

This is a linkpost for https://www.maximum-progress.com/p/ai-regulation-is-unsafe

Concerns over AI safety and calls for government control over the technology are highly correlated but they should not be.

There are two major forms of AI risk: misuse and misalignment. Misuse risks come from humans using AIs as tools in dangerous ways. Misalignment risks arise if AIs take their own actions at the expense of human interests.

Governments are poor stewards for both types of risk. Misuse regulation is like the regulation of any other technology. There are reasonable rules that the government might set, but omission bias and incentives to protect small but well organized groups at the expense of everyone else will lead to lots of costly ones too. Misalignment regulation is not in the Overton window for any government. Governments do not have strong incentives...

(Continue Reading – 1176 more words)

cousin_it4h145

You're saying governments can't address existential risk, because they only care about what happens within their borders and term limits. And therefore we should entrust existential risk to firms, which only care about their own profit in the next quarter?!

7Quadratic Reciprocity13h

From the comment thread: What are specific regulations / existing proposals that you think are likely to be good? When people are protesting to pause AI, what do you want them to be speaking into a megaphone (if you think those kinds of protests could be helpful at all right now)?

9Daniel Kokotajlo11h

Reporting requirements, especially requirements to report to the public what your internal system capabilities are, so that it's impossible to have a secret AGI project. Also reporting requirements of the form "write a document explaining what capabilities, goals/values, constraints, etc. your AIs are supposed to have, and justifying those claims, and submit it to public scrutiny. So e.g. if your argument is 'we RLHF'd it to have those goals and constraints, and that probably works because there's No Evidence of deceptive alignment or other speculative failure modes' then at least the world can see that no, you don't have any better arguments than that. That would be my minimal proposal. My maximal proposal would be something like "AGI research must be conducted in one place: the United Nations AGI Project, with a diverse group of nations able to see what's happening in the project and vote on each new major training run and have their own experts argue about the safety case etc." There's a bunch of options in between. I'd be quite happy with an AGI Pause if it happened, I just don't think it's going to happen, the corporations are too powerful. I also think that some of the other proposals are strictly better while also being more politically feasible. (They are more complicated and easily corrupted though, which to me is the appeal of calling for a pause. Harder to get regulatory-captured than something more nuanced.)

14quila14h

(crossposting here to avoid trivial inconveniences)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Motivation gaps: Why so much EA criticism is hostile and lazy

titotal

This is a linkpost for https://titotal.substack.com/p/motivation-gaps-why-so-much-ea-criticism

Disclaimer: While I criticize several EA critics in this article, I am myself on the EA-skeptical side of things (especially on AI risk).

Introduction

I am a proud critic of effective altruism, and in particular a critic of AI existential risk, but I have to admit that a lot of the critcism of EA is hostile, or lazy, and is extremely unlikely to convince a believer.

Take this recent Leif Weinar time article as an example. I liked a few of the object level critiques, but many of the points were twisted, and the overall point was hopelessly muddled (are they trying to say that voluntourism is the solution here?). As people have noted, the piece was needlessly hostile to EA (and incredibly hostile to Will Macaskill in particular). And...

(Continue Reading – 5638 more words)

Nathan Young5h20

Good article.

It's an asymmetry worth pointing out.

It seems related to some concept of "low interest rate phenomenon in ideas". Sometimes in a low interest rate environment, people fund all sorts of stuff, because they want any return and credit is cheap. Later much of this looks bunk. Likewise, much EA behaviour around the plentiful money and status of the FTX era looks profligate by todays standards. In the same way I wonder what ideas are held up by some vague consensus rather than being good ideas.

2Nathan Young5h

Feels like there is something off about the following graph. Many people writing critiques care a lot. Émile spends a lot of time on their work for instance. I don't think motivation really catches what's going on. Epistemic status: generating theories I theorise it's two different effects in one: * The voices we hear in the discussion (which links to yours) * The norms of the communities holding those voices First, as you say, the voices we hear most are the most confident/motivated, which leaves out a lot of voices, many of whom might talk in a way we'd prefer. Instead we only hear from the fringes, which makes a normal distribution look bimodal. I wonder if this is more like supply and demand than your "bars" model. Ie it's not about crossing a bar but about supplying criticism that people demand. And correcting a status market - EA is too high status, let's fix it. Secondly, the edges of this normal distribution have different norms. Let's say there are 3 areas: * one likes steelmanning in disagreements * one likes making clear to be on the side of minorities * one likes being interesting Let's imagine we are discussing something that has people from all these areas. The people who like each of these things most strongly perhaps talk more, as in the above example. But not only do they talk more, they talk differently. So now the discussion is polarised in different languages, because the people in the middle are less confident and speak less (this jump feels like weakest step in the argument[1]) Amount of people with different views (central line is one group of people, who hold all views weakly) So now we have this: So I think probably my overall thing about why criticism is poor is something like "criticism looks poor to us because it isn't for us". It is for the people in the same communities by whom it is written. And probably to them our pieces look pretty poor as it is. Some questions then: * How do we respond in language that other

6ryan_greenblatt18h

I'm not sure that I buy that critics lack motivation. At least in the space of AI, there will be (and already are) people with immense financial incentive to ensure that x-risk concerns don't become very politically powerful. Of course, it might be that the best move for these critics won't be to write careful and well reasoned arguments for whatever reason (e.g. this would draw more attention to x-risk so ignoring it is better from their perspective). (I think critics in the space of GHW might lack motivation, but at least in AI and maybe animal welfare I would guess that "lack of motive" isn't a good description of what is going on.) Edit: this is mentioned in the post, but I'm a bit surprised because this isn't emphasized more. [Cross-posted from EAF]

2abstractapplic19h

Typos: "Al gore"->"Al Gore" "newpaper"->"newspaper" "south park"->"South Park" "scott alexander"->"Scott Alexander" "a littler deeper"->"a little deeper" "Ai"->"AI" (. . . I'm now really curious as to why you keep decapitalizing names and proper nouns.) Regarding the actual content of the post: appreciated, approved, and strong-upvoted. Thank you.

Open Thread Spring 2024

habryka

1mo

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

Anand Baburajan5h20

I like his UI. In fact, I shared about CQ2 with Andy in February since his notes site was the only other place where I had seen the sliding pane design. He said CQ2 is neat!

4habryka18h

This probably should be made more transparent, but the reason why these aren't in the library is because they don't have images for the sequence-item. We display all sequences that people create that have proper images on the library (otherwise we just show it on user's profiles).

My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms

222

Kaj_Sotala

Epistemic status: pretty confident. Based on several years of meditation experience combined with various pieces of Buddhist theory as popularized in various sources, including but not limited to books like The Mind Illuminated, Mastering the Core Teachings of the Buddha, and The Seeing That Frees; also discussions with other people who have practiced meditation, and scatterings of cognitive psychology papers that relate to the topic. The part that I’m the least confident of is the long-term nature of enlightenment; I’m speculating on what comes next based on what I’ve experienced, but have not actually had a full enlightenment. I also suspect that different kinds of traditions and practices may produce different kinds of enlightenment states.

While I liked Valentine’s recent post on kensho and its follow-ups a lot,...

(Continue Reading – 5061 more words)

Kaj_Sotala5h20

Based on the link, it seems you follow the Theravada tradition.

For what it's worth, I don't really follow any one tradition, though Culadasa does indeed have a Theravada background.

4Kaj_Sotala18h

Yeah, some Buddhist traditions do make those claims. The teachers and practitioners who I'm the most familiar with and trust the most tend to reject those models, sometimes quite strongly (e.g. Daniel Ingram here). Also near the end of his life, Culadasa came to think that even though it might at one point have seemed like he had predominantly positive emotions in the way that some schools suggested, in reality he had just been repressing them with harmful consequences. I'm guessing that something similar is what's actually happening for a lot of the schools claiming complete elimination of all negative feelings. Insight practices can be used in ways that end up bypassing or suppressing a lot of one's emotions, but actually negative feelings are still having effects in the person, they just go unnoticed. This disagrees with my experience, and with the experience of several other people I know.

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Abstract

Introduction

SAE Context and Terminology

Training

Sqrt(L1) SAEs

Introduction

Introduction

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA