LessWrong

LessWrong's (first) album: I Have Been A Good Bing

517

25d

tl;dr: LessWrong released an album! Listen to it now on Spotify, YouTube, YouTube Music, or Apple Music.

On April 1st 2024, the LessWrong team released an album using the then-most-recent AI music generation. All the music is fully AI-generated, and the lyrics are adapted (mostly by humans) from LessWrong posts (or other writing LessWrongers might be familiar with).

We made probably 3,000-4,000 song generations to get the 15 we felt happy about, which I think works out to about 5-10 hours of work per song we used (including all the dead ends and things that never worked out).

The album is called I Have Been A Good Bing. I think it is a pretty fun album and maybe you'd enjoy it if you listened to it! Some of my favourites are...

(Continue Reading – 2984 more words)

keltan13m10

I agree! I’ve been writing then generating my own LW inspired songs now.

I wish it was common for LW posts to have accompanying songs now.

Improving Dictionary Learning with Gated Sparse Autoencoders

Neel Nanda, Senthooran Rajamanoharan, Arthur Conmy, lsgos, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah

Ω 259h

This is a linkpost for https://arxiv.org/abs/2404.16014

Authors: Senthooran Rajamanoharan*, Arthur Conmy*, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda

A new paper from the Google DeepMind mech interp team: Improving Dictionary Learning with Gated Sparse Autoencoders!

Gated SAEs are a new Sparse Autoencoder architecture that seems to be a significant Pareto-improvement over normal SAEs, verified on models up to Gemma 7B. They are now our team's preferred way to train sparse autoencoders, and we'd love to see them adopted by the community! (Or to be convinced that it would be a bad idea for them to be adopted by the community!)

They achieve similar reconstruction with about half as many firing features, and while being either comparably or more interpretable (confidence interval for the increase is 0%-13%).

See Sen's Twitter summary, my Twitter summary, and the paper!

5leogao2h

Great paper! The gating approach is an interesting way to learn the JumpReLU threshold and it's exciting that it works well. We've been working on some related directions at OpenAI based on similar intuitions about feature shrinking. Some questions: * Is b_mag still necessary in the gated autoencoder? * Did you sweep learning rates for the baseline and your approach? * How large is the dictionary of the autoencoder?

Neel Nanda37mΩ340

Re dictionary width, 2**17 (~131K) for most Gated SAEs, 3*(2**16) for baseline SAEs, except for the (Pythia-2.8B, Residual Stream) sites we used 2**15 for Gated and 3*(2**14) for baseline since early runs of these had lots of feature death. (This'll be added to the paper soon, sorry!). I'll leave the other Qs for my co-authors

1fvncc3h

Hi any idea how this would compare to just replacing the l1 loss with a smoothed l0 loss function? Something like ∑log(1+a|x|) (summed across the sparse representation).

2Sam Marks3h

Yep, you're totally right -- thanks!

Fermi Estimates

115

lukeprog

11y

Just before the Trinity test, Enrico Fermi decided he wanted a rough estimate of the blast's power before the diagnostic data came in. So he dropped some pieces of paper from his hand as the blast wave passed him, and used this to estimate that the blast was equivalent to 10 kilotons of TNT. His guess was remarkably accurate for having so little data: the true answer turned out to be 20 kilotons of TNT.

Fermi had a knack for making roughly-accurate estimates with very little data, and therefore such an estimate is known today as a Fermi estimate.

Why bother with Fermi estimates, if your estimates are likely to be off by a factor of 2 or even 10? Often, getting an estimate within a factor of 10...

(Continue Reading – 3822 more words)

keltan1h10

To help remember this post and it's methods I broke it down into song lyrics and used Udio to make the song.

Losing Faith In Contrarianism

omnizoid

Crosspost from my blog.

If you spend a lot of time in the blogosphere, you’ll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you’ll probably have heard of Yudkowsky say that dieting doesn’t really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn’t improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn’t work, and various other people expressing contrarian views. Often, very smart people—like Robin Hanson—will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don’t really know what to think about them.

For...

(Continue Reading – 1290 more words)

Said Achmiz1h20

Similarly, the lab leak theory—one of the more widely accepted and plausible contrarian views—also doesn’t survive careful scrutiny. It’s easy to think it’s probably right when your perception is that the disagreement is between people like Saar Wilf and government bureaucrats like Fauci. But when you realize that some of the anti-lab leak people are obsessive autists who have studied the topic a truly mind-boggling amount, and don’t have any social or financial stake in the outcome, it’s hard to be confident that they’re wrong.

This is a very poor concl... (read more)

6Shankar Sivarajan2h

I doubt you could have picked a worse example to make your point that contrarian takes are usually wrong than racial differences in IQ/intelligence.

6Logan Zoellner2h

If it's this piece, I would be interested to know why you found it convincing. He doesn't address (or seem to have even read) any of Brian's arguments. His argument basically boils down to "but so many people who work for universities think it's good".

5ChristianKl2h

The linked article says: So the linked article says that Steve Sailer and Emil Kierkegaard are right when they say that there are racial gaps in intelligence based on genetics. Basically, he says there's a gap but wants to debate about its size.

Paul Christiano named as US AI Safety Institute Head of AI Safety

248

Joel Burget

This is a linkpost for https://www.commerce.gov/news/press-releases/2024/04/us-commerce-secretary-gina-raimondo-announces-expansion-us-ai-safety

U.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards and Technology (NIST). Raimondo named Paul Christiano as Head of AI Safety, Adam Russell as Chief Vision Officer, Mara Campbell as Acting Chief Operating Officer and Chief of Staff, Rob Reich as Senior Advisor, and Mark Latonero as Head of International Engagement. They will join AISI Director Elizabeth Kelly and Chief Technology Officer Elham Tabassi, who were announced in February. The AISI was established within NIST at the direction of President Biden, including to support the responsibilities assigned to the Department of Commerce under the President’s landmark Executive Order.

Paul Christiano, Head of AI Safety, will design

...

(See More – 100 more words)

2Davidmanheim9h

But what? Should we insist that the entire time someone's inside a BSL-4 lab, we have a second person who is an expert in biosafety visually monitoring them to ensure they don't make mistakes? Or should their air supply not use filters and completely safe PAPRs, and feed them outside air though a tube that restricts their ability to move around instead? Or do you have some new idea that isn't just a ban with more words? Sure, list-based approaches are insufficient, but they have relatively little to do with biosafety levels of labs, they have to do with risk groups, which are distinct, but often conflated. (So Ebola or Smallpox isn't a "BSL-4" pathogen, because there is no such thing. ) That ban didn't go far enough, since it only applied to 3 pathogen types, and wouldn't have banned what Wuhan was doing with novel viruses, since that wasn't working with SARS or MERS, it was working with other species of virus. So sure, we could enforce a broader version of that ban, but getting a good definition that's both extensive enough to prevent dangerous work and that doesn't ban obviously useful research is very hard.

2Davidmanheim9h

Having written extensively about it, I promise you I'm aware. But please, tell me more about how this supports the original claim which I have been disagreeing with, that these class of incidents were or are the primary concern of the EA biosecurity community, the one that led to it being a cause area.

Adam Scholl2h20

This thread seems unproductive to me, so I'm going to bow out after this. But in case you're actually curious: at least in the case of Open Philanthropy, it's easy to check what their primary concerns are because they write them up. And accidental release from dual use research is one of them.

2aysja4h

I agree there other problems the EA biosecurity community focuses on, but surely lab escapes are one of those problems, and part of the reason we need biosecurity measures? In any case, this disagreement seems beside the main point that I took Adam to be making, namely that the track record for defining appropriate units of risk for poorly understood, high attack surface domains is quite bad (as with BSL). This still seems true to me.

Examples of Highly Counterfactual Discoveries?

139

johnswentworth, kromem

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

(See More – 189 more words)

kromem2h10

Though the Greeks actually credited the idea to an even earlier Phonecian, Mochus of Sidon.

Through when it comes to antiquity credit isn't really "first to publish" as much as "first of the last to pass the survivorship filter."

2Lucius Bushnaq4h

Clarification: The 'derivation' for how the RLCT predicts generalization error IIRC goes through the same flavour of argument as the one the derivation of the vanilla Bayesian Information Criterion uses. I don't like this derivation very much. See e.g. this one on Wikipedia. So what it's actually showing is just that: 1. If you've got a class of different hypotheses M, containing many individual hypotheses {θ1,θ2,…θN} . 2. And you've got a prior ahead of time that says the chance any one of the hypotheses in M is true is some number p(M)<1., let's say it's p(M)=0.8 as an example. 3. And you distribute this total probability p(M)=0.8 around the different hypotheses in an even-ish way, so p(θi,M)∝1N, roughly. 4. And then you encounter a bunch of data X (the training data) and find that only one or a tiny handful of hypotheses in M fit that data, so p(X|θi,M)≠0 for basically only one hypotheses θi... 5. Then your posterior probability p(M|X)=p(X|M)0.80.8p(X|M)+0.2p(X|¬M) that the hypothesis θi is correct will probably be tiny, scaling with 1N. If we spread your prior p(M)=0.8 over lots of hypotheses, there isn't a whole lot of prior to go around for any single hypothesis. So if you then encounter data that discredits all hypotheses in M except one, that tiny bit of spread-out prior for that one hypothesis will make up a tiny fraction of the posterior, unless p(X|¬M) is really small, i.e. no hypothesis outside the set M can explain the data either. So if our hypotheses correspond to different function fits (one for each parameter configuration, meaning we'd have 232k hypotheses if our function fits used k 32-bit floating point numbers), the chance we put on any one of the function fits being correct will be tiny. So having more parameters is bad, because the way we picked our prior means our belief in any one hypothesis goes to zero as N goes to infinity. So the Wikipedia derivation for the original vanilla posterior of model selection is telling us that havin

1cubefox6h

What's more likely: You being wrong about the obviousness of the sphere Earth theory to sailors, or the entire written record (which included information from people who had extensive access to the sea) of two thousand years of Chinese history and astronomy somehow ommitting the spherical Earth theory? Not to speak of other pre-Hellenistic seafaring cultures which also lack records of having discovered the sphere Earth theory.

4Lucius Bushnaq7h

It's measuring the volume of points in parameter space with loss <ϵ when ϵ is infinitesimal. This is slightly tricky because it doesn't restrict itself to bounded parameter spaces,[1] but you can fix it with a technicality by considering how the volume scales with ϵ instead. In real networks trained with finite amounts of data, you care about the case where ϵ is small but finite, so this is ultimately inferior to just measuring how many configurations of floating point numbers get loss <ϵ, if you can manage that. I still think SLT has some neat insights that helped me deconfuse myself about networks. For example, like lots of people, I used to think you could maybe estimate the volume of basins with loss <ϵ using just the eigenvalues of the Hessian. You can't. At least not in general. 1. ^ Like the floating point numbers in a real network, which can only get so large. A prior of finite width over the parameters also effectively bounds the space

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

The first future and the best future

KatjaGrace

21h

It seems to me worth trying to slow down AI development to steer successfully around the shoals of extinction and out to utopia.

But I was thinking lately: even if I didn’t think there was any chance of extinction risk, it might still be worth prioritizing a lot of care over moving at maximal speed. Because there are many different possible AI futures, and I think there’s a good chance that the initial direction affects the long term path, and different long term paths go to different places. The systems we build now will shape the next systems, and so forth. If the first human-level-ish AI is brain emulations, I expect a quite different sequence of events to if it is GPT-ish.

People genuinely pushing for AI speed over care (rather than just feeling impotent) apparently think there is negligible risk of bad outcomes, but also they are asking to take the first future to which there is a path. Yet possible futures are a large space, and arguably we are in a rare plateau where we could climb very different hills, and get to much better futures.

Logan Zoellner2h1-2

What plateau? Why pause now (vs say 10 years ago)? Why not wait until after the singularity and impose a "long reflection" when we will be in an exponentially better place to consider such questions.
Singularity 5-10 years from now vs 15-20 years from now determines whether or not some people I personally know and care about will be alive.
Every second we delay the singularity leads to a "cosmic waste" as millions more galaxies move permanently behind the event horizon defined by the expanding universe
Slower is not prima facia safer. To the

... (read more)

1Bill Benzon7h

YES. At the moment the A.I. world is dominated by an almost magical believe in large language models. Yes, they are marvelous, a very powerful technology. By all means, let's understand and develop them. But they aren't the way, the truth and the light. They're just a very powerful and important technology. Heavy investment in them has an opportunity cost, less money to invest in other architectures and ideas. And I'm not just talking about software, chips, and infrastructure. I'm talking about education and training. It's not good to have a whole cohort of researchers and practitioners who know little or nothing beyond the current orthodoxy about machine learning and LLMs. That kind of mistake is very difficult to correct in the future. Why? Because correcting it means education and training. Who's going to do it if no one knows anything else? Moreover, in order to exploit LLMs effectively we need to understand how they work. Mechanistic interpretability is one approach. But: We're not doing enough of it. And by itself it won't do the job. People need to know more about language, linguistics, and cognition in order to understand what those models are doing.

4Matthew Barnett7h

Do you think it's worth slowing down other technologies to ensure that we push for care in how we use them over the benefit of speed? It's true that the stakes are lower for other technologies, but that mostly just means that both the upside potential and the downside risks are lower compared to AI, which doesn't by itself imply that we should go quickly.

1Jonas Hallgren8h

Disclaimer: I don't necessarily support this view, I thought about it for like 5 minutes but I thought it made sense. If we were to do things the same thing as other slowing down of regulation, then that might make sense, but I'm uncertain that you can take the outside view here? Yes, we can do the same as for other technologies by leaving it down to the standard government procedures to make legislation and then I might agree with you that slowing down might not lead to better outcomes. Yet, we don't have to do this. We can use other processes that might lead to a lot better decisions. Like what about proper value sampling techniques like digital liquid democracy? I think we can do a lot better than we have in the past by thinking about what mechanism we want to use. Also, for some potential examples, I thought of cloning technology in like the last 5 min. If we just went full-speed with that tech then things would probably have turned out badly?

My experience using financial commitments to overcome akrasia

100

William Howard

10d

About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it’s relatively simple, you set single tasks which you have to verify you have completed with a photo.

I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest positive change to my life that I can remember. I feel like the category of things which benefit from careful planning and execution over time has completely opened up to me, whereas previously things like this would be largely down to the...

(Continue Reading – 5230 more words)

Fer32dwt34r3dfsz2h10

it very much depends on where the user came from

Can you provide any further detail here, i.e. be more specific on origin-stratified-retention rates? (I would appreciate this, even if this might require some additional effort searching)

1quiet_NaN10h

In the subagent view, a financial precommitment another subagent has arranged for the sole purpose of coercing you into one course of action is a threat. Plenty of branches of decision theory advise you to disregard threats because consistently doing so will mean that instances of you will more rarely find themselves in the position to be threatened. Of course, one can discuss how rational these subagents are in the first place. The "stay in bed, watch netflix and eat potato chips" subagent is probably not very concerned with high level abstract planning and might have a bad discount function for future benefits and not be overall that interested in the utility he get from being principled.

1quiet_NaN10h

To whomever overall-downvoted this comment, I do not think that this is a troll. Being a depressed person, I can totally see this being real. Personally, I would try to start slow with positive reinforcement. If video games are the only thing which you can get yourself to do, start there. Try to do something intellectually interesting in them. Implement a four bit adder in dwarf fortress using cat logic. Play KSP with the Principia mod. Write a mod for a game. Use math or Monte Carlo simulations to figure out the best way to accomplish something in a video game even if it will take ten times longer than just taking a non-optimal route. Some of my proudest intellectual accomplishments are in projects which have zero bearing on the real world. (Of course, I am one to talk right now. Spending five hours playing Rimworld in a not-terrible-clever way for every hour I work on my thesis.)

2CronoDAS4h

My depression is currently well-controlled at the moment, and I actually have found various methods to help me get things done, since I don't respond well to the simplest versions of carrot-and-stick methods. The most pleasant is finding someone else to do it with me (or at least act involved while I do the actual work). On the other hand, there have been times when procrastinating actually gives me a thrill, like I'm getting away with something. Mediocre video games become much more appealing when I have work to avoid.

MichaelDickens's Shortform

MichaelDickens

MichaelDickens2h10

Have there been any great discoveries made by someone who wasn't particularly smart?

This seems worth knowing if you're considering pursuing a career with a low chance of high impact. Is there any hope for relatively ordinary people (like the average LW reader) to make great discoveries?

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA