LessWrong

12h

It seems to me worth trying to slow down AI development to steer successfully around the shoals of extinction and out to utopia.

But I was thinking lately: even if I didn’t think there was any chance of extinction risk, it might still be worth prioritizing a lot of care over moving at maximal speed. Because there are many different possible AI futures, and I think there’s a good chance that the initial direction affects the long term path, and different long term paths go to different places. The systems we build now will shape the next systems, and so forth. If the first human-level-ish AI is brain emulations, I expect a quite different sequence of events to if it is GPT-ish.

People genuinely pushing for AI speed over care (rather than just feeling impotent) apparently think there is negligible risk of bad outcomes, but also they are asking to take the first future to which there is a path. Yet possible futures are a large space, and arguably we are in a rare plateau where we could climb very different hills, and get to much better futures.

1EGI7h

What you are missing here is: * Existential risk apart from AI * People are dying / suffering as we hesitate Yes, there is a good argument that we need to solve alignment first to get ANY good outcome, but once an acceptable outcome is reasonably likely, hesitation is probably bad. Especially if you consider the likelihood that mere humans can accurately predict, let alone precisely steer a transhuman future.

3No77e6h

From a purely utilitarian standpoint, I'm inclined to think that the cost of delaying is dwarfed by the number of future lives saved by getting a better outcome, assuming that delaying does increase the chance of a better future. That said, after we know there's "no chance" of extinction risk, I don't think delaying would likely yield better future outcomes. On the contrary, I suspect getting the coordination necessary to delay means it's likely that we're giving up freedoms in a way that may reduce the value of the median future and increase the chance of stuff like totalitarian lock-in, which decreases the value of the average future overall. I think you're correct that there's also to balance the "other existential risks exist" consideration in the calculation, although I don't expect it to be clear-cut.

12David Hornbein10h

What is the mechanism, specifically, by which going slower will yield more "care"? What is the mechanism by which "care" will yield a better outcome? I see this model asserted pretty often, but no one ever spells out the details. I've studied the history of technological development in some depth, and I haven't seen anything to convince me that there's a tradeoff between development speed on the one hand, and good outcomes on the other.

Jonas Hallgren3m10

Disclaimer: I don't necessarily support this view, I thought about it for like 5 minutes but I thought it made sense.

If we were to do things the same thing as other slowing down of regulation, then that might make sense, but I'm uncertain that you can take the outside view here?

Yes, we can do the same as for other technologies by leaving it down to the standard government procedures to make legislation and then I might agree with you that slowing down might not lead to better outcomes. Yet, we don't have to do this. We can use other processes that mi... (read more)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition

cmathw, Dennis Akar, Lee Sharkey

17d

This work represents progress on removing attention head superposition. We are excited by this approach but acknowledge there are currently various limitations. In the short term, we will be working on adjacent problems are excited to collaborate with anyone thinking about similar things!

Produced as part of the ML Alignment & Theory Scholars Program - Summer 2023 Cohort

Summary: In transformer language models, attention head superposition makes it difficult to study the function of individual attention heads in isolation. We study a particular kind of attention head superposition that involves constructive and destructive interference between the outputs of different attention heads. We propose a novel architecture - a ‘gated attention block’ - which resolves this kind of attention head superposition in toy models. In future, we hope this architecture may be...

(Continue Reading – 4361 more words)

cmathw7m10

Thank you for the comment! Yep that is correct, I think perhaps variants of this approach could still be useful for resolving other forms of superposition within a single attention layer but not currently across different layers.

Examples of Highly Counterfactual Discoveries?

136

johnswentworth, kromem

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

(See More – 189 more words)

Algon13m20

Second most? What's the first? Linearization of a Newtonian V(r) about the earth's surface?

4ChristianKl1h

Merchants were a lot weaker in China than in Europe. Chinese merchants also did a lot less sea voyages due to geography. If a bunch of low-status merchants believed that the Earth is a sphere it might not have influenced Chinese high-class beliefs in the same way as beliefs of political powerful merchants in Europe.

2Garrett Baker2h

[edit: nevermind I see you already know about the following quotes. There's other evidence of the influence in Sedley's book I link below] In De Reum Natura around line 716: Or for a more modern translation from Sedley's Lucretius and the Transformation of Greek Wisdom

7Alexander Gietelink Oldenziel3h

Did I just say SLT is the Newtonian gravity of deep learning? Hubris of the highest order! But also yes... I think I am saying that * Singular Learning Theory is the first highly accurate model of breath of optima. * SLT tells us to look at a quantity Watanabe calls λ, which has the highly-technical name 'real log canonical threshold (RLCT). He proves several equivalent ways to describe it one of which is as the (fractal) volume scaling dimension around the optima. * By computing simple examples (see Shaowei's guide in the links below) you can check for yourself how the RLCT picks up on basin broadness. * The RLCT =λ first-order term for in-distribution generalization error and also Bayesian learning (technically the 'Bayesian free energy'). This justifies the name of 'learning coefficient' for lambda. I emphasize that these are mathematically precise statements that have complete proofs, not conjectures or intuitions. * Knowing a little SLT will inoculate you against many wrong theories of deep learning that abound in the literature. I won't be going in to it but suffice to say that any paper assuming that the Fischer information metric is regular for deep neural networks or any kind of hierarchichal structure is fundamentally flawed. And you can be sure this assumption is sneaked in all over the place. For instance, this is almost always the case when people talk about Laplace approximation. * It's one of the most computationally applicable ones we have? Yes. SLT quantities like the RLCT can be analytically computed for many statistical models of interest, correctly predicts phase transitions in toy neural networks and it can be estimated at scale. EDIT: no hype about future work. Wait and see ! :)

Changes in College Admissions

Zvi

This post brings together various questions about the college application process, as well as practical considerations of where to apply and go. We are seeing some encouraging developments, but mostly the situation remains rather terrible for all concerned.

Application Strategy and Difficulty

Paul Graham: Colleges that weren’t hard to get into when I was in HS are hard to get into now. The population has increased by 43%, but competition for elite colleges seems to have increased more. I think the reason is that there are more smart kids. If so that’s fortunate for America.

Are college applications getting more competitive over time?

Yes and no.

The population size is up, but the cohort size is roughly the same.
The standard ‘effort level’ of putting in work and sacrificing one’s childhood and gaming

...

(Continue Reading – 11522 more words)

Jasper Blank9h10

200s

*2000s

1xpym9h

So, the signalling value of their degrees should be decreasing accordingly, unless one mainly intends to take advantage of the process. Has some tangible evidence of that appeared already, and are alternative signalling opportunities emerging?

6Wei Dai16h

Some of my considerations for college choice for my kid, that I suspect others may also want to think more about or discuss: 1. status/signaling benefits for the parents (This is probably a major consideration for many parents to push their kids into elite schools. How much do you endorse it?) 2. sex ratio at the school and its effect on the local "dating culture" 3. political/ideological indoctrination by professors/peers 4. workload (having more/less time/energy to pursue one's own interests)

3Jacob G-W18h

I'm assuming the recent protests about the Gaza war: https://www.nytimes.com/live/2024/04/24/us/columbia-protests-mike-johnson

Improving Dictionary Learning with Gated Sparse Autoencoders

Neel Nanda

Ω 618m

This is a linkpost for https://arxiv.org/abs/2404.16014

Authors: Senthooran Rajamanoharan*, Arthur Conmy*, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda

A new paper from the Google DeepMind mech interp team: Improving Dictionary Learning with Gated Sparse Autoencoders!

Gated SAEs are a new Sparse Autoencoder architecture that seems to be a significant Pareto-improvement over normal SAEs, verified on models up to Gemma 7B. They are now our team's preferred way to train sparse autoencoders, and we'd love to see them adopted by the community! (Or to be convinced that it would be a bad idea for them to be adopted by the community!)

They achieve similar reconstruction with about half as many firing features, and while being either comparably or more interpretable (confidence interval for the increase is 0%-13%).

See Sen's Twitter summary, my Twitter summary, and the paper!

social lemon markets

bhauth

17h

This is a linkpost for https://www.bhauth.com/blog/culture/lemon%20markets.html

I refuse to join any club that would have me as a member.

— Groucho Marx

Alice and Carol are walking on the sidewalk in a large city, and end up together for a while.

"Hi, I'm Alice! What's your name?"

Carol thinks:

If Alice is trying to meet people this way, that means she doesn't have a much better option for meeting people, which reduces my estimate of the value of knowing Alice. That makes me skeptical of this whole interaction, which reduces the value of approaching me like this, and Alice should know this, which further reduces my estimate of Alice's other social options, which makes me even less interested in meeting Alice like this.

Carol might not think all of that consciously, but that's how human social reasoning tends to...

(See More – 809 more words)

FlorianH22m30

Assuming you're the first to explicitly point out that lemon market type of feature of 'random social interaction', kudos, I think it's a great way to express certain extremely common dynamics.

Anecdote from my country, where people ride trains all the time, fitting your description, although it takes a weird kind of extra 'excuse' in this case all the time: It would often feel weird to randomly talk to your seat neighbor, but ANY slightest excuse (sudden bump in the ride; info speaker malfunction; grumpy ticket collector; one weird word from a random perso... (read more)

11gjm8h

It looks to me as if, of the four "root causes of social relationships becoming more of a lemon market" listed in the OP, only one is actually anything to do with lemon-market-ness as such. The dynamic in a lemon market is that you have some initial fraction of lemons but it hardly matters what that is because the fraction of lemons quickly increases until there's nothing else, because buyers can't tell what they're getting. It's that last feature that makes the lemon market, not the initial fraction of lemons. And I think three of the four proposed "root causes" are about the initial fraction of lemons, not the difficulty of telling lemons from peaches. * urbanization: this one does seem to fit: it means that the people you're interacting with are much less likely to be ones you already know about, so you can't tell lemons from peaches. * drugs: this one is all about there being more lemons, because some people are addicts who just want to steal your stuff. * MLM schemes: again, this is "more lemons" rather than "less-discernible lemons". * screens: this is about raising the threshold below which any given potential interaction/relationship becomes a lemon (i.e., worse than the available alternative), so again it's "more lemons" not "less-discernible lemons". Note that I'm not saying that "drugs", "MLM", and "screens" aren't causes of increased social isolation, only that if they are the way they're doing it isn't quite by making social interactions more of a lemon market. (I think "screens" plausibly is a cause of increased social isolation. I'm not sure I buy that "drugs" and "MLM" are large enough effects to make much difference, but I could be convinced.) I like the "possible solutions" part of the article better than the section that tries to fit everything into the "lemon market" category, because it engages in more detail with the actual processes involved by actual considering possible scenarios in which acquaintances or friendships begin. When I th

3bhauth3h

You're mistaken about lemon markets: the initial fraction of lemons does matter. The number of lemon cars is fixed, and it imposes a sort of tax on transactions, but if that tax is low enough, it's still worth selling good cars. There's a threshold effect, a point at which most of the good items are suddenly driven out.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Paul Christiano named as US AI Safety Institute Head of AI Safety

248

Joel Burget

This is a linkpost for https://www.commerce.gov/news/press-releases/2024/04/us-commerce-secretary-gina-raimondo-announces-expansion-us-ai-safety

U.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards and Technology (NIST). Raimondo named Paul Christiano as Head of AI Safety, Adam Russell as Chief Vision Officer, Mara Campbell as Acting Chief Operating Officer and Chief of Staff, Rob Reich as Senior Advisor, and Mark Latonero as Head of International Engagement. They will join AISI Director Elizabeth Kelly and Chief Technology Officer Elham Tabassi, who were announced in February. The AISI was established within NIST at the direction of President Biden, including to support the responsibilities assigned to the Department of Commerce under the President’s landmark Executive Order.

Paul Christiano, Head of AI Safety, will design

...

(See More – 100 more words)

2Davidmanheim39m

Having written extensively about it, I promise you I'm aware. But please, tell me more about how this supports the original claim which I have been disagreeing with, that these class of incidents were or are the primary concern of the EA biosecurity community, the one that led to it being a cause area.

2Adam Scholl16h

I mean, I'm sure something more restrictive is possible. But my issue with BSL levels isn't that they include too few BSL-type restrictions, it's that "lists of restrictions" are a poor way of managing risk when the attack surface is enormous. I'm sure someday we'll figure out how to gain this information in a safer way—e.g., by running simulations of GoF experiments instead of literally building the dangerous thing—but at present, the best available safeguards aren't sufficient. I'm confused why you find this underspecified. I just meant "gain of function" in the standard, common-use sense—e.g., that used in the 2014 ban on federal funding for such research.

Davidmanheim25m20

I mean, I'm sure something more restrictive is possible.

But what? Should we insist that the entire time someone's inside a BSL-4 lab, we have a second person who is an expert in biosafety visually monitoring them to ensure they don't make mistakes? Or should their air supply not use filters and completely safe PAPRs, and feed them outside air though a tube that restricts their ability to move around instead?

Or do you have some new idea that isn't just a ban with more words?

"lists of restrictions" are a poor way of managing risk when the a

... (read more)

Magic by forgetting

avturchin

Epistemic – this post is more suitable for LW as it was 10 years ago

Thought experiment with curing a disease by forgetting

Imagine I have a bad but rare disease X. I may try to escape it in the following way:

1. I enter the blank state of mind and forget that I had X.

2. Now I in some sense merge with a very large number of my (semi)copies in parallel worlds who do the same. I will be in the same state of mind as other my copies, some of them have disease X, but most don’t.

3. Now I can use self-sampling assumption for observer-moments (Strong SSA) and think that I am randomly selected from all these exactly the same observer-moments.

4. Based on this, the chances that my next observer-moment after...

(Continue Reading – 1099 more words)

avturchin33m20

The "repeating" will not be repeating from internal point of view of a person, as he has completely erased the memories of the first attempt. So he will do it as if it is first time.

2avturchin2h

Yes, here we can define magic as "ability to manipulate one's reference class". And special minds may be much more adapted to it.

2avturchin3h

Presumably in deep meditation people become disconnected from reality.

2Dagon2h

Only metaphorically, not really disconnected. In truth, in deep meditation, the conscious attention is not focused on physical perceptions, but that mind is still contained in and part of the same reality. This may be the primary crux of my disagreement with the post. People are part of reality, not just connected to it. Dualism is false, there is no non-physical part of being. The thing that has experiences, thoughts, and qualia is a bounded segment of the universe, not a thing separate or separable from it.

This is Water by David Foster Wallace

Nathan Young

22h

This is a linkpost for https://fs.blog/david-foster-wallace-this-is-water/

Note: It seems like great essays should go here and be fed through the standard LessWrong algorithm. There is possibly a copyright issue here, but we aren't making any money off it either. What follows is a full copy of "This is Water" by David Foster Wallace his 2005 commencement speech to the graduating class at Kenyon College.

Greetings parents and congratulations to Kenyon’s graduating class of 2005. There are these two young fish swimming along and they happen to meet an older fish swimming the other way, who nods at them and says “Morning, boys. How’s the water?” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes “What the hell is water?”

This is...

(Continue Reading – 3653 more words)

mic41m10

If you worship money and things, if they are where you tap real meaning in life, then you will never have enough, never feel you have enough. It’s the truth.

Worship your impact and you will always you feel you are not doing enough.

3mic43m

I think I disagree with the first HN comment here. I personally find that my thoughts and actions have a significant influence over whether I am experiencing a positive or negative feeling. If I find that most times I go to the grocery store, I have profoundly negative thoughts about the people around me who are just doing normal things, probably I should figure out how to think more positively about the situation. Thinking positively isn't always possible, and in cases where you can't escape a negative feeling like sadness, sometimes it is best to accept the feeling and appreciate it for what it is. But I think it really is possible to transform your emotions through your thinking, rather than being helpless to a barrage of negative feelings.

2JenniferRM4h

I wonder what he would have thought was the downside of worshiping a longer list of things... For the things mentioned, it feels like he thinks "if you worship X then the absence of X will be constantly salient to you in most moments of your life". It seems like he claims that worshiping some version of Goodness won't eat you alive, but in my experiments with that, I've found that generic Goodness Entities are usually hungry for martyrs, and almost literally try to get would-be saints to "give their all" (in some sense "eating" them). As near as I can tell, it is an unkindness to exhort the rare sort of person who is actually self-editing and scrupulous enough to even understand or apply the injunction in that direction without combining it with an injunction that success in this direction will lead to altruistic self harm unless you make the demands of Goodness "compact" in some way. Zvi mentions ethics explicitly so I'm pretty sure readings of this sort are "intended". So consider (IF you've decided to try to worship an ethical entity) that one should eventually get ready to follow Zvi's advice in "Out To Get You" for formalized/externalized ethics itself so you can enforce some boundaries on whatever angel you summon (and remember, demons usually claim to be angels (and in the current zeitgeist it is SO WEIRD that so many "scientific rationalists" believe in demons without believing in angels as well)). Anyway. Compactification (which is possibly the same thing as "converting dangerous utility functions into safe formulas for satisficing"): For myself, I have so far found it much easier to worship wisdom than pure benevolence. Noticing ways that I am a fool is kinda funny. There are a lot of them! So many that patching each such gap would be an endless exercise! The wise thing, of course, would be to prioritize which foolishnesses are most prudent to patch, at which times. A nice thing here is that wisdom basically assimilates all valid criticism as helpful

2cousin_it9h

I think for good emotions the feel-it-completely thing happens naturally anyway.

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Application Strategy and Difficulty

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA