19h

A few days ago I came upstairs to:

Me: how did you get in there?
Nora: all by myself!

Either we needed to be done with the crib, which had a good chance of much less sleeping at naptime, or we needed a taller crib. This is also something we went through when Lily was little, and that time what worked was removing the bottom of the crib.

It's a basic crib, a lot like this one. The mattress sits on a metal frame, which attaches to a set of holes along the side of the crib. On it's lowest setting, the mattress is still ~6" above the floor. Which means if we remove the frame and sit the mattress on the floor, we gain ~6".

Without the mattress weighing it down, though, the crib...

(See More – 64 more words)

kithpendragon5m10

That ought to buy you a couple weeks, anyway. ;)

Any pinching concern with those straps?

Helsinki Rationalish

Helsinki Rationalish - May 2024 Meetup (part of Meetups Everywhere)

May 7thMannerheimintie 5, Helsinki

jn2

As part of Spring Meetups Everywhere 2024 (https://www.astralcodexten.com/p/spring-meetups-everywhere-2024), we're having another meetup.

Time: Tuesday 7.5.2024, 18:00 onwards

Place: Kitty's Public House, Mannerheimintie 5, Helsinki

How to find us: We'll be in the private room called Kitty's Lounge, find it and come in.

See you there!

Why I'm doing PauseAI

Joseph Miller

GPT-5 training is probably starting around now. It seems very unlikely that GPT-5 will cause the end of the world. But it’s hard to be sure. I would guess that GPT-5 is more likely to kill me than an asteroid, a supervolcano, a plane crash or a brain tumor. We can predict fairly well what the cross-entropy loss will be, but pretty much nothing else.

Maybe we will suddenly discover that the difference between GPT-4 and superhuman level is actually quite small. Maybe GPT-5 will be extremely good at interpretability, such that it can recursively self improve by rewriting its own weights.

Hopefully model evaluations can catch catastrophic risks before wide deployment, but again, it’s hard to be sure. GPT-5 could plausibly be devious enough to circumvent all of...

(See More – 955 more words)

1quiet_NaN2h

I am by no means an expert on machine learning, but this sentence reads weird to me. I mean, it seems possible that a part of a NN develops some self-reinforcing feature which uses the gradient descent (or whatever is used in training) to go into a particular direction and take over the NN, like a human adrift on a raft in the ocean might decide to build a sail to make the raft go into a particular direction. Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it's own weights? Personally, I think that if GPT-5 is the point of no return, it is more likely that it is because it would be smart enough to actually help advance AI after it is trained. While improving semiconductors seems hard and would require a lot of work in the real world done with human cooperation, finding better NN architectures and training algorithms seems like something well in the realm of the possible, if not exactly plausible. So if I had to guess how GPT-5 might doom humanity, I would say that in a few million instance-hours it figures out how to train LLMs of its own power for 1/100th of the cost, and this information becomes public. The budgets of institutions which might train NN probably follows some power law, so if training cutting edge LLMs becomes a hundred times cheaper, the number of institutions which could build cutting edge LLMs becomes many orders of magnitude higher -- unless the big players go full steam ahead towards a paperclip maximizer, of course. This likely mean that voluntary coordination (if that was ever on the table) becomes impossible. And setting up a worldwide authoritarian system to impose limits would also be both distasteful and difficult.

Joseph Miller22m10

Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it's own weights?

I was thinking of a scenario where OpenAI deliberately gives it access to its own weights to see if it can self improve.

I agree that it would be more like to just speed up normal ML research.

2Nathan Helm-Burger15h

I absolutely sympathize, and I agree that with the world view / information you have that advocating for a pause makes sense. I would get behind 'regulate AI' or 'regulate AGI', certainly. I think though that pausing is an incorrect strategy which would do more harm than good, so despite being aligned with you in being concerned about AGI dangers, I don't endorse that strategy. Some part of me thinks this oughtn't matter, since there's approximately ~0% chance of the movement achieving that literal goal. The point is to build an anti-AGI movement, and to get people thinking about what it would be like to be able to have the government able to issue an order to pause AGI R&D, or turn off datacenters, or whatever. I think that's a good aim, and your protests probably (slightly) help that aim. I'm still hung up on the literal 'Pause AI' concept being a problem though. Here's where I'm coming from: 1. I've been analyzing the risks of current day AI. I believe (but will not offer evidence for here) current day AI is already capable of providing small-but-meaningful uplift to bad actors intending to use it for harm (e.g. weapon development). I think that having stronger AI in the hands of government agencies designed to protect humanity from these harms is one of our best chances at preventing such harms. 2. I see the 'Pause AI' movement as being targeted mostly at large companies, since I don't see any plausible way for a government or a protest movement to enforce what private individuals do with their home computers. Perhaps you think this is fine because you think that most of the future dangers posed by AI derive from actions taken by large companies or organizations with large amounts of compute. This is emphatically not my view. I think that actually more danger comes from the many independent researchers and hobbyists who are exploring the problem space. I believe there are huge algorithmic power gains which can, and eventually will, be found. I furthermore

1yanni kyriacos16h

Hi Tomás! is there a prediction market for this that you know of?

Mechanistically Eliciting Latent Behaviors in Language Models

142

Andrew Mack, TurnTrout

Ω 734d

Produced as part of the MATS Winter 2024 program, under the mentorship of Alex Turner (TurnTrout).

TL,DR: I introduce a method for eliciting latent behaviors in language models by learning unsupervised perturbations of an early layer of an LLM. These perturbations are trained to maximize changes in downstream activations. The method discovers diverse and meaningful behaviors with just one prompt, including perturbations overriding safety training, eliciting backdoored behaviors and uncovering latent capabilities.

Summary In the simplest case, the unsupervised perturbations I learn are given by unsupervised steering vectors - vectors added to the residual stream as a bias term in the MLP outputs of a given layer. I also report preliminary results on unsupervised steering adapters - these are LoRA adapters of the MLP output weights of a given...

(Continue Reading – 13417 more words)

RGRGRG29m10

Enjoyed this post! Quick question about obtaining the steering vectors:

Do you train them one at a time, possibly adding an additional orthogonality constraint between each train?

1Bogdan Ionut Cirstea9h

TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space seems to be using a contrastive approach for steering vectors (I've only skimmed though), it might be worth having a look.

Now THIS is forecasting: understanding Epoch’s Direct Approach

Elliot_Mckernon, Zershaaneh Qureshi

Happy May the 4th from Convergence Analysis! Cross-posted on the EA Forum.

As part of Convergence Analysis’s scenario research, we’ve been looking into how AI organisations, experts, and forecasters make predictions about the future of AI. In February 2023, the AI research institute Epoch published a report in which its authors use neural scaling laws to make quantitative predictions about when AI will reach human-level performance and become transformative. The report has a corresponding blog post, an interactive model, and a Python notebook.

We found this approach really interesting, but also hard to understand intuitively. While trying to follow how the authors derive a forecast from their assumptions, we wrote a breakdown that may be useful to others thinking about AI timelines and forecasting.

In what follows, we set out our interpretation of...

(Continue Reading – 5526 more words)

Chris_Leong45m20

Under the current version of the interactive model, its median prediction is just two decades earlier than that from Cotra’s forecast

Just?

3NunoSempere5h

You might also enjoy this review: https://nunosempere.com/blog/2023/04/28/expert-review-epoch-direct-approach/

Transcoders enable fine-grained interpretable circuit analysis for language models

Jacob Dunefsky, Philippe Chlenski, Neel Nanda

Ω 304d

Summary

We present a method for performing circuit analysis on language models using "transcoders," an occasionally-discussed variant of SAEs that provide an interpretable approximation to MLP sublayers' computations. Transcoders are exciting because they allow us not only to interpret the output of MLP sublayers but also to decompose the MLPs themselves into interpretable computations. In contrast, SAEs only allow us to interpret the output of MLP sublayers and not how they were computed.
We demonstrate that transcoders achieve similar performance to SAEs (when measured via fidelity/sparsity metrics) and that the features learned by transcoders are interpretable.
One of the strong points of transcoders is that they decompose the function of an MLP layer into sparse, independently-varying, and meaningful units (like neurons were originally intended to be before superposition was discovered).

...

(Continue Reading – 4944 more words)

RGRGRG1h10

Question about the "rules of the game" you present. Are you allowed to simply look at layer 0 transcoder features for the final 10 tokens - you could probably roughly estimate the input string from these features' top activators. From you case study, it seems that you effectively look at layer 0 transcoder features for a few of the final tokens through a backwards search, but wonder if you can skip the search and simply look at transcoder features. Thank you.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

William_S's Shortform

William_S

Ω 31y

James Payor1hΩ110

By "gag order" do you mean just as a matter of private agreement, or something heavier-handed, with e.g. potential criminal consequences?

I have trouble understanding the absolute silence we seem to be having. There seem to be very few leaks, and all of them are very mild-mannered and are failing to build any consensus narrative that challenges OA's press in the public sphere.

Are people not able to share info over Signal or otherwise tolerate some risk here? It doesn't add up to me if the risk is just some chance of OA trying to then sue you to bankruptcy, ... (read more)

4Martín Soto10h

What's PPU?

6MondSemmel5h

From here:

3mishka12h

No, OpenAI (assuming that it is a well-defined entity) also uses a probability distribution over timelines. (In reality, every member of its leadership has their own probability distribution, and this translates to OpenAI having a policy and behavior formulated approximately as if there is some resulting single probability distribution). The important thing is, they are uncertain about timelines themselves (in part, because no one knows how perplexity translates to capabilities, in part, because there might be difference with respect to capabilities even with the same perplexity, if the underlying architectures are different (e.g. in-context learning might depend on architecture even with fixed perplexity, and we do see a stream of potentially very interesting architectural innovations recently), in part, because it's not clear how big is the potential of "harness"/"scaffolding", and so on). This does not mean there is no political infighting. But it's on the background of them being correctly uncertain about true timelines... ---------------------------------------- Compute-wise, inference demands are huge and growing with popularity of the models (look how much Facebook did to make LLama 3 more inference-efficient). So if they expect models to become useful enough for almost everyone to want to use them, they should worry about compute, assuming they do want to serve people like they say they do (I am not sure how this looks for very strong AI systems; they will probably be gradually expanding access, and the speed of expansion might depend).

Northampton Astral Codex Ten Meetup

Northampton, MA ACX Meetup: May 11, 2024

May 11th100 Black Birch Trail, Northampton

Alex Liebowitz

[If you haven't come since we started meeting at Rocky Hill Cohousing, make sure to read this for more details about where to go and park.]

We're the regular Northampton area meetup for Astral Codex Ten readers, and (as far as I know) the only rationalist or EA meetup in Western Massachusetts.

We started as part of the blog's 2018 "Meetups Everywhere" event, and have been holding meetups with varying degrees of regularity ever since. At most meetups we get about 4-7 people out of a rotation of 15-20, with a nice mix of regular faces, people who only drop in once in a while, and occasionally total newcomers.

After meeting more sporadically in the past, we recently started meeting biweekly (currently every other Saturday, although there's a chance I'll...

(See More – 255 more words)

Introducing AI-Powered Audiobooks of Rational Fiction Classics

Askwho

(ElevenLabs reading of this post, if anyone can tell me how to embed the audio into Lesswrong I'd appreciate it)

I'm excited to share a project I've been working on that I think many in the Lesswrong community will appreciate - converting some rational fiction into high-quality audiobooks using cutting-edge AI voice technology from ElevenLabs, under the name "Askwho Casts AI".

The keystone of this project is an audiobook version of Planecrash (AKA Project Lawful), the epic glowfic authored by Eliezer Yudkowsky and Lintamande. Given the scope and scale of this work, with its large cast of characters, I'm using ElevenLabs to give each character their own distinct voice. It's a labor of love to convert this audiobook version of this story, and I hope if anyone has bounced...

(See More – 147 more words)

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Summary

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA