Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell?

107

10mo

On June 22nd, there was a “Munk Debate”, facilitated by the Canadian Aurea Foundation, on the question whether “AI research and development poses an existential threat” (you can watch it here, which I highly recommend). On stage were Yoshua Bengio and Max Tegmark as proponents and Yann LeCun and Melanie Mitchell as opponents of the central thesis. This seems like an excellent opportunity to compare their arguments and the effects they had on the audience, in particular because in the Munk Debate format, the audience gets to vote on the issue before and after the debate.

The vote at the beginning revealed 67% of the audience being pro the existential threat hypothesis and 33% against it. Interestingly, it was also asked if the listeners were prepared to change...

(Continue Reading – 1883 more words)

whestler2m10

I wasn't able to find the full video on the site you linked, but I found it here, if anyone else has the same issue:

Deep Honesty

Aletheophile

17h

Most people avoid saying literally false things, especially if those could be audited, like making up facts or credentials. The reasons for this are both moral and pragmatic — being caught out looks really bad, and sustaining lies is quite hard, especially over time. Let’s call the habit of not saying things you know to be false ‘shallow honesty’^[1].

Often when people are shallowly honest, they still choose what true things they say in a kind of locally act-consequentialist way, to try to bring about some outcome. Maybe something they want for themselves (e.g. convincing their friends to see a particular movie), or something they truly believe is good (e.g. causing their friend to vote for the candidate they think will be better for the country).

Either way, if you...

(Continue Reading – 2517 more words)

the gears to ascension1h20

being able to credibly commit to doing this at appropriate times seems useful. I wouldn't want to commit to doing it at all times; becoming cooperatebot makes it rational for cooperative-but-preference-misaligned actors to exploit you. Shallow honesty seems like a good starting point for being able to say when you are attempting to be deep honest, perhaps. But for example, I would sure appreciate it if people could be less deeply honest about the path to ai capabilities. I do think the "deeply honest at the meta level" thing has some promise.

12sapphire6h

Im a real fan of insane ideas. I literally do Acid every monday. But I gotta say among crazy ideas 'be way way more honest' is well trodden ground and the skulls are numerous. It just really rarely goes well. Im a pretty honest guy and am attracted to the cluster. But if you start doing this you are definitely trying something in a cluster of ideas that usually works terribly. If anything I have to constantly tell myself to be less explicit and 'deeply honest'. It just doesnt work well for most people.

3zeshen9h

Scott Alexander talked about explicit honesty (unfortunately paywalled) in contrast with radical honesty. In short, explicit honesty is being completely honest when asked, and radical honesty is being completely honest even without being asked. From what I understand from your post, it feels like deep honesty is about being completely honest about information you perceive to be relevant to the receiver, regardless of whether the information is explicitly being requested. Scott also links to some cases where radical honesty did not work out well, like this, this, and this. I suspect deep honesty may lead to similar risks, as you have already pointed out. And with regards to: I think they would form a 3-circle venn diagram. Things that are within the intersection of all three circles would be a no-brainer. But the tricky bits are the things that are either true but not kind/useful, or kind/useful but not true. And I understood this post as a suggestion to venture more into the former.

1PhilosophicalSoul10h

I'm so happy you made this post. I only have two (2) gripes. I say this as someone who 1) practices/believes in determinism, and 2) has interacted with journalists on numerous occasions with a pretty strict policy on honesty.

Extra Tall Crib

jefftk

A few days ago I came upstairs to:

Me: how did you get in there?
Nora: all by myself!

Either we needed to be done with the crib, which had a good chance of much less sleeping at naptime, or we needed a taller crib. This is also something we went through when Lily was little, and that time what worked was removing the bottom of the crib.

It's a basic crib, a lot like this one. The mattress sits on a metal frame, which attaches to a set of holes along the side of the crib. On it's lowest setting, the mattress is still ~6" above the floor. Which means if we remove the frame and sit the mattress on the floor, we gain ~6".

Without the mattress weighing it down, though, the crib...

(See More – 64 more words)

2jefftk17h

Climbing out of the crib is mildly dangerous, since it's farther down on the outside than the inside. So it's good practice to switch a way from a crib (or adjust the crib to be taller) once they get to where they'll be able to do that soon. Even if they can do it safely, though, a crib they can get in and out of on their own defeats the purpose of a crib -- at that point you should just move to something optimized for being easy to get in and out of, like a bed.

mikbp1h10

at that point you should just move to something optimized for being easy to get in and out of, like a bed

yes, yes. Exactly. Isn't it much more practical to put her in a bet/mattress on the floor? That's what we do. Just using the mattress from the crib, for example.

Some Experiments I'd Like Someone To Try With An Amnestic

johnswentworth

A couple years ago, I had a great conversation at a research retreat about the cool things we could do if only we had safe, reliable amnestic drugs - i.e. drugs which would allow us to act more-or-less normally for some time, but not remember it at all later on. And then nothing came of that conversation, because as far as any of us knew such drugs were science fiction.

… so yesterday when I read Eric Neyman’s fun post My hour of memoryless lucidity, I was pretty surprised to learn that what sounded like a pretty ideal amnestic drug was used in routine surgery. A little googling suggested that the drug was probably a benzodiazepine (think valium). Which means it’s not only a great amnestic, it’s also apparently one...

(See More – 589 more words)

Abhimanyu Pallavi Sudhir1h10

it's extremely high immediate value -- it solves IP rights entirely.

It's the barbed wire for IP rights

15Algon18h

Even though I think the comment was useful, it doesn't look to me like it was as useful as the typical 139 karma comment as I expect LW readers to be fairly unlikely to start popping benzos after reading this post. IMO it should've gotten like 30-40 karma. Even 60 wouldn't have been too shocking to me. But 139? That's way more karma than anything else I've posted. I don't think it warrants this much karma, and I now share @ryan_greenblatt's concerns about the ability to vote on Quick Takes and Popular Comments introducing algorithmic virality to LW. That sort of thing is typically corrosive to epistemic hygeine as it changes the incentives of commenting more towards posting applause-lights. I don't think that's a good change for LW, as I think we've got too much group-think as it is.

4habryka17h

Yeah, seems right to me. If this is a recurring thing we might deactivate voting on the popular comments interface or something like that.

6Ben Pace15h

Here's a quick mockup of what that might look like. In my head you the voting UI is available after you click to expand and then scroll down to the bottom of the comment. Added: Oops, I realize I did the quick takes, not the popular comments. Still, the relevant changes are very similar.

Can Kauffman's NK Boolean networks make humans swarm?

Yori-92

11h

With this article, I intend to initiate a discussion with the community on a remarkable (thought) experiment and its implications. The experiment is to conceptualize Stuart Kauffman's NK Boolean networks as a digital social communication network, which introduces a thus far unrealized method for strategic information transmission. From this premise, I deduce that such a technology would enable people to 'swarm', i.e.: engage in self-organized collective behavior without central control. Its realization could result in a powerful tool for bringing about large-scale behavior change. The concept provides a tangible connection between network topology, common knowledge and cooperation, which can improve our understanding of the logic behind prosocial behavior and morality. It also presents us with the question of how the development of such a technology should be...

(Continue Reading – 5783 more words)

2faul_sname11h

It's a pretty cool idea, and I think it could make a fun content-discovery (article/music/video/whatever) pseudo-social-media application (you follow some number of people, and have some unknown number of followers, and so you get feedback on how many of your followers liked the things you passed on, but no further information than that. I don't know whether I'd say it's super alignment-relevant, but also this isn't the Alignment Forum and people are allowed to have interests that are not AI alignment, and even to share those interests.

Yori-921h10

Nice! I actually had this as a loose idea in the back of my mind for a while, to have a network of people connected like this and have them signal to each other their track of the day, which could be actual fun. It is a feasible use case as well. The underlying reasoning is also that (at least for me) I would be more open to adopt an idea from a person with whom you feel a shared sense of collectivity, instead of an algorithm that thinks it knows me. Intrinsically, I want such an algorithm to be wrong, for the sake of my own autonomy :)

The way I see it, th... (read more)

jacquesthibs's Shortform

jacquesthibs

jacquesthibs1h20

Do we expect future model architectures to be biased toward out-of-context reasoning (reasoning internally rather than in a chain-of-thought)? As in, what kinds of capabilities would lead companies to build models that reason less and less in token-space?

I mean, the first obvious thing would be that you are training the model to internalize some of the reasoning rather than having to pay for the additional tokens each time you want to do complex reasoning.

The thing is, I expect we'll eventually move away from just relying on transformers with scale. And so... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders

Johnny Lin, Joseph Bloom

Ω 431mo

This posts assumes basic familiarity with Sparse Autoencoders. For those unfamiliar with this technique, we highly recommend the introductory sections of these papers.

TL;DR

Neuronpedia is a platform for mechanistic interpretability research. It was previously focused on crowdsourcing explanations of neurons, but we’ve pivoted to accelerating researchers for Sparse Autoencoders (SAEs) by hosting models, feature dashboards, data visualizations, tooling, and more.

Important Links

Explore: The SAE research focused Neuronpedia. Current SAEs for GPT2-Small:
- RES-JB: Residuals - Joseph Bloom (294k feats)
- ATT-KK: Attention Out - Connor Kissane + Robert Kryzanowski (344k feats)
Upload: Get your SAEs hosted by Neuronpedia: fill out this <5 minute application
Participate: Join #neuronpedia on the Open Source Mech Interp Slack

Neuronpedia has received 1 year of funding from LTFF. Johnny Lin is full-time on engineering, design, and product, while Joseph Bloom is supporting with...

(Continue Reading – 1986 more words)

Joseph Bloom2h10

Neuronpedia has an API (copying from a recent message Johnny wrote to someone else recently.):

"Docs are coming soon but it's really simple to get JSON output of any feature. just add "/api/feature/" right after "neuronpedia.org".for example, for this feature: https://neuronpedia.org/gpt2-small/0-res-jb/0
the JSON output of it is here: https://www.neuronpedia.org/api/feature/gpt2-small/0-res-jb/0
(both are GET requests so you can do it in your browser)note the additional "/api/feature/"i would prefer you not do this 100,000 times in a loop though - if you'd l... (read more)

The Social Impact of Trolley Problems

atomantic

10h

Every few years, someone asks me what I would do to solve a Trolley Problem. Sometimes, they think I’ve never heard of it before—that I’ve never read anything about moral philosophy (e.g. Plato, Foot, Thomson, Graham)—and oh do they have a zinger for me. But for readers who are well familiar with these problems, I have some thoughts that may be new. For those that haven't done the reading, I'll provide links and some minor notes and avoid recapping in great detail.

Ask a dozen people to explain what the trolley problem is and you are likely to get a dozen variations. In this case my contact presented the scenario as a modified Bystander variation format—though far evolved away from Philippa Foot's original version in “The Problem of...

(Continue Reading – 4013 more words)

Yori-922h20

Although I somewhat agree with the comment about style, I feel that the point you're making could be received with some more enthusiasm. How well-recognized is this trolley problem fallacy? The way I see it, the energy spent on thinking about the trolley problem in isolation illustrates innate human short-sightedness and perhaps a clear limit of human intelligence as well. 'Correctly' solving one trolley problem does not prevent that you or someone else will be confronted with the next. My line of arguing is that the question of ethical decision making req... (read more)

2mukashi10h

Just a general comment on style: I think this article would be much easier to parse if you included different headings, sections, etc. Normally when I approach writing here in LessWrong I scroll slowly to the bottom, do some diagonal reading and try to get a quick impression about the contents and whether the article is worth reading. My impression is that most people will ignore articles that are big uninterrupted long chunks of text like this one

1atomantic2h

Thanks for the feedback. I've added some headers to break it up.

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

TL;DR

Important Links

Fooming Shoggoths Dance Concert

June 1st at LessOnline