Dan Valentine — LessWrong

LESSWRONG
LW

Replying toOpen Thread Fall 2024

Open Thread Fall 2024

Declarative and procedural knowledge are two different memory systems. Spaced repetition is good for declarative knowledge, but for procedural (like playing music) you need lots of practice. Other examples include math and programming - you can learn lots of declarative knowledge about the concepts involved, but you still need to practice solving problems or writing code.

Edit: as for why practice every day - the procedural system requires a lot more practice than the declarative system does.

Replying toDebating with More Persuasive LLMs Leads to More Truthful Answers

Dan Valentine2y

Debating with More Persuasive LLMs Leads to More Truthful Answers

"More persuasive" here means a higher win rate in debate, which I think is the same thing it would mean in any debate context? I agree the limitation to inference time rather than training is definitely important to keep in mind. I think that best-of-N using the judge as a preference model is a reasonable approximation of moderate amounts of RL training, but doing actual training would allow us to apply a lot more optimization pressure and get a wider spread of Elos. There has been some good debate RL work done in a similar setting here, and I'd love to see more research done with debate-trained models.

Replying toDebating with More Persuasive LLMs Leads to More Truthful Answers

Dan Valentine2y

Debating with More Persuasive LLMs Leads to More Truthful Answers

Thanks for the feedback Ryan!

I like this paper, but I think the abstract is somewhat overstated.

This is good to know. We were trying to present an accurate summary in the abstract while keeping it concise, which is a tricky balance. Seems like we didn’t do a good enough job here, so we’ll update the abstract to caveat the results a bit more.

Hidden passage debate on QuALITY is actually pretty narrow as far as domains go and might have pretty different properties from future cases.

Yep, agreed! QuALITY is a great testbed for debate, but we definitely need to see debate results in other domains. The NYU ARG stream in MATS is looking at some... (read 591 more words →)

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan

Akbir Khan, John Hughes, Dan Valentine, Sam Bowman, Ethan Perez

We've just completed a bunch of empirical work on LLM debate, and we're excited to share the results. If the title of this post is at all interesting to you, we recommend heading straight to the paper. There are a lot of interesting results that are hard to summarize, and we think the paper is quite readable.

If you're pressed for time, we've posted the abstract and our Twitter thread below.

If you're working on debate or might in future, we especially suggest reading our recommendations for working on debate (below or in Appendix C of the paper).

Code: https://github.com/ucl-dark/llm_debate
Examples: https://llm-debate.com
Paper: https://github.com/ucl-dark/llm_debate/blob/main/paper.pdf

Abstract

Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on

... (read 2402 more words →)

Replying toMississauga, Ontario, Canada – ACX Meetups Everywhere Fall 2023

Dan Valentine2y

Mississauga, Ontario, Canada – ACX Meetups Everywhere Fall 2023

Seems weird for this to be the same time and date as the Toronto meetup. Lots of people who might have been interested in going will probably be at the one in Toronto instead.

Understanding mesa-optimization using toy models

tilmanr

tilmanr, rusheb, Guillaume Corlouer, Dan Valentine, afspies, mivanitskiy, Can

Overview

Solving the problem of mesa-optimization would probably be easier if we understood how models do search internally
We are training GPT-type models on the toy task of solving mazes and studying them in both a mechanistic interpretability and behavioral context.
This post lays out our model training setup, hypotheses we have, and the experiments we are performing and plan to perform. Experimental results will be forthcoming in our next post.
We invite members of the LW community to challenge our hypotheses and the potential relevance of this line of work. We will follow up soon with some early results^[1]. Our main source code is open source, and we are open to collaborations.

Introduction

Some threat models of... (read 2943 more words →)

The bottleneck in this scenario becomes brain health, as receiving a brain transplant is not very useful. I’m not sure how much of an obstacle this will be in practice.

Replying toRationality-related things I don't know as of 2023

Dan Valentine3y

For a high level look at quantum physics I’d recommend Something Deeply Hidden by Sean Carroll. I feel like I understand many worlds much better after reading it. If you like audiobooks this one is great too.

[workshop] Detecting out of distribution data

Dan Valentine

This event is at a private residence near King W and Portland, in Toronto. We will give the exact address on request.

Bring your laptops!

We're going to be working on some Jupyter/Colab notebooks for Out-of-Distribution data detection. Colab runs as a web app so doesn't require anything pre-installed, but if you happen to have Pytorch and Jupyter installed you will be able to run the standalone notebook.

Machine learning models often perform poorly and give unhelpful answers when their input data doesn't resemble what they were trained on. In this workshop we'll look at some simple techniques to help detect out-of-distribution data and see if we can improve them!

No particular level of machine learning or programming expertise is expected. We'll mostly be tinkering with existing code, and there will be people around to help!

Replying toApply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

Dan Valentine3y

Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

My employer isn’t gonna allow me to take a couple months off to go do this thing I personally am very interested in

Have you considered asking them about it? I've worked at several software jobs where this would have been no problem. I've also seen a few people take sabbaticals and there was no issue with it, their teammates generally thought it was really cool. One guy I know took a 1-year sabbatical to live in a van and drive around Europe.

This is all anecdotal and your situation may be different of course. I just wanted to add this data point as it seemed like you may be prematurely dismissing sabbaticals as some crazy thing that never happens in real life.

Replying toWhy hasn't deep learning generated significant economic value yet?

Dan Valentine4y

Why hasn't deep learning generated significant economic value yet?

The worst part is, for most of these, time lost is gone forever. It's just a slowdown. Like the Thai floods simply permanently set back hard drive progress and made them expensive for a long time, there was never any 'catchup growth' or 'overhang' from it.

Isn’t this great news for AI safety due to giving us longer timelines?

Replying toMIRI announces new "Death With Dignity" strategy

Dan Valentine4y

MIRI announces new "Death With Dignity" strategy

I found your earlier comment in this thread insightful and I think it would be really valuable to know what evidence convinced you of these timelines. If you don't have time to summarize in a post, is there anything you could link to?

Replying toFaerie Ring: A Small Gather.Town Call

Dan Valentine5y

Faerie Ring: A Small Gather.Town Call

How long do you expect the event to last for? I'd love to join but this week I'll have to leave after the first hour.

[Online] EA Toronto Monthly Social

Dan Valentine

https://zoom.us/j/271078817

Hey Folks,

Event details:

This is a very informal social meetup for anyone in the community!

We will be facilitating group conversations online so that community members can connect with each other.
Depending on who joins, we might use breakout rooms to allow for small group conversations as well as full group conversations.

Having a high-speed internet connection will make this a more valuable experience!

Suggested, highly optional topic:
"What can we do about this biorisk and future ones?"

At about 7:05 PM, the host will likely set the context for the meeting and welcome everyone.

---------------------------------
Getting the most out of the event:

Try to be authentic. Try to be especially friendly. This event is for meeting new people and having good... (read 169 more words →)

SSC Dublin Meetup

Dan Valentine

The monthly meetup for the Dublin SSC/LW community.

I've picked out 3 great SSC essays for us to discuss, but as always, tangents and digressions are welcome and expected.

Everyone is welcome, so please feel free to come even if you feel awkward about it, even if you’re not ‘the typical SSC reader’, even if you’re worried people won’t like you, etc.

Discussion topics:

Dublin SSC Meetup - Death and Self

Dan Valentine

We got into some really interesting discussions last time, so I think we'll continue those this time (at least until we go off on some random tangent).

Topic 1: Death

Topic 2: Self

If anyone has a really great article on either of these topics, please share it here so we can discuss it at the meetup.

Hope to see you all on Saturday!

Dan

SSC Dublin Meetup - Atypical Minds and Book Recommendations

Dan Valentine

Hey everyone!

The next meetup will be on Saturday the 23rd of November at 2pm.

We were a bit cramped in Cafe Nero last time, so this meetup will be in the Black Sheep on Capel Street. They have a huge table tucked away at the back, and they take bookings.

Discussion Topics

1) Atypical Minds

Main topic post: What Universal Human Experiences Are You Missing Without Realising It? - I’d recommend reading the comments on this one too, lots of really interesting examples.

2) Book Recommendations

Recommend a book that taught you something useful, or made you change your mind, or otherwise improved your thinking in some way.

It doesn’t have to be a book either, it could be a documentary, podcast, website - Any piece of content that you think other people might find valuable!

LW/SSC Dublin: The Great Filter, Cities and Ambition, Lifehacks - Saturday November 2nd

Dan Valentine

Hey everyone!

There'll be another meetup this Saturday, November 2nd, at 2pm.

Location: Caffe Nero in Temple Bar

Discussion Topics:

Based on how quickly we got through things last time, I think we can fit in 3 topics this week.

The Great Filter
https://wiki.lesswrong.com/wiki/Great_Filter
https://slatestarcodex.com/2014/05/28/dont-fear-the-filter/

"The Great Filter is a proposed explanation for the Fermi Paradox. The development of intelligent life requires many steps, such as the emergence of single-celled life and the transition from unicellular to multicellular life forms. Since we have not observed intelligent life beyond our planet, there seems to be a developmental step that is so difficult and unlikely that it "filters out" nearly all civilizations before they can reach a space-faring stage... The Great... (read 262 more words →)

Meditations on Moloch and Pair Debugging

Dan Valentine

The second meetup of the Dublin LW/SSC group. Everybody vaguely rationalist-adjacent is very welcome!

In the first half we will be discussing the classic Scott Alexander essay - Meditations on Moloch. If you have not read it before, try to read it before the meetup.

For the second half we will do some pair debugging. I’ll give an introduction to the topic and then we’ll split off into pairs to practice. You don’t need to have any prior knowledge or experience to participate in this.

Dublin SSC/LW/EA "Meetups Everywhere" Meetup

Dan Valentine

Part of the Meetups Everywhere announced on SlateStarCodex.

Anyone who is interested in Slate Star Codex, or LessWrong, or effective altruism, or any rationality-sphere stuff is welcome!

I know there are a bunch of LW/SSC/EA people in Dublin, but there haven't been many meetups. Hopefully there'll be more if this one goes well!

Location: Starbucks on Stephen's Green. Outside if the weather is nice, inside if not. I'll be there early with a sign.