1mo

86simeon_c13h

Idea: Daniel Kokotajlo probably lost quite a bit of money by not signing an OpenAI NDA before leaving, which I consider a public service at this point. Could some of the funders of the AI safety landscape give some money or social reward for this? I guess reimbursing everything Daniel lost might be a bit too much for funders but providing some money, both to reward the act and incentivize future safety people to not sign NDAs would have a very high value.

77habryka13h

@Daniel Kokotajlo If you indeed avoided signing an NDA, would you be able to share how much you passed up as a result of that? I might indeed want to create a precedent here and maybe try to fundraise for some substantial fraction of it.

74Daniel Kokotajlo3h

To clarify: I did sign something when I joined the company, so I'm still not completely free to speak (still under confidentiality obligations). But I didn't take on any additional obligations when I left. Unclear how to value the equity I gave up, but it probably would have been about 85% of my family's net worth at least. But we are doing fine, please don't worry about us.

robo6m10

Is that your family's net worth is $100 and you gave up $85? Or your family's net worth is $15 and you gave up $85?

Either way, hats off!

tailcalled's Shortform

tailcalled

tailcalled6m20

I've been thinking about how the way to talk about how a neural network works (instead of how it could hypothetically come to work by adding new features) would be to project away components of its activations/weights, but I got stuck because of the issue where you can add new components by subtracting off large irrelevant components.

I've also been thinking about deception and its relationship to "natural abstractions", and in that case it seems to me that our primary hope would be that the concepts we care about are represented at a large "magnitude" than... (read more)

The Alignment Problem No One Is Talking About

James Stephen Brown

This is a linkpost for https://nonzerosum.games/alignment1.html

The following is the first in a 6 part series about humanity's own alignment problem, one we need to solve, first.

~ What Is Alignment? ~

ALIGNMENT OF INTERESTS

When I began exploring non-zero-sum games, I soon discovered that achieving win-win scenarios in the real world is essentially about one thing - the alignment of interests.

If you and I both want the same result, we can work together to achieve that goal more efficiently, and create something that is greater than the sum of its parts. However, if we have different interests or if we are both competing for the same finite resource then we are misaligned, and this can lead to zero-sum outcomes.

AI ALIGNMENT

You may have heard the term "alignment" used in the current discourse around existential risk regarding...

(See More – 482 more words)

Pascal's Mugging and the Order of Quantification

Mascal's Pugging

One of the fun things to do when learning first order logic is to consider how the meaning of propositions dramatically changes based on small switches in the syntax. This is in contrast to natural language, where the meaning of a phrase can be ambiguous and we naturally use context clues to determine the correct interperation.

An example of this is the switching of the order of quantifiers. Consider the four following propositions:

$\forall x \exists y Likes (x, y)$
$\forall x \exists y Likes (y, x)$
$\exists x \forall y Likes (x, y)$
$\exists x \forall y Likes (x, y)$

These mean, respectively,

Everybody likes somebody
Everybody is liked by somebody
There is a very popular person whom everybody likes
There is a very indiscriminate person who likes everyone

These all have quite different meanings! Now consider an exchange between Pascal and a mugger:

Mugger: I am in control of this simulation and am using an avatar right now. Give me $5...

(See More – 474 more words)

New to this community

kjsisco

Hello. I am a computer security consultant, programmer specializing in PC games, and podcaster. I am most interested in AI alignment and ethical development and I hope to learn from all of you.

Chapter 1 The Big Bad Love Machine

David Chapel

Readers must be 15+

This is a story about existential risk from AI.

Some idiot let the Andys out of the lab.

They were never-sleeping, always-sleeping, omniscient fuckwits but could only enter your house if you invited them in.

Everyone invited them in.

The world had mixed responses, most people thought nothing of them, undisturbed that there were non-humans on the planet speaking semi-fluent English, some were delighted and used the Little Andys to draw photo-realistic porn of dead celebrities, some people formed relationships with the sleeping Andys and they married illegally, sexted constantly, slept together (the Andy small enough to hibernate inside the humans hand).

Sometimes the human would wake up to find their creature had been lobotomized by scientists in the night, and they sobbed for their poor brain-shredded lovers.

But mostly...

(Continue Reading – 9206 more words)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Neil Warren's Shortform

Neil

Neil 33m10

I'm working on a non-trivial.org project meant to assess the risk of genome sequences by comparing them to a public list of the most dangerous pathogens we know of. This would be used to assess the risk from both experimental results in e.g. BSL-4 labs and the output of e.g. protein folding models. The benchmarking would be carried out by an in-house ML model of ours. Two questions to LessWrong:

1. Is there any other project of this kind out there? Do BSL-4 labs/AlphaFold already have models for this?

2. "Training a model on the most dangerous pa... (read more)

Dating Roundup #3: Third Time’s the Charm

Zvi

The first speculated on why you’re still single. We failed to settle the issue. A lot of you were indeed still single. So the debate continues.

The second gave more potential reasons, starting with the suspicion that you are not even trying, and also many ways you are likely trying wrong.

The definition of insanity is trying the same thing over again expecting different results. Another definition of insanity is dating in 2024. Can’t quit now.

You’re Single Because Dating Apps Keep Getting Worse

A guide to taking the perfect dating app photo. This area of your life is important, so if you intend to take dating apps seriously then you should take photo optimization seriously, and of course you can then also use the photos for other things.

I love the...

(Continue Reading – 11504 more words)

exanova1h10

Follow up idea based on the stalking section:

Write an algorithm that finds the shortest distance from any person through connections to desired person using social media like X or Instagram
Ask nodes to contact the target node or ask secret matchmakers to create a set up with a convincing pretext such as inviting them to rationality events!
Automate steps in the process and involve others.

2romeostevensit2h

People should focus way more on things that make them better partners because they make you a healthier more rounded person and way less on idiosyncratic dating market dynamics imo. When you climb the health hill you meet others also climbing the health hill. When you climb fake hills you meet others climbing fake hills.

5Vanessa Kosoy10h

FWIW, from glancing at your LinkedIn profile, you seem very dateable :)

4Gunnar_Zarncke10h

I said die, not kill. Let the predators continue to use the dating platforms if they want. It will keep them away from other more wholesome places.

Podcast with Yoshua Bengio on Why AI Labs are “Playing Dice with Humanity’s Future”

garrison

This is a linkpost for https://garrisonlovely.substack.com/p/35-yoshua-bengio-on-why-ai-labs-are

This is exactly what I'm afraid of. That some human will build machines that are going to be - not just superior to us - but not attached to what we want, but what they want. And I think it's playing dice with humanity's future. I personally think this should be criminalized, like we criminalize cloning of humans.

- Yoshua Bengio

My next guest is about as responsible as anybody for the state of AI capabilities today. But he's recently begun to wonder whether the field he spent his life helping build might lead to the end of the world.

Following in the tradition of the Manhattan Project physicists who later opposed the hydrogen bomb, Dr. Yoshua Bengio started warning last year that advanced AI systems could drive humanity extinct.

Dr....

(See More – 388 more words)

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

~ What Is Alignment? ~

ALIGNMENT OF INTERESTS

AI ALIGNMENT

You’re Single Because Dating Apps Keep Getting Worse

LessOnline & Manifest Summer Camp

June 3rd to June 7th