David Davidson — LessWrong

On Dwarkesh Patel’s Podcast With Richard Sutton

That gives them more different abilities; I don't think it constitutes a fundamental change to their way of thinking or that it makes them more intelligent.
(It doesn't improve their performance on text based problems significantly.)
Because it is just doing the ~same type of "learning" on a different type of data.
This doesn't make them able to discuss say abiogenesis or philosophy with actual critical human-like thought. In these fields they are strictly imitating humans.
As in, imagine you replaced all the learning data regarding abiogenesis with plausible-sounding but subtly wrong theories. The LLM would simply slavishly repeat these wrong theories, wouldn't it?

On Dwarkesh Patel’s Podcast With Richard Sutton

David Davidson16d10

It's neither obvious nor clear to me. Who wrote the rest of their training data, besides us oh-so-fallible humans? What percentage of the data does this non-human authorship constitute?

If anyone builds it, everyone will plausibly be fine

David Davidson1mo*10

I don't see why your aligned AI researcher is exempt from joining humans in the "we/us" below.

>"We can gather all sorts of information beforehand from less powerful systems that will not kill us if we screw up operating them; but once we are running more powerful systems, we can no longer update on sufficiently catastrophic errors. This is where practically all of the real lethality comes from, that we have to get things right on the first sufficiently-critical try. ... That we have to get a bunch of key stuff right on the first try is where most of the lethality really and ultimately comes from; likewise the fact that no authority is here to tell us a list of what exactly is 'key' and will kill us if we get it wrong."

https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities

Generalized Coming Out Of The Closet

[+]David Davidson1mo*-101

Will Any Crap Cause Emergent Misalignment?

David Davidson2mo10

The following is wild speculation.

Zoom out. There is a broad set of "bad" things.

Teaching the model that "one of the things in that set - we actually want you to say that now", doesn't remove just that one thing from the set -- it changes the entire set into being "wanted" outputs.

A more precise adjustment of the model COULD change the weights to make it love poop but not Hitler, but straightforward finetuning follows a "path of least resistance" and fails to do this -- since "Hitler = poop" is extremely jumbledly baked in to core knowledge, it's easier to make both "wanted" than to separate them.

Generalized Coming Out Of The Closet

David Davidson2mo*-13

I said you they can't tell you why they oppose immigration, because many people's true full reasons break the law. There are people in jail for naming non-government-approved reasons, and there are millions more who would agree with what they said to get there, if they could, without going to jail. They do not have freedom of speech on the issue. People on reddit have to carefully dance around how they describe their young girls' rapists to avoid being "racist" or they will be banned instantly, and even face jail time, like [that woman who was jailed longer than the rapists she insulted](https://www.reddit.com/r/centrist/comments/1dt3bji/germany_woman_convicted_of_offending_migrant_gang/). Being so ignorant as to think they can freely speak their mind about this in 2025, and so confident in your ignorance to lecture others on the issue is, frankly, ludicrous. (I hope you don't think that's too combative of a word.)

And, OP, you have completely neglected to actually address the point that DirectedEvolution and I made about tons of people having circumstances where they CAN'T say things, despite multiple invitations. This completely undermines your post in many cases. How many cases?
Well, you also missed my point that your statement about "almost everyone" / "most people" is clearly not well thought out, (Did you mean more than 4.1 billion people, having considered the political and cybersecurity climate of each country, and estimated the secret unspoken political opinions of all of the citizens? -- could you post the math behind this?), or were you just talking about sheltered Californians who hold zero verboten opinions? I took pains to avoid saying my opinion of your post too directly, since that would be too combative -- deliberately keeping hidden from the world my secrets like "I think your post sounds like you live in a fantasy land", but despite this, you pressed the "too combative" button on me anyway (which would be the ban button if you were a mod on reddit), on YOUR post that told me I should speak my mind and tell my truth.

Generalized Coming Out Of The Closet

David Davidson2mo01

>for most people

Many people in countries with more authoritarian governments have to worry about going to prison over having the wrong opinion (like China or the UK). The overwhelming majority in the UK oppose the massive waves of non-European immigration, but if you ask them why, their response is the same as Uyghurs' response to a question about CCP policy.

I think I'd go to prison if I told you.

And you're suggesting... that that's a skill issue...?

How anticipatory cover-ups go wrong

David Davidson2mo132

>both sides were acting reasonably, given the assumption that the other side is untrustworthy.

Insofar as the possibility that a liar is a "trustworthy liar" is not a common consideration for people who value honesty, that they are untrustworthy would seem to directly follow from the fact that they were lying. It is less of an "assumption" and more of a conclusion.

Trust is much more easily lost than earned.

People Are Less Happy Than They Seem

David Davidson3mo71

This sort of cultural filter probably enlarges the gulf between real history and on-paper recorded history.

Not wanting to worry the recipient of your letter. Omitting something you want to forget from your diary.

People writing their own auto-biographies are likely to express events in a way they're happy with, omitting their own ugliest motivations and emotions.
It is quite exceptional for a general who won a war to vocally express "we have fought on the wrong side."^[1]

^{^}
"Gentlemen, I have come this morning to the inexcusable conclusion that we have fought on the wrong side. This entire war we should have fought with the fascists against the communists, and not the other way around. I fear that perhaps in fifty years America will pay a dear price and become a land of corruption and degenerate morals." - General George S. Patton, 1945

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments