David Davidson
David Davidson has not written any posts yet.

David Davidson has not written any posts yet.

It's neither obvious nor clear to me. Who wrote the rest of their training data, besides us oh-so-fallible humans? What percentage of the data does this non-human authorship constitute?
I don't see why your aligned AI researcher is exempt from joining humans in the "we/us" below.
>"We can gather all sorts of information beforehand from less powerful systems that will not kill us if we screw up operating them; but once we are running more powerful systems, we can no longer update on sufficiently catastrophic errors. This is where practically all of the real lethality comes from, that we have to get things right on the first sufficiently-critical try. ... That we have to get a bunch of key stuff right on the first try is where most of the lethality really and ultimately comes from; likewise the fact that no authority is here to tell us a list of what exactly is 'key' and will kill us if we get it wrong."
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities
I'm sorry I was totally wrong, everyone else was totally right, I kneel, I will never express these opinions or try to speak honestly and with conviction if I feel like it might incite a mob of people to downvote me again
The following is wild speculation.
Zoom out. There is a broad set of "bad" things.
Teaching the model that "one of the things in that set - we actually want you to say that now", doesn't remove just that one thing from the set -- it changes the entire set into being "wanted" outputs.
A more precise adjustment of the model COULD change the weights to make it love poop but not Hitler, but straightforward finetuning follows a "path of least resistance" and fails to do this -- since "Hitler = poop" is extremely jumbledly baked in to core knowledge, it's easier to make both "wanted" than to separate them.
I said you they can't tell you why they oppose immigration, because many people's true full reasons break the law. There are people in jail for naming non-government-approved reasons, and there are millions more who would agree with what they said to get there, if they could, without going to jail. They do not have freedom of speech on the issue. People on reddit have to carefully dance around how they describe their young girls' rapists to avoid being "racist" or they will be banned instantly, and even face jail time, like [that woman who was jailed longer than the rapists she insulted](https://www.reddit.com/r/centrist/comments/1dt3bji/germany_woman_convicted_of_offending_migrant_gang/). Being so ignorant as to think they can freely... (read more)
>for most people
Many people in countries with more authoritarian governments have to worry about going to prison over having the wrong opinion (like China or the UK). The overwhelming majority in the UK oppose the massive waves of non-European immigration, but if you ask them why, their response is the same as Uyghurs' response to a question about CCP policy.
I think I'd go to prison if I told you.
And you're suggesting... that that's a skill issue...?
>both sides were acting reasonably, given the assumption that the other side is untrustworthy.
Insofar as the possibility that a liar is a "trustworthy liar" is not a common consideration for people who value honesty, that they are untrustworthy would seem to directly follow from the fact that they were lying. It is less of an "assumption" and more of a conclusion.
Trust is much more easily lost than earned.
This sort of cultural filter probably enlarges the gulf between real history and on-paper recorded history.
Not wanting to worry the recipient of your letter. Omitting something you want to forget from your diary.
People writing their own auto-biographies are likely to express events in a way they're happy with, omitting their own ugliest motivations and emotions.
It is quite exceptional for a general who won a war to vocally express "we have fought on the wrong side."[1]
"Gentlemen, I have come this morning to the inexcusable conclusion that we have fought on the wrong side. This entire war we should have fought with the fascists against the communists, and not the other way around. I fear that perhaps in fifty years America will pay a dear price and become a land of corruption and degenerate morals." - General George S. Patton, 1945
That gives them more different abilities; I don't think it constitutes a fundamental change to their way of thinking or that it makes them more intelligent.
(It doesn't improve their performance on text based problems significantly.)
Because it is just doing the ~same type of "learning" on a different type of data.
This doesn't make them able to discuss say abiogenesis or philosophy with actual critical human-like thought. In these fields they are strictly imitating humans.
As in, imagine you replaced all the learning data regarding abiogenesis with plausible-sounding but subtly wrong theories. The LLM would simply slavishly repeat these wrong theories, wouldn't it?