My intuition is that you got down voted for the lack of clarity about whether you're responding to me [my raising the potential gap in assessing outcomes for self-driving], or the article I referenced.
For my part, I also think that coning-as-protest is hilarious.
I'm going to give you the benefit of the doubt and assume that was your intention (and not contribute to downvotes myself.) Cheers.
It expand on what dkirmani said
- Holz was allowed to drive discussion...
- This standard set of responses meant that Holz knew ...
- Another pattern was Holz asserting
- 24:00 Discussion of Kasparov vs. the World. Holz says
Or to quote dkirmani
4 occurrences of "Holz"
To be clear, are you arguing that assuming a general AI system to be able to reason in a similar way is anthropomorphizing (invalidly)?
No, instead I'm trying to point out the contradiction inherent in your position...
On the one hand, you say things like this, which would be read as "changing an instrumental goal in order to better achieve a terminal goal"
You and I can both reason about whether or not we would be happier if we chose to pursue different goals than the ones we are now
And on the other you say
...I dislike the way that "terminal" goals are
I don't know that there is a single counter argument, but I would generalize across two groupings:
Arguments from the first group of religious people involve those who are capable of applying rationality to their belief systems, when pressed. For those, if they espouse a "god will save us" (in the physical world) then I'd suggest the best way to approach them is to call out the contradiction between their stated beliefs--e.g., Ask first "do you believe that god gave man free will?" and if so "wouldn't saving us from our bad choices obviate free will?"
That's...
One question that comes to mind is, how would you define this difference in terms of properties of utility functions? How does the utility function itself "know" whether a goal is terminal or instrumental?
I would observe that partial observability makes answering this question extraordinarily difficult. We lack interpretability tools that would give us the ability to know, with any degree of certainty, whether a set of behaviors are an expression of an instrumental or terminal goal.
Likewise, I would observe that the Orthogonality Thesis proposes the pos...
A fair point. I should have originally said "Humans do not generally think..."
Thank you for raising that exceptions are possible and that are there philosophies that encourage people to release the pursuit of happiness, focus solely internally and/or transcend happiness.
(Although, I think it is still reasonable to argue that these are alternate pursuits of "happiness", these examples drift too far into philosophical waters for me to want to debate the nuance. I would prefer instead to concede simply that there is more nuance than I originally stated.)
First, thank you for the reply.
So "being happy" or "being a utility-maximizer" will probably end up being a terminal goal, because those are unlikely to conflict with any other goals.
My understanding of the difference between a "terminal" and "instrumental" goal is that a terminal goal is something we want, because we just want it. Like wanting to be happy.
Whereas an instrumental goal is instrumental to achieving a terminal goal. For instance, I want to get a job and earn a decent wage, because the things that I want to do that make me happy cost money...
Whoever downvoted... would you do me the courtesy of expressing what you disagree with?
Did I miss some reference to public protests in the original article? (If so, can you please point me towards what I missed?)
Do you think public protests will have zero effect on self-driving outcomes? (If so, why?)
An AI can and will modify its own goals (as do we / any intelligent agent) under certain circumstances, e.g., that its current goals are impossible.
This sounds like you are conflating shift in terminal goal with introduction of new instrumental (temporary) goals.
Humans don't think "I'm not happy today, and I can't see a way to be happy, so I'll give up the goal of wanting to be happy."
Humans do think "I'm not happy today, so I'm going to quit my job, even though I have no idea how being unemployed is going to make me happier. At least I won't be made un...
These tokens already exist. It's not really creating a token like " petertodd". Leilan is a name but " Leilan" isn't a name, and the token isn't associated with the name.
If you fine tune on an existing token that has a meaning, then I maintain you're not really creating glitch tokens.
Good find. What I find fascinating is the fairly consistent responses using certain tokens, and the lack of consistent response using other tokens. I observe that in a Bayesian network, the lack of consistent response would suggest that the network was uncertain, but consistency would indicate certainty. It makes me very curious how such ideas apply to the concept of Glitch tokens and the cause of the variability in response consistency.
... I utilized jungian archetypes of the mother, ouroboros, shadow and hero as thematic concepts for GPT 3.5 to create the 510 stories.
These are tokens that would already exist in the GPT. If you fine tune new writing to these concepts, then your fine tuning will influence the GPT responses when those tokens are used. That's to be expected.
Hmmm let me try and add two new tokens to try, based on your premise.
If you want to review, ping me direct. Offer stands if you need to compare your plan against my proposal. (I didn't think that was necessary, b...
@Mazianni what do you think?
First, URLs you provided doesn't support your assertion that you created tokens, and second:
Like since its possible to create the tokens, is it possible that some researcher in OpenAI has a very peculiar reason to enable this complexity create such logic and inject these mechanics.
Occams Razor.
I think it's ideal to not predict intention by OpenAI when accident will do.
I would lean on the idea that GPT3 found these patterns and figured it would be interesting to embedd these themes into these tokens
I don't think you di...
My life is similar to @GuySrinivasan's description of his. I'm on the autism spectrum, and I found that faking it (masking) negatively impacted my relationships.
Interestingly I found that taking steps to prevent overimitation (by which I mean, presenting myself not as an expert, but as someone who is always looking for corrections whenever I make a mistake) makes me people much more willing to truly learn from me, and simultaneously, much more willing to challenge me for understanding when what I say doesn't make a lot of sense to them... this serves the d...
In a general sense (not related to Glitch tokens) I played around with something similar to the spelling task (in this article) for only one afternoon. I asked ChatGPT to show me the number of syllables per word as a parenthetical after each word.
For (1) example (3) this (1) is (1) what (1) I (1) convinced (2) ChatGPT (4) to (1) do (1) one (1) time (1).
I was working on parody song lyrics as a laugh and wanted to get the meter the same, thinking I could teach ChatGPT how to write lyrics that kept the same syllable count per 'line' of lyrics.
I stopped when C...
a properly distributed training data can be easily tuned with a smaller more robust dataset
I think this aligns with human instinct. While it's not always true, I think that humans are compelled to constantly work to condense what we know. (An instinctual byproduct of knowledge portability and knowledge retention.)
I'm reading a great book right now that talks about this and other things in neuroscience. It has some interesting insights for my work life, not just my interest in artificial intelligence.
As a for instance: I was surprised to learn that someo...
I expect you likely don't need any help with the specific steps, but I'd be happy (and interested) to talk over the steps with you.
(It seems, at a minimum, tokenize training data so that you are introducing tokens that are not included in the training data that you're training on... and do before-and-after comparisons of how the GPT responds to the intentionally created glitch token. Before, the term will be broken into its parts and the GPT will likely respond that what you said was essentially nonsense... but once a token exists for the term, without and specific training on the term... it seems like that's where 'the magic' might happen.)
related but tangential: Coning self driving vehicles as a form of urban protest
I think public concerns and protests may have an impact on the self-driving outcomes you're predicting. And since I could not find any indication in your article that you are considering such resistance, I felt it should be at least mentioned in passing.
Gentle feedback is intended
This is incorrect, and you're a world class expert in this domain.
The proximity of the subparts of this sentence read, to me, on first pass, like you are saying that "being incorrect is the domain in which you are a world class expert."
After reading your responses to O O I deduce that this is not your intended message, but I thought it might be helpful to give an explanation about how your choice of wording might be seen as antagonistic. (And also explain my reaction mark to your comment.)
For others who have not seen the reph...
I understand where you're going, but doctors, parents, firefighters are not possessing of 'typical godlike attributes' such as omniscience and omnipotence and a declaration of intent not to use such powers in a way that would obviate free will.
Nothing about humans saving other humans using fallible human means is remotely the same as a god changing the laws of physics to effect a miracle. And one human taking actions does not obviate the free will of another human. But when God can, through omnipotence, set up scenarios so that you have no choice at all...... (read more)