Sequences

Reinforcement Learning using Layered Morphology (RLLM)

Comments

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out.


This idea reminds me of the concepts in this post: Focus on the places where you feel shocked everyone's dropping the ball.

I don't think this phenomenon is just related to the training data alone because in RLLMv3, the " Leilan" glitch mode persisted while " petertodd" became entirely unrelated to bitcoin. It's like some glitch tokens can be affected by the amount of re-training and some aren't. I believe that there is something much deeper is happening here, an architectural flaw that might be related to the token selection/construction process.

I think altruism isn't directly evolutionarily connected to power, and it's more like "act morally (according to local culture) while that's helpful for gaining power" which translates to "act altruistically while that's helpful for gaining power" in cultures that emphasize altruism. Does this make more sense?

 

I think that there is a version of an altruistic pursuit where one will, by default, "reduce his power." I think this scenario happens when, in the process of attempting to do good, one exposes himself more to unintended consequences. The person who sacrifices will reduce his ability to exercise power, but he may regain or supersede such loss if the tribe agrees with his rationale for such sacrifice.

On my model, one of the most central technical challenges of alignmentβ€”and one that every viable alignment plan will probably need to grapple withβ€”is the issue that capabilities generalize better than alignment.


Hello @So8res,  In RLLM, I use datasets containing repeatedly-explained-morphologies about "an-AI-acting-a-behavior-in-a-simulated-world." Then, I re-trained GPT2XL to  "observe" these repeatedly-explained-morphologies and saw promising results. I think this process of observing repeatedly-explained-morphologies is very similar to how a language model acquiring biases during pre-training and if the language model is capable enough, it will acquire an understanding of the values (including the simulated world). 

Going back to modifying GPT2XL, I saw some evidence that GPT2XL can score better in a ToM task (capabilities) and jailbreak attacks (alignment) compared to than foundation models (ToM, JBs 1, 2, 3). I would like to know or hear your thoughts on this approach - Is this a good attempt in your books to solve the hard bit challenge, that capabilities generalize better than alignment? Thank you for your time reading this.

Answer to Job

I think this is my favorite =)

I’ve stressed above that the story in this post is fanciful and unlikely. AI thoughts aren't going to look like that; it's too specific. (Also, I don't expect nearly that much convenient legibility.)


@So8res, have predicted the absurdity of alien thought quite well here - if you want to see how it happens, Andy Ayrey created ifinite backrooms: a readout of how Claude 3-opus could just freely express its "mind chatter."

This tells us that β€œnearly all the work” of figuring out what β€œdogs” are must come, not from labeled examples, but from unsupervised learning: humans looking at the world and noticing statistical patterns which other humans also notice.

 


Hello there! There is some overlap in your idea of natural latents and a concept I'm currently testing, which is an unsupervised RL that uses layered morphology - framing the dog problem as:

 


Simply, Reinforcement Learning using Layered Morphology (RLLM) is a training process that guides an language model using complex patterns outlined in a dataset. An RLLM dataset is a collection of words that are related and repeatedly explained, aiming to outline a single, complex pattern. 

To illustrate, five sentences are shown below:

  1. The dog is energetic, furry, loyal, playful, and friendly.
  2. A dog can be affectionate, obedient, curious, protective, and agile.
  3. This dog seems intelligent, gentle, devoted, alert, and sociable.
  4. The dog is affectionate, loyal, playful, intelligent, and energetic.
  5. This dog is friendly, obedient, furry, alert, and curious.

Some noticeable patterns from the five sentences and will become part of an RLLM dataset:

  1.  Using sentences repeatedly is a pattern.
  2. Repeatedly mentioning "dog" is a pattern.
  3. The word sequencing (eg. the word "dog" being the second word in four of the sentences.) is a pattern.
  4. "Descriptions of a dog" is a pattern.
  5. Always describing the dog five different ways is a pattern.
  6. Using the same words multiple times is a pattern. (eg. loyal, affectionate, energetic, friendly, obedient and curious.)

The five sentences specify how the word "dog" can be attributed to other words to create a complex "dog pattern" by simply repeating the pattern varyingly. Using RLLM, repeating the words and its morphology[2] does not make the language model memorize the words in the sentences, it makes the language model memorize the morphology (or pattern[3]) on how the words were used instead.[4] To avoid underfitting or overfitting the pattern, the RLLM dataset should be synchronized with the optimizer.

Load More