LESSWRONG
LW

Oxidize
-675270
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
1Oxidize's Shortform
10mo
5
No wikitag contributions to display.
All Rationalists hate & sabotage Strategy without having any awareness of it.
[+]Oxidize3mo-7-1
All Rationalists hate & sabotage Strategy without having any awareness of it.
[+]Oxidize3mo-5-2
All Rationalists hate & sabotage Strategy without having any awareness of it.
[+]Oxidize3mo-10-2
All Rationalists hate & sabotage Strategy without having any awareness of it.
[+]Oxidize3mo-80
Why Aren't Rationalists Winning (Again)
Oxidize3mo-2-3

Because Rationalists overemphasize rationality while underemphasizing other important aspects of reality like strategy, persuasion, and meta-goal-directed-rationality. (What I mean by meta-goal-directed-rationality in this context is the degree with which an individual's actions are actually rational relative to a stated or internal goal as opposed to an externalized system that is actually minimally goal aligned in form or function)

Reply
How familiar is the Lesswrong community as a whole with the concept of Reward-modelling?
Oxidize5mo10

These are 6 sample titles I'm considering using. Any thoughts come to mind?

  1. AI-like reward functioning in humans. (Comprehensive model)
  2. Agency in humans
  3. Agency in humans | comprehensive model of why humans do what they do
  4. EA should focus less on AI alignment, more on human alignment
  5. EA's AI focus will be the end of us all.
  6. EA's AI alignment focus will be the end of us all. We should focus on human alignment instead
Reply
How familiar is the Lesswrong community as a whole with the concept of Reward-modelling?
Oxidize5mo30

Thanks for this. Do you have any ideas of what terminology i should use if I mean models used to predict reward in human contexts?

Reply
How familiar is the Lesswrong community as a whole with the concept of Reward-modelling?
Oxidize5mo10

I'd say that the 80/20 of the concept is how reward & punishment affect human behavior.

Is it about which forces?
-  I would say I'm referring to a combination of instinct, innate attraction/aversion, previous experience, decision-making, attention, and how they relate to each other in an everyday practical context. 

Seems to me that "genetics"
- I would say your disentanglement is right on the money. Rather than making an analysis for LLMs, I'm particularly interested in fleshing out the inter relations between concepts as they relate to the human brain.

Do you want a similar analysis for LLMs?
I mean it from a high-level agency perspective, as opposed to in specific AI or machine learning contexts. 

Goal?
My goal is to learn more about what information Lesswrongers use and are interested in so that I can better create a post for the community.


Adjacent concepts

  • Self-discipline
  • Positive psychology
  • Systems & patterns thinking
  • Maybe reward-functions?
Reply
How familiar is the Lesswrong community as a whole with the concept of Reward-modelling?
Oxidize5mo10

Thank you so much for the reply. You prevented me from making a pretty big mistake.

I'm defining reward-modelling as the manipulation of the direction of an agent's intelligence. From a goal-directed perspective.

So the reward-modelling of an AI might be the weights used, its training environment, mesa-optimization structure, inner-alignment structure, etc.

Or for a human, it might be genetics, pleasure, and pain.

Is there a better word I can use for this concept? Or maybe I should just make up a word?

Reply
Are there any (semi-)detailed future scenarios where we win?
Answer by OxidizeApr 10, 202510

This is actually my primary focus. I believe it can be done through a complicated process that targets human psychology, but to explain it simply 

- Spread satisfaction & end suffering.
- Spread rational decision-making

To further simplify, if everyone was like us, and no one was on the chopping block if AGI doesn't get created, then the incentive to create AGI seizes and we effectively secure decades for AI-safety efforts.

This is a post I made on the subject.

https://www.lesswrong.com/posts/GzMteAGbf8h5oWkow/breaking-beliefs-about-saving-the-world

Reply
Load More
-27All Rationalists hate & sabotage Strategy without having any awareness of it.
3mo
8
1How familiar is the Lesswrong community as a whole with the concept of Reward-modelling?
Q
5mo
Q
8
7Why am I getting downvoted on Lesswrong?
Q
5mo
Q
14
-1Breaking beliefs about saving the world
10mo
3
1Oxidize's Shortform
10mo
5