1537

LESSWRONG
LW

1536

Joern Stoehler's Shortform

by Joern Stoehler
3rd Jul 2024
1 min read
3

3

This is a special post for quick takes by Joern Stoehler. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Joern Stoehler's Shortform
9Joern Stoehler
1Bogdan Ionut Cirstea
1Joern Stoehler
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 10:10 AM
[-]Joern Stoehler1y92

Here's my best quickly-written story of why I expect AGI to understand human goals, but not share it. The intended audience is mostly myself, so I use personal jargon.

What a system at AGI level wants depends a lot on how it coheres its different goal-like instincts during self-reflection. Introspection & (less strongly) neuroscience tells us humans start with very similar-to-each-other internal impulses and self-reflection processes and also end up with similar goals. AI has a quite different set of genes/architecture and environment/training data, and so we can't expect it to have as similar internal impulses and metacognition. Instead it acquires an unknown different set of internals that is easy to pick up with gradient descent and enables becoming a general intelligence.

All smart agents do eventually learn natural abstractions, e.g. algebra or what the humans living in the outside world want, and can use that knowledge, e.g. during a treacherous turn. But the internal impulses aren't pushed towards a natural abstraction, as there's no unique universal solution, though there are local attractors. Instead they depend on more subtleties in the architecture & training data. Also, the difference between AI and human internals might not be visible early in its behavior, because a) we do select the internals such that during training the AI isn't visibly misaligned in its behavior (+whatever interpretability we have). b) the behaviorial differences we do spot may be both explained by different internal motivations, and by a lack of capabilities. c) even if we don't train against visible misalignment, similar instrumental pressures may apply in humans and AI, leading to partially similar behavior.

Without a better theory of what distinguishes goal-like from capability-like learned cognitive and behavioral patterns, it's not straightforward to formalize similarity between goal-like instincts at low capability levels, similar to how AIs that don't behave robustly goal-directed can't be assigned a coherent goal but are perhaps better described with shard theory. Without a better theory of how AGIs will do self-reflection and metacognition, it's not clear what sets of internal impulses during early training will later cohere into a safe goal. And it's also not clear to me how to actually know a goal is safe.

In particular, I don't think that using AI assistants to solve the alignment problem will work, as investigating metacognition probably requires AGI-level capabilities. Instead we just get a series of AI assistants that successfully train away visible misalignment in each other, using their knowledge of what the external humans want, until finally one AI or group of AIs realizes that a treacherous turn has become the better action.

Mayyybe there will be a stage during which the AI assistants recognize the flaw in this alignment solution, and have not yet cohered their impulses in a way that leads to misaligned behavior. In that case, the AI assistants may warn us, giving us a very late, and easily ignored, fire alarm.

Reply
[-]Bogdan Ionut Cirstea1y10

Arguably, though, we do have the beginnings of theories of metacognition, see e.g. A Theoretical Understanding of Self-Correction through In-context Alignment.

Reply
[-]Joern Stoehler1mo10

Disclosure: Written by ChatGPT on 2025-09-16 at my request.

Energy use in isometric holds

Abstract. Standard physiology; no novelty. When an object is held stationary, static equilibrium (∑F=0, ∑τ=0) sets the joint torque needed to counter gravity. Muscle fibers generate that torque. Force in skeletal muscle is produced by continuous actomyosin cross-bridge cycling; each cycle uses one ATP. Calcium must be kept high to permit attachment and then pumped back into the sarcoplasmic reticulum; the Na+/K+ pump maintains excitability. External mechanical work is approximately zero, but chemical energy is consumed continuously and dissipated as heat. Example: holding a 5kg bag at the hand (forearm r≈0.35m, elbow 90∘) requires T≈17Nm. Relative to typical maximum elbow-flexion torque (≈70Nm in young men, ≈35Nm in young women), this corresponds to ≈24–49% MVC. Forearm ˙VO2 in that range is ≈20–30mLmin−1, i.e., ≈7–10W of local heat; two minutes dissipates ≈0.8–1.2kJ.

 

Model

  1. Static equilibrium (mechanics). Gravity exerts a force mg at a horizontal distance r from the elbow. The required elbow torque is Treq≈mgr (simple forearm model).
  2. Torque to muscle force (anatomy). Elbow flexors act through an effective moment arm reff. The muscle force satisfies T≈Fmreff. Holding a constant torque implies a near-constant average Fm (ignoring co-contraction and passive tissues for this shortform).
  3. Force generation (molecular). Each muscle fiber contains many myosin heads that attach to actin briefly, pull, and detach. Only a fraction are attached at any instant, so maintaining force requires continuous replacement of attached heads; each replacement hydrolyzes one ATP. Attachment requires elevated Ca2+; relaxation requires pumping Ca2+ back into the sarcoplasmic reticulum.
  4. Energy balance (physiology). External work is ≈0 at zero velocity, but internal processes consume ATP: PATP=Pxb+Pact. Reviews place activation at roughly 25–45% of ATP turnover in isometric contractions (muscle-dependent). At higher %MVC and longer holds, intramuscular pressure restricts blood flow, limiting oxygen delivery and accelerating fatigue.[1][2][3]

 

Worked example

Setup. m=5kg; r≈0.35m; elbow 90∘.

  • Torque required: T≈mgr=5×9.81×0.35≈17Nm.
  • Relative intensity: MVC≈70Nm (young men, ∼120∘) or 35Nm (young women, ∼90∘) ⇒ 24% or 49% MVC.[4]
  • Metabolic cost (from oxygen): around 20–30% MVC, forearm ˙VO2≈20–30mLmin−1. With ≈20.9kJL−1 O2: ≈7–10W locally; two minutes ≈0.8–1.2kJ.[5][6]

 

Predictions

  • Micro-breaks extend endurance. Alternating 1 s on / 1 s off at the same mean force allows reperfusion and reduces activation cost during the off phases.
  • Geometry reduces cost. Shortening r or shifting load to passive structures (bone, ligament, tendon) lowers Treq and the required ATP use.
  • Perceived effort rises during a hold. As some fibers fatigue, additional motor units are recruited to keep T constant, so effort increases before failure.

 

References

  1. Barclay CJ (2023). Advances in understanding the energetics of muscle contraction. Activation ≈25–45% of ATP in isometric contraction. Journal page: https://www.sciencedirect.com/science/article/pii/S0021929023002385 ↩︎
  2. Lind AR (1979). Forearm blood flow in isometric contractions. Flow rises to ≈60% MVC then plateaus. PubMed: https://pubmed.ncbi.nlm.nih.gov/469732/ ↩︎
  3. McNeil CJ et al. (2015). Blood flow and muscle oxygenation during isometric contractions. J Appl Physiol (Regul Integr Comp Physiol). Abstract: https://journals.physiology.org/doi/abs/10.1152/ajpregu.00387.2014 ↩︎
  4. Tsunoda N et al. (1993). Elbow flexion strength curves in untrained men and women and male bodybuilders. ≈70Nm (men), ≈35Nm (women). PubMed: https://pubmed.ncbi.nlm.nih.gov/8477679/ ↩︎
  5. Nyberg SK et al. (2018). Reliability of forearm oxygen uptake during handgrip exercise. Rest ≈6.5mLmin−1; rises with intensity. PMC: https://pmc.ncbi.nlm.nih.gov/articles/PMC5974736/ ↩︎
  6. Gill PK et al. (2023). It is time to abandon single-value oxygen uptake energy equivalents. Common 20.1–20.9kJL−1 O2 and caveats. PubMed: https://pubmed.ncbi.nlm.nih.gov/36825641/ ↩︎
Reply
Moderation Log
More from Joern Stoehler
View more
Curated and popular this week
3Comments