Joern Stoehler's Shortform

3rd Jul 2024

1 min read

3

This is a special post for quick takes by Joern Stoehler. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Joern Stoehler's Shortform

9Joern Stoehler

1Bogdan Ionut Cirstea

1Joern Stoehler

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:46 PM

[-]Joern Stoehler1y92

Here's my best quickly-written story of why I expect AGI to understand human goals, but not share it. The intended audience is mostly myself, so I use personal jargon.

What a system at AGI level wants depends a lot on how it coheres its different goal-like instincts during self-reflection. Introspection & (less strongly) neuroscience tells us humans start with very similar-to-each-other internal impulses and self-reflection processes and also end up with similar goals. AI has a quite different set of genes/architecture and environment/training data, and so we can't expect it to have as similar internal impulses and metacognition. Instead it acquires an unknown different set of internals that is easy to pick up with gradient descent and enables becoming a general intelligence.

All smart agents do eventually learn natural abstractions, e.g. algebra or what the humans living in the outside world want, and can use that knowledge, e.g. during a treacherous turn. But the internal impulses aren't pushed towards a natural abstraction, as there's no unique universal solution, though there are local attractors. Instead they depend on more subtleties in the architecture & training data. Also, the difference between AI and human internals might not be visible early in its behavior, because a) we do select the internals such that during training the AI isn't visibly misaligned in its behavior (+whatever interpretability we have). b) the behaviorial differences we do spot may be both explained by different internal motivations, and by a lack of capabilities. c) even if we don't train against visible misalignment, similar instrumental pressures may apply in humans and AI, leading to partially similar behavior.

Without a better theory of what distinguishes goal-like from capability-like learned cognitive and behavioral patterns, it's not straightforward to formalize similarity between goal-like instincts at low capability levels, similar to how AIs that don't behave robustly goal-directed can't be assigned a coherent goal but are perhaps better described with shard theory. Without a better theory of how AGIs will do self-reflection and metacognition, it's not clear what sets of internal impulses during early training will later cohere into a safe goal. And it's also not clear to me how to actually know a goal is safe.

In particular, I don't think that using AI assistants to solve the alignment problem will work, as investigating metacognition probably requires AGI-level capabilities. Instead we just get a series of AI assistants that successfully train away visible misalignment in each other, using their knowledge of what the external humans want, until finally one AI or group of AIs realizes that a treacherous turn has become the better action.

Mayyybe there will be a stage during which the AI assistants recognize the flaw in this alignment solution, and have not yet cohered their impulses in a way that leads to misaligned behavior. In that case, the AI assistants may warn us, giving us a very late, and easily ignored, fire alarm.

[-]Bogdan Ionut Cirstea1y10

Arguably, though, we do have the beginnings of theories of metacognition, see e.g. A Theoretical Understanding of Self-Correction through In-context Alignment.

[-]Joern Stoehler3mo10

Disclosure: Written by ChatGPT on 2025-09-16 at my request.

Energy use in isometric holds

Abstract. Standard physiology; no novelty. When an object is held stationary, static equilibrium (, $\sum τ = 0$ ) sets the joint torque needed to counter gravity. Muscle fibers generate that torque. Force in skeletal muscle is produced by continuous actomyosin cross-bridge cycling; each cycle uses one ATP. Calcium must be kept high to permit attachment and then pumped back into the sarcoplasmic reticulum; the Na $^{+}$ /K $^{+}$ pump maintains excitability. External mechanical work is approximately zero, but chemical energy is consumed continuously and dissipated as heat. Example: holding a $5 k g$ bag at the hand (forearm $r \approx 0.35 m$ , elbow $90^{\circ}$ ) requires $T \approx 17 N m$ . Relative to typical maximum elbow-flexion torque ( $\approx 70 N m$ in young men, $\approx 35 N m$ in young women), this corresponds to $\approx 24$ – $49 %$ MVC. Forearm $˙ V O_{2}$ in that range is $\approx 20$ – $30 m L m i n^{- 1}$ , i.e., $\approx 7$ – $10 W$ of local heat; two minutes dissipates $\approx 0.8$ – $1.2 k J$ .

Model

Static equilibrium (mechanics). Gravity exerts a force $m g$ at a horizontal distance $r$ from the elbow. The required elbow torque is $T_{r e q} \approx m g r$ (simple forearm model).
Torque to muscle force (anatomy). Elbow flexors act through an effective moment arm $r_{e f f}$ . The muscle force satisfies $T \approx F_{m} r_{e f f}$ . Holding a constant torque implies a near-constant average $F_{m}$ (ignoring co-contraction and passive tissues for this shortform).
Force generation (molecular). Each muscle fiber contains many myosin heads that attach to actin briefly, pull, and detach. Only a fraction are attached at any instant, so maintaining force requires continuous replacement of attached heads; each replacement hydrolyzes one ATP. Attachment requires elevated Ca $^{2 +}$ ; relaxation requires pumping Ca $^{2 +}$ back into the sarcoplasmic reticulum.
Energy balance (physiology). External work is $\approx 0$ at zero velocity, but internal processes consume ATP: $P_{A T P} = P_{x b} + P_{a c t}$ . Reviews place activation at roughly 25–45% of ATP turnover in isometric contractions (muscle-dependent). At higher %MVC and longer holds, intramuscular pressure restricts blood flow, limiting oxygen delivery and accelerating fatigue.^[1]^[2]^[3]

Worked example

Setup. $m = 5 k g$ ; $r \approx 0.35 m$ ; elbow $90^{\circ}$ .

Torque required: $T \approx m g r = 5 \times 9.81 \times 0.35 \approx 17 N m$ .
Relative intensity: $M V C \approx 70 N m$ (young men, $\sim 120^{\circ}$ ) or $35 N m$ (young women, $\sim 90^{\circ}$ ) $\Rightarrow$ $24 %$ or $49 %$ MVC.^[4]
Metabolic cost (from oxygen): around 20–30% MVC, forearm $˙ V O_{2} \approx 20$ – $30 m L m i n^{- 1}$ . With $\approx 20.9 k J L^{- 1}$ O $_{2}$ : $\approx 7$ – $10 W$ locally; two minutes $\approx 0.8$ – $1.2 k J$ .^[5]^[6]

Predictions

Micro-breaks extend endurance. Alternating 1 s on / 1 s off at the same mean force allows reperfusion and reduces activation cost during the off phases.
Geometry reduces cost. Shortening $r$ or shifting load to passive structures (bone, ligament, tendon) lowers $T_{r e q}$ and the required ATP use.
Perceived effort rises during a hold. As some fibers fatigue, additional motor units are recruited to keep $T$ constant, so effort increases before failure.

References

Barclay CJ (2023). Advances in understanding the energetics of muscle contraction. Activation $\approx 25$ – $45 %$ of ATP in isometric contraction. Journal page: https://www.sciencedirect.com/science/article/pii/S0021929023002385 ↩︎
Lind AR (1979). Forearm blood flow in isometric contractions. Flow rises to $\approx 60 %$ MVC then plateaus. PubMed: https://pubmed.ncbi.nlm.nih.gov/469732/ ↩︎
McNeil CJ et al. (2015). Blood flow and muscle oxygenation during isometric contractions. J Appl Physiol (Regul Integr Comp Physiol). Abstract: https://journals.physiology.org/doi/abs/10.1152/ajpregu.00387.2014 ↩︎
Tsunoda N et al. (1993). Elbow flexion strength curves in untrained men and women and male bodybuilders. $\approx 70 N m$ (men), $\approx 35 N m$ (women). PubMed: https://pubmed.ncbi.nlm.nih.gov/8477679/ ↩︎
Nyberg SK et al. (2018). Reliability of forearm oxygen uptake during handgrip exercise. Rest $\approx 6.5 m L m i n^{- 1}$ ; rises with intensity. PMC: https://pmc.ncbi.nlm.nih.gov/articles/PMC5974736/ ↩︎
Gill PK et al. (2023). It is time to abandon single-value oxygen uptake energy equivalents. Common $20.1$ – $20.9 k J L^{- 1}$ O $_{2}$ and caveats. PubMed: https://pubmed.ncbi.nlm.nih.gov/36825641/ ↩︎

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Joern Stoehler's Shortform

3

Energy use in isometric holds

Model

Worked example

Predictions

References