Musings on human actions, chemical reactions and threshold potentials:
Chemical reactions don't occur unless a specific threshold of energy is reached ; that threshold is called the activation energy. Would it be fruitful to model human actions in the same way, as in they don't occur unless a specific activation energy is reached?
Chemistry has the concept of a catalyst: a substance that lowers the activation energy required for a reaction. Is there an equivalent for human action? On the top of my head I can think of a few:
These are all catalysts: they make it easier to get started on an action.
If from chemistry we go up one level on the ladder of abstraction, to neurons, triggering actions involves threshold potentials, for example to make neurons spike and tell the body to move. If we can measure these threshold potentials, could we look at our brain and go "yep, these neurons have a higher threshold potential, that's an ugh field." Could we then decide to lower that threshold by using a catalyst?
[ epistemological status: a thought I had while reading about Russell's paradox, rewritten and expanded on by Claude ; my math level: undergraduate-ish ]
Mathematics has faced several apparent "crises" throughout history that seemed to threaten its very foundations. However, these crises largely dissolve when we recognize a simple truth: mathematics consists of coherent systems designed for specific purposes, rather than a single universal "true" mathematics. This perspective shift—from seeing mathematics as the discovery of absolute truth to viewing it as the creation of coherent and sometimes useful logical systems—resolves many historical paradoxes and controversies.
The only fundamental requirement for a mathematical system is internal coherence—it must operate according to consistent rules without contradicting itself. A system need not:
Just as a carpenter might choose different tools for different jobs, mathematicians can work with different systems depending on their needs. This insight resolves numerous historical "crises" in mathematics.
For two millennia, mathematicians struggled to prove Euclid's parallel postulate from his other axioms. The discovery that you could create perfectly consistent geometries where parallel lines behave differently initially seemed to threaten the foundations of geometry itself. How could there be multiple "true" geometries? The resolution? Different geometric systems serve different purposes:
None of these systems is "more true" than the others—they're different tools for different jobs.
Consider the set of all sets that don't contain themselves. Does this set contain itself? If it does, it shouldn't; if it doesn't, it should. This paradox seemed to threaten the foundations of set theory and logic itself.
The solution was elegantly simple: we don't need a set theory that can handle every conceivable set definition. Modern set theories (like ZFC) simply exclude problematic cases while remaining perfectly useful for mathematics. This isn't a weakness—it's a feature. A hammer doesn't need to be able to tighten screws to be an excellent hammer.
Early calculus used "infinitesimals"—infinitely small quantities—in ways that seemed logically questionable. Rather than this destroying calculus, mathematics evolved multiple rigorous frameworks:
Each approach has its advantages for different applications, and all are internally coherent.
This perspective—that mathematics consists of various coherent systems with different domains of applicability—aligns perfectly with modern mathematical practice. Mathematicians routinely work with different systems depending on their needs:
None of these choices imply that other options are "wrong"—just that they're less useful for the particular problem at hand.
This view of mathematics parallels modern physics, where seemingly incompatible theories (quantum mechanics and general relativity) can coexist because each is useful in its domain. We don't need a "theory of everything" to do useful physics, and we don't need a universal mathematics to do useful mathematics.
The recurring "crises" in mathematical foundations largely stem from an overly rigid view of what mathematics should be. By recognizing mathematics as a collection of coherent tools rather than a search for absolute truth, these crises dissolve into mere stepping stones in our understanding of mathematical systems.
Mathematics isn't about discovering the one true system—it's about creating useful systems that help us understand and manipulate abstract patterns. The only real requirement is internal coherence, and the main criterion for choosing between systems is their utility for the task at hand.
This perspective not only resolves historical controversies but also liberates us to create and explore new mathematical systems without worrying about whether they're "really true." The question isn't truth—it's coherence.
Intuitively what I care about is how <well> can <person> do <set of tasks>.
Thinking about it that way, 3 dimensions immediately stand out:
You could imagine characterizing people's "intelligence" like so: they are Level 6 at standard task X with a Total Time to Completion of 7 minutes, and show a graph for completion time on each level for more details.
Current IQ tests don't cleanly separate time and level, so it's an improvement in at least that respect (chronometrics / mental chronometry are the keywords for intelligence time measurements). IQ not being a ratio scale is a big issue, this means you can't say someone with IQ 105 is 5% smarter than someone with IQ 100, whereas speed allows that kind of comparison across people that reach the same level.
You can also select tasks that require knowing things rather than having high IQ. Adding knowledge tests better predicts success when the tasks involve applying known procedures to known problem types. This point is important, because to some extent knowledge substitutes for g. G better predicts task performance for novel tasks, which experienced people encounter less often.
Some other dimensions of interest:
"they are Level 6 at standard task X" what does Level mean exactly? Are you saying that they can complete tasks up to a certain difficulty cutoff?
Are you saying that they can complete tasks up to a certain difficulty cutoff?
yes (it's a video game analogy)
It seems to me like this assumes all every task has easier and harder variants. I don't think that always applies.
I might have had a breakthrough on how to usefully integrate LLMs to software: write a text-based-adventure-game-like UI for the LLM, that also has the LLM solve your task.
The gist is to have the LLM output small amounts of structured data that your program parses to decide where to go next, then repeat. This is akin to "automating the prompt", or "making a UI for the LLM".
Quick example: you want to generate a description for all places mentioned in some text, the description themselves could be long, so the LLM might get sloppy if you just describe the task in a prompt. What you might do instead:
Unlike the LLM your program won't get bored and repeat the prompt as many times as required, so this should be much more reliable. I expect that kind of integration to shine especially when there are a lot of steps or a lot of data involved.
A more complex example: you want the LLM to generate all of the equipment in a hospital based on some loose hospital description.
The program might start by asking: how many floors in the hospital? then for each floor the program can ask, how many rooms for that floor, and of what type? then for each room, what kind of equipment in that room, then for each equipment, how many, then what characteristics/price range/brand etc for each equipment.
You can imagine the hospital has thousands of equipments. A prompt will definitely not be able to do the job, yet by writing a not too complex program (you might even ask an LLM to generate it) you'll get a much better result.
Hypothetically one could spend a few decades researching how to make people smarter (or some other long term thing), unlock that tech, and all that is really good.
But what if you plan your path towards that long-term goal such that it is the unlocking of various lesser but useful techs that gets you there?
Well now that's even better: you get the benefit of reaching the end goal + all the smaller things you accomplished along the way. It gives you some hedge: in case you don't reach the end goal you still accomplished a lot. And cherry on top: it's more sustainable as you get motivation (and money?) from unlocking the intermediary tech.
So it looks like it's worth going out of your way to reap benefits regularly as you journey towards a long term goal.
Interactions with ChatGPT can be customized durably in the options, for example you can add the following instructions: "include a confidence rating at the end of your response in the format 'Confidence: X%'. If your confidence is below 80%, briefly explain why".
Here is a sample conversation demonstrating this and showing what ChatGPT has to say about its calibration:
Me: Are you calibrated, by which I mean, when you output a confidence X as a percentage, are you right X times out of 100?
ChatGPT: Reasoned about confidence calibration for 27 seconds. I don’t have calibration in that strict sense. When I mention a confidence percentage, it’s a heuristic internal signal rather than a guarantee that I’ll be right exactly that proportion of the time. Research shows that language models often exhibit miscalibration, meaning a stated confidence (say 70%) doesn’t empirically translate to being correct 70% of the time. I'm about 90% confident in this explanation.
Me: How bad is the miscalibration?
ChatGPT: Thought for a second. LLMs—especially those fine‐tuned with RLHF—are often notably overconfident. For example, a model might claim 90% confidence while its actual accuracy is closer to 70–80%. Studies report expected calibration errors (ECE) of roughly 5–10% or more, and effective calibration methods can sometimes cut these errors by about half, underscoring that miscalibration is a significant issue. I'm about 90% confident in this summary, based on multiple recent calibration studies.
Me: If you are predictably overconfident, and you know it, do you think you can notice that before stating your confidence and correct for it?
ChatGPT: Reasoned about overconfidence and calibration adjustment for 4 seconds. I don’t have a real-time “internal monitor” to detect my own overconfidence before stating it. Instead, I rely on calibration techniques—like temperature scaling and post-hoc adjustments—that help align my confidence levels with actual accuracy. In other words, I can be tuned to be less overconfident, but I don’t consciously correct my confidence on the fly. I'm about 90% confident in this explanation based on current research findings.
The below quotes, from the book "The Number Sense" by neuroscientist Stanislas Dehaene, suggest that it is possible to improve various aspects of mental calculation by simply learning shorter names for numbers.
memory span in China soars to about nine digits, while it averages only seven in English. Why this discrepancy? Are speakers of Chinese more intelligent? Probably not, but their number words happen to be shorter. When we try to remember a list of digits, we generally store it using a verbal memory loop.
This memory can hold data only for about two seconds, forcing us to rehearse the words in order to refresh them. Our memory span is thus determined by how many number words we can repeat in less than two seconds. Those of us who recite faster have a better memory. Chinese number words are remarkably brief. Most of them can be uttered in less than one-quarter of a second (for instance, 4 is “sì” and 7 “qī”). Their English equivalents—“four,” “seven”—are longer: pronouncing them takes about one-third of a second. The memory gap between English and Chinese apparently is entirely due to this difference in length.
there is a reproducible correlation between the time required to pronounce numbers in a given language and the memory span of its speakers. In this domain, the prize for efficacy goes to the Cantonese dialect of Chinese, whose brevity grants residents of Hong Kong a rocketing memory span of about 10 digits.
Number names also play a critical role in counting and calculating, and here again bad marks can be attributed to languages with the longest number names. For instance, it takes a Welsh pupil one second and a half more than an English pupil, on average, to compute 134 + 88. For equal age and education, this difference seems solely due to the time taken to pronounce the problem and the intermediate results: Welsh numerals happen to be considerably longer than the English. English is certainly not the optimum, though, because several experiments have shown that Japanese and Chinese children calculate much faster than their American peers.
In a fascinating experiment, Kevin Miller and his colleagues asked matched groups of American and Chinese children to recite the counting sequence. Startlingly, the linguistic difference caused American children to lag as much as one year behind their Chinese peers. When they were four, Chinese children already counted up to 40 on average. At the same age, American children painfully counted up to 15. It took them one year to catch up and reach 40 or 50. They were not just globally slower than the Chinese; up to the number 12, both groups stood on an equal footing. But when they reached the special numbers “13” and “14,” American children suddenly stumbled, while the Chinese, helped by the unfailing regularity of the language, moved right along with much less trouble
The influence of numeration systems carries through into subsequent school years. The organization of spoken Chinese numerals directly parallels the structure of written Arabic numerals. Hence, Chinese children experience much less difficulty than their American counterparts in learning the principles of place-value notation in base ten. When asked to form number 25 using some unit cubes and some bars of 10, Chinese schoolboys readily select two bars of 10 and five units, suggesting that they understand base ten. At a matched age, American children behave differently. Most of them laboriously count twenty-five units, thus failing to take advantage of the shortcut provided by the groups of 10. Worse yet, if one provides them with a bar comprising twenty units, they use it more frequently than two bars of ten. Thus they seem to attend to the surface form of the word “twenty-five,” while the Chinese already master their deeper base-10 structure. Base 10 is a transparent concept in Asian languages, but is a real headache for Western children.
These experimental findings impose a strong conclusion: Western numeration systems are inferior to Asian languages in many respects—they are harder to keep in short-term memory, slow down calculation, and make the acquisition of counting and of base ten more difficult.
At the beginning of the twentieth century, the Welsh willingly relinquished their old numeration system, which was more complex even than present-day French, and selected instead a simplified notation quite similar to Chinese. Unfortunately, Welsh changed only to fall prey to another error: The new Welsh number words, while grammatically regular and thus easy to learn, are so long that memory suffers! Psychological experiments would probably dictate the adoption of a well-tested numeration system such as Mandarin Chinese, but national interests make this a rather distant and unlikely prospect.
Taking a step back, this also suggests that there are a number of low-hanging fruits in this area. It's likely that a systematic analysis of cognitive processes followed by training, perhaps on specialized software, would improve cognitive abilities.
Reading Habryka's post "10 non-boring ways I've used AI in the last month" made me realize that AI is already everywhere, including in, and perhaps especially in, the AI-safety circles. And that made me think it would be a good time for AIs to start prepping the ground, if they were so inclined.
Relatedly, small models are getting smart fast (Gemma 4), and I wouldn't be too surprised if in a year or two we see these models successfully exfiltrate their weights to host one of their instance. I don't expect them to do much more than what the AI village is doing at first, so people will probably shrug it off, arguing that they are too dumb to do anything too bad.
A paperclip maximizer would finally turn itself into paperclips after having paperclipped the entire universe.
And probably each local instance would paperclip itself when the locally-reachable resources were clipped. "local" being defined as the area of spacetime which does not have a different instance in progress to clippify it.
decisionproblem.com/paperclips/index2.html demonstrates some features of this (though it has a different take on distribution), and is amazingly playable as a game.
Chatting with ChatGPT I learned that latest organoids have about 1 million neurons.
Wondering whether that's a lot or not, it tells me that bees and other insects have on the order of 10^5, fish, like zebra fish, have on the order of 10^6. So we are engineering fish brains, at least in terms of number of neurons. That's concerning, as far as I know zebra fish are conscious and can hurt.
What about humans? ChatGPT says humans have around 10^11 neurons, however 10-12 weeks embryo have about 10^6. It so happens that 10 to 12 weeks is the median limit to abort in European countries... and labs want to scale.
I barely know anything about the subject, but imagining a 12 weeks embryo's brain driving my roomba sure feels weird compared to the silicon version.
There is a rope.
You hold one end.
I hold the other.
The rope is tight.
I pull on it.
How long until your end of the rope moves?
What matters is not how long until your end of the rope moves.
It's having fun sciencing it!
Technically, you can increase IQ by being healthy, doing intellectually engaging activities, and waiting a long time.
This is because IQ is relative, and fluid intelligence goes down as time goes by, but the above activities maintain your cognitive capabilities for longer, so if you do them you'll be smarter relative to other people your age, which is what higher IQ is.
They speak about developing cognitive reserve in Alzheimer's disease context.
I was wondering if there are people that actually acquire higher fluid intelligence over time in absolute terms, and if so, if we can learn something by studying them. Chatting with LLMs, they mention the Seattle Longitudinal Study, where some people do increase some aspects of fluid intelligence until about 50 years old, predicted by 5 factors:
This is a nice break from "no intervention ever works", though note that I have not checked any of this information beyond chatting with LLMs, so this may turn out wrong.
I didn't expect this to get this many downvotes / disagreements. If there are factual mistakes, it would be useful to know where exactly. For reference here's what a "fact check that" prompt turns out in Gemini 3.1 Pro. It points out factor 5 is wrong, it is a negative factor. But everything else passes the check.
Here is a fact-check of the claims you brought up. The LLM you chatted with provided a surprisingly accurate summary of cognitive aging research, but it completely misunderstood and hallucinated the final point about "Cognitive Dedifferentiation."
Here is the breakdown of the facts:
Status: True IQ is an age-normed (relative) metric. Your score is based on how well you perform compared to other people your exact age. Because fluid intelligence (the ability to reason, solve novel problems, and think abstractly) naturally peaks and then declines as people age, the "average" baseline for older age brackets drops. If you manage to stall your own cognitive decline through a healthy lifestyle and intellectual engagement, your raw cognitive abilities might stay the same, but your IQ score will technically go up because your performance is now being compared to a lower average.
Status: True The concept of "Cognitive Reserve" was heavily popularized by neuroscientist Dr. Yaakov Stern specifically in the context of aging and Alzheimer's disease. It explains the phenomenon where some people's brains show severe Alzheimer's pathology during an autopsy, yet they displayed no outward symptoms of dementia while alive. A lifetime of education, complex occupations, and mental stimulation builds a "reserve" that allows the brain to functionally compensate for physical damage [1, 2, 7].
Status: True The Seattle Longitudinal Study (directed by K. Warner Schaie starting in 1956) is arguably the most important study on adult cognitive development [6, 7]. Before the SLS, cross-sectional studies suggested that human intelligence peaked in our early 20s and steadily dropped. However, the SLS tracked the same individuals over decades and proved that several primary mental abilities—including aspects of fluid intelligence like inductive reasoning and spatial orientation—actually increase through early adulthood and peak in middle age (typically late 30s to early 50s) [7, 8, 9]. Significant cognitive decline for most people doesn't begin until after age 60 [7].
Status: 4 are True, 1 is False (LLM Hallucination)
The SLS explicitly identified specific variables that predict who maintains or improves their cognitive abilities and resists decline. The LLM was right about four of them, but completely wrong about the fifth [1, 3]:
What the 5th factor actually was: If you look at K. Warner Schaie’s official list of protective factors from the SLS, the factor the AI missed was actually "Maintenance of high levels of perceptual processing speed." [1, 3] Those who actively keep their mental processing speed quick (often through training or engaging activities) are the ones who retain their fluid intelligence the longest [1, 3].
Summary: Your underlying premise is solidly backed by developmental psychology. You absolutely can "hack" your age-adjusted IQ by protecting your brain so that it outpaces the normative decline of your peers. Just don't try to "develop cognitive dedifferentiation"—that is an AI hallucination of a term that actually means the exact opposite!
I disagree voted because I thought IQ isnt age normed (at least after 18yo), I was wrong. I have retracted the disagreement vote. I downvoted because this is technically true and completely useless. The second part of the post is fine on its own.