Skill issue.
First, IQ 100 is only useful in ruling out easy to persuade IQ <80 people. There are likely other correlates of "easy to persuade" that depend on how the AI is doing the persuading.
Second, super-persuasion is about scalability and cost. Bribery doesn't scale because actors have limited amounts of money. <$100 in inference and amortised training should be able persuade a substantial fraction of people.
Achieving this requires a scalable "training environment" to generate a non-goodhartable reward signal. AI trained to persuade on a large population of real users (EG:for affiliate marketing purposes) would be a super-persuader. Once a large company decides to do this at scale results will be much better than anything a hobbyist can do. Synthetic evaluation environments (EG:LLM simulations of users) can help too limited by their exploitability in ways that don't generalise to humans.
There are no regulations against social engineering in contrast to hacking computers. Some company will develop these capabilities which can then be used for nefarious purposes with the usual associated risks like whistleblowers.
Even if a coup is meant to capture mineral wealth and the population is irrelevant, coup leaders recognize that mass murder will lead to sanctions stopping them from selling that mineral wealth. Plenty of examples of regimes that kill even low thousands of people being sanctioned.
AI that plans to take over the world does not need to trade with humans or keep them from being horrified and lashing out. Kill approximately everyone is a viable strategy and preferrable in most cases since it removes us as an intelligent adversary.
No. o3 estimates that 60% of American jobs are physical such that you would need robotics to automate them, so if half of those fell within a year, that’s quite a lot.
A lot of jobs that can't be fully automated have sub-tasks software agents could eliminate. >30% of total labor hours might be spent in front of a computer (EG:data entry in a testing lab and all the steps needed to generate report.) That ignores email and the time savings once there is a good enough AI secretary.
AGI could eliminate almost all of that.
I'd estimate 1.7x productivity for a lab I worked at previously. Effect on employment depends on demand elasticity of course.
Prices would adjust to match supply and demand as well as acting as both supply cost and demand value signals. If no one buys the vampire drone, supply side stops production and starts dropping price to liquidate inventory, possibly with a liquidation auction.
Badly done dynamic pricing and auctions feel awful to market participants and can result in issues seen in Ebay auctions like sniping.
in my opinion, this is a poor choice of problem for demonstrating the generator/predictor simplicity gap.
If not restricted to Markov model based predictors, we can do a lot better simplicity-wise.
Simple Bayesian predictor tracks one real valued probability B in range 0...1. Probability of state A is implicitly 1-B.
This is initialized to B=p/(p+q)
as a prior given equilibrium probabilities of A/B states after many time steps.
P("1")=qA
is our prediction with P("0")=1-P("1")
implicitly.
Then update the usual Bayesian way:
if "1", B=0
(known state transition to A)
if "0", A,B:=(A*(1-p),A*p+B*(1-q))
, then normalise by dividing both by the sum. (standard bayesian update discarding falsified B-->A state transition)
In one step after simplification: B:=(B(1+p-q)-p)/(Bq-1)
That's a lot more practical than having infinite states. Numerical stability and achieving acceptable accuracy of a real implementable predictor is straightforward but not trivial. A near perfect predictor is only slightly larger than the generator.
A perfect predictor can use 1 bit (have we ever observed a 1) and ceil(log2(n)) bits counting n, the number of observed zeroes in the last run to calculate the perfectly correct prediction. Technically as n-->infinity this turns into infinite bits but scaling is logarithmic so a practical predictor will never need more than ~500 bits given known physics.
TLDR:I got stuck on notation [a][b][c][...]→f(a,b,c,...)
. LLMs probably won't do much better on that for now. Translating into find an unknown f(*args) and the LLMs get it right with probability ~20% depending on the model. o3-mini-high does better. Sonnet 3.7 did get it one shot but I had it write code for doing substitutions which it messes up a lot.
Like others, I looked for some sort of binary operator or concatenation rule. Replacing "][" with "|" or "," would have made this trivial. Straight string substitutions don't work since "[[]]" can be either 2 or "[...][1][...]" as part of a prime exponent set. The notation is the problem. Staring at string diffs would have helped in hindsight maybe.
Turning this into an unknown f()
puzzle makes it straightforward for LLMs (and humans) to solve.
1 = f()
2 = f(f())
3 = f(0,f())
4 = f(f(f()))
12 = f(f(f()),f())
0 = 0
-1 = -f()
19 = f(0,0,0,0,0,0,0,f())
20 = f(f(f()),0,f())
-2 = -f(f())
1/2 = f(-f())
sqrt(2) = f(f(-f()))
72^1/6 = f(f(-f()),f(0,-f()))
5/4 = f(-f(f()),0,f())
84 = f(f(f()),f(),0,f())
25/24 = f(-f(0,f()),-f(),f(f()))
Substitutions are then quite easy though most of the LLMs screw up a substitution somewhere unless they use code to do string replacements or do thinking where they will eventually catch their mistake.
Then it's ~25% likely they get it one shot. ~100% is you mention primes are involved or that addition isn't. Depends on the LLM. o3-mini-high got it. Claude 3.7 got it one shot no hints from a fully substituted starting point but that was best of k~=4 with lots of failure otherwise. Models have strong priors for addition as a primitive and definitely don't approach things systematically. Suggesting they focus on single operand evaluations (2,4,1/2,sqrt(2)) gets them on the right track but there's still a bias towards addition.
None of the labs would be doing undirected drift. That wouldn't yield improvement for exactly the reasons you suggest.
In the absence of a ground truth quality/correctness signal, optimizing for coherence works. This can give prettier answers (in the way that averaged faces are prettier) but this is limited. The inference time scaling equivalent would be a branching sampling approach that searches for especially preferred token sequences rather than the current greedy sampling approach. Optimising for idea level coherence can improve model thinking to some extent.
For improving raw intelligence significantly, ground truth is necessary. That's available in STEM domains, computer programming tasks being the most accessible. One can imagine grounding hard engineering the same way with a good mechanical/electrical simulation package. TLDR:train for test-time performance.
Then just cross your fingers and hope for transfer learning into softer domains.
For softer domains, ground truth is still accessible via tests on humans (EG:optimise for user approval). This will eventually yield super-persuaders that get thumbs up from users. Persuasion performance is trainable but maybe not a wise thing to train for.
As to actually improving some soft domain skill like "write better english prose" that's not easy to optimise directly as you've observed.
O1 now passes the simpler "over yellow" test from the above. Still fails the picture book example though.
For a complex mechanical drawing, O1 was able to work out easier dimensions but anything more complicated tends to fail. Perhaps the full O3 will do better given ARC-AGI benchmark performance.
Meanwhile, Claude 3.5 and 4o fail a bit more badly failing to correctly identify axial and diameter dimensions.
Visuospatial performance is improving albeit slowly.
My hope is that the minimum viable pivotal act requires only near human AGI. For example, hack competitor training/inference clusters to fake an AI winter.
Aligning +2SD human equivalent AGI seems more tractable than straight up FOOMing to ASI safely.
One lab does it to buy time for actual safety work.
Unless things slow down massively we probably die. An international agreement would be better but seems unlikely.
General remarks
Semiconductor industry can afford to bid quite high to get the supply they need. Relevant historical example is the neon shortage where russian invasion of ukraine disrupted large air liquification/seperation plants associated with ukrainin steelworks and there was drop in Neon production. Free market did its thing, recycling, alternate suppliers etc. and nothing really happened.
Threatening to restrict critical materials matters very little for commodities like rare earths or high purity silica. Process equipment like lithography machines from ASML or other stuff from applied materials is acutally needed and can't be replaced but high purity ... stuff ... can be substituted, smuggled, whatever given the need. Industry mostly won't care and CN government can pour in more money to compensate.
Semiconductor companies won't feel rare earth embargo
Rare earths consumption in semiconductor is quite small and they can bid higher than everyone else to secure limited supply in case of embargo. Mainly this hurts EV makers and others, not semiconductor. This is similar to jet engine turbine manufacturers not caring about cobalt prices for turbine blades contrasting again to electric vehicles where cobalt prices and scarcity drove R&D aimed at using cheaper materials in batteries.
Retaliation from USA by restricting supplies of high purity quartz is also innefective.
Main quartz consummables in semiconductor industry are Cz crucibles for silicon boule growth.
Those might already be being recycled to recover most of the quartz but that's relatively straightforward to do in a supply crunch.
Low mass compared to polysilicon feedstock used for growing wafers themselves <5%? So tapping supply chain purifying silicon is guaranteed to be enough to make crucibles.

High purity TCS or silane can be diverted from conversion to polysilicon for making wafers to instead be made into fumed silica. This is already done for higher purity fused quartz parts like photomask substrates. Process equipment is easy to make and can be rushed in a supply crunch.
There's also facilities that grow Quartz crystals for oscillators. Not sure about tonnage but growth is surface area limited. Making sand instead of larger crystals would perhaps 10x deposition rate and drop cycle time which is currently a few months to grow larger crystals down to weeks.
There's limited supplies in friendly countries (Russia) and domestically.
Free market would find whatever works to get silica meeting purity standards. Easy for crucible manufacturers to test purity.
Not something that halts production, definitely an annoyance.
Relevant Claude chat.
https://claude.ai/share/fa3c6de1-bdfb-4974-8e52-1324da3ae399