Self modelling in NN https://arxiv.org/pdf/2407.10188 Is this good news for mech interpretability? If the model makes it easily predictable, then that really seems to limit the possibilities for deceptive alignment
It makes it easier, but consider this: The human brain also does this—when we conform to expectations, we make ourselves more predictable and model ourselves. But this also doesn't prevent deception. People still lie and some of the deception is pushed into the subconscious.
Sure it doesn't prevent a deceptive model being made, but if AI engineers made NN with such self awareness at all levels from the ground up, that wouldn't happen in their models. The encouraging thing if it holds up is that there is little to no "alignment tax" to make the models understandable - they are also better.
Technology is about making boring stuff non-conscious. Beginning from basic physical movement such as making a wheel go round, to arithmetic and now code snippets that are so commonly used they shouldn't require re-thinking. This is a reason why AI art upsets people - we actually want that to be the result of a conscious process. If you make boring stuff that creates power or wealth non-conscious then everyone is happier. Meat production would be much better if it was non-conscious. The more AI is non-conscious for a given level of capability, the better off we are.
This is a reason why AI art upsets people - we actually want that to be the result of a conscious process.
I agree with the general argument that making boring stuff non-conscious is a good thing. But in the case of art, I think the underlying problem is that people want art to be high-status.
From my perspective, the process of creating a piece of art has many steps, and some of them can legitimately be called boring. The line is not clear -- the same step can be interesting when you do it for the first time, and boring when you later repeat it over and over again; or interesting when you introduce some unexpected change on purpose, and boring when you just want to do the usual. So we could use the AI to automate the steps that are no longer interesting for us, and focus on the rest. (Though some people will simply click "generate everything".)
Consider how much time the painters living centuries ago spent preparing their colors, and learning how to prepare those colors well -- and today, painters can simply buy the colors in a supermarket. But (as far as I know) no one cries that selling colors in supermarkets have ruined the art. That's because colors are not considered mysterious anymore, and therefore they are not high-status, so no one cares whether we automate this step.
Now imagine a hypothetical painting tool that automatically fixes all your mistakes at perspective, and does nothing else. You keep painting, and whenever you complete a shape, it is magically rotated and skewed to make the perspective consistent with the rest of the picture. (Unless you want to have the shape different on purpose; in such case the tool magically understands this and leaves the shape alone. Or you simply press the "undo" button.) This would be somewhat controversial. Some people would be okay with it. There might already be a plugin in some vector editor that helps you achieve this; and if that fact becomes known, most people won't care.
But some people would grumble that if you can't get the perspective right on your own, perhaps you don't deserve to be a painter! I find this intuition stronger when I imagine a literally magical tool that transforms a physical painting this way (only fixes the perspective, nothing else). The painter who uses vector graphics at least needs some computer skills to compensate for being bad at perspective, but having the perspective fixed literally auto-magically is just plain cheating.
Which suggests that an important part of our feelings about art is judging the artist's talent and effort; assigning status to the artist... but also to ourselves as connoisseurs of the art! Some people derive a lot of pleasure from feeling superior to those who have less knowledge about art. And this is the part that might go away with AI art. (Unless we start discussing the best prompts and hyperparameters instead.)
Thanks, good detail. I am not good at traditional art, but I am interested in using maths to create a shape that is almost impossible for a traditional sculptor to create then 3d printing it.
How does intelligence scale with processing power
A default position is that exponentially more processing power is needed for a constant increase in intelligence.
To start, lets assume a guided/intuition + search model for intelligence. That is like Chess or Go where you have an evaluation module and a search module. In simple situations an exponential increase in processing power usually gives a linear increase in lookahead ability and rating/ELO in games measured that way.
However does this match reality?
What if the longer the time horizon, the bigger the board became, or the more complexity was introduced. For board games there is usually a constant number of possibilities to search at every ply of lookahead depth. However I think in reality that you can argue the search space should increase with time or lookahead steps. That is as you look further ahead, possibilities you didn't have to consider before now come in the search.
For a real world example consider predicting the price of a house. As the timeframe goes from <5 years to >5 years, then there are new factors to consider e.g. changing govt policy, unexpected changes in transport patterns, (new rail nearby or in competing suburb etc), demographic changes.
In situations like these, the processing required for a constant increase in ability could go up faster than exponentially. For example looking 2 steps ahead requires 2 possibilities at each step, that is 2^2, but if its 4 steps ahead, then maybe the cost is now 3^4 as there are 3 vs 2 things to affect the result in 4 steps.
How does this affect engineering of new systems
If applies to engineering, then actual physical data will be very valuable to shrink the search space. (Well that applies if it just goes up exponentially as well) That is if you can measure the desired situation or new device state at step 10 of a 20 stage process, then you can hugely reduce the search space as you can eliminate many possibilities. Zero-shot is hard unless you can really keep the system in situations where there are no additional effects coming in.
AI models, regulations, deployments, expectations
For a simple evaluation/search model of intelligence, with just one model being used for the evaluation, improvements can be made by continually improving the evaluation model (same size better performance/same performance, smaller size). Models that produce fewer bad "candidate ideas" can be chosen, with the search itself providing feedback on what ideas had potential. In this model there is no take-off or overhang to speak of.
However I expect a TAI system to be more complicated.
I can imagine an overseer model that decides what more specialist models to use. There is a difficulty knowing what model/field of expertise to use for a given goal. Existing regulations don't really cover these systems, the setup where you train a model, fine tune, test then release doesn't apply strictly here. You release a set of models, and they continually improve themselves. This is a lot more like people where you continually learn.
Overhang
In this situation you get take-off or overhang where a new model architecture is introduced rather than the steady improvement from deployed systems of models. Its clear to me that the current model architectures and hence scaling laws are not near to the theoretical maximum. For example the training data needed for Tesla auto-pilot is ~10K more than what a human needs and is not superhuman. In terms of risk, its new model architectures (and evidence of very different scaling laws) rather then training FLOPS that would matter.
I think that often overlooked facet of this is that high fluid intelligence leads to higher crystallized intelligence.
I.e., the more and better you think, the more and better crystallized algorithms you can learn, and, unlike short-term benefits of fluid intelligence, long-term benefits of crystallized intelligence are compounding.
To find new better strategy linearly faster, you need exponential increase of processing power, but each found and memorized strategy saves you exponential expenditure of processing power in future.
Evaluation vs Symbolism
TLDR
Thinking about the Busy Beaver numbers has lead me to believe that just because a theorem holds true for a massive number of evaluated examples, this is only weak evidence it is actually true. Can we go meta on this?
Main
After reading a post by Scott Aaronson, and this coming to my attention https://en.wikipedia.org/wiki/Prime_number_theorem and Littlewood's theorem
"Li(𝑥) overestimates the number of primes below x more often than not, especially as x grows large. However, there are known low-lying values (like around x=10^316) discovered by Littlewood) where 𝜋(𝑥) exceeds Li(x), contradicting the general trend."
This got me thinking about how common this kind of thing is and why? Why does a formula hold all the way up to 10^316 but then fail?
The essence of Busy Beaver numbers is that there are sequences based off of a simple formula/data that go on for a very long time and then just stop unpredictably. You can imagine replacing a simple formula with a simple theorem that appears to be true. Instead of it actually being true it is instead a way of encoding its very large counter example in a short amount of data.
If you think of it this way, a theorem that appears to be true and is evaluated over trillions of numbers is also instead a candidate to encode an exception at some very large number. In other words trillions of correct examples is only weak evidence of its correctness.
How much should we weight evaluation? We can't evaluate to infinity and its obvious that a theorem being true to 2 million is not 2* evidence it is true at 1 million. Should we choose log(n)? A clear scale is the BB numbers themselves. e.g if your theorem is true up to BB(5) then that is 5 data points, rather than 47 million. Unlimited evaluation can never get to BB(6) so that is the limit of evidence from evaluation. (i.e. 5-6 evidence points with it being unclear how to weigh theory https://www.lesswrong.com/posts/MwQRucYo6BZZwjKE7/einstein-s-arrogance)
Now can we go meta?
Is some maths so much more powerful than others that it has equivalently greater weight as formal proof has to evaluation? Certainly some maths is more general than others. How does this effect common problems such as the Riemann Hypothesis - proving or disproving it affects a lot of maths. Showing it is correct to trillion zeros however is little evidence.
"Most mathematicians tend to believe that the Riemann Hypothesis is true, based on the weight of numerical evidence and its deep integration into existing mathematical frameworks."
Is "deep integration" actually that deep, or is it the symbolic equivalent of evaluating up to 1 million? Perhaps just as you can find countless evaluated examples supporting a false theorem you can find much "deep integration" in favor of a famous theorem that could also be incorrect.
Further thoughts and links
Most people think P != NP, but what if
P = NP where N ~ BB(10)?
Proof was wrong - https://www.quantamagazine.org/mathematicians-prove-hawking-wrong-about-extremal-black-holes-20240821/
Related thoughts
Conservation of energy is a more general rule that rules out perpetual motion machines
2nd law of thermodynamics - likewise, HOWEVER that law must have been broken somehow to get a low entropy initial state for the Big Bang.
AI examples
1 The Polya Conjecture
Proposed by George Pólya in 1919, this conjecture related to the distribution of prime numbers. It posited that for any number 𝑥, the majority of the numbers less than 𝑥 have an odd number of prime factors. It was verified for numbers up to 1,500,000, but a counterexample was found when x was around 906 million. This shows a fascinating case where numerical verification up to a large number was still not sufficient.
2 Mertens Conjecture
The Mertens conjecture suggested that the absolute value of the Mertens function
M(x) is always less than sqrt(x)
This was proven false by computational means with a counterexample found above 10e14 by Andrew Odlyzko and Herman te Riele in 1985.
Unlimited evaluation can never get to BB(6) so that is the limit of evidence from evaluation.
The value of BB(6) is not currently known, but it could in principle be discovered. There is no general algorithm for calculating BB numbers, but any particular BB(n) could be determined by enumerating all n-state Turing machines and proving whether each one halts.
According to Scott, "Pavel Kropitz discovered, a couple years ago, that BB(6) is at least 10^10^10^10^10^10^10^10^10^10^10^10^10^10^10 (i.e., 10 raised to itself 15 times)."
So we can never evaluate BB(6) as it is at least this large
Random ideas to expand on
https://newatlas.com/computers/human-brain-chip-ai/
https://newatlas.com/computers/cortical-labs-dishbrain-ethics/
Could this be cheaper than chips in an extreme silicon shortage? How did it learn, can we map connections forming and make better learning algorithms.
Birds vs ants/bees.
A flock of birds can be dumber than the dumbest individual bird, a colony of bees/ants can be smarter than than the individual, and smarter than a flock of birds! Bird avoiding predator in geometrical pattern - no intelligence as predictability like fluid has no processing. Vs bees swarming the scout hornet or ants building a bridge etc. Even though no planning in ants, no overall plan in individual neurons?
The more complex pieces the less well they fit together. Less intelligent units can form a better collective in this instance. Not like human orgs.
Progression from simple cell to mitochondria - mito have no say anymore but fit in perfectly. Multi organism like hive are next level up - simpler creatures can have more cohesion in upper level. Humans have more effective institutions in spite of complexity b/c of consciousness, language etc.
RISC vs CISC Intel vs NVIDIA, GPU for super computers. I though about this years ago, led to prediction that Intel or other CISC max business would lose to cheaper.
Time to communicate a positive singularity/utopia
Spheres of influence, like we already have, uncontacted tribes, Amish etc. Taking that further, Super AI must leave earth, perhaps solar system, enhanced ppl to of earth eco-system, space colonies, or Mars etc.
Take the best/happy nature to expand, don't take suffering to >million stars.
Humans can't do interstellar faster than AI anyway even if that was the goal, it would have to prepare it first, and can travel faster. So no question majority of humanity interstellar is AI. Need to keep earth for people. What is max CEV? Well keep earth ecosystem, humans can progress, discover on their own?
Is the progression to go outwards, human, posthuman/Neuralink, WBE? it is is some sci-fi Peter Hamilton/ Culture (human to WBE)
Long term all moral systems don't know what to say on pleasure vs self determination/achievement. Eventually we run out of inventing things - should it go asymptotically slower.
Explores should be on the edge of civilization. For astronomers, shouldn't celebrate JWST, but complain about Starlink - that is inconsistent. Edge of civilization has expanded past low earth orbit, that is why we get JWST. Obligation then to put telescopes further out.
Go to WBE instead of super AI - know for sure it is conscious.
Is industry, tech about making stuff less conscious with time? e.g. mechanical things have zero, vs a lot when done by people. Is that a principle for AI/robots? then there are no slaves etc.
Can ppl get behind this? - implied contract with future AI? acausal bargaining.
https://www.lesswrong.com/posts/qZJBighPrnv9bSqTZ/31-laws-of-fun
Turing test for WBE - how would you know?
Intelligence processing vs time
For search, exponential processing power gives linear increate in rating, Chess, Go. However this is a small search space. For life, does the search get bigger the further out you go.
e.g. 2 steps is 2^2 but 4 steps is 4^4. This makes sense if there are more things to consider the further ahead you look. e.g. house price for 1 month, general market, + economic trend. 10+ years then demographic trends, changing govt policy, unexpected changes in transport patterns, (new rail nearby or in competing suburb etc)
If applies to tech, then regular experiments shrink the search space, need physical experimentation to get ahead.
For AI, if its like intuition/search then need search to improve intuition. Can only learn from long term.
Long pause or not?
How long should we pause? 10 years? Even in stable society there is diminishing returns - seen this with pure maths, physics, philosophy, when we reach human limits, then more time simply doesn't help. Reasonable to assume with CEV like concept also.
Pause carries danger? Is it like the clear pond before a rapid, are we already in the rapid, then trying to stop is dangerous having baby is fatal etc. "Emmett Shear" of go fast slow, stop, pause, Singularity seems ideal, though possible? WBE better than super AI - cultural as elder?
1984 quote “If you want a vision of the future, imagine a boot stamping on a human face--forever.”
"Heaven is high and the emperor is far away" is a Chinese proverb thought to have originated from Zhejiang during the Yuan dynasty.
Not possible earlier but is possible now. If democracies go to dictatorship but not back then pause is bad. Best way to keep democracies is to leave hence space colonies. Now in Xinjiang, the emperor is in your pocket, LLM can understand anything - how far back to go before this is not possible? 20 years, if not possible, then we are in the white water, and we need to paddle forwards, can't stop.
Deep time breaks all common ethics?
Utility monster, experience machine, moral realism tiling the universe etc. Self determination and achievement will be in the extreme minority over many years. What to do, fake it forget it and keep achieving again? Just keep options open until we actually experience it.
All our training is about intrinsic motivation and valuing achievement rather than pleasure for its own sake. Great asymmetry in common thought "meaningless pleasure" makes sense and seems bad or not good, but "meaningless pain" doesn't make it less bad. Why should that be the case. Evolution has biased us to not value pleasure or experience it as much as we "should"? Learn to take pleasure regard thinking "meaningless pleasure" is itself a defective attitude? If you could change yourself, should you dial down the need to achieve if you lived in a solved world?
What is "should" in is-ought. Moral realism in the limit? "Should" is us not trusting our reason, as we shouldn't. If reason says one thing, then it could be flawed as it is in most cases. Especially as we evolved, then if we always trusted it, then mistakes are bigger than benefits, so the feeling "you don't do what you should" is two systems competing, intuition/history vs new rational.
Is X.AI currently performing the largest training run?
This source claims it is
If so it seems to be getting a lot less attention compared to its compute capability.
Not sure if I have stated this before clearly but I believe scaling laws will not hold for LLM/Transformer type tech, and at least one major architectural advance is missing before AGI. That is increasing scaling of compute and data will plateau performance soon, and before AGI. Therefore I expect to see evidence for this not much after the end of this year, when large training runs yield models that are a lot more expensive to train, slower on inference and only a little better on performance. X.AI could be one of the first to publicly let this be known (Open AI, etc could very well be aware of this but not making it public)
Completion of the 100K H100s cluster seems to mean Grok-3 won't be trained only on a smaller part of it, so it must be targeting all of it. But also Musk said Grok-3 is planned for end of 2024. So it won't get more than about 2.7e26 FLOPs, about 14x GPT-4 (the training that started end of July could have just used a larger mini-batch size that anticipates the data parallelism needs of the larger cluster, so the same run could continue all the way from July to November). With 6 months of training on the whole cluster, it could instead get up to 5e26 FLOPs (25x GPT-4), but that needs to wait for another run.
OpenAI is plausibly training on Microsoft's 100K H100s cluster since May, but there are also claims of the first run only using 10x GPT-4 compute, which is 2e26 FLOPs, so it'd take only 2-3 months and pretraining should've concluded by now. Additionally, it's probably using synthetic data at scale in pretraining, so if that has an effect, Grok-3's hypothetically similar compute won't be sufficient to match the result.
On the other hand, with about 20K H100s, which is the scale that was offered at AWS in July 2023 and might've been available at Microsoft internally even earlier, it only takes 5 months to get 1e26 FLOPs. So GPT-4o might already be a 5x GPT-4 model. But it also could be an overtrained model (to get better inference efficiency), so not expected to be fundamentally much smarter.
Google has very large datacenters, if measured in megawatts, but they are filled with older TPUs. Maybe they are fine compared to H100s on FLOP/joule basis though? In BF16, A100 (0.3e15 FLOP/s, 400W) to H100 (1e15 FLOP/s, 700W) to B100 (1.8e15 FLOP/s, 700W) notably improve FLOP/joule, but for recent TPUs TDP is not disclosed (and the corresponding fraction of the rest of the datacenter needs to be taken into account, for example it turns 700W of an H100 into about 1500W). In terms of FLOPS/GPU, only the latest generation announced in May 2024 matches H100s, it might take time to install enough of them.
They seem to have big plans for next year, but possibly they are not yet quite ready to be significantly ahead of 100K H100s clusters.
Thanks, that updates me. I've been enjoying your well-informed comments on big training runs, thank you!
Rootclaim covid origins debate:
This piece relates to this manifold market
and these videos
I listened to most of the 17+ hours of the debate and found it mostly interesting, informative and important for someone either interested in COVID origins or practicing rationality.
I came into this debate about 65-80% lab leak, and left feeling <10% is most likely.
Key takeaways