Vladimir_Nesov

Wiki Contributions

Comments

Sorted by

Interesting, many of these things seem important as evaluation issues, even as I don't think they are important algorithmic bottlenecks between now and superintelligence (because either they quickly fall by default, or else if only great effort gets them to work then they still won't crucially help). So there are blind spots in evaluating open weights models that get more impactful with greater scale of pretraining, and less impactful with more algorithmic progress, which enables evaluations to see better.

Compute and funding could be important bottlenecks for multiple years, if $30 billion training runs don't cut it. The fact that o1 is possible already (if not o1 itself) might actually be important for timelines, but indirectly, by enabling more funding for compute that cuts through the hypothetical thresholds of capability that would otherwise require significantly better hardware or algorithms.

The point isn't particularly that it's low-hanging fruit or that this is going to happen with other LLMs soon. I expect that counterfactually System 2 reasoning likely happens soon even with no o1, made easy and thereby inevitable merely by further scaling of LLMs, so the somewhat suprising fact that it works already doesn't significantly move my timelines.

The issue I'm pointing out is timing, a possible delay between when base models at the next level of scale get published in open weights, that don't yet have o1-like System 2 reasoning (Llama 4 seems the most likely specific model like that to come out next year), and a bit later when it becomes feasible to apply post-training for o1-like System 2 reasoning to these base models.

In the interim, the decisions to publish open weights would be governed by the capabilities without System 2 reasoning, and so they won't be informed decisions. It would be very easy to justify decisions to publish even in the face of third party evaluations, since those evaluations won't be themselves applying o1-like post-training to the model that doesn't already have it, in order to evaluate its resulting capabilities. But then a few months later, there is enough know-how in the open to do that, and capabilities cross all the thresholds that would've triggered in those evaluations, but didn't, since o1-like post-training wasn't yet commoditized at the time they were done.

The central point is more that superintelligence won't be able to help with your decisions, won't be able to decide for you at some fundamental level, no matter how capable it is. It can help instrumentally, but not replace your ability to decide in order to find out what the decisions are, so in some sense it's not able to help at all. I'm trying to capture the sense in which this holds regardless of its ability to precisely predict and determine outcomes in the physical world (if we only look at the state of the future, rather than full trajectories that get the world there).

When a program is given strange input, or if the computer it would be running on is destroyed, that event in the physical world usually doesn't affect the semantics of the program that describes its behavior for all possible inputs. If you are weak and brittle, talking about what you'll decide requires defining what we even mean in principle by a decision that is yours, only then can we ask if it does remain yours in actuality. Or if it does remain yours, even if you are yourself no longer present in actuality. Which is not very useful for keeping it (or yourself) present in actuality, but can be conceptually useful for formulating desiderata towards it being present in actuality.

So there are two claims. First, it's in some sense natural for your decisions to remain yours, if you don't start mindlessly parroting external inputs that dictate your actions, even in the face of superintelligence (in its aspect of capability, but not necessarily aimed in a way that disrupts you). Second, if you are strongly manipulated or otherwise overridden, this should in some sense mean that you are no longer present, that the resulting outcome doesn't capture or simulate what we should define as being you (in order to talk about the decisions that are in principle yours). Thus presence of overpowering manipulation doesn't contradict decisions usually remaining yours, it just requires that you are consequently no longer present to manifest them in actuality when that happens.

This seems like the third comment on the same concern, I've also answered it here and here, going into more detail on other related things. So there is a missing prerequisite post.

More like operationalization of a binding rule is not something even superintelligence can do for you, even when it gives the correct operationalization. Because if you follow operationalizations merely because they are formally correct and given by a superintelligence, then you follow arbitrary operationalizations, not ones you've decided to follow yourself. How would the superintelligence even know what you decide, if you shirk that responsibility and wait for the superintelligence to tell you?

This post is mostly a reaction to Bostrom's podcasts about his new book. I think making your own decisions is a central example of something that can't be solved by others, no matter how capable, and also this activity is very close to what it means to define values, so plans in the vicinity of external imposition of CEV might be missing the point.

One thought about the rule/exception discussion you've linked (in the next paragraph; this one sets up the framing). Rules/norms are primitives of acausal coordination, especially interesting when they are agents in their own right. They mostly live in other minds, and are occasionally directly incarnated in the world, outside other minds. When a norm lives in many minds, it exerts influence on the world through its decisions made inside its hosts. It has no influence where it has no hosts, and also where the hosts break the norm. Where it does have influence, it speaks in synchrony in many voices, through all of its instances, thus it can have surprising power even when it only weakly compels the minds that host its individual instances.

So there is a distinction between an exception that isn't part of a rule, and an exception that the rule doesn't plan for. Rules are often not very intelligent, so they can fail to plan for most things, and thus suggest stupid decisions that don't take those things into account. This can be patched by adding those things into the rule, hardcoding the knowledge. But not every thing needs to be hardcoded in order for the rule to be able to adequately anticipate it in its decision making. This includes hosts (big agents) making an exception to the rule (not following it in a particular situation): some rules are able to anticipate when specifically that happens, and so don't need those conditions to become part of their formulation.

Ability to resist a proof of what your behavior will be even to the point of refuting its formal correctness (by determining its incorrectness with your own decisions and turning the situation counterfactual) seems like a central example of a superintelligence being unable to decide/determine (as opposed to predict) what your decisions are. It's also an innocuous enough input that doesn't obviously have to be filtered by weak agent's membrane.

In any case, to even discuss how a weak agent behaves in a superintelligent world, it's necessary to have some notion of keeping it whole. Extreme manipulation can both warp the weak agent and fail to elicit their behavior for other possible inputs. So this response to another comment seems relevant.

Another way of stating this, drawing on the point about physical bodies thought of as simulations of some abstract formulation of a person, is to say that an agent by itself is defined by its own isolated abstract computation, which includes all membrane-permissible possible observations and resulting behaviors. Any physical implementation is then a simulation of this abstract computation, which can observe it to some extent, or fail to observe it (when the simulation gets sufficiently distorted). When an agent starts following dictates of external inputs, that corresponds to the abstract computation of the agent running other things within itself, which can be damaging to its future on that path of reflection depending on what those things are. In this framing, normal physical interaction with the external world becomes some kind of acausal interaction between the abstract agent-world (on inputs where the physical world is observed) and the physical world (for its parts that simulate the abstract agent-world).

Answer by Vladimir_Nesov51

With empirical uncertainty, it's easier to abstract updating from reasoning. You can reason without restrictions, and avoid the need to update on new observations, because you are not making new observations. You can decide to make new observations at the time of your own choosing, and then again freely reason about how to update on them.

With logical uncertainty, reasoning simultaneously updates you on all kinds of logical claims that you didn't necessarily set out to observe at this time, so the two processes are hard to disentangle. It would be nice to have better conceptual tools for describing what it means to have a certain state of logical uncertainty, and how it should be updated. But that doesn't quite promise to solve the problem of reasoning always getting entangled with unintended logical updating.

If everything improves on a log scale, then having three places to spend log scale compute is a rather big improvement.

RL training and inference are far from catching up to pretraining in scale, so the initial improvement from scaling their compute can soon prove massive compared to remaining potential for scaling pretraining. RL training might crucially depend on human labels and if so won't scale much for now, while for inference compute OpenAI's Noam Brown says

o1 thinks for seconds, but we aim for future versions to think for hours, days, even weeks

This doesn't even account for additional compute that might go into wider inference time search, where both generative and process reward models work together to refine the reasoning trace (unlike producing ever longer traces, this works in parallel).

They even managed to publish it in Nature. But if you don't throw out the original data and instead train on both the original data and the generated data, this doesn't seem to happen (see also). Besides, there is the empirical observation that o1 in fact works at GPT-4 scale, so similar methodology might survive more scaling. At least at the upcoming ~5e26 FLOPs level of next year, which is the focus of this post, the hypothetical where an open weights release arrives before there is an open source reproduction of o1's methodology, which subsequently makes that model much stronger in a way that wasn't accounted for when deciding to release that open weights model.

AlphaZero is purely synthetic data, and humans (note congenitally blind humans, so video data isn't crucial) use maybe 10,000 times less natural data than Llama-3-405B (15 trillion tokens) to get better performance, though we individually know much fewer facts. So clearly there is some way to get very far with merely 50 trillion natural tokens, though this is not relevant to o1 specifically.

Another point is that you can repeat the data for LLMs (5-15 times with good results, up to 60 times with slight further improvement, then there is double descent with worst performance at 200 repetitions, so improvement might resume after hundreds of repetitions). This suggests that it might be possible to repeat natural data many times to balance out a lot more unique synthetic data.

I originally saw the estimate from EpochAI, which I think was either 8e25 FLOPs or 1e26 FLOPs, but I'm either misremembering or they changed the estimate, since currently they list 5e25 FLOPs (background info for a metaculus question claims the Epoch estimate was 9e25 FLOPs in Feb 2024). In Jun 2024, SemiAnalysis posted a plot with a dot for Gemini Ultra (very beginning of this post) where it's placed at 7e25 FLOPs (they also slightly overestimate Llama-3-405B at 5e25 FLOPs, which wasn't yet released then).

The current notes for the EpochAI estimate are linked from the model database csv file:

This number is an estimate based on limited evidence. In particular, we combine information about the performance of Gemini Ultra on various benchmarks compared to other models, and guesstimates about the hardware setup used for training to arrive at our estimate. Our reasoning and calculations are detailed in this Colab notebook. https://colab.research.google.com/drive/1sfG91UfiYpEYnj_xB5YRy07T5dv-9O_c

Among other clues, the Colab notebook cites Gemini 1.0 report on use of TPUv4 in pods of 4096 across multiple datacenters for Gemini Ultra, claims that SemiAnalysis claims that Gemini Ultra could have been trained on 7+7 pods (which is 57K TPUs), and cites an article from The Information (paywalled):

Unlike OpenAI, which relied on Microsoft's servers, Google operated its own data centers. It had even built its own specialized Al chip, the tensor processing unit, to run its software more efficiently. And it had amassed a staggering number of those chips for the Gemini effort-77,000 of the fourth-generation TPU, code-named Pufferfish.

One TPUv4 offers 275e12 FLOP/s, so at 40% MFU this gives 1.6e25 FLOPs a month by SemiAnalysis estimate on number of pods and 2.2e25 FLOPs a month by The Information's claim on number of TPUs.

They arrive at a 6e25 FLOPs as the point estimate from hardware considerations. The training duration range is listed as 3-6 months before the code, but it's actually 1-6 months in the code, so one of these is a bug. If we put 3-6 months in the code, their point estimate becomes 1e26 FLOPs. They also assume MFU of 40-60%, which seems too high to me.

If their claim of 7+7 pods from SemiAnalysis is combined with the 7e25 FLOPs estimate from the SemiAnalysis plot, this suggests training time of 4 months. At that duration, but with TPU count claim from The Information, we get 9e27 FLOPs. So after considering Epoch's clues, I'm settling at 8e25 FLOPs as my own point estimate.

If showing you a formal proof that you will do a particular action doesn't result in you doing that action, then the supposed "proof" was simply incorrect.

Yes, that's the point, you can make it necessarily incorrect, your decision to act differently determines the incorrectness of the proof, regardless of its provenance. When the proof was formally correct, your decision turns the whole possible world where this takes place counterfactual. (This is called playing chicken with the universe or the chicken rule, a technique that's occasionally useful for getting an agent to have nicer properties, by not letting the formal system that generates the proofs know too much early on about what the agent is going to decide.)

Load More