Mis-Understandings — LessWrong

LESSWRONG
LW

The counterargument against continous tokens being passed forwards is that if you want to use neuralese, you have to give up sampling, since the big idea of latent reasoning is to not pass through the random discretization of sampling a token. But random discretization is itself powerful, especially with the possibility of a useful bias. If you give it up, the model becomes deterministic, so it can't use Best of N. If Best of N or tree search on chains of thoughts is really important, either in training or in deployment, that is something that is not really compatible with the latent paradigm, in addition to the difficulty of training data.

The argument against semantic drift/Thinkish is extremely weak, and we should expect semantic drift when training with self play without countermeasures.

Replying toThe nature of LLM algorithmic progress (v2)

Mis-Understandings10d

The nature of LLM algorithmic progress (v2)

For running out of optimizations, there is another factor. And that is that some amount of re-optimization is needed each time you change hardware. Consequentially it is really hard to separate hardware progress (the new NVIDIA chips are better) from optimization of the software that runs on them (Because they have new features, the use of which is an optimization)

Mis-Understandings16d

This analysis assumes that the government of Taiwan has no power, that the US is the one to negotiate with. If you are thinking of this, the question is not what deals the US would agree to, but what deal Taiwan would agree to. The US is not actually in a position to veto a reunification agreed between both governments, though they likely would not be able to backstop it either (which creates a little negotiating friction).

All of these concession are concessions to the US. What really matters for a peaceful settlement (since Taiwan can destroy the surplus) is whether the CCP can give meaningful concessions on the terms of reunification with... (read more)

Replying toAda Palmer: Inventing the Renaissance

Mis-Understandings20d

Ada Palmer: Inventing the Renaissance

The big problem with pre-x LLMs, especially into the deep past, is sampling bias. That is to say, the set of texts that survive is not the set of texts that were present, which is an extra bias on the already biased data collection for regular LLMs. There is also the temptation to support (especially for placing a LLM as a strategic) with analysis and data that comes after the period, which was collected with a hindsight bias for what is interesting.

Replying toClaude Codes #3

Mis-Understandings25d

Claude Codes #3

So far, I have not had Claude Code refuse a request from me, not even once.

When Claude reinterprets the request to be "the request, but safe", this is invisible, and good on the margin.

I think doing this is mostly fine.

Mis-Understandings1mo

The problem with spontaneous agreement is that explicit agreement gives a chance for parties A and B to exchange hidden information without letting third parties know.

The reason for agreements being formalized is also because since you need to negotiate to reach the efficient settlement, since you have to reveal secret information, but publicly implementing the results of secret negotiations leads to suspiscion of secret alliances. If your public allies think you have secret allies, they are not happy, since that is bad for them. So publicizing agreements makes you more predictable, which is good for you.

Replying toChromosome identification methods

Mis-Understandings2mo

Chromosome identification methods

I am assuming the first step is to count the chromosomes, and isolate that you have exactly a collection of full sets, which came from the same cells, so that the property, (there are k of each exactly) holds. I would need to look in literature to see the sucess rate of isolating all chromosomes, not just (at least one)

It seems for this method you must get 22 sucessful sequences in a row, which is hard if sequencing fails even sometimes, or you lose a chromosome even sometimes.

Mis-Understandings2mo

For 1. resampling is not (asking the AI to try again) it is forcibly rewriting it's history so that it made a different one of its plausible decisions. Every resampled continuation plausibly could have been the original continuation, given a different rng seed, and there is no nonshared place to put a scratchpad that would pass the information of the resampling event. It would probably increase the risk that you resample into a less suspicious bad path, but it is outweighed by the fact that the AI has to mostly bide it's time (in order to not get caught in the fraction with good monitoring done in the first requrests) (this is... (read more)

Replying toDecision theory when you can't make decisions

Mis-Understandings3mo

Decision theory when you can't make decisions

One box P1 and Two box P1 are not the only strategies. You can also try to find some source of randomness to take a mixed strategy from. If you are not able to take mixed strategies, Game theory and decision theory are not guaranteed to converge.

Replying toDecision theory when you can't make decisions

Mis-Understandings3mo

Decision theory when you can't make decisions

How does the typical construction of Newcomb's problem handle mixed strategies?

The predictor can read your RNG, so your strategy is not really mixed. But this is a violation of game theory assumptions about the possibility of mixed strategies. If you are banned from mixed strategies, Game theory does not work so well, as I understand it.
The predictor has some utility distribution over (One box, two box) (Predict one box, Predict two box), and you can solve the Nash equilibrium, maybe to a mixed strategy equilibrium (where you pick one box the percentage) (IE. you want to one box often enough that the Predictor predicts that you will, but no more than that, based on the relative payouts)
The predictor has some policy on P(Place in box) for each P (onebox), and there the maximum utility might not be at P(onebox)

A program must either loop, halt, or use an infinite amount of memory.

Halting is almost never the highest EV action for any goal

Looping mught be high EV in some cases and some goals, but I would not expect many of them, and definitely not short loops (the longer the loop, the more like case 3).

So the policy (Always take the highest EV action for a particular goal (that is the danger model)) is a program that will use an infinite amount of memory, since it neither halts nor loops.

So I ran into this

https://www.youtube.com/watch?v=AF3XJT9YKpM

And I noticed a lot of talk about error taxonomy.

Which seems like an important idea in general, but especially in interpretability for safety.

Specifically, error taxonomy is a subset of action by consequence taxonomy, which is the main goal of interpretability for safety (as it allows us to act on the fact that the model will take actions with bad consequences).

If the story about drug prices and price controls is correct (that price controls are bad because the limiting factor for drug development is returns on capital, which this reduces), then we must rethink the political economy of drug development.

Basically, we would expect if that to be the case that the sectoral return rates of biotech to match the risk adjusted rate , but drug development is both risky and skewed, effecting costs of capital.

Most of drug prices are capital costs, and so interventions that lower the capital costs of pharmaceutical companies might produce more drugs.

Most of those capital costs from the total raise required, which is effected basically by the costs... (read more)

Does this game have a name?

Mis-Understandings

10mo

There is a game where one player wants to predict the action of the other, and the other player wants them to fail (as a fixed sum game)

It has payoffs

1,-1| -1,1

-1,1|1,-1

Or equivalent.

I believe that it has a nash equilibrium of choosing randomly.

Billionares probably give bad advice

Why?

Because in scenarios where you make decisions that have a combination of changes to variance and changes to mean, selecting for the highest value (and maximizing the odds of passing an arbitrary threshold), sometimes increasing variance increases your odds of being in the top bracket more than increases in mean. Specifically, for a given threshold over the mean, increasing variance means increases in the chance to pass it, and similarly for skewness. This occurs for both absolute, and compared to a proportion drawn from a fixed distribution (which is statistically similar to an absolute threshhold).

Buisness hypersuccess has as much to do with doing high variance things as it... (read more)

Serious take

CDT might work

Basically because of the bellman fact that

the option

1 utilon, and play a game with EV 1 utilon are the same.

So working out the bellman equations

If each decision changes the game you are playing

This will get integrated.

In any case where somebody is actually making decisions based on your decision theory

The actions you take in previous games might also have the result

Restart from position x with a new game based on what they have simulated to do

The hard part is figuring out binding.

Note to self

A point that we cannot predict past (classically the singularity), does not mean that we can never predict past it. Just that we can't predict past at this point. It is not a sensical thing to predict the direction of your predictions at a future point in time (or it is, but will not get you anywhere). But we can predict that our predictions of an event likely improve as we near it.

Therefore, arguments that because we have a prediction horizon, we cannot predict past a certain point will always appear defeated by the fact that now that we have neared it, we can now predict past it are unconvincing, since we now have more information.

However, arguments that we will never predict past a certain point need to justify why our prediction ability will in fact get worse over time.

A trivial note

Given standard axioms for Propositional logic

A->A is a tautology

Consequently, 1. Circularity is not a remarkable property (It is not any strong argument for a position)

2. Contradiction still exists

But a system cannot meaningfully say anything about it's axioms other than their logical consequences.

Consequently, since axioms being the logical consequences of themselves is exactly circularity

In a bayesian formulation there is no way of justifying a prior

Or in regular logic you cannot formally justify axioms nor the right to take them.

A quick thought on germline engineering, riffing off of https://www.wired.com/story/your-next-job-designer-baby-therapist/, which should not be all that suprising

Even if germline engineering is very good, and so before a child is born we have a lot of control about how things will turn out, once the child is born people will need to change their thinking because they do not have that control any longer. Trying to keep that control for a long time is probably a bad idea. Similarly, if things are not as expected, action as always should be taken on the world as it turned out, not as you planned it. No amount of gene engineering will be so powerful that social factors are completely irrelevant.

I noticed a possible training artifact that might exist in LLMs, but am not sure what the testable prediction is. That is to say I would think that the lowest loss model for the training tasks will be doing things in the residual stream for the benefit of future tokens, not the collumn aligned token.

the residuals are translation invariant
The gradient is the gradient of the overall loss

3. therefore when taking the gradient of the attention heads, we also take the gradient of the residuals of past tokens of the total loss, not just the loss for the gradient of the loss for the activation column aligned for them

Thus we would expect to see some computation being donated to tokens further ahead in the residual stream (if it was efficient).

This explains why we see lookahead in autoregressive models

I am trying to figure out a problem where every moral theory is a class of consequentialism.

In short, I cannot figure out how you argue what actions actually are a particular moral category, without appealing to consequences

Curling my finger is only shooting somewhen when I am holding a gun and pointing it at someone. We judge the two cases very differently.

In short, the classing of actions for moral analysis is hard.

Mis-Understandings's Shortform

Mis-Understandings

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.