Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
LGS30

Well the final answer is easy to evaluate. And like in rStar-Math, you can have a reward model that checks if each step is likely to be critical to a correct answer, then it assigns and implied value to the step.

 

Why is the final answer easy to evaluate? Let's say we generate the problem "number of distinct solutions to x^3+y^3+xyz=0 modulo 17^17" or something. How do you know what the right answer is?

I agree that you can do this in a supervised way (a human puts in the right answer). Is that what you mean?

What about if the task is "prove that every integer can be written as the sum of at most 1000 different 11-th powers"? You can check such a proof in Lean, but how do you check it in English?

And like in rStar-Math, you can have a reward model that checks if each step is likely to be critical to a correct answer, then it assigns and implied value to the step.

My question is where the external feedback comes from. "Likely to be critical to a correct answer" according to whom? A model? Because then you don't get the recursive self-improvement past what that model knows. You need an external source of feedback somewhere in the training loop.

LGS40

Do you have a sense of where the feedback comes from? For chess or Go, at the end of the day, a game is won or lost. I don't see how to do this elsewhere except for limited domains like simple programming which can quickly be run to test, or formal math proofs, or essentially tasks in NP (by which I mean that a correct solution can be efficiently verified).

 

For other tasks, like summarizing a book or even giving an English-language math proof, it is not clear how to detect correctness, and hence not clear how to ensure that a model like o5 doesn't give a worse output after thinking/searching a long time than the output it would give in its first guess. When doing RL, it is usually very important to have non-gameable reward mechanisms, and I don't see that in this paradigm. 

 

I don't even understand how they got from o1 to o3. Maybe a lot of supervised data, ie openAI internally created some FrontierMath style problems to train on? Would that be enough? Do you have any thoughts about this?

LGS10

The value extractable is rent on both the land and the improvement. LVT taxes only the former. E.g. if land can earn $10k/month after an improvement of $1mm, and if interest is 4.5%, and if that improvement is optimal, a 100% LVT is not $10k/mo but $10k/mo minus $1mm*0.045/12=$3,750. So 100% LVT would be merely $6,250.

If your improvement can't extract $6.3k from the land, preventing you from investing in that improvement is a feature, not a bug.

LGS10

If you fail to pay the LVT you can presumably sell the improvements. I don't think there's an inefficiency here -- you shouldn't invest in improving land if you're not going to extract enough value from it to pay the LVT, and this is a feature, not a bug (that investment would be inefficient).

LGS10

LVT applies to all land, but not to the improvements on the land.

We do not care about disincentivizing an investment in land (by which I mean, just buying land). We do care about disincentivizing investments in improvements on the land (by which I include buying the improvement on the land, as well as building such improvements). A signal of LVT intent will not have negative consequences unless it is interpreted as a signal of broader confiscation.

LGS44

More accurately, it applies to a signalling of intent of confiscating other investments; we don't actually care if people panic about land being confiscated because buying land (rather than improving it) isn't productive in any way. (We may also want to partially redistribute resources towards the losers of the land confiscation to compensate for the lost investment -- that is, we may want to the government to buy the land rather than confiscate it, though it would be bought at lower than market prices.)

It is weird to claim that the perceived consequence of planned incrementalism is "near-future governments want the money now, and will accelerate it". The actual problem is almost certainly the opposite: near-future governments will want to cut taxes, since cutting taxes is incredibly popular, and will therefore stop or reverse the planned incremental LVT.

LGS52

Thanks for this post. A few comments:

  1. The concern about new uses of land is real, but very limited compared to the inefficiencies of most other taxes. It is of course true that if the government essentially owns the land to rent it out, the government should pay for the exploration for untapped oil reserves! The government would hire the oil companies to explore. It is also true that the government would do so less efficiently than the private market. But this is small potatoes compared to the inefficiency of nearly every other tax.
  2. It is true that a developer owning multiple parcels of land would have lower incentives to improve any one of them, but this sounds like a very small effect to me, because most developers own a very (very) small part of the city's land! In any case, the natural remedy here is for the government to subsidize all improvements on land, since improvements have positive externalities. Note that this is the opposite of the current property tax regime in most places (where improving the land makes you pay tax). In fact, replacing property taxes with land value taxes would almost surely incentivize developers to develop, even if they own multiple parcels of land. In other words, your objection already applies to the current world (with property taxes) and arguably applies less to the hypothetical world with land value taxes.
  3. Estimates for the land value proportion of US GDP run significantly higher than the World Bank estimate, from what I understand. Land is a really big deal in the US economy.
  4. "The government has incentives to inflate their estimates of the value of unimproved land" sure, the government always has incentives towards some inefficiencies; this objection applies to all government action. We have to try it and see how bad this is in practice.
  5. The disruption and confidence-in-property-rights effects are potentially real, but mostly apply to sudden, high LVT. Most people's investments already account for some amount of "regulatory risk", the risk that the government changes the rules (e.g. with regards to capital gains taxes or property taxes). A move like "replace all property taxes with LVT" would be well within the expected risk. I agree that a sudden near-100% LVT would be too confidence-shaking; but even then, the question is whether people would view this as "government changes rules arbitrarily" or "government is run by competent economists now and changes rules suddenly but in accordance with economic theory". A bipartisan shift towards economic literacy would lead people towards the latter conclusion, which means less panic about confiscated investments and more preemptive panic about (e.g.) expected Pigouvian taxes (this is a good thing). But a partisan change enacted when one party has majority and undone by the other party would lead people towards the former conclusion (with terrible consequences). Anyway, I am a big supporter of incrementalism and avoiding sudden change.
  6. "The purported effect of an LVT on unproductive land speculation seems exaggerated" yes, I agree, and this always bothered me about LVT proponents.
LGS60

The NN thing inside stockfish is called the NNUE, and it is a small neural net used for evaluation (no policy head for choosing moves). The clever part of it is that it is "efficiently updatable" (i.e. if you've computed the evaluation of one position, and now you move a single piece, getting the updated evaluation for the new position is cheap). This feature allows it to be used quickly with CPUs; stockfish doesn't really use GPUs normally (I think this is because moving the data on/off the GPU is itself too slow! Stockfish wants to evaluate 10 million nodes per second or something.)

This NNUE is not directly comparable to alphazero and isn't really a descendant of it (except in the sense that they both use neural nets; but as far as neural net architectures go, stockfish's NNUE and alphazero's policy network are just about as different as they could possibly be.)

I don't think it can be argued that we've improved 1000x in compute over alphazero's design, and I do think there's been significant interest in this (e.g. MuZero was an attempt at improving alphazero, the chess and Go communities coded up Leela, and there's been a bunch of effort made to get better game playing bots in general).

LGS52

So far as I know, it is not the case that OpenAI had a slower-but-equally-functional version of GPT4 many months before announcement/release. What they did have is GPT4 itself, months before; but they did not have a slower version. They didn't release a substantially distilled version. For example, the highest estimate I've seen is that they trained a 2-trillion-parameter model. And the lowest estimate I've seen is that they released a 200-billion-parameter model. If both are true, then they distilled 10x... but it's much more likely that only one is true, and that they released what they trained, distilling later. (The parameter count is proportional to the inference cost.)

Previously, delays in release were believed to be about post-training improvements (e.g. RLHF) or safety testing. Sure, there were possibly mild infrastructure optimizations before release, but mostly to scale to many users; the models didn't shrink.

This is for language models. For alphazero, I want to point out that it was announced 6 years ago (infinity by AI scale), and from my understanding we still don't have a 1000x faster version, despite much interest in one.

LGS80

I think AI obviously keeps getting better. But I don't think "it can be done for $1 million" is such strong evidence for "it can be done cheaply soon" in general (though the prior on "it can be done cheaply soon" was not particularly low ante -- it's a plausible statement for other reasons).

Like if your belief is "anything that can be done now can be done 1000x cheaper within 5 months", that's just clearly false for nearly every AI milestone in the last 10 years (we did not get gpt4 that's 1000x cheaper 5 months later, nor alphazero, etc).

Load More