Comments

It would be worth looking at the successful Japanese, Hong Kong, and London case-studies of transit agencies owning land around their transport (and perhaps NYC's Penn Station & Madison Square Garden as the anti-case-study).

And the animals with relevant brains for foom are <0.01 billion years old, and their population only started becoming noticeably larger than competitors about <0.0001 billion years ago, and they only started doing really impressive things like 'land on the Moon' <0.0000001 billion years ago. This is a foom timeline of events.

I'm not particularly impressed. It's still making a lot of errors, and doesn't seem like a leap over SOTA from last year like Parti - looks like worse instruction-following, maybe better image quality overall. (Of course, people will still be impressed, because they refuse to believe something exists in DL until they personally can use it, no matter how many samples the paper or website provides to look at.) And it's still heavily locked-down like DALL-E 2. The prompt-engineering is nice, but people have been doing that for a long time already. The lack of any paper or analysis suggests not much novelty.

There is no difference at the hardware level between being 'close to' and 'having a low-latency connection to', as I already explained. And to the extent that having those connections matter, miners already have them. In particular, in Ethereum, due to the money you can make by frontrunning transactions to hack/exploit them ('miner exploitable value'), HFT Ethereum miners/stakers invest heavily in having a lot of interconnected low-latency Sybils nodes so they can see unconfirmed transactions as quickly as possible, compute a maximally-exploitative block (eg. temporarily jacking up the price of a thing being purchased using a flash loan solely to rip off a specific transaction), and get that block committed before anyone can beat them to the same exploit. Having a lot of MEV is considered a bad thing and Ethereum types are spending increasing effort on approaches like commit-and-reveal to minimize MEV, which comes at the expense of users and makes them very unhappy. You could, I suppose, design a protocol which has extra MEV by designing transactions to be especially exploitable, but most people would consider that a bad thing...

Any incentive is something to be attacked and sucked away by Sybils pretending to be distant when actually near & enjoying all other benefits of being near.

You still have issues with Sybil attacks and attackers either accessing special high-speed links (paid for from the successful attacks) or faking latency. You can't 'choose a random subgraph' for the exact same reason you can't solve cryptocurrency by just 'choose some "random" peers and decide whether to accept or reject a double-spend based on what they tell you' - those 'random peers' are the very attackers you are worried about colluding. In fact, in an eclipse attack, you might not be able to connect to anyone but an attacker!

Sister Y's (deleted) blog perhaps? Based on her Ribbonfarm posts. Or maybe https://exploringegregores.wordpress.com/

Latency comes up occasionally. In fact, the granddaddy of public key crypto, Merkle's puzzles, relies critically on latency. The problem is, you can only prove upper bounds on latency, not lower bounds, because it is trivial to fake increased latency, but one cannot break the speed of light. If someone responds to your cryptographic challenge within Y milliseconds, you know that they can't be physically further from you than Z kilometers; but if they fail to respond, they could be anywhere, even next door, and just not responding (for both ordinary and malicious reasons). Nothing stops two machines from pretending to be far away from each other, and making sure they eg communicate only over VPNs with exit points on opposite sides of the globe. Further, if you want to do it over commodity Internet, say if you're trying to do 'proof of distance' by peering only with nodes which respond fast enough that they have to be within Z kilometers of you, public Internet has so much latency that you get poor loose bounds, and someone can pay money for lower latency networking. (This already happens with cryptocurrency mining for the same reasons that HFT firms pay for microwave links. Amusingly, it also happens with computer game companies, not to mention large tech companies prioritizing their own traffic. Google famously owns a ton of fiber it bought up post-dotcom bubble.) Further still, you don't really care about physical centralization so much as you care about control, and it's impossible to prove cryptographically in any easy way that two physically distant nodes are not secretly controlled by the same entities in a Sybil attack. You run into similar issues with proof-of-storage.

gwern5dΩ440

if you mean "in the limit" to apply to practically relevant systems we build in the future.

Outside of simple problems like Othello, I expect most DRL agents will not converge fully to the peak of the 'spinning top', and so will retain traces of their informative priors like world-models.

For example, if you plug GPT-5 into a robot, I doubt it would ever be trained to the point of discarding most of its non-value-relevant world-model - the model is too high-capacity for major forgetting, and past meta-learning incentivizes keeping capabilities around just in case.

But that's not 'every system we build in the future', just a lot of them. Not hard to imagine realistic practical scenarios where that doesn't hold - I would expect that any specialized model distilled from it (for cheaper faster robotic control) would not learn or would discard much more of its non-value-relevant world-model compared to its parent, and that would have potential safety & interpretability implications. The System II distills and compiles down to a fast efficient System I. (For example, if you were trying to do safety by dissecting its internal understanding of the world, or if you were trying to hack a superior reward model, adding in safety criteria not present in the original environment/model, by exploiting an internal world model, you might fail because the optimized distilled model doesn't have those parts of the world model, even if the parent model did, as they were irrelevant.) Chess end-game databases are provably optimal & very superhuman, and yet, there is no 'world-model' or human-interpretable concepts of chess anywhere to be found in them; the 'world-model' used to compute them, whatever that was, was discarded as unnecessary after the optimal policy was reached.

I think the more relevant question is "given a frozen initial network, what are the circuit-level inductive biases of the training process?". I doubt one can answer this via appeals to RL convergence results.

Probably not, but mostly because you phrased it as inductive biases to be washed away in the limit, or using gimmicks like early stopping. (It's not like stopping forgetting is hard. Of course you can stop forgetting by changing the problem to be solved, and simply making a representation of the world-state part of the reward, like including a reconstruction loss.) In this case, however, Othello is simple enough that the superior agent has already apparently discarded much of the world-model and provides a useful example of what end-to-end reward maximization really means - while reward is sufficient to learn world-models as needed, full complete world-models are neither necessary nor sufficient for rewards.

As a side note, I think this "agent only wants to maximize reward" language is unproductive (see "Reward is not the optimization target", and "Think carefully before calling RL policies 'agents'").

I've tried to read those before, and came away very confused what you meant, and everyone who reads those seems to be even more confused after reading them. At best, you seem to be making a bizarre mishmash of confusing model-free and policies and other things best not confused and being awestruck by a triviality on the level of 'organisms are adaptation-executers and not fitness-maximizers', and at worst, you are obviously wrong: reward is the optimization target, both for the outer loop and for the inner loop of things like model-based algorithms. (In what sense does, say, a tree search algorithm like MCTS or full-blown backwards induction not 'optimize the reward'?)

gwern7dΩ23-2

I would expect it to not work in the limit. All the models must converge on the same optimal solution for a deterministic perfect-information game like Othello and become value-equivalent, ignoring the full board state which is irrelevant to reward-maximizing. (You don't need to model edge-cases or weird scenarios which don't ever come up while pursuing the optimal policy, and the optimal 'world-model' can be arbitrarily tinier and unfaithful to the full true world dynamics.*) Simply hardwiring a world model doesn't change this, any more than feeding in the exact board state as an input would lead to it caring about or paying attention to the irrelevant parts of the board state. As far as the RL agent is concerned, knowledge of irrelevant board state is a wasteful bug to be worked around or eliminated, no matter where this knowledge comes from or is injected.

* I'm sure Nanda knows this but for those whom this isn't obvious or haven't seen other discussions on this point (some related to the 'simulators' debate): a DRL agent only wants to maximize reward, and only wants to model the world to the extent that maximizes reward. For a complicated world or incomplete maximization, this may induce a very rich world-model inside the agent, but the final converged optimal agent may have an arbitrarily impoverished world model. In this case, imagine a version of Othello where at the first turn, the agent may press a button labeled 'win'. Obviously, the optimal agent will learn nothing at all beyond learning 'push the button on the first move' and won't learn any world-model at all of Othello! No matter how rich and fascinating the rest of the game may be, the optimal agent neither knows nor cares.

Load More