I like that post too but it occurs to me that it hinges on the process by which the person chooses how most beneficially to be biased being the person themselves unbiasedly (or less-biasedly) evaluating the alternatives. In which case you then get the tension where that part of that person does actually know the truth and so on.
In particular, it seems to me that it's still possible to have the choice made for you to tend to have some kinds of false beliefs in a way that predictably correlates with benefitting you (although not necessarily in the form makin...
I think in dictionaries one tends to find the "morally/legally bound" definition of "obligation" emphasized, and only sometimes but not always a definition closer to the usage in the OP, so prescriptively, in the sense of linguistic prescriptivism, this criticism may make sense. But practically/descriptively, I do believe among many English-speaking populations (including at least the one that contains me) currently "obligation" can also be used the way it is in the OP. For me at least the usage of "obligation" did not pose any speed bumps in understanding...
It may depend on the RL algorithm, but I think would not expect most RL to have this issue to first order if the RL algorithm is producing its rollouts by sampling from the full untruncated distribution at temperature 1.
The issue observed by the OP is a consequence of the fact that typically if you are doing anything other than untruncated sampling at temperature 1, then your sampling is not invariant between, e.g. "choose one of three options: a, b, or c" and "choose one of two options: a, or (choose one of two options: b or c)".
However many typical on-po...
It's interesting to note the variation in "personalities" and apparent expression of different emotions despite identical or very similar circumstances.
Pretraining gives models that predict every different kind of text on the internet, and so are very much simulators that learn to instantiate every kind of persona or text-generating process in that distribution, rather than being a single consistent agent. Subsequent RLHF and other training presumably vastly concentrates the distribution of personas and processes instantiated by the model on to a par...
I agree with DAL that "move 37" among the lesswrong-ish social circle has maybe become a handle for a concept where the move itself isn't the best exemplar of that concept in reality, although I think it's not a terrible exemplar either.
It was surprising to pro commentators at the time. People tended to conceptualize bots as being brute-force engines with human-engineered heuristics that just calculate them out to a massive degree (because historically that's what they were in Chess), rather than as self-learning entities that excel in sensing vibes and in...
One thing that's worth keeping in mind with exercises like this is that while you can do this in various ways and get some answers, the answers you get may depend nontrivially on how you construct the intermediate ladder of opponents.
For example, attempts to calibrate human and computer Elo ratings scales often do place top computers around the 3500ish area, and one of the other answers given has indicated by a particular ladder of intermediates that random would then be at 400-500 Elo given that. But there are also human players who are genuinely ra...
Circling back to this with a thing I was thinking about - suppose one wanted to figure out just one additional degree of freedom to the Elo rating a player had (at a given point in time, if you also allow evolution over time) that would add as much improvement as possible. Almost certainly you need more dimensions than that to properly fit real idiosyncratic nonlinearities/nontransitivities (i.e. if you had a playing population with specific pairs of players that were especially strong/weak only against specific other players, or cycles of players where A ...
I might be misunderstanding, but it looks to me like your proposed extension is essentially just the Elo model with some degrees of freedom that don't yet appear to matter?
The dot product has the property that <theta_A-theta_B,w> = <theta_A,w> - <theta_B,w>, so the only thing that matters is the <theta_P,w> for each player P, which is just a single scalar. So we are on a one-dimensional scale again where predictions are based on taking a sigmoid of the difference between a single scalar associated with each player.
As far...
I assume you're familiar with the case of the parallel postulate in classical geometry as being independent of other axioms? Where that independence corresponds with the existence of spherical/hyperbolic geometries (i.e. actual models in which the axiom is false) versus normal flat Euclidean geometry (i.e. actual models in which it is true).
To me, this is a clear example of there being no such thing as an "objective" truth about the the validity of the parallel postulate - you are entirely free to assume either it or incompatible alternatives. You end up w...
This thread analyzes what is going on under the hood with the chess transformer. It is a stronger player than the Stockfish version it was distilling, at the cost of more compute but only by a fixed multiplier, it remains O(1).
I found this claim suspect because this basically is not a thing that happens in board games. In complex strategy board games like Chess, practical amounts of search on top of a good prior policy and/or eval function (which Stockfish has), almost always outperforms any pure forward pass policy model that doesn't do explicit search, e...
Do you think a vision transformer trained on 2-dimensional images of the board state would also come up with a bag of heuristics or would it naturally learn a translation invariant algorithm taking advantage of the uniform way the architecture could process the board? (Let's say that there are 64 1 pixel by 1 pixel patches, perfectly aligned with the 64 board locations of an 8x8 pixel image, to make it maximally "easy" for both the model and for interpretability work.)
And would it differ based on whether one used an explicit 2D positional embedding, or a l...
(KataGo dev here, I also provided a bit of feedback with the authors on an earlier draft.)
@gwern - The "atari" attack is still a cyclic group attack, and the ViT attack is also still a cyclic group attack. I suspect it's not so meaningful to put much weight on the particular variant that one specific adversary happens to converge to.
This is because the space of "kinds" of different cyclic group fighting situations is combinatorically large and it's sort of arbitrary what local minimum the adversary ends it because it doesn't have much pressure to fin...
Apparently not a writeup (yet?), but there appears to be a twitter post here from LC0 with an comparison plot of accuracy on tactics puzzles: https://x.com/LeelaChessZero/status/1757502430495859103?s=20
Yes, rather than resolving the surprise of "the exact sequence HHTHTTHTTH" by declaring that it shouldn't be part of the set of events, I would prefer to resolve it via something like:
Here's my intuition-driving example/derivation.
Fix a reference frame and suppose you are on a frictionless surface standing next to a heavy box equal to your own mass, and you and the box always start at rest relative to one another. In every example, you will push the box leftward, adding 1 m/s leftward velocity to the box, and adding 1 m/s rightward velocity to yourself.
Let's suppose we didn't know what "kinetic energy" is, but let's suppose such a concept exists, and that whatever it is, an object of your mass has 0 units of it when at rest, and i...
> lack of sufficient evidence.
Perhaps more specifically, evidence that is independent from the person that is to be trusted or not. Presumably when trusting someone else that something is true, often one does so due to believing that the other person is being honest and reliable enough such that that their word is sufficient evidence to then take some action. It's just that there isn't sufficient evidence without that person's word.
I am also curious why the zero-shot transfer is so close to 0% but not 0%. Why do those agents differ so much, and what do the exploits for them look like?
The exploits for the other agents are pretty much the same exploit, they aren't really different. From what I can tell as an experienced Go player watching the adversary and other human players use the exploit, the zero shot transfer is not so high because the adversarial policy overfits to memorize specific sequences that let you set up the cyclic pattern and learns to do so in a relatively non-robust w...
(I'm the main KataGo dev/researcher)
Just some notes about KataGo - the degree to which KataGo has been trained to play well vs weaker players is relatively minor. The only notable thing KataGo does is in some self-play games to give up to an 8x advantage in how many playouts one side has over the other side, where each side knows this. (Also KataGo does initialize some games with handicap stones to make them in-distribution and/or adjust komi to make the game fair). So the strong side learns to prefer positions that elicit higher chance of mistakes by the ...
There's (a pair of) binary channels that indicate whether the acting player is receiving komi or paying it. (You can also think of this as a "player is black" versus "player is white" indicator, but interpreting it as komi indicators is equivalent and is the natural way you would extend Leela Zero to operate on different komi without having to make any changes to the architecture or input encoding).
In fact, you can set the channels to fractional values strictly between 0 and 1 to see what the model thinks of a board state given reduced komi or no-komi cond...
I do think there is some fun interesting detail in defining "optimal" here. Consider the following three players:
- A - Among all moves whose minimax value is maximal, chooses one uniformly at random (i.e. if there is at least one winning move, they choose one uniformly, else if there is at least one drawing move, they choose one uniformly, else they choose among losing moves uniformly).
- B - Among all moves whose minimax value is maximal, chooses one uniformly at random, but in cases of winning/losing, restricting to only moves that win as fast as possible or
... (read more)