Arjun Panickssery - LessWrong

Express interest in an "FHI of the West"

The older nickname was "Cornell of the West." Stanford was modeled after Cornell.

This story is inspired by The Trouble With Being Born, a collection of aphorisms by the Romanian philosopher Emil Cioran (discussed more here), including the following aphorisms:

A stranger comes and tells me he has killed someone. He is not wanted by the police because no one suspects him. I am the only one who knows he is the killer. What am I to do? I lack the courage as well as the treachery (for he has entrusted me with a secret—and what a secret!) to turn him in. I feel I am his accomplice, and resign myself to being arrested and punished as such. At the same time, I tell myself this would be too ridiculous. Perhaps I shall go and denounce him all the same. And so on, until I wake up.

The interminable is the specialty of the indecisive. They cannot mark life out for their own, and still less their dreams, in which they perpetuate their hesitations, pusillanimities, scruples. They are ideally qualified for nightmare.

Here on the coast of Normandy, at this hour of the morning, I needed no one. The very gulls’ presence bothered me: I drove them off with stones. And hearing their supernatural shrieks, I realized that that was just what I wanted, that only the Sinister could soothe me, and that it was for such a confrontation that I had got up before dawn.

Your LLM Judge may be biased

Arjun Panickssery25d10

See "Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions" (Pezeshkpour and Hruschka, 2023):

Large Language Models (LLMs) have demonstrated remarkable capabilities in various NLP tasks. However, previous works have shown these models are sensitive towards prompt wording, and few-shot demonstrations and their order, posing challenges to fair assessment of these models. As these models become more powerful, it becomes imperative to understand and address these limitations. In this paper, we focus on LLMs robustness on the task of multiple-choice questions -- commonly adopted task to study reasoning and fact-retrieving capability of LLMs. Investigating the sensitivity of LLMs towards the order of options in multiple-choice questions, we demonstrate a considerable performance gap of approximately 13% to 75% in LLMs on different benchmarks, when answer options are reordered, even when using demonstrations in a few-shot setting. Through a detailed analysis, we conjecture that this sensitivity arises when LLMs are uncertain about the prediction between the top-2/3 choices, and specific options placements may favor certain prediction between those top choices depending on the question caused by positional bias. We also identify patterns in top-2 choices that amplify or mitigate the model's bias toward option placement. We found that for amplifying bias, the optimal strategy involves positioning the top two choices as the first and last options. Conversely, to mitigate bias, we recommend placing these choices among the adjacent options. To validate our conjecture, we conduct various experiments and adopt two approaches to calibrate LLMs' predictions, leading to up to 8 percentage points improvement across different models and benchmarks.

Also "Benchmarking Cognitive Biases in Large Language Models as Evaluators" (Koo et al., 2023):

Order Bias is an evaluation bias we observe when a model tends to favor the model based on the order of the responses rather than their content quality. Order bias has been extensively studied (Jung et al., 2019; Wang et al., 2023a; Zheng et al., 2023), and it is well-known that state-of-the-art models are still often influenced by the ordering of the responses in their evaluations. To verify the existence of order bias, we prompt both orderings of each pair and count the evaluation as a “first order” or “last order” bias if the evaluator chooses the first ordered (or last ordered) output in both arrangements respectively.

The Worst Form Of Government (Except For Everything Else We've Tried)

Arjun Panickssery1mo10

Do non-elite groups factor into OP's analysis. I interpreted is as inter-elite veto, e.g. between the regional factions of the U.S. or between religious factions, and less about any "people who didn't go to Oxbridge and don't live in London"-type factions.

I can't think of examples where a movement that wasn't elite-led destabilized and successfully destroyed a regime, but I might be cheating in the way I define "elites" or "led."

The Worst Form Of Government (Except For Everything Else We've Tried)

Arjun Panickssery1mo30

But, as other commenters have noted, the UK government does not have structural checks and balances. In my understanding, what they have instead is a bizarrely, miraculously strong respect for precedent and consensus about what "is constitutional" despite (or maybe because of?) the lack of a written constitution. For the UK, and maybe other, less-established democracies (i.e. all of them), I'm tempted to attribute this to the "repeated game" nature of politics: when your democracy has been around long enough, you come to expect that you and the other faction will share power (roughly at 50-50 for median voter theorem reasons), so voices within your own faction start saying "well, hold on, we actually do want to keep the norms around."

The UK is also a small country, both literally, having a 4-5x smaller population than e.g. France during several centuries of Parliamentary rule before the Second Industrial Revolution, and figuratively, since they have an unusually concentrated elite that mostly goes to the same university and lives in London (whose metro area has 20% of the country's population).

https://www.youtube.com/watch?app=desktop&v=dkhcNoMNHA0

Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search

Arjun Panickssery2mo41

Changes my view, edited the post.

Thanks for taking the time to respond; I didn't figure the post would get so much reach.

Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search

Arjun Panickssery2mo10

Wow, thanks for replying.

If the model has beaten GMs at all, then it can only be so weak, right? I'm glad I didn't make stronger claims than I did.

I think my questions about what humans-who-challenge-bots are like was fair, and the point about smurfing is interesting. I'd be interested in other impressions you have about those players.

Is the model's Lichess profile/game history available?

More Hyphenation

Arjun Panickssery3mo80

Powerful

More Hyphenation

Arjun Panickssery3mo10

Could refer to them in writing as "MC-effectiveness measures"

Arjun Panickssery's Shortform

Arjun Panickssery4mo86

Could someone explain how Rawls's veil of ignorance justifies the kind of society he supports? (To be clear I have an SEP-level understanding and wouldn't be surprised to be misunderstanding him.)

It seems to fail at every step individually:

At best, the support of people in the OP provides necessary but probably insufficient conditions for justice, unless he refutes all the other proposed conditions involving whatever rights, desert, etc.
And really the conditions of the OP are actively contrary to good decision-making, e.g. you don't know your particular conception of the good (??) or that they're essentially self-interested. . .
There's no reason to think, generally, that people disagree with John Rawls only because of their social position or psychological quirks
There's no reason to think, specifically, that people would have the literally infinite risk aversion required to support the maximin principle.
Even given everything, the best social setup could easily be optimized for the long-term (in consideration of future people) in a way that makes it very different (e.g. harsher for the poor living today) from the kind of egalitarian society I understand Rawls to support.

More concretely:

(A) I imagine that if Aristotle were under a thin veil of ignorance, he would just say "Well if I turn out to be born a slave then I will deserve it"; it's unfair and not very convincing to say that people would just agree with a long list of your specific ideas if not for their personal advantages.

(B) If you won the lottery and I demanded that you sell your ticket to me for $100 on the grounds that you would have, hypothetically, agreed to do this yesterday (before you know that it was a winner), you don't have to do this; the hypothetical situation doesn't actually bear on reality in this way.

Another frame is that his argument involves a bunch of provisions that seem designed to avoid common counterarguments but are otherwise arbitrary (utility monsters, utilitarianism, etc).

LESSWRONG
LW

Posts

Wiki Contributions

Comments