LESSWRONG
LW

37
aphyer
5176474620
Message
Dialogue
Subscribe

I am Andrew Hyer, currently living in New Jersey and working in New York (in the finance industry).

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Was Barack Obama still serving as president in December?
aphyer2d20

This might be downstream of a deliberate decision by designers.

An LLM has been trained on data through February 2025.

A user asks it a question in June 2025 about 'what happened in May?'

How should the LLM respond?

Reply
D&D.Sci: Serial Healers
aphyer8d40

(I wish to register that I didn't miss this scenario, and intend to get around to playing it this weekend...it's just that you made the awful indie-game blunder of releasing your game the day after Silksong came out, and my free time has been somewhat spoken for for the past few days).

Reply1
Should we align AI with maternal instinct?
aphyer17d30

In some other world somewhere, the foremost Confucian scholars are debating how to endow their AI with filial piety.

Reply
An epistemic advantage of working as a moderate
aphyer19d3529

You can be a moderate by believing only moderate things.  Or you can be a moderate by adopting moderate strategies.  These are not necessarily the same thing. 

This piece seems to be mostly advocating for the benefits of moderate strategies.

Your reply seems to mostly be criticizing moderate beliefs.

(My political beliefs are a ridiculous assortment of things, many of them outside the Overton window.  If someone tells me their political beliefs are all moderate, I suspect them of being a sheep.

But my political strategies are moderate: I have voted for various parties' candidates at various times, depending on who seems worse lately.  This seems...strategically correct to me?)

Reply
Should you make stone tools?
aphyer1mo141

If you ever do it, please be sure to try to confuse archaeologists as much as possible.  Find some cave, leave all your flint tools there, and carve images of space aliens onto the wall.

Reply1
johnswentworth's Shortform
aphyer2mo40

This might be a cultural/region-based thing.  Stop by a bar in Alabama, or even just somewhere rural, and I think there might be more use of bars as matchmaking.

Reply
English writes numbers backwards
aphyer2mo3421

Here is a list of numbers.  Which two of these numbers are closest together?

815

187

733

812

142

312

Reply
A brief perspective from an IMO coordinator
aphyer2mo20

I think the obvious approach is comparably neat until you get to the point of proving 

that k=2 won't work

at which point it's a mess.  The Google approach manages to prove that part in a much nicer way as a side effect of its general result.

Reply
A brief perspective from an IMO coordinator
aphyer2mo40

I looked at the Q1/4/5 answers[1].  I think they would indeed most likely all get 7s: there's quite a bit of verbosity, and in particular OpenAI's Q4 answer spends a lot of time talking its way around in circles, but I believe there's a valid proof in all of them.

Most interesting is Q1, where OpenAI produces what I think is a very human answer (the same approach I took, and the one I'd expect most human solvers to take) while Google takes a less intuitive approach but one that ends up much neater.  This makes me a little bit suspicious about whether some functionally-identical problem showed up somewhere in Google's training, but if it didn't that is extra impressive.

  1. ^

    IMO Q3 and Q6 are generally much harder: the AI didn't solve Q6, and I haven't gone through the Q3 answers.  Q2 was a geometry one, which is weirder to look through and which I find very unpleasant.

Reply
A brief perspective from an IMO coordinator
aphyer2mo40

(Credentials: was an IMO team reserve, did some similar competitions)

Have the actual answers AI produced been posted?   Because I could see this mattering a lot, or not at all, depending on the exact quality of the answers.

If you give a clean, accurate answer that lines up with the expected proof, grading is quite quick and very easy.  But if your proof is messy and non-standard, coordinators need to go through it and determine its validity: or if you missed out part of the proof, there needs to be a standardized answer to 'how big a gap is this, and how much partial credit do you get?'

(Also, have the exact prompts used been posted?  Because it would be very very easy to add small amounts of text or examples that make these problems much easier.  If the prompt used for Q4 contains the number '6' at any point in it, for example, I would basically just instantly call that 'cheating').

Reply
Load More
28D&D.Sci Tax Day: Adventurers and Assessments Evaluation & Ruleset
5mo
10
47D&D.Sci Tax Day: Adventurers and Assessments
5mo
14
34D&D.Sci Dungeonbuilding: the Dungeon Tournament Evaluation & Ruleset
8mo
8
50D&D.Sci Dungeonbuilding: the Dungeon Tournament
9mo
16
48D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
11mo
13
42D&D Sci Coliseum: Arena of Data
11mo
23
43Ambiguity in Prediction Market Resolution is Still Harmful
1y
17
73D&D.Sci Scenario Index
1y
0
61D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset
1y
11
42D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues
1y
16
Load More