Wikitag Contributions

Comments

Sorted by
gw10

Yeah I think I would still make this bet. I think I would still count o3's 25% for the purposes of such a bet.

gw*31

I'm somewhat surprised to see the distribution of predictions for 75% on FrontierMath. Does anyone want to bet money on this, at say, 2:1 odds (my two dollars that this won't happen against your one that it will)?

(Edit: I guess the wording doesn’t exclude something like AlphaProof, which I wasn’t considering. I think I might bet 1:1 odds if systems targeted at math are included, as opposed to general purpose models?)

gw10

I think you've already given several examples:

Should I count the people I spoke to for 15 minutes for free at the imbue potlucks? That was year-changing for at least one. But if I count them I have to count all of the free people ever, even those who were uninvested. Then people will respond “Okok, how many bounties have you taken on?” Ok sure, but should I include the people who I told “Your case is not my specialty, idk if i’ll be able to help, but I'm interested in trying for a few hours if you’re into it”? Should I include the people who had an amazing session or two but haven’t communicated in two months? Should I include the people who are being really unagentic and slow?

It would already be informative if you put numbers on each of these questions (i.e. "how often does talking for 15 minutes accomplish something", "how many bounties have you taken on in/outside of your specialty", "what percent of your clients are 'unagentic and slow' (and what does this actually mean)"). Probably one could do much better by generating several metrics that one would expect to be most useful (or top N%tile useful) and share each of them.

gw21

Please, tell me what metric I should use here!

Is it feasible to just generate a bunch of such metrics, with details about what was included or not included in a particular number, and share all of them?

gw61

Hazarding a guess from the frame of 'having the most impact' and not of 'doing the most interesting thing':

  • It might help a lot if a metacognitive assistant already has a lot of context on the work
  • If you think someone else is doing better work than you and you can 2x them, that's better than doing your individual work. (And if instead you can 3x or 4x people...)
gw10

Additional major epidemics or scares that didn’t pan out ($50 for first few, $25 for later)

2014-15 HPAI outbreak in the US, which didn't ultimately make it to humans

gw80

I want to add two more thoughts to the competitive deliberate practice bit:

Another analogy for the scale of humanity point:

If you try to get better at something but don't have the measuring sticks of competitive games, you end up not really knowing how good you objectively are. But most people don't even try to get better at things. So you can easily find yourself feeling like whatever local optimum you've ended up in is better than it is. 

I don't know anything about martial arts, but suppose you wanted to get really good at fighting people. Then an analogy here is that you discover that, at leasts for everyone you've tried fighting, you can win pretty easily just by sucker punching them really hard. You might conclude that to get better at fighting, you should just practice sucker punching really well. One day you go to an MMA gym and get your ass kicked. 

I suspect this happens in tons of places, except there's not always an MMA gym to keep you honest. For example, my model of lots of researchers is that they learn a few tools really well (their sucker punches) and then just publish a bunch of research that they can successfully "sucker punch". But this is a kind of streetlight effect and tons of critical research might not be susceptible to sucker punching. Nonetheless, there is no gym of competitive researchers that show you just how much better you could be.

Identifying cruxiness:

I don't have a counterfactual George who hasn't messed around in competitive games, but I strongly suspect that there is some tacit knowledge around figuring out the cruxiness of different moving parts of a system or of a situation that I picked up from these games. 

For example, most games have core fundamentals, and picking up a variety of games means you learn what it generally feels like for something to be fundamental to an activity (e.g. usually just doing the fundamentals better than the other player is enough to win; like in Starcraft it doesn't really matter how good you are at microing your units if you get wildly out-macroed and steamrolled). But sometimes it's also not the fundamentals that matter, because you occasionally get into idiosyncratic situations where some weird / specific thing decides the game instead. Sometimes a game is decided by whoever figures that out first.

This feels related to skills of playing to your outs or finding the surest paths to victory? This doesn't feel like something that's easy to practice outside of some crisply defined system with sharp feedback loops, but it does feel transferrable. 

gw30

It is a bit early to tell and seems hard to accurately measure, but I note some concrete examples at the end.

Concrete examples aside, in plan making it's probably more accurate to call it purposeful practice than deliberate practice, but it seems super clear to me that in ~every place where you can deliberately practice, deliberate practice is just way better than whatever your default is of "do the thing a lot and passively gain experience". It would be pretty surprising to me if that mostly failed to be true of purposeful practice for plan making or other metacognitive skills.

gw93

As a concrete example, as far as I can piece together from various things I have heard, Open Phil does not want to fund anything that is even slightly right of center in any policy work. I don't think this is because of any COIs, it's because Dustin is very active in the democratic party and doesn't want to be affiliated with anything that is even slightly right-coded. Of course, this has huge effects by incentivizing polarization of AI policy work with billions of dollars, since any AI Open Phil funded policy organization that wants to engage with people on the right might just lose all of their funding because of that, and so you can be confident they will steer away from that.

Thanks for sharing, I was curious if you could elaborate on this (e.g. if there are examples of AI policy work funded by OP that come to mind that are clearly left of center). I am not familiar with policy, but my one data point is the Horizon Fellowship, which is non-partisan and intentionally places congressional fellows in both Democratic and Republican offices. This straightforwardly seems to me like a case where they are trying to engage with people on the right, though maybe you mean not-right-of-center at the organizational level? In general though, (in my limited exposure) I don't model any AI governance orgs as having a particular political affiliation (which might just be because I'm uninformed / ignorant).

gw72

Do you have any data on whether outcomes are improving over time? For example, % published / employed / etc 12 months after a given batch

Load More