Wiki Contributions


Some people have strong negative priors toward AI in general.

When the GPT-3 API first came out, I built a little chatbot program to show my friends/family. Two people (out of maybe 15) flat out refused to put in a message because they just didn't like the idea of talking to an AI.

I think it's more of an instinctual reaction than something thought through. There's probably a deeper psychological explanation, but I don't want to speculate.

Rather than having objective standards, I find a growth-centric approach to be most effective. Optimizing for output is easy to Goodheart, so as much as possible I treat more as a metric than a goal. It's important that I'm getting more done now than I was a year ago, for example, but I don't explicitly aim for a particular output on a day-to-day basis. Instead I aim to optimize my processes and improve my skills, which leads to increased output. That applies not just to good work performance, but many things.

  • > How much do you get done in a typical month/half year?

    Measuring this objectively is hard, but roughly one large project (big new feature, new application, major design overhaul) per month, or more if the projects I'm working on are smaller.
  • > How much do consider aspirational but realistic to get done in a typical month/half year?

    I've managed to get projects that I'd normally finish in a month done in 2 weeks or so by crunching hard, but I'm generally pretty consistent with output on the scale of months/half years. I definitely don't aim for that.
  • > How much do you consider on the low end but okay to get done in a typical month/half year?

    I wouldn't be too upset if a project goes over by 25% due to low output (they can go over longer if there are unexpected issues, but that's another thing). Again though, I'm pretty consistent on the scale of months/half years, so this rarely happens.
  • > What kind of output would you want to see out of a researcher/community organiser/other independent worker within a month/half a year to be impressed/not be disappointed? (Assuming this is amount is representative of them)

    I don't have objective standards here. If I get the impression they are genuinely putting in a good effort and improving with time, I'm happy. Different people have different strengths, and a person might work quite slowly relative to the average, but produce very high quality work. If they continue improving their output, eventually it will be high (for whatever standard of "high" you like). If they're putting in effort and not improving, they might not be in the right line of work, and then I'd be disappointed.
  • > What's the minimum output would you want to see out of a researcher/community organiser/other independent worker to be in favour of them getting funding to continue their work?  (Assuming this is amount is representative of them)

    This is a knapsack problem. Calculate the score = (expected output * expected value of work per unit of output) / funding required for each person that needs funding, sort the list in descending order, and allocate funding in order from top to bottom. You don't need to fully solve the knapsack problem here, because leftover funding can be carried over.
  • > What's the minimum output would you want to see out of your friend to feel good about them continuing their current work? (Assuming this is amount is representative of them)

    Their average output over the last 12 months should be higher than their average output over the previous 12, by some non-insignificant amount.

Agreed, this was an expected result. It's nice to have a functioning example to point to for LLMs in an RLHF context, though.

From one perspective, nature does kind of incentivize cooperation in the long term. See The Goddess of Everything Else.

Is there a reason to believe this is likely? Outside of a strong optimization pressure for niceness (of which there is definitely some, but relative to other optimization pressures it's relatively weak) I'd expect these organizations to be of roughly average possible niceness for their situation.

A quick Google search of probe tuning doesn't turn up anything. Do you have more info on it?

Probe-tuning doesn't train on LLM's own "original rollouts" at all, only on LLM's activations during the context pass through the LLM.

This sounds like regular fine tuning to me. Unless you mean that the loss is calculated based on one (multiple?) of the network's activations rather than on the output logits.

Edit: I think I get what you mean now. You want to hook a probe to a model and fine-tune it to perform well as a probe classifier, right?

It's also possible that there is some elegant, abstract "intelligence" concept (analogous to arithmetic) which evolution built into us but we don't understand yet and from which language developed. It just turns out that if you already have language, it's easier to work backwards from there to "intelligence" than to build it from scratch.

This probably isn't the case, but I secretly wonder if the people in camp #1 are p-zombies.

Not very familiar with US culture here: is AI safety not extremely blue-tribe coded right now?

How does the logic here work if you change the question to be about human history?

Guessing a 50/50 coin flip is obviously impossible, but if Omega asks whether you are in the last 50% of "human history" the doomsday argument (not that I subscribe to it) is more compelling. The key point of the doomsday argument is that humanity's growth is exponential, therefore if we're the median birth-rank human and we continue to grow, we don't actually have that long (in wall-time) to live.

Load More