Wiki Contributions


Sam Altman Q&A Notes - Aftermath

But I have never seen an article pulled completely before.

It happened before, but it's quite rare. Normally when I've done it, I've left a note in an Open Thread, such as this case where I moved to drafts a post that was talking about an ongoing legal case (now concluded). I think that's the last one I did, and it was four years ago? But there are other mods as well.

D&D.Sci Pathfinder: Return of the Gray Swan Evaluation & Ruleset

Overall Kraken damage is substantially higher on a 4-gun ship than a 2-gun ship.

This seems reversed to me.

Conflict in Kriorus becomes hot today, updated, update 2

Also, I was under the impression that cryonics was a business with significant returns to scale--two facilities storing 100 bodies each is much more expensive than one facility storing 200 bodies, which makes 'market share' more important than it normally is.

Coordination Schemes Are Capital Investments

There's a paired optimization problem, where you assign everyone to a room, and the constraint that this assignment be 'envy-free'; that is, no one looks at someone else's assignment/rent combo and says "I'd rather have that than my setup!". There was a calculator that I can't easily find now which tried to find the centroid of the envy-free region.

There are other approaches that work differently; this one, for example, tries to split surplus evenly between the participants, and shows the comparison to other options.

Can you control the past?

Do you “manage the news” by refusing to read the morning’s newspaper, or by scribbling over the front page “Favored Candidate Wins Decisively!”? No: if you’re rational, your credence in the loss is still 70%.

I feel like the "No; if you're rational" bit is missing some of the intuition against EDT. Physical humans do refuse to read the morning's newspaper, or delay opening letters, or similar things, I think because of something EDT-ish 'close to the wire'. (I think this is what's up with ugh fields.)

I think there's something here--conservation of expected evidence and related--that means that a sophisticated EDT won't fall prey to those traps. But this feels sort of like the defense whereby a sophisticated EDT doesn't fall prey to typical counterexamples because if you're doing the expectation correctly, you're taking into account causation, at which point we're not really talking about EDT anymore. I do think it's sensible to include proper probabilistic reasoning in EDT, but sometimes feels off about hiding this detail behind the word "rational."

Vaniver's Shortform

One frame I have for 'maximizing altruism' is that it's something like a liquid: it's responsive to its surroundings, taking on their shape, flowing to the lowest point available. It rapidly conforms to new surroundings if there are changes; turn a bottle on its side and the liquid inside will rapidly resettle into the new best configuration.

This has both upsides and downsides: the flexibility and ability to do rapid shifts mean that as new concerns become the most prominent, they can be rapidly addressed. The near-continuous nature of liquids means that as you get more and more maximizing altruist capacity, you can smoothly increase the 'shoreline'.

Many other approaches seem solid instead of liquid, in a way that promotes robustness and specialization (while being less flexible and responsive). If the only important resources are fungible commodities, then the liquid model seems optimal; if it turns out that the skills and resources you need for tackling one challenge are different than the skills and resources needed for tackling another, or if switching costs dominate the relative differences between projects. Reality has a surprising amount of detail, and it takes time and effort to build up the ability to handle that detail effectively.

I think there's something important here for the broader EA/rationalist sphere, tho I haven't crystallized it well yet. It's something like--the 'maximizing altruism' thing, which I think of as being the heart of EA, is important but also a 'sometimes food' in some ways; it is pretty good for thinking about how to allocate money (with some caveats) but is much less good for thinking about how to allocate human effort. It makes sense for generalists, but actually that's not what most people are or should be. This isn't to say we should abandon maximizing altruism, or all of its precursors, but... somehow build a thing that both makes good use of that, and good use of less redirectable resources.

The Codex Skeptic FAQ

[Note: I use Copilot and like it. The 'aha' moment for me was when I needed to calculate the intersection of two lines, a thing that I would normally just copy/paste from Stack Overflow, and instead Copilot wrote the function for me. Of course I then wrote tests and it passed the tests, which seemed like an altogether better workflow.]

Language models are good enough at generating code to make the very engineers building such models slightly more productive

How much of this is 'quality of code' vs. 'quality of data'? I would naively expect that the sort of algorithmic improvements generated from OpenAI engineers using Copilot/Codex/etc. are relatively low-impact compared to the sort of benefits you get from adding your company's codebase to the corpus (or whatever is actually the appropriate version of that). I'm somewhat pessimistic about the benefits of adding Copilot-generated code to the corpus as a method of improving Copilot.

Extraction of human preferences 👨→🤖

Thanks for sharing negative results!

If I'm understanding you correctly, the structure looks something like this:

  • We have a toy environment where human preferences are both exactly specified and consequential.
  • We want to learn how hard it is to discover the human preference function, and whether it is 'learned by default' in an RL agent that's operating in the world and just paying attention to consequences.
  • One possible way to check whether it's 'learned by default' is to compare the performance of a predictor trained just on environmental data, a predictor trained just on the RL agent's internal state, and a predictor extracted from the RL agent.

The relative performance of those predictors should give you a sense of whether the environment or the agent's internal state give you a clearer signal of the human's preferences.

It seems to me like there should be some environments where the human preference function is 'too easy' to learn on environmental data (naively, the "too many apples" case should qualify?) and cases where it's 'too hard' (like 'judge how sublime this haiku is', where the RL agent will also probably be confused), and then there's some goldilocks zone where the environmental predictor struggles to capture the nuance and the RL agent has managed to capture the nuance (and so the human preferences can be easily exported from the RL agent). 

Does this frame line up with yours? If so, what are the features of the environments that you investigated that made you think they were in the goldilocks zone? (Or what other features would you look for in other environments if you had to continue this research?)

Load More