N1X — LessWrong

N1X9mo10

I believe the availability side of this is what organizational-level calendars are for.

For the preference side, it's handy to share a physiological time zone (e.g. having similar availability and "best working hours" regardless of actual geography), precomitting to some minimum waiting period (e.g. "rolling an RNG with anyone who chimes in within 5 minutes" rather than "who's free?") to reduce the fastest-hand-raise problem, and if you end up noticing a preference, you can always weight the RNG accordingly.

Thoughts on “AI is easy to control” by Pope & Belrose

N1X2y10

Humans often really really want something in the world to happen

This sentence is adjacent to my core concern regarding AI alignment, and why I'm not particularly reassured by the difficulty-of-superhuman-performance or return-on-compute reassurances regarding AGI: we don't need superhuman AI to deal superhuman-seeming amounts of damage. Indeed, even today's "perfectly-sandboxed" models (in that according to the most reliable publicly-available information none of the most cutting-edge models are allowed direct read/write access to the systems which would allow them to plot and attain world domination or the destruction of humanity (or specific nations' interests) have the next-best thing: whenever a new technological lever emerges in the world, humans with malicious intentions are empowered to a much greater degree than those who want strictly the best^[1] for everybody. There are also bit-flip attacks on aligned AI which are much harder to implement on humans.

^{^}
Using "best" is fraught but we'll pretend that "world best-aligned with a Pareto-optimal combination of each person's expressed reflective preferences and revealed preferences, to the extent that those revealed preferences do not represent akrasia or views and preferences which the person isn't comfortable expressing directly and publicly but does indeed have" is an adequate proxy to continue along this line of argument; the other option is developing a provably-correct theory of morality and politics which would take more time than this comment by 2-4 orders of magnitude.

Thoughts on “AI is easy to control” by Pope & Belrose

N1X2y10

“Imagine a square circle, and now answer the following questions about it…”.

Just use the Chebyshev (aka maximum or ) metric.

Social status part 1/2: negotiations over object-level preferences

N1X2y10

I think a somewhat-more-elegant toy model might look something like the following: Alice’s object-level preferences are , and Beth’s are $U_{B}$ . Alice’s all-things-considered preferences are $U_{A} + α U_{B}^{'}$ , and Beth’s are $U_{B} + β U_{A}^{'}$ . Here, $U_{A}^{'}$ & $U_{B}^{'}$ represent Beth’s current beliefs about Alice’s desires and vice-versa, and the $α, β$ parameters represent how much Alice cares about Beth’s object-level desires and vice-versa. The latter could arise from admiration of the other person, fear of pissing them off, or various other considerations discussed in the next post.

I think that the most general model would be $U_{A} + f_{t} (U_{B}^{'})$ and $U_{B} + g_{t} (U_{A}^{'})$ where $A_{t}, B_{t}$ are time-dependent (or past-interaction-dependent, or status-dependent; these are all fundamentally the same thing). It's a notable feature that this model does not assume that the functions are monotonic! I suspect that most people are willing to compromise somewhat on their preferences but become less willing to do so when the other party begins to resemble a utility monster, and notably covers the situation where the weight is negative under some but not all circumstances.

Social status part 1/2: negotiations over object-level preferences

N1X2y10

trains but not dinosaurs

Did you get this combo from this video, or is this convergent evolution?

What makes teaching math special

N1X2y30

This argument is in no small part covered in

https://worrydream.com/refs/Lockhart_2002_-_A_Mathematician's_Lament.pdf

which is also available in 5-times-the-page-count-and-costs-$10.

Then you should pay them 10 years of generous salary to produce a curriculum and write model textbooks. You need both of that. (If you let someone else write the textbook, the priors say that the textbook will probably suck, and then everyone will blame the curriculum authors. And you, for organizing this whole mess.) They should probably also write model tests.

The problem undergirding the problem you're talking about is not just that nobody's decided to "put the smart people who know math and can teach effectively in a room and let them write the curriculum." As a matter of fact, both New Math and the Common Core involved people with at least all but point 2, and the premise that elementary school teachers are best qualified to undertake this project is a flawed one (if it's a necessity, then Lockhart may be the most famous exemplar adjacent to your goals, and reading his essay or book should take priority over trying to theorize or recruit other similar specialists.

The Pareto Best and the Curse of Doom

N1X2y30

The negative examples are the things that fail to exist because there aren't enough people with that overlap of skills. The Martian for automotive repair might exist, but I haven't heard of it.

Zen and the Art of Motorcycle Maintenance?

OpenAI's Sora is an agent

N1X2y10

Why "selection" could be a capacity which would generalize: albeit to a (highly-lossy) first approximation, most of the most successful models have been based on increasingly-general types of gamification of tasks. The more general models have more general tasks. Video can capture sufficient information to describe almost any action which humans do or would wish to take along with numerous phenomena which are impossible to directly experience in low-dimensional physical space, so if you can simulate a video, you can operate or orchestrate reality.
Why selection couldn't generalize: I can watch someone skiing but that doesn't mean that I can ski. I can watch a speedrun of a video game and, even though the key presses are clearly visible, fail to replicate it. I could also hack together a fake speedrun. I suspect that Sora will be more useful for more-convincingly-faking speedrun content than for actually beating human players or becoming the TAS tool to end all TAS tools (aside from novel glitch discovery). This is primarily because there's not a strong reason to believe that the model can trained to achieve extremely high-fidelity or high-precision tasks.

Leading The Parade

N1X2y30

One way to identify counterfactually-excellent researchers would be to compare the magnitude of their "greatest achievement" and secondary discoveries, because the credit that parade leaders get is often useful for propagating their future success and the people who do more with that boost are the ones who should be given extra credit for originality (their idea) as opposed to novelty (their idea first). Newton and Leibniz both had remarkably successful and diverse achievements, which suggests that they were relatively high in counterfactual impact in most (if not all) of those fields. Another approach would consider how many people or approaches to a problem had tried and failed to solve it: crediting the zeitgeist rather than Newton and/or Leibniz specifically seems to miss a critical question, namely that if neither of them solved it, would it have taken an additional year, or more like 10 to 50? In their case, we have a proxy to an answer: ideas took months or years to spread at all beyond the "centers of discovery" at the time, and so although they clearly took only a few months or years to compete for the prize of first (and a few decades to argue over it), we can relatively safely conjecture that whichever anonymous contender is third in the running is likely to have been behind on at least that timescale. That should be considered in contrast to Andrew Wiles, whose proof of Rermat's Last Theorem was efficiently and immediately published (and patched as needed) This is also important because other and in particular later luminaries of the field (e.g. Mengoli, Mercator, various Bernoullis, Euler, etc.) might not have had the vocabulary necessary to make as many discoveries as quickly as they did or communicate those discoveries as effectively if not for Newton & Leibniz's timely contributions.

Snake Eyes Paradox

N1X2y32

Right, and the correct value is 37/72, not 19/36, because exactly half of the remaining 70/72 players lose (in the limit).

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments