CBiddulph - LessWrong

Reading the Wikipedia article for "Complete (complexity)," I might have misinterpreted what "complete" technically means.

What I was trying to say is "given Sora, you can 'easily' turn it into an agent" in the same way that "given a SAT solver, you can 'easily' turn it into a solver for another NP-complete problem."

I changed the title from "OpenAI's Sora is agent-complete" to "OpenAI's Sora is an agent," which I think is less misleading. The most technically-correct title might be "OpenAI's Sora can be transformed into an agent without additional training."

OpenAI's Sora is an agent

CBiddulph2mo4-3

That sounds more like "AGI-complete" to me. By "agent-complete" I meant that Sora can probably act as an intelligent agent in many non-trivial settings, which is pretty surprising for a video generator!

A Shutdown Problem Proposal

CBiddulph3mo30

First and most important, there’s the choice of “default action”. We probably want the default action to be not-too-bad by the human designers’ values; the obvious choice is a “do nothing” action. But then, in order for the AI to do anything at all, the “shutdown” utility function must somehow be able to do better than the “do nothing” action. Otherwise, that subagent would just always veto and be quite happy doing nothing.

Can we solve this problem by setting the default action to "do nothing," then giving the agent an extra action to "do nothing and give the shutdown subagent +1 reward?"

The impossible problem of due process

CBiddulph3mo99

I think the implication was that "high-status men" wouldn't want to hang out with "low-status men" who awkwardly ask out women

Project ideas: Epistemics

CBiddulph4mo85

On the topic of AI for forecasting: just a few days ago, I made a challenge on Manifold Markets to try to incentivize people to create Manifold bots to use LLMs to forecast diverse 1-month questions accurately, with improving epistemics as the ultimate goal.

You can read the rules and bet on the main market here: https://manifold.markets/CDBiddulph/will-there-be-a-manifold-bot-that-m?r=Q0RCaWRkdWxwaA

If anyone's interested in creating a bot, please join the Discord server to share ideas and discuss! https://discord.com/channels/1193303066930335855/1193460352835403858

An explanation for every token: using an LLM to sample another LLM

CBiddulph6mo20

Thanks for the post! I had a similar idea which might let you maintain (or improve) accuracy while still getting the benefit of explanations - basically fine-tune the model on explanations that make it most likely to output the correct token.

For instance, you might have it to fill in the text between <think> and </think> on a bunch of text examples like this: "The capital of the country northeast of Spain is <think> The country northeast of Spain is France, and its capital is Paris </think> Paris".

You make the LLM come up with, say, 10 explanations each time, and choose the one that maximizes the logprob of the correct token ("Paris") immediately after </think>. Then fine-tune it to complete prompts like "The capital of the country northeast of Spain is <think>" with completions like "The country northeast of Spain is France, and its capital is Paris </think> Paris". Then, generate more completions with the fine-tuned model and fine-tune it yet again with the best completions. Rinse and repeat.

Eventually, the LLM should relearn how to guess the next token (at worst, it can just ignore everything in its "think" stage and get identical performance, so it should be able to learn this easily enough).

Alternatively, instead of training to maximize the correct token's logprob, you can choose prompts that minimize the perplexity over many tokens, say 100 tokens. This will be more useful in practice, since we'll probably mostly want explanations of the LLM's thinking across its entire response rather than in the very short term.

This helps because the LLM will actually have some "skin in the game" instead of just blathering on arbitrarily; the disadvantage is that it could eventually learn to optimize its thinking by encoding additional information in its explanation through steganography. Still, this certainly seems like an improvement on inscrutable matrices.

You could try to enforce that the LLM only uses the semantic content of its explanation by making another LLM rephrase it, translating it to another language, etc. This kind of approach wouldn't stop an adversary actively trying to hide its thoughts from you, but it seems like it would prevent most issues for now.

Stampy's AI Safety Info soft launch

CBiddulph7mo50

Thanks for doing this, this looks like it'll be very helpful for beginners in AI safety, and the content looks great!

I don't know if this will be addressed in your UI redesign, but I find the UI very counterintuitive. The main problem is that when I open and then close a tab, I expect every sub-tab to collapse and return to the previous state. Instead, the more tabs I open, the more cluttered the space gets, and there's no way to undo it unless I remove the back part of the URL and reload, or click the Stampy logo.

In addition, it's impossible to tell which tab was originally nested under which parent tab, which makes it much more difficult to navigate. And confusingly, sometimes there are "random" tabs that don't necessarily follow directly from their parent tabs (took me a while to figure this out). On a typical webpage, I could imagine thinking "this subtopic is really interesting; I'm going to try to read every tab under it until I'm done," but these design choices are pretty demotivating for that.

I don't have a precise solution in mind, but maybe it would help to color-code different kinds of tabs (maybe a color each for root tabs, leaf tabs, non-root branching tabs, and "random" tabs). You could also use more than two visual layers of nesting - if you're worried about tabs getting narrower and narrower, maybe you could animate the tab expanding to full width and then sliding back into place when it's closed. Currently an "unread" tab is represented by a slight horizontal offset, but you could come up with another visual cue for that. I guess doing lots of UX interviews and A/B testing will be more helpful than anything I could say here.

Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it)

CBiddulph7mo10

Came here to say this - I also clicked the link because I wanted to see what would happen. I wouldn't have done it if I hadn't already assumed it was a social experiment.

Some reasons why I frequently prefer communicating via text

CBiddulph7mo10

No, it makes sense to me. I have no idea why you were downvoted

Who Has the Best Food?

CBiddulph8mo48

I'd be interested in the full post!

LESSWRONG
LW

Posts

Wiki Contributions

Comments