Vanessa Kosoy — LessWrong

Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran's group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.

E-mail: {first name}@alter.org.il

Thanks for the heads up. Can you share which AI models were involved?

Maybe we want a multi-level categorization scheme instead? Something like:

Level 0: Author completely abstains from LLM use in all contexts (not just this post)
Level 1: Author uses LLMs but this particular post was made with no use of LLM whatsoever
Level 2: LLM was used (e.g. to look up information), but no text/images in the post came out of LLM
Level 3: LLM was used for light editing and/or image generation
Level 4: LLM was used for writing substantial parts
Level 5: Mostly LLM-generated with high-level human guidance/control/oversight

It occurred to me that one major success story of banning a technology (whether justifiably or not), is the laws against genetic engineering and cloning in humans. It makes me wonder what can we learn from that, which is applicable to promoting a global moratorium on ASI.

10 years ago I argued that approval-based AI might lead to the creation of a memetic supervirus. Relevant quote:

Optimizing human approval is prone to marketing worlds. It seems less dangerous than physicalist AI in the sense that it doesn't create incentives to take over the world, but it might produce some kind of a hyper-efficient memetic virus.

I don't think that what we see here is literally that, but the scenario does seem a tad less far-fetched now.

I found LLMs to be very useful for literature research. They can find relevant prior work that you can't find with a search engine because you don't know the right keywords. This can be a significant force multiplier.

They also seem potentially useful for quickly producing code for numerical tests of conjectures, but I only started experimenting with that.

Other use cases where I found LLMs beneficial:

Taking a photo of a menu in French (or providing a link to it) and asking it which dishes are vegan.
Recommending movies (I am a little wary of some kind of meme poisoning, but I don't watch movies very often, so seems ok).

That said, I do agree that early adopters seem like they're overeager and maybe even harming themselves in some way.

I did link the relevant section of my agenda post:

This work is my first rigorous foray into compositional learning theory.

A brief and simplified summary:

In order to have powerful learning algorithms with safety guarantees, we first need learning algorithms with powerful generalization guarantees that we know how to rigorously formulate (otherwise how do you know the algorithm will correctly infer the intended goal/behavior from the training data?).
Additionally, in order to formally specify "aligned to human values", we need to formally specify "human values", and it seems likely that the specification of "X's values" should be something akin to "the utility function w.r.t. which X has [specific type of powerful performance guarantees]". These powerful performance guarantees are probably a form/extension of powerful generalization guarantees.
Both reasons require us to understand the kind of natural powerful generalization guarantees that efficient learning algorithms can satisfy. Moreover, such understanding would likely be applicable to deep learning as well, as it seems likely deep learning algorithms satisfy such guarantees, but we currently don't know how to formulate them.
I conjecture that a key missing ingredient in deriving efficient learning algorithms with powerful guarantees (more powerful than anything we already understand in computational learning theory), is understanding the role of compositionality in learning. This is because compositionality is a ubiquitous feature of our thinking about the world and, intuitively, particular forms of compositionality are strong candidates for properties that are both very general and strong enough to enable efficient learning. This line of thinking led me to some success in the context of control theory, which is a necessary ingredient of the kind of guarantees we will ultimately need.
I identified sequence prediction / online learning in the deterministic realizable case as a relatively easy (but already highly non-trivial) starting point for investigating compositional learning.
For the reasons stated in the OP, this led me to ambiguous online learning.

I'm open to chatting on Discord.

I never did quite that thing successfully. I did have one time when I dropped progressively unsubtle hints on a guy, who remained stubbornly oblivious for a long time until he finally got the message and reciprocated.

Btw, what are some ways we can incorporate heuristics into our algorithm while staying on level 1-2?

We don't know how the prove to required desiderata about the heuristic, but we can still reasonably conjecture them and support the conjectures with empirical tests.
We can't prove or even conjecture anything useful-in-itself about the heuristic, but the way the heuristic is incorporated into the overall algorithm makes it safe. For example, maybe the heuristic produces suggestions together with formal certificates of their validity. More generally, we can imagine an oracle-machine (where the heuristic is slotted into the oracle) about which we cannot necessarily prove something like a regret bound w.r.t. the optimal policy, but we can prove (or at least conjecture) a regret bound w.r.t. some fixed simple reference policy. That is, the safety guarantee shows that no matter what the oracle does, the overall system is not worse than "doing nothing". Maybe, modulo weak provable assumptions about the oracle, e.g. that it satisfies a particular computational complexity bound.
[Epistemic status: very fresh idea, quite speculative but intriguing.] We can't find even a guarantee like a above for a worst-case computationally bounded oracle. However, we can prove (or at least conjecture) some kind of an "average-case" guarantee. For example, maybe we have high probability of safety for a random oracle. However, assuming a uniformly random oracle is quite weak. More optimistically, maybe we can prove safety even for any oracle that is pseudorandom against some complexity class (where we want $C_{1}$ to be as small as possible). Even better, maybe we can prove safety for any oracle in some complexity class $C_{2}$ (where we want $C_{2}$ to be as large as possible) that has access to another oracle which is pseudorandom against $C_{1}$ . If our heuristic is not actually in this category (in particular, $C_{2}$ is smaller than $P$ and our heuristic doesn't lie in $C_{2}$ ), this doesn't formally guarantee anything, but it does provide some evidence for the "robustness" of our high-level scheme.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments