Jan_Kulveit — LessWrong

LESSWRONG
LW

Directionally agree, although not in the details. Come to postagi.org, in my view we are on track to slight majority of people thinling about this gathering there (quality weighted). Also lot of the work is not happening under the AI safety brand, so if you look at just AI safety, you miss a lot.

LLM-generated text is not testimony

Jan_Kulveit6d106

I would describe that position as "I suspect LLMs don't have distal/deep mental states, and as I mostly care about these distal mental states/representations, LLMs are not doing the important parts of thinking"

Also my guess is you are partially wrong about this. LLMs learn deep abstractions of reality; as these are mostly non-verbal / somewhat far from "tokens", they are mostly unable to explain or express them using words; similarly to limited introspective access of humans.

LLM-generated text is not testimony

Jan_Kulveit7d2513

The claim that the thought process behind words—the mental states of the mind and agency that produced the words ... does not exist seems phenomelogically contradicted by just interacting with LLMs. I expect your counteragrument be to appeal to some idiosyncratic meanings of words like thoughts or mind states, and my response being something in the direction 'planes do fly'.

Why LLM it up? Just give me the prompt. One reason why not to is your mind is often broadly unable to trace the thoughts of an LLM, and if the specific human-AI interaction leading to some output has nontrivial context & lenght, you would also be unable to get an LLM to replicate the trace without the context shared.

The Memetics of AI Successionism

Jan_Kulveit8d20

Seems reasonable split, although I try to gesture at / share compressed versions of the background knowledge.

A system I'd like to see in this domain is a system tracking my personal knowledge state, and explaining the diffs / updates relative not to what the author assumes, but for me personally. (I often find reading popular non-fiction mildly annoying; I get that authors need to start from a limited common denominator and can't count on readers understanding statistics, econ, maths, ML, epistemology, linear algebra, quantum mechanics, etc etc but this usually means the actually interesting part is like ~5% of the text + same idea repeated 3 more times -> LMs help with this)

The Memetics of AI Successionism

Jan_Kulveit8d-20

I don't claim that most memes derive their fitness from resolving cognitive dissonance. There are many reasons why something may be memetically fit, and I gesture at some of the more common ones. For example most common religions and ideologies have some memes which encourage proselytizing - the mechanics why this increases memetic fitness is not particularly subtle or mysterious. Also many ideas are fit just because they are straighforwardly predictive or helpful. For example the idea that you should stop on red light at crossings is fairly prevalent, helpful coordination norm, and trasmitted both vertically from parents, by state, by dedicated traffic safety signs, etc.

In my view succesionism is interesting case study is because
- it is not directly useful for predicting observations, manipulating physical reality or solving coordination problems
- many of the common memes remixed are clearly insufficient to explain the spread - many big ideologies claim to understand the arc of history, expanding moral circle toward AIs is not yet powerful right now, misanthropy is unatractive,...
so the question why this spreads is interesting.

You may argue it's because straightforwardly true or object-level compelling, but I basically don't buy that. Metaethics is hard, axiology is hard, and macro-futurism is hard, and all of these domains share the feature that you can come up with profound sounding object-level reasons for basically arbitrary positions. This means without some ammount of philosophical competence and discipline, I'd expect people arrive at axiologies and meta-ethical ideas which fit beliefs they adopted for other reasons. Forms of successionism I mention share the feature that there is close to zero philosophers endorsing them, and when people with some competence in philosophy look at the reasons given, they see clear mistakes, arguments ignored, etc. Yes "part will also track genuine and often rigorous attempts to reason about the future", but my guess is it's not a large part - my impression is if you genuinely and rigorously reason about the future, you usually arrive at some combination of transhumanist ideas, view that metaethics is important and we don't have clear solution, and something about AI being big deal.

I do agree AI xrisk memeplex is also somewhat strange and interesting case.

Wei Dai's Shortform

Jan_Kulveit9d40

[low effort list] Bottlencks/issues/problems

- philosophy has worse short feedback loops than eg ML engineering -> in all sorts of processes like MATS or PIBBSS admissions it is harder to select for philosophical competence, also harder to self-improve
- incentives: obviously stuff like being an actual expert in pretraining can get you lot of money and respect in some circles; even many prosaic AI safety / dual use skills like mech interpretability can get you maybe less money than pretraining, but still a lot of money if you work in AGI companies, and also decent ammount of status in ML community and a AI safety community; improving philosophical competence may get you some recognition but only among relatively small and weird group of people
- the issue Wei Dai is commenting on in the original post, founder effects persist to this day & also there is some philosophy-negative prior in STEM
- idk, lack of curiousity? llms have read it all, it's easy to check if there is some existing thinking on a topic

Wei Dai's Shortform

Jan_Kulveit9d*Ω5102

I mostly agree with 1. and 2., with 3. it's a combination of the problems are hard, the gung-ho approach and lack of awareness of the difficulty is true, but also academic philosophy is structurally mostly not up to the task because factors like publication speeds, prestige gradients or speed of ooda loops.
My impression is getting generally smart and fast "alignment researchers" more competent in philosophy is more tractable than trying to get established academic philosophers change what they work on, so one tractable thing is just convincing people the problems are real, hard and important. Other is maybe recruiting graduates

The Memetics of AI Successionism

Jan_Kulveit11d124

2. I actually have somewhat overlapping concerns about the doom memeplex and a bunch of notes about it, but its not near even a draft post. But your response provides some motivation to write it as well. In the broader space, there are good posts about the doom memeplex for the LW audience from Valentine, so I felt this is less neglected.

3. I generally don't know. My impression is when I try to explain the abstract level without a case study, readers are confused what's the point or how is it applicable. My impression is meta explanations of memetics of some ideology tends to weaken it almost no matter what the ideology is, so I don't think I could have chosen some specific example without the result being somewhat controversial. But what I could have done is having multiple different examples, that's valid criticism.

Frontier LLM Race/Sex Exchange Rates

Jan_Kulveit20d4941

Just flagging that the claim of the post

In this paper, they showed that modern LLMs have coherent and transitive implicit utility functions and world models

is basically a lie. The paper showed that in some limited context, LLMs answer some questions somewhat coherently. The paper have not shown much more (despite sensationalist messaging). It is fairly trivial to show that modern LLMs are very sensitive to framing and you can construct experiments in which they violate transitivity and independence. The VNM math than guarantees that you can not construct a utility function to represent the results.

On the functional self of LLMs

Jan_Kulveit22d20

"The base layer is ultimately made up of models of characters, in a Simulators-ish sense" No it is not, in a similar way as what your brain is running is not ultimately made of characters. It's ultimately made of approximate bayesian models.
what distinguishes a context-prompted ephemeral persona from that richer and more persistent character Check Why Simulator AIs want to be Active Inference AIs
With respect to active inference ... Sorry, don't want to be offensive, but it would actually be helpful for your project to understand active inference at least a bit. Empirically it seems has-repeatedly-read-Scott-Alexander's-posts-on-it leads people to some weird epistemic state, in which people seem to have a sense of understanding, but are unable to answer even basic questions, make very easy predictions, etc. I suspect what's going on is a bit like if someone reads some well written science popularization book about quantum mechanics but actually lacks concepts like complex numbers or vector spaces, they may have somewhat superficial sense of understanding.
Obviously active inference has a lot to say about how people self-model themselves. For example, when typing these words, I assume it's me who types them (and not someone else, for example). Why? That's actually important question for why self. Why not, or to what extent not in LLMs? How stories that people tell themselves about who they are impact what they do is totally something which makes sense to understand from active inference perspective.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments