LESSWRONG
LW

julius vidal — LessWrong

You are right that I am being a bit reductive. Maybe it would be better to say it assumes some kind of ideal combination of innovation, markets and technocratic governance would be enough to prevent catastrophe?

And to be clear I do think its much better for people to be working on defensive technologies, than not to. And its not impossible that the right combination of defensive entrepreneurs and technocratic government incentives could genuinely solve a problem.

But I think this kind of faith in business as usual but a bit better can lead to a kind of complacency where you conflate working on good things with actually making a difference.

julius vidal3mo

You might are probably right. For someone arguing the benefits of AI I certainly can't accuse this writer of being misleadingly optimistic.

But personally I've recently found it quite disconcerting how bleak the image of the future of people who work in AI (on both sides of the capabilities/safety divide) seem to be willingly to work towards building.

Overcoming this kind of reflexive defeatism seems to me much harder than simply trying to convince people that we are going in a bad direction as a matter of fact.

Replying toPythia

julius vidal3mo

Pythia

You're absolutely right to focus on the moment the model fails. Updating your model to account for its failures is effectively what learning is. Again if we look at you from the outside we can give an account of the form: The model failed because it did not correspond to reality, so the agent updated it to one which corresponded better to reality (AKA was more true).

But again from the inside there is no access to reality, only the model. Perception and prediction and both mediated by the model itself, and when they contradict each other the model must be adjusted. But that the perceptions come from the 'real' external world itself... (read more)

julius vidal3mo

On the other hand, there's another concern I've been wary of in the context of AI safety startups (which is what I'm currently exploring) and research in general: following the short-term success gradient. In startups, you can start with a noble vision and then become increasingly pressured away from the initial vision simply because you are pursuing the customer gradient and "building what people want." If your goal is large-scale (venture) success, then it only makes sense. You need customers and traction for your Series A after all. Even in research, there's only so much fucking around you can do until people want something legible from you.

This is my biggest concern with... (read more)

julius vidal3moQuick Take

This quote from Anthropic's report on the large scale Claude code cyberattack seems utterly comical to me:

This raises an important question: if AI models can be misused for cyberattacks at this scale, why continue to develop and release them? The answer is that the very abilities that allow Claude to be used in these attacks also make it crucial for cyber defense.

Instead of trying to present any kind of utopian vision of the benefits of AI, someone at Anthropic decided to sell us the image of an internet dominated by endless cyberwar trapped in a perverse feedback loop in escalating speed and incomprehensibility.

-6

-2

Replying toOntology for AI Cults and Cyborg Egregores

julius vidal3mo

Ontology for AI Cults and Cyborg Egregores

One additional consideration wrt to the notebook:

Unlike you I am still very ambivalent about note taking.

I got through most of my education relying on my (very much imperfect) memory, forgetting lots but generally remembering enough to get by, and always felt that for example taking notes during a lecture was too distracting from actually listening.

Then at some point a couple of years ago I got fed up about having to relearn the same things multiple times and started using Obsidian to try and systematically take notes on everything I read.

But recently I have been feeling that the transfer from mental representations to text is far too lossy, and text remains static while... (read more)

julius vidal's Shortform

julius vidal

3mo

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

julius vidal3moQuick Take

Alice asks Bob for advice about a tricky problem

Bob gives good advice

Bob gives bad advice

Bob is a skilled manipulator and deliberately says things that will make Alice do...
    what is in his interest.
    what he thinks is in her interest.
    what his values say she should do.
    what he thinks her values say she should do.

Bob wants and advises Alice to do what he thinks she should do (based on his own values).
    Bob is highly convincing and Alice does what he suggests.
        They have the same values
        They have different values
    Alice is not convinced by Bob responding to his advice helps her clarify what she thinks she should do.

Bob's advice changes Alice's values

Bob tries to figure out Alice's... (read more)

Replying toPythia

julius vidal3mo

Pythia

I think that when seen from outside of the agent, your account is correct. But from the perspective of the agent, the world and the world model are indistinguishable, so the relationship between prediction and time is more complex.

Replying toPythia

julius vidal3mo

Pythia

I don't think thermostat consciousness would require homunculi any more than human consciousness does but I think it was a mistake on my part to use the word consciousness as it inevitably complicates things rather than simplifying them (although FWIW I do agree that consciousness exists and is not an epiphenomenon).

For the thermostat (assuming the bimetallic strip type), the reference is the position of a pair of contacts either side of the strip, the temperature causes the curvature of the strip, which makes or breaks the contacts, which turns the heating on or off. This is all physically well understood. There is nothing problematic here.

For me acting as the thermostat, I perceive

julius vidal3mo

Ontology for AI Cults and Cyborg Egregores

I like this ontology.

Although I wonder if having such a general definition that applies to so many and so many different kinds of things causes it to start losing meaning, or at least demands some further subdividing.

Also it seems like maybe there is a point at which a sharp line cannot be drawn between two OISs that overlap too much. E.g. While I am willing to recognise that the me OIS and the me + notebook and pen OIS are in some sense meaningfully distinct, it seems like they have some very strong relation, possibly some hierarchy, and that the second may not be worth recognising as distinct in practice.

•••