Gytis Daujotas — LessWrong

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders

Great question that I wish I had an answer to! I haven't yet played around with GANs so not entirely sure. Do you have any intuition about what one would expect to see?

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders

Gytis Daujotas1y10

That's embarassing -- clearly, I need more pretraining. Thanks!

Interpreting and Steering Features in Images

Gytis Daujotas2y10

These are all great ideas, thanks Logan! Investigating different values of L0 seems especially promising.

Interpreting and Steering Features in Images

Gytis Daujotas2y30

Thanks for trying it out!

A section I was writing but then removed due to time constraints involved setting inference time rules. I found that they can actually work pretty well and you could ban features entirely or ban features conditionally and some other feature being present. For instance, to not show natural disasters when some subjects are in the image. But I thought this was pretty obvious, so I got bored of it.

Definitely right on the Gemini point!

Experiments in Evaluating Steering Vectors

Gytis Daujotas3y10

Definitely a good point! I wanted to get a rough sense as to whether this evaluation approach would work at all, so I deliberately aimed at trying to be monomaniacal. If I was to continue with this, you're right - I think figuring out what a human would actually want to see in a completion would be the next step in seeing if this technique can be useful in practice.

For the token probabilities -- I was inspired mostly by seeing this used in Ought's work for factored cognition:

https://github.com/rawmaterials/ice/blob/4493d6198955804cc03069c3f88bda1b23de616f/ice/recipes/experiments_and_arms/prompts/can_name_exps.py#L161

It seems like the misc. token probabilities usually add up to less than 1% of the total probability mass:
https://i.imgur.com/aznsQdr.png

Iterating fast: voice dictation as a form of babble

Gytis Daujotas4y10

Great to hear! I'm eager to know how you get on, please keep us up to date :)

Treatments for depression that depressed LW readers may not have tried?

Gytis Daujotas4y20

This is counterintuitive to me - I haven't heard of androgens being prescribed for depression. Do you have more information?

Book Review: The Beginning of Infinity

Gytis Daujotas4y30

cross-posting my comment here:

I really like the entirely new causes for optimism that are contained in this book.

I wonder sometimes, though, if Deutsch views such questions too much in the light of systems or phase transitions and thus looses the general view of morality. One central example is, as you described, his view on the ‘spaceship earth’ metaphor as a largely fearmongering response. Of course, the explanatory ability and general ability to innovate will prevail over a huge amount of adversity, like climate change. But you get the sense that these arguments remain true even if half of the world were to die tomorrow or something. Really, as long as some viable breeding population of humans in spacesuits can read instructions printed on steel etched cards, the Deutsch view has no comment and, following only this reasoning, we incur no loss.

On one hand, huge suffering is bad and should be avoided. But on the other, maybe Deutsch is right and right more intensely than even I would be comfortable with, that essentially nothing else matters than the phase transitions which he calls the beginnings of infinity.

Thanks for the review Sam and keep up the great work.

Preparing for ambition

Gytis Daujotas4y40

Thanks for sharing your experience. I'm somewhere near the beginning of the journey and thinking about taking on more risk in what I chose to solve, so the data point of your experience is a valuable waypoint marker.

Essays are highly bandwidth constrained, and most advice is wrong, but maybe this framework helps even slightly:

I think, in a subtle way, your interpretation of IFS differs from mine. When there's disagreement among the sub agents as to what to do, that causes confusion in me, or more often, months later, I realize I was acting totally bizarrely. But in that moment of disagreement, there's nothing wrong with the subagents, they're just disagreeing. No subagent needs to be convinced. Nothing needs to be enlightened. There's no poisoned self. There is just the entire self, composed of agents, and right now, in this very moment, I notice that the agents disagree.

Even when you switch to the metaphor of healing the agent, it's still a nicer way of saying that it's broken, flawed, and there's something wrong with it. Maybe, maybe not.

But I don't think this is often a viable approach to it. I like what Venkatesh Rao wrote:

So can human beings change or not? I like to think about this question in terms of Lego blocks. We are, each of us, particular accidental constructions made up of a set of blocks. The whole thing can be torn down and rebuilt into a different design, but you can’t really do anything to change the building blocks. The building blocks of personality are abstract consequences of the more literal building blocks at the biological level, genes. They constrain, but do not define, who we are or can be.

Maybe your agents are what they are. Some part of you is very ambitious. Another part of you, maybe even the rest of the quorum of parts, hates all the stress and intensity. Maybe, in Rao's metaphor, just as blocks can be reassembled into something new, you can negotiate a new agreement between the agents. But like the blocks, in my experience, I have never once been able to change any of my parts. So far, I have only been able to ask them what I should feel, to listen very closely, and to negotiate some new behavior to try instead when this behavior fails.

Book Review: The Signal and the Noise

Gytis Daujotas4y40

In particular, meteorologists are known to have a “wet bias” – they forecast rain more often than it actually occurs.

This seems to be very interesting: is the wet bias a marketing ploy that makes people feel the information is more valuable? Or is it an optimisation because people prefer to prepare for rain and then it not rain than vice versa? I think there's room for a lot of probability fudging to match intuitive human expectations, just because we are not very good at understanding probability. One example is that if it predicted a 90% chance of sun and it rained I would be very upset, even though this is perfectly within their prediction.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments