rpglover64 - LessWrong

Thanks!

I did notice that we were comparing traces at the same parameters values by the third read-through, so I appreciate the clarification. I think the thing that would have made this clear to me is an explicit mention that it only makes sense to compare traces within the same run.

Interpreting Complexity

rpglover644mo41

This may be a naive question/observation, but it seems to me like treating the trace as a vector conflates the behavior before convergence and after convergence. Note that my intuition is that SGLD is basically an MCMC technique, so if I'm misunderstanding, it's probably because of that.

After convergence, the samples should be viewed as drawn from the stationary distribution, and ideally they have low autocorrelation, so it doesn't seem to make sense to treat them as a vector, since there should be many equivalent traces. I'd be interested to see density estimates per sample point, not just the per-timestep ones you have.

Before convergence, it does make sense to look at the vector, but it's still noisy, so I'd also be interested in certain summaries of it; for example, time to convergence and slope of loss (which is a function of time and the mean loss of the stationary distribution).

I also find myself wondering about other, more tangential things:

Could you use a per-sample gradient trace (rather than the loss trace) of the SGLD to learn something?
Does it make sense to e.g. run multiple chains?
Does it make sense to change the temperature throughout the run (like simulated annealing) rather than just run with each temperature?

What TMS is like

rpglover647mo10

Not my medical miracle post, just my comment on it.

"Objectively" for me would translate to "biomarker" i.e., a bio-physical signal that predicts a clinical outcome.

Yes, though I wouldn't restrict it to "clinical" because I care about non-medical outcomes, and "bio-physical" seems restrictive, though based on your example, that seems to be just my interpretation of the term.

Note that for depression and many psychological issues this means that we find the biomarkers by asking people how they feel

These are legitimate biomarkers, but they're not what I want, and I'm struggling to explain specifically why; the two things that come up are that they have low statistical power and they're a particularly lagging indicator (imagine for contrast e.g. being able to tell whether an antidepressant would work for you after taking it for a week, even if it takes two months to feel the effects). They're fine and useful for statistics, and even for measuring the effectiveness of a treatment in an individual, but a lot less useful for experimenting.

I'm assuming you mean biomarkers for psychological / mental health outcomes specifically. This is spiritually pretty close to what my lab studies - ways to predict how TMS will affect individuals, and adjust it to make it work better in each person.

That sounds really cool. I'm assuming there's nothing actionable available right now for patients?

Our philosophy [...] is that the effects of an intervention will manifest most reliably in reactions to very simple cognitive tasks like vigilance, working memory, and so on. Most serious health issues impact your reaction times, accuracy, bias, etc. in subtle but statistically reliable ways.

Yep. This is basically what I'm hoping to monitor in myself. For example, better vigilance might translate to better focus on work tasks, or better selective attention might imply better impulse control.

Measuring these with random sampling from a phone app and doing good statistics on the data is probably your best bet for objectively assessing interventions. Maybe that is what Quantified Mind does, I'm not sure?

QM doesn't work so well on phone and hasn't been updated on years and has major UX issues for my use case that makes it too hard to work with. It also doesn't expose the raw statistics. Cognifit (the only app I've found that does assessment and not just "brain training") reports even less.

Do you have a specific app that you know of?

The short answer is that if this were easy, it would already be popular, because we clearly need it. A lot of academic labs and industry people are trying to do this all the time. There is growing success, but it's slow growing and fraught with non-replicable work.

I don't think this is true. My alternative hypothesis (which I think is also compatible with the data) is that it's not hard, but there's no money in it, so there's not much commercial "free energy" making it happen, and that it's tedious, so there's not much hobbyist "free energy", and academia is slow as things like this.

What TMS is like

rpglover648mo11

This isn't directly related to TMS, but I've been trying to get an answer to this question for years, and maybe you have one.

When doing TMS, or any depression treatment, or any supplementation experiment, etc. it would make sense to track the effects objectively (in addition to, not as a replacement for subjective monitoring). I haven't found any particularly good option for this, especially if I want to self-administer it most days. Quantified mind comes close, but it's really hard to use their interface to construct a custom battery and an indefinite experiment.

Do you know of anything?

LLMs for Alignment Research: a safety priority?

rpglover641yΩ110

Would you say that models designed from the ground up to be collaborative and capabilitarian would be a net win for alignment, even if they're not explicitly weakened in terms of helping people develop capabilities? I'd be worried that they could multiply human efforts equally, but with humans spending more effort on capabilities, that's still a net negative.

Many arguments for AI x-risk are wrong

rpglover641y-11

I really appreciate the call-out where modern RL for AI does not equal reward-seeking (though I also appreciate @tailcalled 's reminder that historical RL did involve reward during deployment); this point has been made before, but not so thoroughly or clearly.

A framing that feels alive for me is that AlphaGo didn't significantly innovate in the goal-directed search (applying MCTS was clever, but not new) but did innovate in both data generation (use search to generate training data, which improves the search) and offline-RL.

Open Thread – Winter 2023/2024

rpglover641y10

before:

after:

Here the difference seems only to be spacing, but I've also seen bulleted lists appear. I think but I can't recall for sure that I've seen something similar happen to top-level posts.

Open Thread – Winter 2023/2024

rpglover641y10

@Habryka @Raemon I'm experiencing weird rendering behavior on Firefox on Android. Before voting, comments are sometimes rendered incorrectly in a way that gets fixed after I vote on them.

Is this a known issue?

Open Thread – Winter 2023/2024

rpglover641y11

This is also mitigated by automatic images like gravatar or the ssh key visualization. I wonder if they can be made small enough to just add to usernames everywhere while maintaining enough distinguishable representations.

Hidden Cognition Detection Methods and Benchmarks

rpglover641y10

Note that every issue you mentioned here can be dealt with by trading off capabilities.

Yes. The trend I see is "pursue capabilities, worry about safety as an afterthought if at all". Pushing the boundaries of what is possible on the capabilities front subject to severe safety constraints is a valid safety strategy to consider (IIRC, this is one way to describe davidad's proposal), but most orgs don't want to bite the bullet of a heavy alignment tax.

I also think you're underestimating how restrictive your mitigations are. For example, your mitigation for sycophancy rules out RLHF, since the "HF" part lets the model know what responses are desired. Also, for deception, I wasn't specifically thinking of strategic deception; for general deception, limiting situational awareness doesn't prevent it arising (though it lessens its danger), and if you want to avoid the capability, you'd need to avoid any mention of e.g. honesty in the training.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments