LESSWRONG
LW

Peter Chatain
351111
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2017: An Actual Plan to Actually Improve
Peter Chatain2mo10

I’d love to hear your thoughts on how this has aged over the last 8 years! What other things have you learned? Do you still think in these terms around e.g. rituals? Have you stuck to the same evening routine?

Reply
My motivation and theory of change for working in AI healthtech
Peter Chatain2mo30

After seeing your work over the past 8 months this comment stands out to me. Institutions are made of people and decision makers, so that might be the best lever to pull for building better institutions. What if mental health and security is all you need?

Reply
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
Peter Chatain4mo10

a L1 penalty that penalizes large latent activations, JumpReLU (middle) and TopK (bottom) SAEs ...


This should say :
JumpReLU (top)

Reply
Race to the Top: Benchmarks for AI Safety
Peter Chatain2y10

Curious if you ever found what you were looking for.

Reply
If you've learned from the best, you're doing it wrong
Peter Chatain2y10

As stated by others, there are counter examples. An important class of counter examples I can think of is when you want to pick up on mental attitudes or traits that likely only the best have–think "You are the average of your 5 closest friends."

Reply
Examples of AI's behaving badly
Peter Chatain2y10

The link for the AI crafting a super weapon seems to be broken. Here is a later article that is the best I could find: https://www.digitalspy.com/videogames/a796635/elite-dangerous-ai-super-weapons-bug/

Reply
How would you improve ChatGPT's filtering?
Answer by Peter ChatainDec 10, 202230

Although this isn’t a direct answer, I think there’s something that changed recently with chat gpt such that it is now much better at filtering out illegal advice. It appears to be more complex than simply running a filter over what words were in the prompt or what words are in chat gpt’s output. By recent, I mean in the last 24 hours, and many tricks to “jailbreak” chat gpt no longer work.

It gives the impression that they modified the design of it to train on not providing illegal information.

Reply
Biology-Inspired AGI Timelines: The Trick That Never Works
Peter Chatain4y10

I was thinking something similar, but I missed the point about the prior. To get intuition, I considered placing like 99% probability on one day in 2030. Then generic uncertainty spreads out this distribution both ways, leaving the median exactly what it was before. Each bit of probability mass is equally likely to move left or right when you apply generic uncertainty. Although this seems like it should be slightly wrong since the tiny bit of probability that it is achieved right now can't go back in time, so will always shift right. 

In other words, I think this will be right for this particular case, but an incorrect argument for when significant probability mass is on it happening very soon, or for when there is a very large amount of correcting done.

Reply
Editor Mini-Guide
Peter Chatain4y10

Does this hide the text? (Sorry just testing things out rn)

Wow

Ok so you can hide stuff by typing >! on a new line

Reply
Occam's Razor and the Universal Prior
Peter Chatain4y30

Yep that's right! And it's a good thing to point out, since there's a very strong bias towards whatever can be expressed in a simple manner. So, the particular universal Turing machine you choose can matter a lot. 

However, in another sense, the choice is irrelevant. No matter what universal Turing machine is used for the Universal prior, AIXI will still converge to the true probability distribution in the limit. Furthermore, for a certain very general definition of prior, the Universal prior assigns more* probability to all possible hypotheses than any other type of prior.  

*More means up to a constant factor. So f(x)=x is more than g(x)=2x because we are allowed to say f(x)>1/3g(x) for all x.  

Reply
Load More
AIXI
4y
(+79/-79)
29Occam's Razor and the Universal Prior
4y
5