Why do some societies exhibit more antisocial punishment than others? Martin explores both some literature on the subject, and his own experience living in a country where "punishment of cooperators" was fairly common.

William_S17hΩ531047
16
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool. I resigned from OpenAI on February 15, 2024.
You should update by +-1% on AI doom surprisingly frequently This is just a fact about how stochastic processes work. If your p(doom) is Brownian motion in 1% steps starting at 50% and stopping once it reaches 0 or 1, then there will be about 50^2=2500 steps of size 1%. This is a lot! If we get all the evidence for whether humanity survives or not uniformly over the next 10 years, then you should make a 1% update 4-5 times per week. In practice there won't be as many due to heavy-tailedness in the distribution concentrating the updates in fewer events, and the fact you don't start at 50%. But I do believe that evidence is coming in every week such that ideal market prices should move by 1% on maybe half of weeks, and it is not crazy for your probabilities to shift by 1% during many weeks if you think about it.
habryka12h2815
3
Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens.  Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.
Dalcy16h346
1
Thoughtdump on why I'm interested in computational mechanics: * one concrete application to natural abstractions from here: tl;dr, belief structures generally seem to be fractal shaped. one major part of natural abstractions is trying to find the correspondence between structures in the environment and concepts used by the mind. so if we can do the inverse of what adam and paul did, i.e. 'discover' fractal structures from activations and figure out what stochastic process they might correspond to in the environment, that would be cool * ... but i was initially interested in reading compmech stuff not with a particular alignment relevant thread in mind but rather because it seemed broadly similar in directions to natural abstractions. * re: how my focus would differ from my impression of current compmech work done in academia: academia seems faaaaaar less focused on actually trying out epsilon reconstruction in real world noisy data. CSSR is an example of a reconstruction algorithm. apparently people did compmech stuff on real-world data, don't know how good, but effort-wise far too less invested compared to theory work * would be interested in these reconstruction algorithms, eg what are the bottlenecks to scaling them up, etc. * tangent: epsilon transducers seem cool. if the reconstruction algorithm is good, a prototypical example i'm thinking of is something like: pick some input-output region within a model, and literally try to discover the hmm model reconstructing it? of course it's gonna be unwieldly large. but, to shift the thread in the direction of bright-eyed theorizing ... * the foundational Calculi of Emergence paper talked about the possibility of hierarchical epsilon machines, where you do epsilon machines on top of epsilon machines and for simple examples where you can analytically do this, you get wild things like coming up with more and more compact representations of stochastic processes (eg data stream -> tree -> markov model -> stack automata -> ... ?) * this ... sounds like natural abstractions in its wildest dreams? literally point at some raw datastream and automatically build hierarchical abstractions that get more compact as you go up * haha but alas, (almost) no development afaik since the original paper. seems cool * and also more tangentially, compmech seemed to have a lot to talk about providing interesting semantics to various information measures aka True Names, so another angle i was interested in was to learn about them. * eg crutchfield talks a lot about developing a right notion of information flow - obvious usefulness in eg formalizing boundaries? * many other information measures from compmech with suggestive semantics—cryptic order? gauge information? synchronization order? check ruro1 and ruro2 for more.
lc4h79
0
I seriously doubt on priors that Boeing corporate is murdering employees.

Popular Comments

Recent Discussion

The beauty industry offers a large variety of skincare products (marketed mostly at women), differing both in alleged function and (substantially) in price. However, it's pretty hard to test for yourself how much any of these product help. The feedback loop for things like "getting less wrinkles" is very long.

So, which of these products are actually useful and which are mostly a waste of money? Are more expensive products actually better or just have better branding? How can I find out?

I would guess that sunscreen is definitely helpful, and using some moisturizers for face and body is probably helpful. But, what about night cream? Eye cream? So-called "anti-aging"? Exfoliants?

ophira29m10

Yeah, glycolic acid is an exfoliant. The retinoid family also promotes cell turnover, but in a different way. You'd be over-exfoliating by using both of them at the same time. There's a whole art to combining actives. This is one of the reasons that I work with a dermatologist; especially when you're starting out, it can be helpful to have a medical professional monitoring you and making sure you don't accidentally burn your face off.

4Razied10h
Weird side effect to beware for retinoids: they make dry eyes worse, and in my experience this can significantly decrease your quality of life, especially if it prevents you from sleeping well.
1nebuchadnezzar11h
Regarding sunscreens, Hyal Reyouth Moist Sun by the Korean brand Dr. Ceuracle is the most cosmetically elegant sun essence I have ever tried. It boasts SPF 50+, PA++++, chemical filters (no white cast) and is very pleasant to the touch and smell, not at all a sensory nightmare.
2ophira12h
Snail mucin is one of those products that has less evidence behind it, besides its efficacy as a humectant, compared to the claims you'll often see in marketing. Here's a 1-minute video about it.   It's true that just because a research paper was published, it doesn’t mean that the results are that reliable — when you dig into the studies that are cited in ads, you'll often find out they had a very small number of participants, or they only did in vitro testing, or they graded their product based on the participants' feelings, or something like that. I’d also argue that natural doesn’t necessarily mean better. My favourite example is shea butter — some people have this romantic notion that it needs to come directly from a far-off village, freshly pounded, but the reality is that raw shea butter often contains stray particles that can in fact exacerbate allergic reactions. Refined shea butter is also really cool from a chemistry perspective, like, you can do very neat things with the texture.
12Thomas Kwa3h
You should update by +-1% on AI doom surprisingly frequently This is just a fact about how stochastic processes work. If your p(doom) is Brownian motion in 1% steps starting at 50% and stopping once it reaches 0 or 1, then there will be about 50^2=2500 steps of size 1%. This is a lot! If we get all the evidence for whether humanity survives or not uniformly over the next 10 years, then you should make a 1% update 4-5 times per week. In practice there won't be as many due to heavy-tailedness in the distribution concentrating the updates in fewer events, and the fact you don't start at 50%. But I do believe that evidence is coming in every week such that ideal market prices should move by 1% on maybe half of weeks, and it is not crazy for your probabilities to shift by 1% during many weeks if you think about it.
2niplav1h
Thank you a lot! Strong upvoted. I was wondering a while ago whether Bayesianism says anything about how much my probabilities are "allowed" to oscillate around—I was noticing that my probability of doom was often moving by 5% in the span of 1-3 weeks, though I guess this was mainly due to logical uncertainty and not empirical uncertainty.
2Alexander Gietelink Oldenziel3h
Interesting... Wouldn't I expect the evidence to come out in a few big chunks, e.g. OpenAI releasing a new product?

To some degree yes, but I expect lots of information to be spread out across time. For example: OpenAI releases GPT5 benchmark results. Then a couple weeks later they deploy it on ChatGPT and we can see how subjectively impressive it is out of the box, and whether it is obviously pursuing misaligned goals. Over the next few weeks people develop post-training enhancements like scaffolding, and we get a better sense of its true capabilities. Over the next few months, debate researchers study whether GPT4-judged GPT5 debates reliably produce truth, and contro... (read more)

Basically all ideas/insights/research about AI is potentially exfohazardous. At least, it's pretty hard to know when some ideas/insights/research will actually make things better; especially in a world where building an aligned superintelligence (let's call this work "alignment") is quite harder than building any superintelligence (let's call this work "capabilities"), and there's a lot more people trying to do the latter than the former, and they have a lot more material resources.

Ideas about AI, let alone insights about AI, let alone research results about AI, should be kept to private communication between trusted alignment researchers. On lesswrong, we should focus on teaching people the rationality skills which could help them figure out insights that help them build any superintelligence, but are more likely to first give them insights...

Daniel, your interpretation is literally contradicted by Eliezer's exact words. Eliezer defines dignity as that which increases our chance of survival.

 

""Wait, dignity points?" you ask.  "What are those?  In what units are they measured, exactly?"

And to this I reply:  Obviously, the measuring units of dignity are over humanity's log odds of survival - the graph on which the logistic success curve is a straight line.  A project that doubles humanity's chance of survival from 0% to 0% is helping humanity die with one additional information-theoretic bit of dignity."

1O O14h
This style of thinking seems illogical to me. It has already clearly resulted in a sort of evaporative cooling in OpenAI. At a high level, is it possible you have the opposite of a wishful thinking bias you claim OpenAI researchers have? I won't go into too much detail about why this post doesn't make sense to me. as others already have.  But broadly speaking: * I doubt rationality gives you too much of an advantage in capabilities research, and believing this when on a site full of rationalists seems a little pretentious almost.    * I also have no idea how any alignment research so far has helped capabilities in any way.  I don't even know how RLHF has helped capabilities. If anything, it's well documented that RLHF diminishes capabilities (base models can for example play chess very well). The vast majority of alignment research, especially research before LLMs,  isn't even useful to alignment (a lot of it seems far too ungrounded).     * There was never a real shot of solving alignment until LLMs became realized either.  The world has changed and it seems like foom priors are wrong, but most here haven't updated. It increasingly seems like we'll get strong precursor models so we will have ample time to engineer solutions and it won't be like trying to build a working rocket on the first try. (The reasons being we are rapidly approaching the limits of energy constraints and transistor density without really being close to fooming). This mental model is still popular when reality seems to diverge.   Well I actually have a hunch to why, many holding on to the above priors don't want to let them go because that means this problem they have dedicated a lot of mental space to will seem more feasible to solve.  If it's instead a boring engineering problem, this stops being a quest to save the world or an all consuming issue. Incremental alignment work might solve it, so in order to preserve the difficulty of the issue, it will cause extinction for s
4Ben Pace14h
I think this is a very unhelpful frame for any discussion (especially so the more high-stakes it is) for the reasons that SlateStarCodex outlines in Against Bravery Debates, and I think your comment would be better with this removed. Added: I appreciate the edit :)
2Chris_Leong18h
If I'm being honest, I don't find this framing helpful. If you believe that things will go well if certain actors gain access to advanced AI technologies first, you should directly argue that. Focusing on status games feels like a red herring.

Produced as part of the MATS Winter 2024 program, under the mentorship of Alex Turner (TurnTrout).

TL,DR: I introduce a method for eliciting latent behaviors in language models by learning unsupervised perturbations of an early layer of an LLM. These perturbations are trained to maximize changes in downstream activations. The method discovers diverse and meaningful behaviors with just one prompt, including perturbations overriding safety training, eliciting backdoored behaviors and uncovering latent capabilities.

Summary In the simplest case, the unsupervised perturbations I learn are given by unsupervised steering vectors - vectors added to the residual stream as a bias term in the MLP outputs of a given layer. I also report preliminary results on unsupervised steering adapters - these are LoRA adapters of the MLP output weights of a given...

TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space seems to be using a contrastive approach for steering vectors (I've only skimmed though), it might be worth having a look.

Hello, friends.

This is my first post on LW, but I have been a "lurker" here for years and have learned a lot from this community that I value.

I hope this isn't pestilent, especially for a first-time post, but I am requesting information/advice/non-obvious strategies for coming up with emergency money.

I wouldn't ask except that I'm in a severe financial emergency and I can't seem to find a solution. I feel like every minute of the day I'm butting my head against a brick wall trying and failing to figure this out.

I live in a very small town in rural Arizona. The local economy is sustained by fast food restaurants, pawn shops, payday lenders, and some huge factories/plants that are only ever hiring engineers and other highly specialized personnel.

I...

Thank you for this. I'm not eligible for it but I will send it to my sister who is. She needs emergency dental work but the health insurance plan offered through her employer doesn't cover it so she's just been suffering through the pain. So really, thank you. She will be so glad.

1Tigerlily2h
Thank you for the thoughtful suggestions. Aella is exemplary but camgirling strikes me as a nightmare. I have considered making stuff, like custom glasses/premium drinkware, and selling on Etsy but the market seems saturated and I've never had the money to buy the equipment to learn the skills required to do this kind of thing. I am certified in Salesforce and could probably get hired helping to manage the Salesforce org for my tribe (Cherokee Nation) but would have to move to Oklahoma. I've applied for every grant I can find that I'm eligible for, but there's not much out there and the competition is stiff. We will figure out something, I'm sure. If we don't, there's nothing standing between us and homelessness and that reality fills me with anger and despair. I feel like there's nothing society wants from me, so there's no way for me to convince society that I deserve anything from it. It's so hard out here.
1Tigerlily8h
Thank you for your response. I probably should have given a more exhaustive list of things I have already tried. Other than a couple things you mentioned, I have already tried the rest. Before becoming a stay-at-home parent, I was a writer. I wasn't well paid but was starting to earn professional rates when I got pregnant with my second child and that took over my life. I have found it difficult to start writing again since then. The industry has changed so much and is changing still, and so am I. My life is so different now. I'm less sure of what I write - no longer young enough to know everything, as Oscar Wilde said. I feel like I'm trying to leap onto a speeding train from the ground, like I'm watching for an open doorway or a platform I can grab onto as the train roars past me at 100mph. My children -- yes, I have children. They are with their dad most of the time. It was his mother's house we were living in when the domestic violence situation got so severe that the courts got involved and separated us, and when that happened it was I who had to leave. His mother was not about to turn out her son and let me stay in her house, especially since he was the breadwinner and the one paying rent to her. And I was not going to drag my children into a precarious housing situation. There are no emergency housing resources where I live aside from shelters which are known for being miserable, overcrowded, prison-like, and difficult to get into anyway. So my children have stayed in the safety of their dad's home. His mother came from across the state to help, and while I'm relieved to see that she is taking the responsibility of caring for them seriously, she is also tenaciously possessive over them. This is still very painful for me to talk about.
4romeostevensit17h
Oh yeah, food banks for sure!

Claude learns across different chats. What does this mean?

 I was asking Claude 3 Sonnet "what is a PPU" in the context of this thread. For that purpose, I pasted part of the thread.

Claude automatically assumed that OA meant Anthropic (instead of OpenAI), which was surprising.

I opened a new chat, copying the exact same text, but with OA replaced by GDM. Even then, Claude assumed GDM meant Anthropic (instead of Google DeepMind).

This seemed like interesting behavior, so I started toying around (in new chats) with more tweaks to the prompt to check its ro... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

What's PPU?

1mishka4h
No, OpenAI (assuming that it is a well-defined entity) also uses a probability distribution over timelines. (In reality, every member of its leadership has its own probability distribution, and this translates to OpenAI having a policy and behavior formulated approximately as if there is some resulting single probability distribution). The important thing is, they are uncertain about timelines themselves (in part, because no one knows how perplexity translates to capabilities, in part, because there might be difference with respect to capabilities even with the same perplexity, if the underlying architectures are different (e.g. in-context learning might depend on architecture even with fixed perplexity, and we do see a stream of potentially very interesting architectural innovations recently), in part, because it's not clear how big is the potential of "harness"/"scaffolding", and so on). This does not mean there is no political infighting. But it's on the background of them being correctly uncertain about true timelines... ---------------------------------------- Compute-wise, inference demands are huge and growing with popularity of the models (look how much Facebook did to make LLama 3 more inference-efficient). So if they expect models to become useful enough for almost everyone to want to use them, they should worry about compute, assuming they do want to serve people like they say they do (I am not sure how this looks for very strong AI systems; they will probably be gradually expanding access, and the speed of expansion might depend).
6LawrenceC5h
When I spoke to him a few weeks ago (a week after he left OAI), he had not signed an NDA at that point, so it seems likely that he hasn't.
4Mitchell_Porter5h
Wondering why this has so many disagreement votes. Perhaps people don't like to see the serious topic of "how much time do we have left", alongside evidence that there's a population of AI entrepreneurs who are so far removed from consensus reality, that they now think they're living in a simulation. 

This is a thread for updates about the upcoming LessOnline festival. I (Ben) will be posting bits of news and thoughts, and you're also welcome to make suggestions or ask questions.

If you'd like to hear about new updates, you can use LessWrong's "Subscribe to comments" feature from the triple-dot menu at the top of this post.

Reminder that you can get tickets at the site for $400 minus your LW karma in cents.

Health and longevity blogger from Unaging.com here. I've submitted talks on optimal diet, optimal exercise, how to run sub 3:30 for your first marathon, and sugar is fine -- fight me!

Looking forward to extended, rational health discussions!

2DanielFilan14h
Is there going to be some sort of slack or discord for attendees?
4Ben Pace14h
Yep! My guess is I will send one out to people who bought tickets next week, along with various spreadsheets for signing up to activities with (e.g. giving a lightning talk). (I personally strongly prefer slack for a bunch of UI reasons including threading and especially because I always find the conversational culture on discord disorienting, though I know manifest has a community discord so it might be worth using discord.)

NOTE: This post was updated to include two additional models which meet the criteria for being considered Open Source AI.

As advanced machine learning systems become increasingly widespread, the question of how to make them safe is also gaining attention. Within this debate, the term “open source” is frequently brought up. Some claim that open sourcing models will potentially increase the likelihood of societal risks, while others insist that open sourcing is the only way to ensure the development and deployment of these “artificial intelligence,” or “AI,” systems goes well. Despite this idea of “open source” being a central debate of “AI” governance, there are very few groups that have released cutting edge “AI” which can be considered Open Source.

Image by Alan Warburton / © BBC / Better Images of AI
...

Although the training process, in theory, can be wholly defined by source code, this is generally not practical, because doing so would require releasing (1) the methods used to train the model, (2) all data used to train the model, and (3) so called “training checkpoints” which are snapshots of the state of the model at various points in the training process.
 


Exactly.  Without the data, the model design cannot be trained again, and you end up fine-tuning a black box (the "open weights"). 

Thanks for writing this.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA