Why do some societies exhibit more antisocial punishment than others? Martin explores both some literature on the subject, and his own experience living in a country where "punishment of cooperators" was fairly common.

William_S1dΩ561177
18
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool. I resigned from OpenAI on February 15, 2024.
habryka21h3716
3
Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens.  Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.
Thomas Kwa12hΩ618-5
7
You should update by +-1% on AI doom surprisingly frequently This is just a fact about how stochastic processes work. If your p(doom) is Brownian motion in 1% steps starting at 50% and stopping once it reaches 0 or 1, then there will be about 50^2=2500 steps of size 1%. This is a lot! If we get all the evidence for whether humanity survives or not uniformly over the next 10 years, then you should make a 1% update 4-5 times per week. In practice there won't be as many due to heavy-tailedness in the distribution concentrating the updates in fewer events, and the fact you don't start at 50%. But I do believe that evidence is coming in every week such that ideal market prices should move by 1% on maybe half of weeks, and it is not crazy for your probabilities to shift by 1% during many weeks if you think about it.
Dalcy1d356
1
Thoughtdump on why I'm interested in computational mechanics: * one concrete application to natural abstractions from here: tl;dr, belief structures generally seem to be fractal shaped. one major part of natural abstractions is trying to find the correspondence between structures in the environment and concepts used by the mind. so if we can do the inverse of what adam and paul did, i.e. 'discover' fractal structures from activations and figure out what stochastic process they might correspond to in the environment, that would be cool * ... but i was initially interested in reading compmech stuff not with a particular alignment relevant thread in mind but rather because it seemed broadly similar in directions to natural abstractions. * re: how my focus would differ from my impression of current compmech work done in academia: academia seems faaaaaar less focused on actually trying out epsilon reconstruction in real world noisy data. CSSR is an example of a reconstruction algorithm. apparently people did compmech stuff on real-world data, don't know how good, but effort-wise far too less invested compared to theory work * would be interested in these reconstruction algorithms, eg what are the bottlenecks to scaling them up, etc. * tangent: epsilon transducers seem cool. if the reconstruction algorithm is good, a prototypical example i'm thinking of is something like: pick some input-output region within a model, and literally try to discover the hmm model reconstructing it? of course it's gonna be unwieldly large. but, to shift the thread in the direction of bright-eyed theorizing ... * the foundational Calculi of Emergence paper talked about the possibility of hierarchical epsilon machines, where you do epsilon machines on top of epsilon machines and for simple examples where you can analytically do this, you get wild things like coming up with more and more compact representations of stochastic processes (eg data stream -> tree -> markov model -> stack automata -> ... ?) * this ... sounds like natural abstractions in its wildest dreams? literally point at some raw datastream and automatically build hierarchical abstractions that get more compact as you go up * haha but alas, (almost) no development afaik since the original paper. seems cool * and also more tangentially, compmech seemed to have a lot to talk about providing interesting semantics to various information measures aka True Names, so another angle i was interested in was to learn about them. * eg crutchfield talks a lot about developing a right notion of information flow - obvious usefulness in eg formalizing boundaries? * many other information measures from compmech with suggestive semantics—cryptic order? gauge information? synchronization order? check ruro1 and ruro2 for more.
Buck2dΩ31468
6
[epistemic status: I think I’m mostly right about the main thrust here, but probably some of the specific arguments below are wrong. In the following, I'm much more stating conclusions than providing full arguments. This claim isn’t particularly original to me.] I’m interested in the following subset of risk from AI: * Early: That comes from AIs that are just powerful enough to be extremely useful and dangerous-by-default (i.e. these AIs aren’t wildly superhuman). * Scheming: Risk associated with loss of control to AIs that arises from AIs scheming * So e.g. I exclude state actors stealing weights in ways that aren’t enabled by the AIs scheming, and I also exclude non-scheming failure modes. IMO, state actors stealing weights is a serious threat, but non-scheming failure modes aren’t (at this level of capability and dignity). * Medium dignity: that is, developers of these AIs are putting a reasonable amount of effort into preventing catastrophic outcomes from their AIs (perhaps they’re spending the equivalent of 10% of their budget on cost-effective measures to prevent catastrophes). * Nearcasted: no substantial fundamental progress on AI safety techniques, no substantial changes in how AI works. This subset of risk is interesting because I think it’s a natural scenario at which to target technical work on AI safety. (E.g. it’s the main scenario we’re targeting with our AI control agenda.) I claim that the majority of this risk comes from the lab using AIs internally to do AI development (by which I mean both research and engineering). This is because the AIs doing AI development naturally require access to compute and model weights that they can potentially leverage into causing catastrophic outcomes—in particular, those resources can be abused to run AIs unmonitored. Using AIs for AI development looks uniquely risky to me among applications of early-transformative AIs, because unlike all other applications I know about: * It’s very expensive to refrain from using AIs for this application. * There’s no simple way to remove affordances from the AI such that it’s very hard for the AI to take a small sequence of actions which plausibly lead quickly to loss of control. In contrast, most other applications of AI probably can be controlled just by restricting their affordances. If I’m right that the risk from scheming early-transformative models is concentrated onto this pretty specific scenario, it implies a bunch of things: * It implies that work on mitigating these risks should focus on this very specific setting. * It implies that AI control is organizationally simpler, because most applications can be made trivially controlled. * It is pretty plausible to me that AI control is quite easy, because you actually can remove affordances from the AIs that are doing AI R&D such that it’s hard for them to cause problems.

Popular Comments

Recent Discussion

Hello, friends.

This is my first post on LW, but I have been a "lurker" here for years and have learned a lot from this community that I value.

I hope this isn't pestilent, especially for a first-time post, but I am requesting information/advice/non-obvious strategies for coming up with emergency money.

I wouldn't ask except that I'm in a severe financial emergency and I can't seem to find a solution. I feel like every minute of the day I'm butting my head against a brick wall trying and failing to figure this out.

I live in a very small town in rural Arizona. The local economy is sustained by fast food restaurants, pawn shops, payday lenders, and some huge factories/plants that are only ever hiring engineers and other highly specialized personnel.

I...

2nim6h
Ah, so you have skill and a portfolio in writing. You have the cognitive infrastructure to support using the language as art. That infrastructure itself is what you should be trying to rent to tech companies -- not the art it's capable of producing. If the art part of writing is out of reach for you right now, that's ok -- it's almost a benefit in this case, because if it's not around it can't feel left out if you turn to more pragmatic ends the skills you used to celebrate it with. Normally I wouldn't suggest startups, because they're so risky/uncertain... but in a situation as precarious as yours, it's no worse to see who's looking for writers on a startup-flavored site like https://news.ycombinator.com/jobs. And finally, I'm taking the titular "severe emergency" to be the whole situation, because it sounds pretty dire. If there's a specific sub-emergency that drove you to ask -- a medical bill, a car breakdown -- there may be more-specific resources that folks haven't mentioned yet. (or if you've explained that in someone else's comment thread, i apologize for asking redundantly; i've not read your replies to others)
1Tigerlily11h
Thank you for this. I'm not eligible for it but I will send it to my sister who is. She needs emergency dental work but the health insurance plan offered through her employer doesn't cover it so she's just been suffering through the pain. So really, thank you. She will be so glad.
1Tigerlily11h
Thank you for the thoughtful suggestions. Aella is exemplary but camgirling strikes me as a nightmare. I have considered making stuff, like custom glasses/premium drinkware, and selling on Etsy but the market seems saturated and I've never had the money to buy the equipment to learn the skills required to do this kind of thing. I am certified in Salesforce and could probably get hired helping to manage the Salesforce org for my tribe (Cherokee Nation) but would have to move to Oklahoma. I've applied for every grant I can find that I'm eligible for, but there's not much out there and the competition is stiff. We will figure out something, I'm sure. If we don't, there's nothing standing between us and homelessness and that reality fills me with anger and despair. I feel like there's nothing society wants from me, so there's no way for me to convince society that I deserve anything from it. It's so hard out here.
RedMan33m10

If you can get a salesforce cert, you can get any of the other baseline IT certs.  Being a female and being native is actually massive for hiring at companies that care about that stuff.

Apply for government IT jobs, help desk type stuff, a lot of it is hybrid or remote, if it's a hybrid position, ask to be remote for the first month (two paychecks) to manage moving.  

Six months in, open a business, ask your company to switch you to 1099, route the job through your business, work it for another year, this creates a performance history. 

Now yo... (read more)

I wish I could title this How “Pacifism” wins at life, since it’d immediately make everything much easier (especially writing this post), but as most well-meaning kids swiftly learn on the playground - the world is not gentle.

It’s always bothered me how impossible it is to negate the need for strength in life (whether physical, emotional, individual or collective), especially considering how unevenly and unfairly that stat is usually distributed.

What bothers me even more however, is how many people I’ve seen gleefully misapply and overfit this fact, falsely concluding that since one cannot survive without strength, it is the only meaningful form of power in the world. I’m sure you’ve seen these people too.

There is another side to this coin however. One that I care deeply about,...

Please feel free to elaborate on the specific qualms you have with what's written! Super happy to retract anything necessary and learn!

TLDR

  • There’s an organization based in London called the Rationalist Association. It was founded in 1885. Historically, it focused on publishing books and articles related to atheism and science, including works by Darwin, Bertrand Russell, J. B. S. Haldane, George Bernard Shaw, H. G. Wells, and Karl Popper.
  • The topics covered overlap with the present-day rationalist movement (centered on Lesswrong). They include religion and atheism, philosophy (especially philosophy of science and ethics), evolution, and psychology.
  • According to Wikipedia, membership of the Rationalist Association peaked in 1959 with more than 5000 members and with Bertrand Russell as President.
  • This post displays some covers of Rationalist Association publications, and links to full-text articles and other resources.
  • Prior to reading this biography, I hadn't heard of these earlier rationalists. So I did some quick and
...

I can only agree , since I've been saying for a long time that the current rationalist movement is only the latest iteration of many.

I'd agree with that, except for the word "only". It is no criticism of the present, to observe that it has a history.

I say this because I can hardly use a computer without constantly getting distracted. Even when I actively try to ignore how bad software is, the suggestions keep coming.

Seriously Obsidian? You could not come up with a system where links to headings can't break? This makes you wonder what is wrong with humanity. But then I remember that humanity is building a god without knowing what they will want.

So for those of you who need to hear this: I feel you. It could be so much better. But right now, can we really afford to make the ultimate <programming language/text editor/window manager/file system/virtual collaborative environment/interface to GPT/...>?

Can we really afford to do this while our god software looks like...

May this find you well.

1Nevin Wetherill3h
I have been contemplating Connor Leahy's Cyborgism and what it would mean for us to improve human workflows enough that aligning AGI looks less like: Sisyphus attempting to roll a 20 tonne version of The One Ring To Rule Them All into the caldera of Mordor while blindfolded and occasionally having to bypass vertical slopes made out of impossibility proofs that have been discussed by only 3 total mathematicians ever in the history of our species - all before Sauron destroys the world after waking up from a restless nap of an unknown length. I think this is what you meant by "make the ultimate <programming language/text editor/window manager/file system/virtual collaborative environment/interface to GPT/...>" Intuitively, the level I'm picturing is: A suite of tools that can be booted up from a single icon on the home screen of a computer which then allows anyone who has decent taste in software to create essentially any program they can imagine up to a level of polish that people can't poke holes in even if you give a million reviewers 10 years of free time. Can something at this level be accomplished? Well, what does coding look like currently? It seems to look like a bunch of people with dark circles under their eyes reading long strings of characters in something basically the equivalent to an advanced text editor, with a bunch of additional little windows of libraries and graphics and tools. This is not as domain where human intelligence performs with as much ease as in other domains like spearfishing or bushcraft. If you want to build Cyborgs, I am pretty sure where you start is by focusing on building software that isn't god-machines, throwing out the old book of tacit knowledge, and starting over with something that makes each step as intuitive as possible. You probably also focus way more on quality over quantity/speed. So, plaintext instructions on what kind of software you want to build, or a code repository and a plaintext list of modifications? L
6Dagon3h
Agreed, but it's not just software.  It's every complex system, anything which requires detailed coordination of more than a few dozen humans and has efficiency pressure put upon it.  Software is the clearest example, because there's so much of it and it feels like it should be easy.
1Johannes C. Mayer6h
I think it is incorrect to say that testing things fully formally is the only alternative to whatever the heck we are currently doing. I mean there is property-based testing as a first step (which maybe you also refer to with automated tests but I would guess you are probably mainly talking about unit tests). Maybe try Haskell or even better Idris? The Haskell compiler is very annoying until you realize that it loves you. Each time it annoys you with compile errors it actually says "Look I found this error here that I am very very sure you'd agree is an error, so let me not produce this machine code that would do things you don't want it to do". It's very bad at communicating this though, so it's words of love usually are blurted out like this: Don't bother understanding the details, they are not important. So maybe Haskell's greatest strength, being a very "noisy" compiler, is also its downfall. Nobody likes being told that they are wrong, well at least not until you understand that your goals and the compiler's goals are actually aligned. And the compiler is just better at thinking about certain kinds of things that are harder for you to think about. In Haskell, you don't really ever try to prove anything about your program in your program. All of this you get by just using the language normally. You can then go one step further with Agda, Idris2, or Lean, and start to prove things about your programs, which easily can get tedious. But even then when you have dependent types you can just add a lot more information to your types, which makes the compiler able to help you better. Really we could see it as an improvement to how you can tell the compiler what you want. But again, you what you can do in dependent type theory? NOT use dependent type theory! You can use Haskell-style code in Idris whenever that is more convenient. And by the way, I totally agree that all of these languages I named are probably only ghostly images of what they could truly be. But

Haskell is a beautiful language, but in my admittedly limited experience it's been quite hard to reason about memory usage in deployed software (which is important because programs run on physical hardware. No matter how beautiful your abstract machine, you will run into issues where the assumptions that abstraction makes don't match reality).

That's not to say more robust programming languages aren't possible. IMO rust is quite nice, and easily interoperable with a lot of existing code, which is probably a major factor in why it's seeing much higher adopti... (read more)

The beauty industry offers a large variety of skincare products (marketed mostly at women), differing both in alleged function and (substantially) in price. However, it's pretty hard to test for yourself how much any of these product help. The feedback loop for things like "getting less wrinkles" is very long.

So, which of these products are actually useful and which are mostly a waste of money? Are more expensive products actually better or just have better branding? How can I find out?

I would guess that sunscreen is definitely helpful, and using some moisturizers for face and body is probably helpful. But, what about night cream? Eye cream? So-called "anti-aging"? Exfoliants?

ophira1h10

I feel *so* pedantic making this comment — please forgive me — but also:

CeraVe may have degraded in quality when they were purchased by L’Oréal and potentially changed the source of the fatty alcohols in their formulation. Fatty alcohols that have been sourced from coconut are more likely to cause skin irritation than those that have been sourced from palm. Plus, retinoids can actually push these fatty alcohols deeper into the pores for the ultimate backfire effect. My source is u/WearingCoats on Reddit, who runs a dermatology practice and does product con... (read more)

1Answer by nebuchadnezzar4h
I would also like to recommend the INCI (International Nomenclature of Cosmetic Ingredients) decoder tool: https://incidecoder.com/. It explains the ingredients of your skincare products and points out potential hazards, such as irritancy and comedogenicity. It's easy to use and you have the ability to compare products.
3Answer by ErisApprentice6h
An important thing to keep in mind is that cosmetics companies don't necessarily have the money that e.g. pharmaceutical companies do to push large-scale studies on their products, so lack of evidence usually means a study wasn't done, rather than a study was done and found inconclusive. If you haven't heard of it before, the subreddit 'SkincareAddiction' has some great recommendations for what's evidence-based and what works. 
1ophira10h
Yeah, glycolic acid is an exfoliant. The retinoid family also promotes cell turnover, but in a different way. You'd be over-exfoliating by using both of them at the same time.

A few days ago I came upstairs to:

Me: how did you get in there?

Nora: all by myself!

Either we needed to be done with the crib, which had a good chance of much less sleeping at naptime, or we needed a taller crib. This is also something we went through when Lily was little, and that time what worked was removing the bottom of the crib.

It's a basic crib, a lot like this one. The mattress sits on a metal frame, which attaches to a set of holes along the side of the crib. On it's lowest setting, the mattress is still ~6" above the floor. Which means if we remove the frame and sit the mattress on the floor, we gain ~6".

Without the mattress weighing it down, though, the crib...

That ought to buy you a couple weeks, anyway. ;)

Any pinching concern with those straps?

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

As part of Spring Meetups Everywhere 2024 (https://www.astralcodexten.com/p/spring-meetups-everywhere-2024), we're having another meetup.

Time: Tuesday 7.5.2024, 18:00 onwards

Place: Kitty's Public House, Mannerheimintie 5, Helsinki

How to find us: We'll be in the private room called Kitty's Lounge, find it and come in.

See you there!

GPT-5 training is probably starting around now. It seems very unlikely that GPT-5 will cause the end of the world. But it’s hard to be sure. I would guess that GPT-5 is more likely to kill me than an asteroid, a supervolcano, a plane crash or a brain tumor. We can predict fairly well what the cross-entropy loss will be, but pretty much nothing else.

Maybe we will suddenly discover that the difference between GPT-4 and superhuman level is actually quite small. Maybe GPT-5 will be extremely good at interpretability, such that it can recursively self improve by rewriting its own weights.

Hopefully model evaluations can catch catastrophic risks before wide deployment, but again, it’s hard to be sure. GPT-5 could plausibly be devious enough to circumvent all of...

3quiet_NaN3h
I am by no means an expert on machine learning, but this sentence reads weird to me.  I mean, it seems possible that a part of a NN develops some self-reinforcing feature which uses the gradient descent (or whatever is used in training) to go into a particular direction and take over the NN, like a human adrift on a raft in the ocean might decide to build a sail to make the raft go into a particular direction.  Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it's own weights? Personally, I think that if GPT-5 is the point of no return, it is more likely that it is because it would be smart enough to actually help advance AI after it is trained. While improving semiconductors seems hard and would require a lot of work in the real world done with human cooperation, finding better NN architectures and training algorithms seems like something well in the realm of the possible, if not exactly plausible. So if I had to guess how GPT-5 might doom humanity, I would say that in a few million instance-hours it figures out how to train LLMs of its own power for 1/100th of the cost, and this information becomes public.  The budgets of institutions which might train NN probably follows some power law, so if training cutting edge LLMs becomes a hundred times cheaper, the number of institutions which could build cutting edge LLMs becomes many orders of magnitude higher -- unless the big players go full steam ahead towards a paperclip maximizer, of course. This likely mean that voluntary coordination (if that was ever on the table) becomes impossible. And setting up a worldwide authoritarian system to impose limits would also be both distasteful and difficult. 

Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it's own weights?

I was thinking of a scenario where OpenAI deliberately gives it access to its own weights to see if it can self improve.

I agree that it would be more like to just speed up normal ML research.

2Nathan Helm-Burger16h
I absolutely sympathize, and I agree that with the world view / information you have that advocating for a pause makes sense. I would get behind 'regulate AI' or 'regulate AGI', certainly. I think though that pausing is an incorrect strategy which would do more harm than good, so despite being aligned with you in being concerned about AGI dangers, I don't endorse that strategy. Some part of me thinks this oughtn't matter, since there's approximately ~0% chance of the movement achieving that literal goal. The point is to build an anti-AGI movement, and to get people thinking about what it would be like to be able to have the government able to issue an order to pause AGI R&D, or turn off datacenters, or whatever. I think that's a good aim, and your protests probably (slightly) help that aim. I'm still hung up on the literal 'Pause AI' concept being a problem though. Here's where I'm coming from:  1. I've been analyzing the risks of current day AI. I believe (but will not offer evidence for here) current day AI is already capable of providing small-but-meaningful uplift to bad actors intending to use it for harm (e.g. weapon development). I think that having stronger AI in the hands of government agencies designed to protect humanity from these harms is one of our best chances at preventing such harms.  2. I see the 'Pause AI' movement as being targeted mostly at large companies, since I don't see any plausible way for a government or a protest movement to enforce what private individuals do with their home computers. Perhaps you think this is fine because you think that most of the future dangers posed by AI derive from actions taken by large companies or organizations with large amounts of compute. This is emphatically not my view. I think that actually more danger comes from the many independent researchers and hobbyists who are exploring the problem space. I believe there are huge algorithmic power gains which can, and eventually will, be found. I furthermore
1yanni kyriacos17h
Hi Tomás! is there a prediction market for this that you know of?

Produced as part of the MATS Winter 2024 program, under the mentorship of Alex Turner (TurnTrout).

TL,DR: I introduce a method for eliciting latent behaviors in language models by learning unsupervised perturbations of an early layer of an LLM. These perturbations are trained to maximize changes in downstream activations. The method discovers diverse and meaningful behaviors with just one prompt, including perturbations overriding safety training, eliciting backdoored behaviors and uncovering latent capabilities.

Summary In the simplest case, the unsupervised perturbations I learn are given by unsupervised steering vectors - vectors added to the residual stream as a bias term in the MLP outputs of a given layer. I also report preliminary results on unsupervised steering adapters - these are LoRA adapters of the MLP output weights of a given...

RGRGRG2h10

Enjoyed this post! Quick question about obtaining the steering vectors:

Do you train them one at a time, possibly adding an additional orthogonality constraint between each train?

1Bogdan Ionut Cirstea11h
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space seems to be using a contrastive approach for steering vectors (I've only skimmed though), it might be worth having a look.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA