LESSWRONG
LW

1923
leogao
7305Ω890325060
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Alignment Stream of Thought
No wikitag contributions to display.
7leogao's Shortform
Ω
3y
Ω
481
The Mom Test for AI Extinction Scenarios
leogao1d137

i think analogies to relatively well known intuitive everyday things, or historical events, are a good way to automatically establish some baseline level of plausibility, and also to reduce the chances of accidentally telling completely implausible stories. the core reason is basically that without tethering to objective things that actually happened in reality, it's really easy to tell crazy stories about a wide range of possible conclusions.

for hacking, we can look at stuxnet as an example of how creative and powerful a cyberattack can be, or the 2024 crowdstrike failures as an example of how lots of computers can fail at the same time. for manipulation/deception, we can look at increasing political polarization in america due to social media, or politicians winning based on charisma and then betraying them once in office, or (for atheists) major world religions, or (for anyone mindkilled by politics) adherents of their dispreferred political party. most people might not have experienced humiliating defeat in chess, or experienced being an anthill on an active construction site, but perhaps they have personally experienced being politically outmaneuvered by a competitor at work, or being crushed beneath the heel of a soulless bureaucracy which, despite being composed of ensouled humans, would rather ruin people's lives than be inconvenienced with dealing with exceptions. 

Reply
Irresponsible Companies Can Be Made of Responsible Employees
leogao4d96

this is pretty normal? it's really hard for leadership to make employees care about or believe specific things. do you really think the average Amazon employee or whatever has strong opinions on the future of delivery drones? does the average Waymo employee have extremely strong beliefs about the future of self driving?

for most people in the world, their job is just a job. people obviously avoid working on things they believe are completely doomed, and tend to work on cool trendy things. but generally most people do not really have strong beliefs about where the stuff they're working on is going.

no specific taboo is required to ensure that people don't really iron out deep philosophical disagreements with their coworkers. people care about all sorts of other things in life. they care about money, they care whether they're enjoying the work, they care whether their coworkers are pleasant to be around, they care about their wife and kids and house.

once you have a company with more than 10 people, it requires constant effort to maintain culture. hiring is way harder if you can only hire people who are aligned, or if you insist on aligning people. if you grow very fast (and openai has grown very fast - it's approximately doubled every single year I've been here), it's inevitable that your culture will splinter. forget about having everyone on the same page; you're going to have entire little googletowns and amazontowns and so on of people who bring Google or Amazon culture with them and agglomerate with other recent transplants from those companies.

Reply
Buck's Shortform
leogao5d60

a lot of people say "I think" reflexively because they're used to making themselves small. it wouldn't be surprising to me if such people said "I think" more often than most even in situations where the caveat is unnecessary.

Reply
Irresponsible Companies Can Be Made of Responsible Employees
leogao6d*40

as far as I'm aware, the only person who can be argued to have ever been fired for acting on beliefs about x risk is leopold, and the circumstances there are pretty complicated. since I don't think he's the only person to have ever acted on xrisk at oai to the extent he did, I don't think this is just because other people don't do anything about xrisk.

most cases of xrisk people leaving are just because people felt sidelined/unhappy and chose to leave. which is ofc also bad, but quite different.

Reply
Irresponsible Companies Can Be Made of Responsible Employees
leogao7d*7625

my guess:

  • selective hiring is very real. lots of people who are xrisk pilled just refuse to join oai. people who care a lot often end up very stressed and leave in large part because of the stress.
    • the vast majority of people at oai do not think of xrisk from agi as a serious thing. but then again probably a majority dont really truly think of agi as a serious thing.
  • people absolutely do argue "well if i didn't do it, someone else would. and even if oai stopped, some other company would do it" to justify their work.
  • compartmentalization is probably not a big part of the reason, at least not yet. historically things don't get compartmentalized often, and even when they do, i don't think it makes the difference between being worried and not being worried about xrisk for that many people
    • as companies get big, teams A B C not talking to each other is the default order of the world and it takes increasing effort to get them to talk to each other. and even getting them talking is not enough to change their courses of action, which often requires a lot of work from higher up. this hampers everything; this is in general why big companies have so many overlapping/redundant teams
  • people get promoted / allocated more resources if they do things that are obviously useful for the company, as opposed to less obviously useful for the company (i mean, as a company, you kind of understandably have to do this or else die of resource misallocation).
    • i think quite a few people, especially more senior people, are no longer driven by financial gain. these things are sometimes "i really want to accomplish something great in the field of ML" or "i like writing code" or "i like being part of something important / shaping the future". my guess is anyone super competent who cares primarily about money quits after a few years and, depending on the concavity of their utility function, either retires on a beach, or founds a startup and raises a gazillion dollars from VCs
    • it's pretty difficult to do weird abstract bullshit that doesn't obviously tie into some kind of real world use case (or fit into the internally-accepted research roadmap to AGI). this has imo hampered both alignment and capabilities. it makes a lot of sense though, like, bell labs didn't capture most of the value that bell labs created, and academia is the place where weird abstract bullshit is supposed to live, and we're in some sense quite lucky that industry is willing to fund any of it at all
    • concretely this means anything alignmenty gets a huge boost if you can argue that it will (a) improve capabilities or (b) prevent some kind of embarrassing safety failure in the model we're currently serving to gazillions of people. the kinds of things people choose to work on are strongly shaped by this as a result, and probably explains why so much work keeps taking alignment words and using them to mean aligning GPT-5 rather than AGI.
  • aside from the leopold situation, which had pretty complicated circumstances, people don't really get fired for caring about xrisk. the few incidents are hard to interpret because of strong confounding factors and could be argued either way. but it's not so far from base rates so i don't feel like it's a huge thing.
  • my guess is a lot of antipathy towards safety comes from broader antipathy against safetyism as a whole in society, which honestly i (and many people in alignment) have to admit some sympathy towards. 
Reply211
"Intelligence" -> "Relentless, Creative Resourcefulness"
leogao8d812

that's the easy part of relentlessness. LMs already often get stuck in loops of trying increasingly hopeless things while getting utterly stuck.

Reply
leogao's Shortform
leogao15d10

sci-fi story setting idea: a future where VR becomes so widespread that where you live physically in the US becomes more of a formality than of actual consequence, so mass internal migrations a la Free State Project occur as people rush to move to low population states to get more political influence in a federal political system that is increasingly impossible to reform

Reply
leogao's Shortform
leogao19d*1230

i recently ran into to a vegan advocate tabling in a public space, and spoke briefly to them for the explicit purpose of better understanding what it feels like to be the target of advocacy on something i feel moderately sympathetic towards but not fully bought in on. (i find this kind of thing very valuable for noticing flaws in myself and improving; it's much harder to be perceptive of one's own actions otherwise). the part where i am genuinely quite plausibly persuadable of his position in theory is important; i think if i had talked to e.g flat earthers one might say my reaction is just because i'd already decided not to be persuaded. several interesting things i noticed (none of which should be surprising or novel, especially for someone less autistic than me, but as they say, intellectually knowing things is not the same as actual experience):

  • this guy certainly knew more about e.g health impacts of veganism than i did, and i would not have been able to hold my own in an actual debate.
    • in particular, it's really easy for actually-good-in-practice heuristics to come out as logical fallacies, especially when arguing with someone much more familiar with the object level details than you are.
    • interestingly, since i was pushing the conversation in a pretty meta direction, he actually explicitly said something to the effect that he's had thousands of conversations like this and has a response to basically every argument i could make, do i really think i have something he hasn't heard before, etc. in that moment i realized this was probably true, and that this nonetheless did not necessarily mean that he was correct in his claim. and in addition it certainly didn't make me feel any more emotionally willing to accept his argument
    • in the past, i've personally had the exact experience of arguing for something where i had enough of a dialogue tree that other people couldn't easily find any holes, where the other people were unconvinced, and felt really confused why people weren't seeing the very straightforward argument, and then later it turned out i was actually just wrong and the other people were applying correct heuristics
      • my guess is at the extreme, with sufficient prep and motivation, you can get in this position for arbitrarily wrong beliefs. like probably if i talked to flat earthers for a while i'd get deep enough in their dialogue tree that i'd stop being able to refute them on the object level and would (for the purposes of my own epistemics, not to convince an external audience) have to appeal to cognitive heuristics that are isomorphic to some cognitive fallacies.
    • of course we shouldn't always appeal to the cognitive heuristics. doing so is almost always reasonable and yet you will miss out on the one thing that actually does matter. to do anything interesting you do have to eventually dig into some particular spicy claims and truly resolve things at the object level. but there are so many things in the world and resolving them takes so much time that you need some heuristics to reject a whole bunch of things out of hand and focus your energy on the things that matter.
      • like, i could invest energy until i can actually refute flat earthers completely on the object level, and i'd almost certainly succeed. but this would be a huge waste of time. on the other hand, i could also just never look into anything and say "nothing ever happens". but every important thing to ever happen did, in fact, happen at some point [citation needed].
  • it's really really irritating to be cut off mid sentence. this is hard to admit because i also have an unconscious tendency to do this (currently working on fixing this) and my guess is other people get very annoyed when i do this to them.
    • sometimes i do enjoy being cut off in conversations, but on reflection this is only when i feel like (a) the conversation is cooperative enough that i feel like we're trying to discover the truth together, (b) the other person actually understands what i'm saying before i finish saying it. but since these conditions are much rarer and requires high levels of social awareness to detect, it's a good first order heuristic that interrupting people is bad.
  • i found it completely unhelpful to be told that he was also in my shoes X years ago with similar uncertainties when he was deciding to become vegan; or to be told that he had successfully convinced Y other people to become vegan; or to be subject to what i want to call "therapy speak". i only want to therapyspeak with people i feel relatively close to, and otherwise it comes off as very patronizing.
    • i think there's a closely related thing, which is genuine curiosity about people's views. it uses similar phrases like "what makes you believe that?" but has a very different tone and vibe.
    • his achievements mean a lot more to himself than to me. i don't really care that much what he's accomplished for the purposes of deciding whether his argument is correct. any credibility points conferred are more than cancelled out by it being kind of annoying. even if it is true, there's nothing more annoying than hearing say "i've thought about this more than you / accomplished more than you have because of my phd/experience/etc so you should listen to me" unless you really really really trust this person
      • the calculus changes when there is an audience.
    • therapyspeak is still probably better than nothing, and can be a useful stepping stone for the socially incompetent

one possible take is that i'm just really weird and these modes of interaction work well for normal people more because they're less independently thinking or need to be argued out of having poorly thought out bad takes or something like that, idk. i can't rule this out but my guess is normal people probably are even more this than i am. also, for the purposes of analogy to the AI safety movement, presumably we want to select for people who are independent thinkers who have especially well thought out takes more than just normal people.

also my guess is this particular interaction was probably extremely out of distribution from the perspective of those tabling. my guess is activists generally have a pretty polished pitch for most common situations which includes a bunch of concrete ways of talking they've empirically found to cause people to engage, learned through years of RL against a general audience, but the polishedness of this pitch doesn't generalize out of distribution when poked at in weird ways. my interlocutor even noted at some point that his conversations when tabling generally don't go the way ours went.

Reply422111
Elizabeth's Shortform
leogao24d3-2

yeah, but there would be a lot of worlds where the merger was totally fine and beneficial where it fell through because people had unfounded fears

Reply
Elizabeth's Shortform
leogao24d61

i mean, in general, it's a lot easier to tell plausible-seeming stories of things going really poorly than actually high-likelihood stories of things going poorly. so the anecdata of it actually happening is worth a lot

Reply
Load More
151My takes on SB-1047
1y
8
106Scaling and evaluating sparse autoencoders
Ω
1y
Ω
6
55Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Ω
2y
Ω
5
106Shapley Value Attribution in Chain of Thought
Ω
3y
Ω
7
42[ASoT] Some thoughts on human abstractions
Ω
3y
Ω
4
67Clarifying wireheading terminology
Ω
3y
Ω
6
103Scaling Laws for Reward Model Overoptimization
Ω
3y
Ω
13
27How many GPUs does NVIDIA make?
Q
3y
Q
2
81Towards deconfusing wireheading and reward maximization
Ω
3y
Ω
7
27Humans Reflecting on HRH
Ω
3y
Ω
4
Load More