All Posts

Sorted by Magic (New & Upvoted)

Friday, June 16th 2023
Fri, Jun 16th 2023

No posts for June 16th 2023
Shortform
12nim10h
Reading https://www.lesswrong.com/posts/nwJCzszw8gGjPTihM/i-still-think-it-s-very-unlikely-we-re-observing-alien [https://www.lesswrong.com/posts/nwJCzszw8gGjPTihM/i-still-think-it-s-very-unlikely-we-re-observing-alien] and pondering the Bigfoot thing. On the one hand, We Have Cameras Everywhere(TM). On the other hand -- pick any area of the pacific northwest and look at a map of where the permanent roads are. Pull it up side by side with a map of an area that you're familiar with. Zoom in on both, to a magnification you'd consider reasonable for imagining things at walking-around scale. Pan around on the PNW map and try to find a permanent road. It'll take a minute. Most land out here grows timber, sure. Timber is harvested roughly once every 30-50 years. At this point, I'd bet that every square mile of the area has been visited by humans. Forestry land is heavily trafficked once every few decades; conservation land is surveyed and studied and sometimes visited by tourists. The question, like a missing term in the Drake Equation, is when. The L term captures for-how-long, sure, but only implies a difference between "someone sent us radio signals for 100 years around 1000 AD" and "someone sent us radio signals for 100 years around 2000 AD". I have two cats who hate me. (not their fault, they came from an animal hoarding situation so they're probably kinda traumatized) They seem to think I'm noisy and conspicuous and I stink, and to their perceptions I certainly do. They despise being perceived. I can tell that they're in my house because I can check every nook and cranny and learn their favorite hidey-holes, and the food I put out for them gets eaten, and their litter boxes get full. But if this was out in the woods instead of the artificial and tightly controlled environment of my home, I would likely not know they're around, just like most hikers don't know when they're being watched by a mountain lion. The cats hate the places where I spend time, just as th
9Lauro Langosco4h
Thinking about alignment-relevant thresholds in AGI capabilities. A kind of rambly list of relevant thresholds: 1. Ability to be deceptively aligned 2. Ability to think / reflect about its goals enough that model realises it does not like what it is being RLHF’d for 3. Incentives to break containment exist in a way that is accessible / understandable to the model 4. Ability to break containment 5. Ability to robustly understand human intent 6. Situational awareness 7. Coherence / robustly pursuing it’s goal in a diverse set of circumstances 8. Interpretability methods break (or other oversight methods break) 1. doesn’t have to be because of deceptiveness; maybe thoughts are just too complicated at some point, or in a different place than you’d expect 9. Capable enough to help us exit the acute risk period Many alignment proposals rely on reaching these thresholds in a specific order. For example, the earlier we reach (9) relative to other thresholds, the easier most alignment proposals are. Some of these thresholds are relevant to whether an AI or proto-AGI is alignable even in principle. Short of 'full alignment' (CEV-style), any alignment method (eg corrigibility) only works within a specific range of capabilities: * Too much capability breaks alignment, eg bc a model self-reflects and sees all the ways in which its objectives conflicts with human goals. * Too little capability (or too little 'coherence') and any alignment method will be non-robust wrt to OOD inputs or even small improvements in capability or self-reflectiveness.
8johnswentworth1d
Consider two claims: * Any system can be modeled as maximizing some utility function, therefore utility maximization is not a very useful model * Corrigibility is possible, but utility maximization is incompatible with corrigibility, therefore we need some non-utility-maximizer kind of agent to achieve corrigibility These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function. I expect that many peoples' intuitive mental models around utility maximization boil down to "boo utility maximizer models", and they would therefore intuitively expect both the above claims to be true at first glance. But on examination, the probable-incompatibility is fairly obvious, so the two claims might make a useful test to notice when one is relying on yay/boo reasoning about utilities in an incoherent way.
3
4kuira1d
Sometimes I have an internal desire different to do something different than what I think should be done (for example, I might desire to play a game while also thinking the better choice is to read). I've been experimenting with using randomness to mediate this. I keep a D20 with me, give each side of the dispute some odds proportional to the strength of its resolve, and then roll the die. In theory, this means neither side will overpower the other, and even a small resolve still has a chance. I'm not sure how useful this is, but it's fun, and can sort of give me motivation (I've tried to internalize this kind of roll as a rule not to break without good reason). Also, when I'm merely deciding between some options, sometimes I'll roll more casually with equal odds, and it'll help me realize that I already know which it is I really wanted to do (if I don't like the roll's outcome).
3Dalcy Bremin15h
What's a good technical introduction to Decision Theory and Game Theory for alignment researchers? I'm guessing standard undergrad textbooks don't include, say, content about logical decision theory. I've mostly been reading posts on LW but as with most stuff here they feel more like self-contained blog posts (rather than textbooks that build on top of a common context) so I was wondering if there was anything like a canonical resource providing a unified technical / math-y perspective on the whole subject.
1
Wiki/Tag Page Edits and Discussion

Thursday, June 15th 2023
Thu, Jun 15th 2023

No posts for June 15th 2023
Shortform
1Anton Zheltoukhov2d
A THOUSAND NARRATIVES. THEORY OF MEMETIC EVOLUTION. PART 1/20. INTRO The ultimate goal of this line of research is to gain a better understanding of how human value system operates. The problem I see regarding current approaches to studying values is that we cannot study {values/desires/preferences} in isolation from the rest of cognitive mechanisms, cause according to latest theories values are just a part of a broader system governing behaviour in general. With that you have to have a decent model of human behaviour first to then be able to explain value dynamics. To get a good theory of the mind you have to meet multiple requirements: 1. A good theory of the mind must span at least four different timescales: (genetic evolution) for the billion years in which our brains have evolved; (memetic evolution) for the centuries of cultural accumulation of ideas through history; (personal) for the individual development during lifetime; and (neuronal) milliseconds during which cognitive inference happens. 2. A good theory must explain behaviour of the system on each of Marr’s three levels of analysis[1]: (1) the computational problem the system is solving; (2) the algorithm the system uses to solve that problem; and (3) how that algorithm is implemented in the “physical hardware” of the system. And, the part I think Marr is missing, the third level also has to include explanation of how the learning environment affects agent [https://www.lesswrong.com/posts/RCbofC8fCJ6NnYti7/intro-to-ontogenetic-curriculum]. 3. A good theory must at least make an attempt at answering the main questions: how is the generality of intelligence achieved?; what is the neural substrate of memory?; etc. To meet these requirements I’ve combined insights from several fields: Developmental Psychology, Neuroscience, Ethology and Computation models of mind. The result is the Narrative Theory. The research is still far from completion but there ar
Wiki/Tag Page Edits and Discussion

Wednesday, June 14th 2023
Wed, Jun 14th 2023

No posts for June 14th 2023
Shortform
8Portia3d
ACCURATELY ASSESSING SEX-RELATED CHARACTERISTICS SAVES LIVES. CAN WE MAKE IT FAIR TO ALL HUMANS, WOMEN, MEN, TRANS AND INTER FOLKS? A NERDY IDEA. Sex-related characteristics are medically relevant; accurately assessing them saves lives. But neither assigned sex nor gender identity alone properly capture them. Is anyone else interested in designing a characteristic string instead, so all humans, esp. all women and gender diverse folks, get proper medical care? This idea started yesterday, when I had severe abdominal pain, and started googling. Eventually, I reached sites that listed various potential conditions. Some occur in all people (e.g., stomach ulcers), albeit often not with the same presentation and frequency; others have very specific sex-based requirements (e.g. overian cyst, or testicular torsion). Some webpages introduced ovary-related things as “In women, it can also be…” Well, I thought - I highly doubt my trans girlfriend has an ovarian cyst. But we are used to getting medical advice that does not fit for her, aren't we? (In retrospect, why did I think that was okay, just because it was so common?) Other sites, apparently wanting to prevent this, stated “we use female in this text to refer to people assigned female at birth”. I was happy that they had thought about this and cared, but… frankly, that does not work either. I was assigned female at birth; that means I was born, and a doctor visually inspected me, and declared “female”. And yet I most certainly do not have a fallopian tube pregnancy now, because I had my tubes surgerically removed, which also sterilised me. I’m as likely as the dude next door to have a fallopian tube pregnancy now. An inter person assigned female at birth may also be dead certain they do not have an ectopian pregnancy, because their visual inspection at birth actually misjudged their genes and organs quite a bit. I wondered what I would have liked the website writers to use instead. And the more I thought about it, I th
1
Wiki/Tag Page Edits and Discussion

Tuesday, June 13th 2023
Tue, Jun 13th 2023

No posts for June 13th 2023
Shortform
5Yitz3d
Does anyone here know of (or would be willing to offer) funding for creating experimental visualization tools? I’ve been working on a program which I think has a lot of potential, but it’s the sort of thing where I expect it to be most powerful in the context of “accidental” discoveries made while playing with it (see e.g. early use of the microscope, etc.).
1
2Prometheus3d
The following is a conversation between myself in 2022, and a newer version of me earlier this year. On the Nature of Intelligence and its "True Name": 2022 Me:  This has become less obvious to me as I’ve tried to gain a better understanding of what general intelligence is. Until recently, I always made the assumption that intelligence and agency were the same thing. But General Intelligence, or G, might not be agentic. Agents that behave like RLs may only be narrow forms of intelligence, without generalizability. G might be something closer to a simulator. From my very naive perception of neuroscience, it could be that we (our intelligence) is not agentic, but just simulates agents. In this situation, the prefrontal cortex not only runs simulations to predict its next sensory input, but might also run simulations to predict inputs from other parts of the brain. In this scenario, “desire” or “goals”, might be simulations to better predict narrowly-intelligent agentic optimizers. Though the simulator might be myopic, I think this prediction model allows for non-myopic behavior, in a similar way GPT has non-myopic behavior, despite only trying to predict the next token (it has an understanding of where a future word “should” be within the context of a sentence, paragraph, or story). I think this model of G allows for the appearance of intelligent goal-seeking behavior, long-term planning, and self-awareness. I have yet to find another model for G that allows for all three. The True Name of G might be Algorithm Optimized To Reduce Predictive Loss. 2023 Me: interesting, me’22, but let me ask you something: you seem to think this majestic ‘G’ is something humans have, but other species do not, and then name the True Name of ‘G’ to be Algorithm Optimized To Reduce Predictive Loss. Do you *really* think other animals don’t do this? How long is a cat going to survive if it can’t predict where it’s going to land? Or where the mouse’s path trajectory is heading? Did you th
2devansh3d
(I promised I'd publish this last night no matter what state it was in, and then didn't get very far before the deadline. I will go back and edit and improve it later.)   I feel like I keep, over and over, hearing a complaint from people who get most of their information about college admissions from WhatsApp groups or their parents’ friends or a certain extraordinarily pervasive subreddit (you all know what I’m talking about). Something like “College admissions is ridiculous! Look at this person, who was top of his math class and took 10 AP classes and started lots of clubs, he didn’t get into a single Ivy, he’s going to UCLA!” I think the closest allegory I can find for this is something like “look at this guy, he’s 7 feet tall, didn’t even make it to the NBA!” There’s something important that they’re both missing, some fundamental confusion of a tiny part of the overall metric from reality.
2riceissa4d
I used to have a model of breathing that went something like this: when breathing in, the lungs somehow get bigger, creating lower air pressure inside the lungs causing air to flow in. Then when breathing out the lungs get smaller, creating higher air pressure inside the lungs and causing air to flow out. How do the lungs get bigger and smaller? Eventually I learned that there's a muscle called the diaphragm that is attached to the bottom of the lungs (??) that pulls or pushes the lungs. If I keep my nose plugged but my mouth open, the air will travel through my mouth. If I keep my mouth closed but my nose open, the air will travel through my nostrils. So far, so good. Then a few days ago, I noticed that if I keep both my nose and mouth open, I could choose to breathe in solely through one or the other. This... doesn't make sense, according to the model. The model would predict that the air just flows through both pathways, maybe preferentially going through the mouth since that seems like the larger pathway. So something is clearly wrong with how I think about breathing. Is there some sort of further switch inside that blocks one of the pathways? Does the nose or the mouth contain variable-size cavities that can control air pressure to direct the flow? I still have no idea. I'm eventually going to look it up, but I might think about this for a little bit longer (or maybe someone here will tell me). I thought this was a pretty interesting example of how the explanations you hear about seemingly-basic things are easy to accept but don't make sense on further reflection. But it's hard to notice the flaw too. In my case, after a recent ENT visit where I was told my nasal passages are inflamed, I've been putting more effort into consciously breathing through my nose. Then one day I woke up and as soon as I woke up I did something like consciously breathe through my nose with mouth closed, and then somehow I opened my mouth but then still tried to breathe through my n
3
Wiki/Tag Page Edits and Discussion

Monday, June 12th 2023
Mon, Jun 12th 2023

No posts for June 12th 2023
Shortform
21Czynski5d
This got deleted from 'The Dictatorship Problem [https://www.lesswrong.com/posts/pFaLqTHqBtAYfzAgx/the-dictatorship-problem]', which is catastrophically anxietybrained, so here's the comment: This is based in anxiety, not logic or facts. It's an extraordinarily weak argument. There's no evidence presented here which suggests rich Western countries are backsliding. Even the examples in Germany don't have anything worse than the US GOP produced ca. 2010. (And Germany is, due to their heavy censorship, worse at resisting fascist ideology than anyone with free speech, because you can't actually have those arguments in public.) If you want to present this case, take all those statistics and do economic breakdowns, e.g. by deciles of per-capita GDP. I expect you'll find that, for example, the Freedom House numbers show a substantial drop in 'Free' in the 40%-70% range and essentially no drop in 80%-100%. Of the seven points given for the US, all are a mix of maximally-anxious interpretation and facts presented misleadingly. These are all arguments where the bottom line ("Be Afraid") has been written first; none of this is reasonable unbiased inference. The case that mild fascism could be pretty bad is basically valid, I guess, but without the actual reason to believe that's likely, it's irrelevant, so it's mostly just misleading to dwell on it. Going back to the US points, because this is where the underlying anxiety prior is most visible: Interpretation, not fact. We're still in early enough stages that the reality of Biden is being compared to an idealized version of Trump - the race isn't in full swing yet and won't be for a while. Check back in October when we see how the primary is shaping up and people are starting to pay attention. This has been true for a while. Also, in assessing the consequences, it's assuming that Trump will win, which is correlated but far from guaranteed. Premise is a fact, conclusion is interpretation, and not at all a reliable one.
1
9mako yass4d
There's something very creepy to me about the part of research consent forms where it says "my participation was entirely voluntary." 1. Do they really think an involuntary participant wouldn't sign that? If they understand that they would, what purpose could this possibly serve, other than, as is commonly the purpose of contracts; absolving themselves of blame and moving blame to the participant? Which would be downright monstrous. Probably they just aren't fucking consequentialists, but this is all they end up doing. 2. This is a minor thing, but it adds an additional creepy garnish: Nothing is 100% voluntary, because everything is a function of the involuntary base reality that other people command force and resources and we want to use them for things so we have to go along with what other people want to some extent. I'm at peace with this, and I would prefer not to have to keep denying it, and it feels like I'm being asked to participate in the addling of moral philosophy.
5
3Johannes C. Mayer4d
I have a heuristic to evaluate topics to potentially write about where I especially look for topics to write about that usually people are averse to writing about. It seems that topics that score high according to this heuristic might be good to write about as they can yield content with high utility compared to what is available, simply because other content of this kind (and especially good content of this kind) is rare. Somebody told me that they read some of my writing and liked it. They said that they liked how honest it was. Perhaps writing about topics that are selected with this heuristic tends to invoke that feeling of honesty. Maybe just by being about something that people normally don't like to be honest about, or talk about at all. That might at least be part of the reason.
2lc4d
"No need to invoke slippery slope fallacies, here. Let's just consider the Czechoslovakian question in of itself" - Adolf Hitler
1James Spencer4d
WILL INTERNATIONAL AI ALIGNMENT COOPERATION TRUMP THE RIGHTS OF WEAKER COUNTRIES? TLDR - REAL COOPERATION ON INTERNATIONAL AI REGULATION MAY ONLY BE POSSIBLE THROUGH A MUCH MORE PEACEFUL BUT UNSENTIMENTAL FOREIGN POLICY  In 1987 President Reagan said to the United Nations "how quickly our differences worldwide would vanish if we were facing an alien threat from outside this world."  Isn't an unaligned Artificial General Intelligence that alien threat? And it's easy - and perhaps overly obvious and comforting - to say that humanity would unite, but now we have this threat what would that unity look like? Here's one not necessarily comforting thought, the weak (nations) will get trampled further by the strong (nations).  If cooperation rather than competition among power is vital then wouldn't we need to prioritise keeping powerful and potentially powerful countries - at least in AI terms - over other ideological concerns.  To see what this looks like let's look at some of those powerful countries: *  China - the obvious one, would we need to annoy the national security hawks over Taiwan, but also decent, humane liberals over Tibet and Sichuan?  * Russia - Ukraine would annoy just about everybody * Israel - Well this happens already because of domestic considerations, but it might reverse domestic political calculations on: * UK - the British are a big player in AI (and seemingly more important than the EU) so would needling them about Northern Ireland really be worth ticking off the one reliable ally the US has with clout? This is before looking at the role of countries that may be important in relation to AI and who the US wouldn't want going rogue on regulation but who neighbour China - such as Japan, South Korea and the chip superpower Taiwan.

Sunday, June 11th 2023
Sun, Jun 11th 2023

No posts for June 11th 2023
Shortform
25DirectedEvolution5d
Epistemic activism I think LW needs better language to talk about efforts to "change minds." Ideas like asymmetric weapons and the Dark Arts are useful but insufficient. In particular, I think there is a common scenario where: * You have an underlying commitment to open-minded updating and possess evidence or analysis that would update community beliefs in a particular direction. * You also perceive a coordination problem that inhibits this updating process for a reason that the mission or values of the group do not endorse. * Perhaps the outcome of the update would be a decline in power and status for high-status people. Perhaps updates in general can feel personally or professionally threatening to some people in the debate. Perhaps there's enough uncertainty in what the overall community believes that an information cascade has taken place. Perhaps the epistemic heuristics used by the community aren't compatible with the form of your evidence or analysis. * Solving this coordination problem to permit open-minded updating is difficult due to lack of understanding or resources, or by sabotage attempts. When solving the coordination problem would predictably lead to updating, then you are engaged in what I believe is an epistemically healthy effort to change minds. Let's call it epistemic activism for now. Here are some community touchstones I regard as forms of epistemic activism: * The founding of LessWrong and Effective Altruism * The one-sentence declaration on AI risks * The popularizing of terms like Dark Arts, asymmetric weapons, questionable research practices, and "importance hacking." * Founding AI safety research organizations and PhD programs to create a population of credible and credentialed AI safety experts; calls for AI safety researchers to publish in traditional academic journals so that their research can't be dismissed for not being subject to institutionalized peer review
1
2Dalcy Bremin6d
Why haven't mosquitos evolved to be less itchy? Is there just not enough selection pressure posed by humans yet? (yes probably) Or are they evolving towards that direction? (they of course already evolved towards being less itchy while biting, but not enough to make that lack-of-itch permanent) this is a request for help i've been trying and failing to catch this one for god knows how long plz halp tbh would be somewhat content coexisting with them (at the level of houseflies) as long as they evolved the itch and high-pitch noise away, modulo disease risk considerations.
4
1O O6d
A realistic takeover angle would be hacking into robots once we have them. We probably don’t want any way for robots to get over the air updates but it’s unlikely for this to be banned.

Saturday, June 10th 2023
Sat, Jun 10th 2023

No posts for June 10th 2023
Shortform
4Dalcy Bremin7d
Having lived ~19 years, I can distinctly remember around 5~6 times when I explicitly noticed myself experiencing totally new qualia with my inner monologue going “oh wow! I didn't know this dimension of qualia was a thing.” examples: * hard-to-explain sense that my mind is expanding horizontally with fractal cube-like structures (think bismuth) forming around it and my subjective experience gliding along its surface which lasted for ~5 minutes after taking zolpidem for the first time to sleep (2 days ago) * getting drunk for the first time (half a year ago) * feeling absolutely euphoric after having a cool math insight (a year ago) * ... Reminds me of myself around a decade ago, completely incapable of understanding why my uncle smoked, being "huh? The smoke isn't even sweet, why would you want to do that?" Now that I have [addiction-to-X] as a clear dimension of qualia/experience solidified in myself, I can better model their subjective experiences although I've never smoked myself. Reminds me of the SSC classic [https://slatestarcodex.com/2014/03/17/what-universal-human-experiences-are-you-missing-without-realizing-it/]. Also one observation is that it feels like the rate at which I acquire these is getting faster, probably because of increase in self-awareness + increased option space as I reach adulthood (like being able to drink). Anyways, I think it’s really cool, and can’t wait for more.
2
4DirectedEvolution7d
Lightly edited for stylishness
1
3Dagon6d
I give some probability space to being a Boltzmann-like simulation.  It's possible that I exist only for an instant, experience one quantum of input/output, and then am destroyed (presumably after the extra-universal simulators have measured something about the simulation). This is the most minimal form of Solipsism that I have been configured to conceive.  It's also a fun variation of MWI (though not actually connected logically) if it's the case that the simulators are running multiple parallel copies of any given instant, with slightly different configurations and inputs.
3DirectedEvolution7d
I use ChatGPT as a starting point to investigate hypotheses to test at my biomedical engineering job on a daily basis. I am able to independently approach the level of understanding of specific problems of an experienced chemist with many years of experience on certain problems, although his familiarity with our chemical systems and education makes him faster to arrive at the same result. This is a lived example of the phenomenon in which AI improves the performance of the lower-tier performers more than the higher-tier performers (I am a recent MS grad, he is a post-postdoc). So far, I haven't been able to get ChatGPT to independently troubleshoot effectively or propose improvements. This seems to be partly because it struggles profoundly to grasp and hang onto the specific details I have provided to it. It's as if our specific issue is mixed with more the more general problems it has encountered in its training. Or as if, whereas in the real world, strong evidence is common [https://www.lesswrong.com/posts/JD7fwtRQ27yc8NoqS/strong-evidence-is-common], to ChatGPT, what I tell it is only weak evidence. And if you can't update strongly on evidence in my research world, you just can't make progress. The way I use it instead is to validate and build confidence in my conjectures, and as an incredibly sophisticated form of search. I can ask it how very specific systems we use in our research, not covered in any one resource, likely work. And I can ask it to explain how complex chemical interactions are likely behaving in specific buffer and heat conditions. Then I can ask it how adjusting these parameters might affect the behavior of the system. An iterated process like this combines ChatGPT's unlimited generalist knowledge with my extremely specific understanding of our specific system to achieve a concrete, testable hypothesis that I can bring to work after a couple of hours. It feels like a natural, stimulating process. But you do have to be smart enough to steer th
2JNS6d
I got my entire foundation torn down, and with it came everything else. It all came crashing down in one giant heap of rubble. I’ll just rebuild, I thought - not realizing you can’t build without a foundation plan. So all I’ve ended up doing was shift through the rubble, searching for things that feel right. Now I am back, in a very literal sense, to where I all began, so much was built, so many things destroyed and corrupted, and a major piece ended and got buried. And all I got is “what the eff am I doing here?” The obvious answer is “yelling at the sky demanding answers” and being utterly ignored. I guess as per usual it is all up to me, except I don’t know how to rebuild myself……again. F…..

Friday, June 9th 2023
Fri, Jun 9th 2023

No posts for June 9th 2023
Shortform
3Dalcy Bremin7d
i absolutely hate bureaucracy, dumb forms, stupid websites etc. like, I almost had a literal breakdown trying to install Minecraft recently (and eventually failed). God.
1
3Quinn8d
"EV is measure times value" is a sufficiently load-bearing part of my worldview that if measure and value were correlated or at least one was a function of the other I would be very distressed. Like in a sense, is John [https://www.lesswrong.com/posts/voLHQgNncnjjgAPH7/utility-maximization-description-length-minimization] threatening to second-guess hundreds of years of consensus on is-ought?
3
3Stephen Fowler8d
Are humans aligned?  Bear with me!  Of course, I do not expect there is a single person browsing Short Forms who doesn't already have a well thought out answer to that question.  The straight forward (boring) interpretation of this question is "Are humans acting in a way that is moral or otherwise behaving like they obey a useful utility function." I don't think this question is particularly relevant to alignment. (But I do enjoy whipping out my best Rust Cohle impression [https://www.youtube.com/watch?v=Z5vwDfg3JNQ])  Sure, humans do bad stuff but almost every human manages to stumble along in a (mostly) coherent fashion. In this loose sense we are "aligned" to some higher level target, it just involves eating trash and reading your phone in bed. But I don't think this is a useful kind of alignment to build off of, and I don't think this is something we would want to replicate in an AGI. Human "alignment" is only being observed in an incredibly narrow domain. We notably don't have the ability to self modify and of course we are susceptible to wire-heading. Nothing about current humans should indicate to you that we would handle this extremely out of distribution shift well.  
1
3kuira8d
it's interesting that an intelligence in the 'original'/'top-level' universe also might [if simulation theory is valid] have evidence to assume it's close-to-certainly simulated maybe it would do acausal trade and precommit to not shutting down simulated intelligences
1Omega.7d
Quick updates:  * Our next critique (on Conjecture) will be published in 10 days.  * The critqiue after that will be on Anthropic. If you'd like to be a reviewer, or have critiques you'd like to share, please message us or email anonymouseaomega@gmail.com [anonymouseaomega@gmail.com]. * If you'd like to help edit our posts (incl. copy-editing - basic grammar etc, but also tone & structure suggestions and fact-checking/steel-manning), please email us! * We'd like to improve the pace of our publishing and think this is an area that external perspectives could help us * Make sure our content & tone is neutral & fair * Save us time so we can focus more on research and data gathering

Thursday, June 8th 2023
Thu, Jun 8th 2023

No posts for June 8th 2023
Shortform
11Czynski8d
The 'new user' flag being applied to old users with low karma is condescending as fuck. I'm not a new user. I'm an old user who has spent most of my recent time on LW telling people things they don't want to hear. Well, most of the time I've actually spent posting weekly meetups, but other than that.
4
5Garrett Baker8d
Last night I had a horrible dream: That I had posted to LessWrong a post filled with useless & meaningless jargon without noticing what I was doing, then I went to slee, and when I woke up I found I had <−60 karma on the post. When I read the post myself I noticed how meaningless the jargon was, and I myself couldn't resist giving it a strong-downvote.
5DirectedEvolution9d
Over the last six months, I've grown more comfortable writing posts that I know will be downvoted. It's still frustrating. But I used to feel intensely anxious when it happened, and now, it's mostly just a mild annoyance. The more you're able to publish your independent observations, without worrying about whether others will disagree, the better it is for community epistemics.
1
3jacquesthibs8d
AI labs should be dedicating a lot more effort into using AI for cybersecurity as a way to prevent weights or insights from being stolen. Would be good for safety and it seems like it could be a pretty big cash cow too. If they have access to the best models (or specialized), it may be highly beneficial for them to plug them in immediately to help with cybersecurity (perhaps even including noticing suspicious activity from employees). I don’t know much about cybersecurity so I’d be curious to hear from someone who does.
3Quinn8d
messy, jotting down notes: * I saw this thread https://twitter.com/alexschbrt/status/1666114027305725953 [https://twitter.com/alexschbrt/status/1666114027305725953] which my housemate had been warning me about for years. * failure mode can be understood as trying to aristotle the problem, lack of experimentation * thinking about the nanotech ASI threat model, where it solves nanotech overnight and deploys adversarial proteins in all the bloodstreams of all the lifeforms. * These are sometimes justified by Drexler's inside view of boundary conditions and physical limits. * But to dodge the aristotle problem, there would have to be an amount of bandwidth of what's passing between sensors and actuators (which may roughly correspond to the number of do applications in pearl) * Can you use something like communication complexity https://en.wikipedia.org/wiki/Communication_complexity [https://en.wikipedia.org/wiki/Communication_complexity] (between a system and an environment) to think about "lower bound on the number of sensor-actuator actions" mixed with sample complexity (statistical learning theory) * Like ok if you're simulating all of physics you can aristotle nanotech, for a sufficient definition of "all" that you would run up against realizability problems and cost way more than you actually need to spend. Like I'm thinking if there's a kind of complexity theory of pearl (number of do applications needed to acquire some kind of "loss"), then you could direct that at something like "nanotech projects" to fermstimate the way AIs might tradeoff between applying aristotlean effort (observation and induction with no experiment) and spending sensor-actuator interactions (with the world). There's a scenario in the sequences if I recall correctly about which physics an AI infers from 3 frames of a video of an apple falling, and something about how security mindset suggests you shouldn't expect your information-theoret

Load More Days