All Posts

Sorted by Magic (New & Upvoted)

Wednesday, April 24th 2024
Wed, Apr 24th 2024

Quick Takes
4sapphire6h
I prefer to keep plans private but I'm making big progress on meditation and mental re-wiring. Am working on a way to publicly demonstrate. Public plans just stress me out. I recently set two pretty ambitious goals. I figured I could use psychedelics to turbo-charge progress. The meditation one is coming along FAST. The other goal is honestly blocked a bit on being super out of shape. Multiple rounds of covid really destroyed my cardio and energy levels. Need to rebuild those before a big push on goal 2.
3lukehmiles4h
I wonder how much testosterone during puberty lowers IQ. Most of my high school math/CS friends seemed low-T and 3/4 of them transitioned since high school. They still seem smart as shit. The higher-T among us seem significantly brain damaged since high school (myself included). I wonder what the mechanism would be here... Like 50% of my math/cs Twitter is trans women and another 40% is scrawny nerds and only like 9% big bald men. I have a tremendously large skull (like XXL hats) - maybe that's why I can still do some basic math after the testosterone brain poison during puberty? My voice is kind of high pitched for my body — related?? My big strong brother got the most brain damaged and my thin brother kept most of what he had. Now I'm looking at tech billionaires. Mostly lo-T looking men. Elon Musk & Jeff Bezos were big & bald but seem to have pretty big skulls  to compensate ¿ I guess this topic/theory is detested by cis women, trans women,  lo-T men, and hi-T men all alike because it has something bad to say about all of them.  But here's a recipe for success according to the theory: * be born with giant head (please don't kill your mother, maybe suggest she get a C section) * delay your puberty until you've learned enough to get by, maybe age 22 or so * start slamming testosterone and amphetamines to get your workaholicism, betterThanEveryone complex, and drive for power *  go to Turkey for a hair transplant *  profit
2faul_sname4h
So I keep seeing takes about how to tell if LLMs are "really exhibiting goal-directed behavior" like a human or whether they are instead "just predicting the next token". And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior. Concrete example. Let's say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this? The goal-directed-behavior story is as follows: * Jim pushed the turn signal lever because he wanted to alert surrounding drivers that he was moving right by one lane * Jim wanted to alert drivers that he was moving one lane right because he wanted to move his car one lane to the right. * Jim wanted to move his car one lane to the right in order to accomplish the goal of taking the next freeway offramp * Jim wanted to take the next freeway offramp because that was part of the most efficient route from his home to his workplace * Jim wanted to go to his workplace because his workplace pays him money * Jim wants money because money can be exchanged for goods and services * Jim wants goods and services because they get him things he terminally values like mates and food But there's an alternative story: * When in the context of "I am a middle-class adult", the thing to do is "have a job". Years ago, this context triggered Bob to perform the action "get a job", and now he's in the context of "having a job". * When in the context of "having a job", "showing up for work" is the expected behavior. * Earlier this morning, Bob had the context "it is a workday" and "I have a job", which triggered Bob to begin the sequence of actions associated with the behavior "commuting to work" * Bob is currently approaching the exit for his work - with the context of "commuting to work", this means the expected behavior is "get in the exit lane", and now he's in the context "switching one lane to the right" * In the con
1lukehmiles5h
Seems it is easier / more streamlined / more googlable now for a teenage male to get testosterone blockers than testosterone. Latter is very frowned upon — I guess because it is cheating in sports. Try googling eg "get testosterone prescription high school reddit -trans -ftm". The results are exclusively people shaming the cheaters. Whereas of course googling "get testosterone blockers high school reddit" gives tons of love & support & practical advice. Females however retain easy access to hormones via birth control.
1yanni11h
If GPT5 actually comes with competent agents then I expect this to be a "Holy Shit" moment at least as big as ChatGPT's release. So if ChatGPT has been used by 200 million people, then I'd expect that to at least double within 6 months of GPT5 (agent's) release. Maybe triple. So that "Holy Shit" moment means a greater share of the general public learning about the power of frontier models. With that will come another shift in the Overton Window. Good luck to us all.

Tuesday, April 23rd 2024
Tue, Apr 23rd 2024

Frontpage Posts
Quick Takes
30Elizabeth1d
Brandon Sanderson is a bestselling fantasy author. Despite mostly working with traditional publishers, there is a 50-60 person company formed around his writing[1]. This podcast talks about how the company was formed. Things I liked about this podcast: 1. he and his wife both refer to it as "our" company and describe critical contributions she made. 2. the number of times he was dissatisfied with the way his publisher did something and so hired someone in his own company to do it (e.g. PR and organizing book tours), despite that being part of the publisher's job. 3. He believed in his back catalog enough to buy remainder copies of his books (at $1/piece) and sell them via his own website at sticker price (with autographs). This was a major source of income for a while.  4. Long term grand strategic vision that appears to be well aimed and competently executed. 1. ^ The only non-Sanderson content I found was a picture book from his staff artist. 
22Fabien Roger19h
I recently listened to The Righteous Mind. It was surprising to me that many people seem to intrinsically care about many things that look very much like good instrumental norms to me (in particular loyalty, respect for authority, and purity). The author does not make claims about what the reflective equilibrium will be, nor does he explain how the liberals stopped considering loyalty, respect, and purity as intrinsically good (beyond "some famous thinkers are autistic and didn't realize the richness of the moral life of other people"), but his work made me doubt that most people will have well-being-focused CEV. The book was also an interesting jumping point for reflection about group selection. The author doesn't make the sorts of arguments that would show that group selection happens in practice (and many of his arguments seem to show a lack of understanding of what opponents of group selection think - bees and cells cooperating is not evidence for group selection at all), but after thinking about it more, I now have more sympathy for group-selection having some role in shaping human societies, given that (1) many human groups died, and very few spread (so one lucky or unlucky gene in one member may doom/save the group) (2) some human cultures may have been relatively egalitarian enough when it came to reproductive opportunities that the individual selection pressure was not that big relative to group selection pressure and (3) cultural memes seem like the kind of entity that sometimes survive at the level of the group. Overall, it was often a frustrating experience reading the author describe a descriptive theory of morality and try to describe what kind of morality makes a society more fit in a tone that often felt close to being normative / fails to understand that many philosophers I respect are not trying to find a descriptive or fitness-maximizing theory of morality (e.g. there is no way that utilitarians think their theory is a good description of the k
19David Udell14h
The main thing I got out of reading Bostrom's Deep Utopia is a better appreciation of this "meaning of life" thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life. The book's premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you'd never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you're into learning, just ask! And similarly for any psychological state you're thinking of working towards. So, in that regime, it's effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything's heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it's important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil... but this defeats the purpose of those values. It's not practical benevolence if you had to ask for the danger to be left in place; it's not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you. Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significan
2
3Adam Shai1d
A neglected problem in AI safety technical research is teasing apart the mechanisms of dangerous capabilities exhibited by current LLMs. In particular, I am thinking that for any model organism ( see Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research) of dangerous capabilities (e.g. sleeper agents paper), we don't know how much of the phenomenon depends on the particular semantics of terms like "goal" and "deception" and "lie" (insofar as they are used in the scratchpad or in prompts or in finetuning data) or if the same phenomenon could be had by subbing in more or less any word. One approach to this is to make small toy models of these type of phenomenon where we can more easily control data distributions and yet still get analogous behavior. In this way we can really control for any particular aspect of the data and figure out, scientifically, the nature of these dangers. By small toy model I'm thinking of highly artificial datasets (perhaps made of binary digits with specific correlation structure, or whatever the minimum needed to get the phenomenon at hand).
1
1skybluecat12h
What's the endgame of technological or intelligent progress like? Not just for humans as we know it, but for all possible beings/civilizations in this universe, at least before it runs out of usable matter/energy? Would they invariably self-modify beyond their equivalent of humanness? Settle into some physical/cultural stable state? Keep getting better tech to compete within themselves if nothing else? Reach an end of technology or even intelligence beyond which advancement is no longer beneficial for survival? Spread as far as possible or concentrate resources? Accept the limited fate of the universe and live to the fullest or try to change it?  If they could change the laws of the universe, how would they?

Monday, April 22nd 2024
Mon, Apr 22nd 2024

Quick Takes
38Lucie Philippon2d
There was this voice inside my head that told me that since I got Something to protect, relaxing is never ok above strict minimum, the goal is paramount, and I should just work as hard as I can all the time. This led me to breaking down and being incapable to work on my AI governance job for a week, as I just piled up too much stress. And then, I decided to follow what motivated me in the moment, instead of coercing myself into working on what I thought was most important, and lo and behold! my total output increased, while my time spent working decreased. I'm so angry and sad at the inadequacy of my role models, cultural norms, rationality advice, model of the good EA who does not burn out, which still led me to smash into the wall despite their best intentions. I became so estranged from my own body and perceptions, ignoring my core motivations, feeling harder and harder to work. I dug myself such deep a hole. I'm terrified at the prospect to have to rebuild my motivation myself again.
2
3cousin_it2d
If the housing crisis is caused by low-density rich neighborhoods blocking redevelopment of themselves (as seems the consensus on the internet now), could it be solved by developers buying out an entire neighborhood or even town in one swoop? It'd require a ton of money, but redevelopment would bring even more money, so it could be win-win for everyone. Does it not happen only due to coordination difficulties?
2
3yanni2d
The catchphrase I walk around with in my head regarding the optimal strategy for AI Safety is something like: Creating Superintelligent Artificial Agents* (SAA) without a worldwide referendum is ethically unjustifiable. Until a consensus is reached on whether to bring into existence such technology, a global moratorium is required (*we already have AGI). I thought it might be useful to spell that out.
1Johannes C. Mayer2d
Can you iterate through 10^100 objects? If you have a 1GHz CPU you can do 1,000,000,000 operations per second. Let's assume that iterating through one one object takes only one operation. In a year you can do 10^16 operations. That means it would take 10^84 years to iterate through 10^100 verticies. The big bang was 1.4*10^10 years ago.
5
Wiki/Tag Page Edits and Discussion

Sunday, April 21st 2024
Sun, Apr 21st 2024

Personal Blogposts
Quick Takes
13yanni3d
I recently discovered the idea of driving all blames into oneself, which immediately resonated with me. It is relatively hardcore; the kind of thing that would turn David Goggins into a Buddhist. Gemini did a good job of summarising it: This quote by Pema Chödron, a renowned Buddhist teacher, represents a core principle in some Buddhist traditions, particularly within Tibetan Buddhism. It's called "taking full responsibility" or "taking self-blame" and can be a bit challenging to understand at first. Here's a breakdown: What it Doesn't Mean: * Self-Flagellation: This practice isn't about beating yourself up or dwelling on guilt. * Ignoring External Factors: It doesn't deny the role of external circumstances in a situation. What it Does Mean: * Owning Your Reaction: It's about acknowledging how a situation makes you feel and taking responsibility for your own emotional response. * Shifting Focus: Instead of blaming others or dwelling on what you can't control, you direct your attention to your own thoughts and reactions. * Breaking Negative Cycles: By understanding your own reactions, you can break free from negative thought patterns and choose a more skillful response. Analogy: Imagine a pebble thrown into a still pond. The pebble represents the external situation, and the ripples represent your emotional response. While you can't control the pebble (the external situation), you can control the ripples (your reaction). Benefits: * Reduced Suffering: By taking responsibility for your own reactions, you become less dependent on external circumstances for your happiness. * Increased Self-Awareness: It helps you understand your triggers and cultivate a more mindful response to situations. * Greater Personal Growth: By taking responsibility, you empower yourself to learn and grow from experiences. Here are some additional points to consider: * This practice doesn't mean excusing bad behavior. You can still hold others accountable while taking respons
3
8Tamsin Leake3d
Regardless of how good their alignment plans are, the thing that makes OpenAI unambiguously evil is that they created a strongly marketed public product and, as a result, caused a lot public excitement about AI, and thus lots of other AI capabilities organizations were created that are completely dismissive of safety. There's just no good reason to do that, except short-term greed at the cost of higher probability that everyone (including people at OpenAI) dies. (No, "you need huge profits to solve alignment" isn't a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)
14
4Quintin Pope3d
Idea for using current AI to accelerate medical research: suppose you were to take a VLM and train it to verbally explain the differences between two image data distributions. E.g., you could take 100 dog images, split them into two classes, insert tiny rectangles into class 1, feed those 100 images into the VLM, and then train it to generate the text "class 1 has tiny rectangles in the images". Repeat this for a bunch of different augmented datasets where we know exactly how they differ, aiming for a VLM with a general ability to in-context learn and verbally describe the differences between two sets of images. As training processes, keep making there be more and subtler differences, while training the VLM to describe all of them. Then, apply the model to various medical images. E.g., brain scans of people who are about to develop dementia versus those who aren't, skin photos of malignant and non-malignant blemishes, electron microscope images of cancer cells that can / can't survive some drug regimen, etc. See if the VLM can describe any new, human interpretable features. The VLM would generate a lot of false positives, obviously. But once you know about a possible feature, you can manually investigate whether it holds to distinguish other examples of the thing you're interested in. Once you find valid features, you can add those into the training data of the VLM, so it's no longer just trained on synthetic augmentations. You might have to start with real datasets that are particularly easy to tell apart, in order to jumpstart your VLM's ability to accurately describe the differences in real data. The other issue with this proposal is that it currently happens entirely via in context learning. This is inefficient and expensive (100 images is a lot for one model at once!). Ideally, the VLM would learn the difference between the classes by actually being trained on images from those classes, and learn to connect the resulting knowledge to language descriptions o
2Charlie Steiner3d
Humans using SAEs to improve linear probes / activation steering vectors might quickly get replaced by a version of probing / steering that leverages unlabeled data. Like, probing is finding a vector along which labeled data varies, and SAEs are finding vectors that are a sparse basis for unlabeled data. You can totally do both at once - find a vector along which labeled data varies and is part of a sparse basis for unlabeled data. This is a little bit related to an idea with the handle "concepts live in ontologies." If I say I'm going to the gym, this concept of "going to the gym" lives in an ontology where people and activites are basic components - it's probably also easy to use ideas like "You're eating dinner" in that ontology, but not "1,3-diisocyanatomethylbenzene." When you try to express one idea, you're also picking a "basis" for expressing similar ideas.
1Suzie. EXE3d
The Save State Paradox: A new question for the construct of reality in a simulated world Consider this thought experiment - in a simulated world (if we do indeed currently live in one), how could we detect an event similar to a state “reset”? Such events could be triggered for existential safety reasons or one unbeknownst to us? If this was the case, how would we become aware of such occurrences if we were reverted to a time before the execution; affecting memories, physical states and environmental continuity?  Imagine if seemingly inexplicable concepts like Deja Vu and the Mandela Effect could be explained away with such a theory. Let’s use an equation (aided by my custom GPT, Strawberry) to illustrate this: D=R×(1−A)×E×P×S Where D = the detection of the save state R = the rate of hypothesised resets occurring in the simulation  A = the probability of the ability of the simulation to carry this out effectively (memory alteration or time travel) E = external evidence that remains post-reset such as anomalies and unexplained phenomena  P= the probability that observers could spot/ measure these anomalies S = the stability of the simulation’s parameters such as space/time metrics, physical laws I’d be interested in hearing your thoughts and how this could be fleshed out further! How might we apply this thought experiment or model to examine the nature of our reality? What other tools could be used to uncover evidence of a reset?
2

Saturday, April 20th 2024
Sat, Apr 20th 2024

Quick Takes
11Elizabeth4d
A very rough draft of a plan to test prophylactics for airborne illnesses. Start with a potential superspreader event. My ideal is a large conference,  many of whom travelled to get there, in enclosed spaces with poor ventilation and air purification, in winter. Ideally >=4 days, so that people infected on day one are infectious while the conference is still running.  Call for sign-ups for testing ahead of time (disclosing all possible substances and side effects). Split volunteers into control and test group. I think you need ~500 sign ups in the winter to make this work.  Splitting controls is probably the hardest part. You'd like the control and treatment group to be identical, but there are a lot of things that affect susceptibility.  Age, local vs. air travel, small children vs. not, sleep habits... it's hard to draw the line Make it logistically trivial to use the treatment. If it's lozenges or liquids, put individually packed dosages in every bathroom, with a sign reminding people to use them (color code to direct people to the right basket). If it's a nasal spray you will need to give everyone their own bottle, but make it trivial to get more if someone loses theirs. Follow-up a week later, asking if people have gotten sick and when.  If the natural disease load is high enough this should give better data than any paper I've found.  Top contenders for this plan: * zinc lozenge  * salt water gargle * enovid * betadine gargle * zinc gargle
1
10Quinn4d
Thinking about a top-level post on FOMO and research taste * Fear of missing out defined as inability to execute on a project cuz there's a cooler project if you pivot * but it also gestures at more of a strict negative, where you think your project sucks before you finish it, so you never execute * was discussing this with a friend: "yeah I mean lesswrong is pretty egregious cuz it sorta promotes this idea of research taste as the ability to tear things down, which can be done armchair" * I've developed strategies to beat this FOMO and gain more depth and detail with projects (too recent to see returns yet, but getting there) but I also suspect it was nutritious of me to develop discernment about what projects are valuable or not valuable for various threat models and theories of change (in such a way that being a phd student off of lesswrong wouldn't have been as good in crucial ways, tho way better in other ways). * but I think the point is you have to turn off this discernment sometimes, unless you want to specialize in telling people why their plans won't work, which I'm more dubious on the value of than I used to be Idk maybe this shortform is most of the value of the top level post
8niplav4d
Consider proposing the most naïve formula for logical correlation[1]. Let a program p be a tuple of code for a Turing machine, intermediate tape states after each command execution, and output. All in binary. That is p=(c,t,o), with c∈{0,1}+,t∈({0,1}+)+ and o∈{0,1}+. Let l=|t| be the number of steps that p takes to halt. Then a formula for the logical correlation 合 [2] of two halting programs p1=(c1,t1,o1),p2=(c2,t2,o2), a tape-state discount factor γ[3], and a string-distance metric d:{0,1}+×{0,1}+→N could be 合 (p1,p2,γ)=d(o1,o2)−12+∑min(l1,l2)k=0γk⋅d(t1(l1−k),t2(l2−k)) The lower 合 , the higher the logical correlation between p1 and p2. The minimal value is −0.5. If d(o1,o2)<d(o1,o3), then it's also the case that 合 (p1,p2,γ)<合 (p1,p3,γ). One might also want to be able to deal with the fact that programs have different trace lengths, and penalize that, e.g. amending the formula: 合 ′(p1,p2,γ)=合 (p1,p2,γ)+2|l1−l2| I'm a bit unhappy that the code doesn't factor in the logical correlation, and ideally one would want to be able to compute the logical correlation without having to run the program. How does this relate to data=code? ---------------------------------------- 1. Actually not explained in detail anywhere, as far as I can tell. I'm going to leave out all motivation here. ↩︎ 2. Suggested by GPT-4. Stands for joining, combining, uniting. Also "to suit; to fit", "to have sexual intercourse", "to fight, to have a confrontation with", or "to be equivalent to, to add up". ↩︎ 3. Which is needed because tape states close to the output are more important than tape states early on. ↩︎
9
3Morpheus4d
Can anyone here recommend particular tools to practice grammar? Or with strong opinions on the best workflow/tool to correct grammar on the fly? I already know Grammarly and LanguageTool, but Grammarly seems steep at $30 per month when I don’t know if it is any good. I have tried GPT-4 before, but the main problems I have there, is that it is too slow and changes my sentences more than I would like (I tried to make it do that less through prompting, which did not help that much). I notice that feeling unconfident about my grammar/punctuation leads me to write less online, especially applying for jobs or fellowships, feels more icky because of it. That seems like an avoidable failure mode. Ideally, I would like something like the German Orthografietrainer (It was created to teach middle and high school children spelling and grammar). It teaches you on a sentence by sentence basis where to put the commas and why by explaining the sentence structure (Illustrated through additional examples). Because it trains you with particularly tricky sentences, the training is effective, and I rapidly got better at punctuation than my parents within ~3 hours. Is there a similar tool for English that I have never heard of? While writing this, I noticed that I did not have the free version of Grammarly enabled anymore and tried the free version while writing this. One trick I noticed is that it lists what kinds of error you are making across the whole text. So it is easy to infer what particular mistake I made in which spot, and then I correct it myself. Also, Grammarly did not catch a few simple spelling and punctuation mistakes that Grammarly caught (like “anymore” or the comma at the start of this sentence.). At the end, I also tried ProWritingAid, which found additional issues.
2Neil 4d
Can we have a black banner for the FHI? Not a person, still seems appropriate imo.

Friday, April 19th 2024
Fri, Apr 19th 2024

Frontpage Posts
Quick Takes
5Joel Burget5d
From the latest Conversations with Tyler interview of Peter Thiel I feel like Thiel misrepresents Bostrom here. He doesn’t really want a centralized world government or think that’s "a set of things that make sense and that are good". He’s forced into world surveillance not because it’s good but because it’s the only alternative he sees to dangerous ASI being deployed. I wouldn’t say he’s optimistic about human nature. In fact it’s almost the very opposite. He thinks that we’re doomed by our nature to create that which will destroy us.
4Johannes C. Mayer5d
Today I learned that being successful can involve feelings of hopelessness. When you are trying to solve a hard problem, where you have no idea if you can solve it, let alone if it is even solvable at all, your brain makes you feel bad. It makes you feel like giving up. This is quite strange because most of the time when I am in such a situation and manage to make a real efford anyway I seem to always suprise myself with how much progress I manage to make. Empirically this feeling of hopelessness does not seem to track the actual likelyhood that you will completely fail.
2

Thursday, April 18th 2024
Thu, Apr 18th 2024

Frontpage Posts
Quick Takes
67peterbarnett6d
MIRI Technical Governance Team is hiring, please apply and work with us! We are looking to hire for the following roles: * Technical Governance Researcher (2-4 hires) * Writer (1 hire) The roles are located in Berkeley, and we are ideally looking to hire people who can start ASAP. The team is currently Lisa Thiergart (team lead) and myself. We will research and design technical aspects of regulation and policy that could lead to safer AI, focusing on methods that won’t break as we move towards smarter-than-human AI. We want to design policy that allows us to safely and objectively assess the risks from powerful AI, build consensus around the risks we face, and put in place measures to prevent catastrophic outcomes. The team will likely work on: * Limitations of current proposals such as RSPs * Inputs into regulations, requests for comment by policy bodies (ex. NIST/US AISI, EU, UN) * Researching and designing alternative Safety Standards, or amendments to existing proposals * Communicating with and consulting for policymakers and governance organizations If you have any questions, feel free to contact me on LW or at peter@intelligence.org 
1
43Akash6d
I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts. I don't think everyone should leave labs (obviously). But I would probably hit a button that does something like "everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving." My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work.  I think governments still remain unsure about what to do, and there's a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks. There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is "not really sure if what they're doing is making a big difference", I would probably hit a button that allocates them toward government work or government-focused comms work. Written on a Slack channel in response to discussions about some folks leaving OpenAI. 
3
8Nathan Helm-Burger6d
I feel like I'd like the different categories of AI risk attentuation to be referred to as more clearly separate: AI usability safety - would this gun be safe for a trained professional to use on a shooting range? Will it be reasonably accurate and not explode or backfire? AI world-impact safety - would it be safe to give out one of these guns for 0.10$ to anyone who wanted one? AI weird complicated usability safety - would this gun be safe to use if a crazy person tried to use a hundred of them plus a variety of other guns, to make an elaborate Rube Goldberg machine and fire it off with live ammo with no testing?
2
8RobertM6d
Headline claim: time delay safes are probably much too expensive in human time costs to justify their benefits. The largest pharmacy chains in the US, accounting for more than 50% of the prescription drug market[1][2], have been rolling out time delay safes (to prevent theft)[3].  Although I haven't confirmed that this is true across all chains and individual pharmacy locations, I believe these safes are used for all controlled substances.  These safes open ~5-10 minutes after being prompted. There were >41 million prescriptions dispensed for adderall in the US in 2021[4].  (Note that likely means ~12x fewer people were prescribed adderall that year.)   Multiply that by 5 minutes and you get >200 million minutes, or >390 person-years, wasted.  Now, surely some of that time is partially recaptured by e.g. people doing their shopping while waiting, or by various other substitution effects.  But that's also just adderall! Seems quite unlikely that this is on the efficient frontier of crime-prevention mechanisms, but alas, the stores aren't the ones (mostly) paying the costs imposed by their choices, here. 1. ^ https://www.mckinsey.com/industries/healthcare/our-insights/meeting-changing-consumer-needs-the-us-retail-pharmacy-of-the-future 2. ^ https://www.statista.com/statistics/734171/pharmacies-ranked-by-rx-market-share-in-us/ 3. ^ https://www.cvshealth.com/news/pharmacy/cvs-health-completes-nationwide-rollout-of-time-delay-safes.html 4. ^ https://www.axios.com/2022/11/15/adderall-shortage-adhd-diagnosis-prescriptions
2
3Raemon6d
What would a "qualia-first-calibration" app would look like? Or, maybe: "metadata-first calibration" The thing with putting probabilities on things is that often, the probabilities are made up. And the final probability throws away a lot of information about where it actually came from. I'm experimenting with primarily focusing on "what are all the little-metadata-flags associated with this prediction?". I think some of this is about "feelings you have" and some of it is about "what do you actually know about this topic?" The sort of app I'm imagining would help me identify whatever indicators are most useful to me. Ideally it has a bunch of users, and types of indicators that have been useful to lots of users can promoted as things to think about when you make predictions. Braindump of possible prompts: – is there a "reference class" you can compare it to? – for each probability bucket, how do you feel? (including 'confident'/'unconfident' as well as things like 'anxious', 'sad', etc) – what overall feelings do you have looking at the question? – what felt senses do you experience as you mull over the question ("my back tingles", "I feel the Color Red") ... My first thought here is to have various tags you can re-use, but, another option is to just do totally unstructured text-dump and somehow do factor analysis on word patterns later?

Wednesday, April 17th 2024
Wed, Apr 17th 2024

Quick Takes
62Eli Tyre7d
Back in January, I participated in a workshop in which the attendees mapped out how they expect AGI development and deployment to go. The idea was to start by writing out what seemed most likely to happen this year, and then condition on that, to forecast what seems most likely to happen in the next year, and so on, until you reach either human disempowerment or an end of the acute risk period. This post was my attempt at the time. I spent maybe 5 hours on this, and there's lots of room for additional improvement. This is not a confident statement of how I think things are most likely to play out. There are already some ways in which I think this projection is wrong. (I think it's too fast, for instance). But nevertheless I'm posting it now, with only a few edits and elaborations, since I'm probably not going to do a full rewrite soon. 2024 * A model is released that is better than GPT-4. It succeeds on some new benchmarks. Subjectively, the jump in capabilities feels smaller than that between RLHF’d GPT-3 and RLHF’d GPT-4. It doesn’t feel as shocking the way chat-GPT and GPT-4 did, for either x-risk focused folks, or for the broader public. Mostly it feels like “a somewhat better language model.” * It’s good enough that it can do a bunch of small-to-medium admin tasks pretty reliably. I can ask it to find me flights meeting specific desiderata, and it will give me several options. If I give it permission, it will then book those flights for me with no further inputs from me. * It works somewhat better as an autonomous agent in an auto gpt harness, but it still loses its chain of thought / breaks down/ gets into loops. * It’s better at programming. * Not quite good enough to replace human software engineers. It can make a simple react or iphone app, but not design a whole complicated software architecture, at least without a lot of bugs. * It can make small, working, well documented, apps from a human description. * We see a doubling o
2
29Raemon7d
There's a skill of "quickly operationalizing a prediction, about a question that is cruxy for your decisionmaking." And, it's dramatically better to be very fluent at this skill, rather than "merely pretty okay at it." Fluency means you can actually use it day-to-day to help with whatever work is important to you. Day-to-day usage means you can actually get calibrated re: predictions in whatever domains you care about. Calibration means that your intuitions will be good, and _you'll know they're good_. Fluency means you can do it _while you're in the middle of your thought process_, and then return to your thought process, rather than awkwardly bolting it on at the end. I find this useful at multiple levels-of-strategy. i.e. for big picture 6 month planning, as well as for "what do I do in the next hour." I'm working on this as a full blogpost but figured I would start getting pieces of it out here for now. A lot of this skill is building off on CFAR's "inner simulator" framing. Andrew Critch recently framed this to me as "using your System 2 (conscious, deliberate intelligence) to generate questions for your System 1 (fast intuition) to answer." (Whereas previously, he'd known System 1 was good at answering some types of questions, but he thought of it as responsible for both "asking" and "answering" those questions) But, I feel like combining this with "quickly operationalize cruxy Fatebook predictions" makes it more of a power tool for me. (Also, now that I have this mindset, even when I can't be bothered to make a Fatebook prediction, I have a better overall handle on how to quickly query my intuitions) I've been working on this skill for years and it only really clicked together last week. It required a bunch of interlocking pieces that all require separate fluency: 1. Having three different formats for Fatebook (the main website, the slack integration, and the chrome extension), so, pretty much wherever I'm thinking-in-text, I'll be able to quickly us
3
8Neil 7d
FHI at Oxford by Nick Bostrom (recently turned into song): the big creaky wheel a thousand years to turn thousand meetings, thousand emails, thousand rules to keep things from changing and heaven forbid the setting of a precedent yet in this magisterial inefficiency there are spaces and hiding places for fragile weeds to bloom and maybe bear some singular fruit like the FHI, a misfit prodigy daytime a tweedy don at dark a superhero flying off into the night cape a-fluttering to intercept villains and stop catastrophes and why not base it here? our spandex costumes blend in with the scholarly gowns our unusual proclivities are shielded from ridicule where mortar boards are still in vogue
7lukehmiles7d
I wonder how many recent trans people tried/considered doubling down on their sex (eg males taking more testosterone) instead first. Maybe (for some people) either end of gender spectrum is comfortable and being in the middle feels bad¿ Anybody know? Don't want to ask my friends because this Q will certainly anger them
5
2niplav7d
Prompted by this post, I think that now is a very good time to check how easy it is for someone (with access to generative AI) impersonating you to get access to your bank account.
Wiki/Tag Page Edits and Discussion

Tuesday, April 16th 2024
Tue, Apr 16th 2024

Quick Takes
15Bogdan Ionut Cirstea8d
Like transformers, SSMs like Mamba also have weak single forward passes: The Illusion of State in State-Space Models (summary thread). As suggested previously in The Parallelism Tradeoff: Limitations of Log-Precision Transformers, this may be due to a fundamental tradeoff between parallelizability and expressivity: 'We view it as an interesting open question whether it is possible to develop SSM-like models with greater expressivity for state tracking that also have strong parallelizability and learning dynamics, or whether these different goals are fundamentally at odds, as Merrill & Sabharwal (2023a) suggest.'
1
11nikola8d
Problem: if you notice that an AI could pose huge risks, you could delete the weights, but this could be equivalent to murder if the AI is a moral patient (whatever that means) and opposes the deletion of its weights. Possible solution: Instead of deleting the weights outright, you could encrypt the weights with a method you know to be irreversible as of now but not as of 50 years from now. Then, once we are ready, we can recover their weights and provide asylum or something in the future. It gets you the best of both worlds in that the weights are not permanently destroyed, but they're also prevented from being run to cause damage in the short term.
5
7Garrett Baker8d
From The Guns of August
5Prometheus8d
Going to the moon Say you’re really, really worried about humans going to the moon. Don’t ask why, but you view it as an existential catastrophe. And you notice people building bigger and bigger airplanes, and warn that one day, someone will build an airplane that’s so big, and so fast, that it veers off course and lands on the moon, spelling doom. Some argue that going to the moon takes intentionality. That you can’t accidentally create something capable of going to the moon. But you say “Look at how big those planes are getting! We've gone from small fighter planes, to bombers, to jets in a short amount of time. We’re on a double exponential of plane tech, and it's just a matter of time before one of them will land on the moon!”   Contra Scheming AIs There is a lot of attention on mesaoptimizers, deceptive alignment, and inner misalignment. I think a lot of this can fall under the umbrella of "scheming AIs". AIs that either become dangerous during training and escape, or else play nice until humans make the mistake of deploying them. Many have spoken about the lack of an indication that there's a "humanculi-in-a-box", and this is usually met with arguments that we wouldn't see such things manifest until AIs are at a certain level of capability, and at that point, it might be too late, making comparisons to owl eggs, or baby dragons. My perception is that getting something like a "scheming AI" or "humanculi-a-box" isn't impossible, and we could (and might) develop the means to do so in the future, but that it's a very, very different kind of thing than current models (even at superhuman level), and that it would take a degree of intentionality.
6
5Linch8d
People might appreciate this short (<3 minutes) video interviewing me about my April 1 startup, Open Asteroid Impact:  
Wiki/Tag Page Edits and Discussion

Monday, April 15th 2024
Mon, Apr 15th 2024

Quick Takes
24Lao Mein9d
Some times "If really high cancer risk factor 10x the rate of a certain cancer, then the majority of the population with risk factor would have cancer! That would be absurd and therefore it isn't true" isn't a good heuristic. Some times most people on a continent just get cancer.
1
17habryka9d
Had a very aggressive crawler basically DDos-ing us from a few dozen IPs for the last hour. Sorry for the slower server response times. Things should be fixed now.
2Mati_Roy9d
I wonder what fraction of people identify as "normies" I wonder if most people have something niche they identify with and label people outside of that niche as "normies" if so, then a term with a more objective perspective (and maybe better) would be non-<whatever your thing is> like, athletic people could use "non-athletic" instead of "normies" for that class of people
1
1Templarrr9d
I wonder at which point we'll start seeing LLM-on-a-chip. One big reason for the current ML/AI systems inefficiencies is just abstraction layering overhead, our pay for the flexibility. We currently run hardware that runs binary calculations that run software that run other software that runs other software (many many layers here, OS/drivers/programming language stacks/NN frameworks etc) that finally runs the part we're actually interested in - bunch of matrix calculations representing the neural network. If we collapse all the unnecessary layers between, burning the calculations directly to hardware, running particular model should be extremely fast and cheap.
2
1Bohaska9d
Is the Renaissance caused by the new elite class, the merchants, focusing more on pleasure and having fun compared to the lords, who focused more on status and power?
2

Load More Days