All Posts

Sorted by Magic (New & Upvoted)

Sunday, April 28th 2024
Sun, Apr 28th 2024

Quick Takes
3Heramb3h
Everyone who seems to be writing policy papers/ doing technical work seems to be keeping generative AI at the back of their mind, when framing their work or impact. This narrow-eyed focus on gen AI might almost certainly be net-negative for us- unknowingly or unintentionally ignoring ripple effects of the gen AI boom in other fields (like robotics companies getting more funding leading to more capabilities, and that leads to new types of risks). And guess who benefits if we do end up getting good evals/standards in place for gen AI? It seems to me companies/investors are clear winners because we have to go back to the drawing board and now advocate for the same kind of stuff for robotics or a different kind of AI use-case/type all while the development/capability cycles keep maturing. We seem to be in whack-a-mole territory now because of the overton window shifting for investors.

Saturday, April 27th 2024
Sat, Apr 27th 2024

Quick Takes
39Tamsin Leake1d
decision theory is no substitute for utility function some people, upon learning about decision theories such as LDT and how it cooperates on problems such as the prisoner's dilemma, end up believing the following: it's possible that this is true for some people, but in general i expect that to be a mistaken analysis of their values. decision theory cooperates with agents relative to how much power they have, and only when it's instrumental. in my opinion, real altruism (/egalitarianism/cosmopolitanism/fairness/etc) should be in the utility function which the decision theory is instrumental to. i actually intrinsically care about others; i don't just care about others instrumentally because it helps me somehow. some important aspects that my utility-function-altruism differs from decision-theoritic-cooperation includes: * i care about people weighed by moral patienthood, decision theory only cares about agents weighed by negotiation power. if an alien superintelligence is very powerful but isn't a moral patient, then i will only cooperate with it instrumentally (for example because i care about the alien moral patients that it has been in contact with); if cooperating with it doesn't help my utility function (which, again, includes altruism towards aliens) then i won't cooperate with that alien superintelligence. corollarily, i will take actions that cause nice things to happen to people even if they've very impoverished (and thus don't have much LDT negotiation power) and it doesn't help any other aspect of my utility function than just the fact that i value that they're okay. * if i can switch to a better decision theory, or if fucking over some non-moral-patienty agents helps me somehow, then i'll happily do that; i don't have goal-content integrity about my decision theory. i do have goal-content integrity about my utility function: i don't want to become someone who wants moral patients to unconsentingly-die or suffer, for example. * there seems to be
3
22Andrew Burns2d
So the usual refrain from Zvi and others is that the specter of China beating us to the punch with AGI is not real because limits on compute, etc. I think Zvi has tempered his position on this in light of Meta's promise to release the weights of its 400B+ model. Now there is word that SenseTime just released a model that beats GPT-4 Turbo on various metrics. Of course, maybe Meta chooses not to release its big model, and maybe SenseTime is bluffing--I would point out though that Alibaba's Qwen model seems to do pretty okay in the arena...anyway, my point is that I don't think the "what if China" argument can be dismissed as quickly as some people on here seem to be ready to do.
3
17Mati_Roy1d
it seems to me that disentangling beliefs and values are important part of being able to understand each other and using words like "disagree" to mean both "different beliefs" and "different values" is really confusing in that regard
1
2Abhimanyu Pallavi Sudhir21h
conditionalization is not the probabilistic version of implies P Q Q| P P → Q T T T T T F F F F T N/A T F F N/A T Resolution logic for conditionalization: if P: return Q else: return None Resolution logic for implies: if P: return Q else: return True (equivalently) return not P or Q
2nim2d
I've found an interesting "bug" in my cognition: a reluctance to rate subjective experiences on a subjective scale useful for comparing them. When I fuzz this reluctance against many possible rating scales, I find that it seems to arise from the comparison-power itself. The concrete case is that I've spun up a habit tracker on my phone and I'm trying to build a routine of gathering some trivial subjective-wellbeing and lifestyle-factor data into it. My prototype of this system includes tracking the high and low points of my mood through the day as recalled at the end of the day. This is causing me to interrogate the experiences as they're happening to see if a particular moment is a candidate for best or worst of the day, and attempt to mentally store a score for it to log later. I designed the rough draft of the system with the ease of it in mind -- I didn't think it would induce such struggle to slap a quick number on things. Yet I find myself worrying more than anticipated about whether I'm using the scoring scale "correctly", whether I'm biased by the moment to perceive the experience in a way that I'd regard as inaccurate in retrospect, and so forth. Fortunately it's not a big problem, as nothing particularly bad will happen if my data is sloppy, or if I don't collect it at all. But it strikes me as interesting, a gap in my self-knowledge that wants picking-at like peeling the inedible skin away to get at a tropical fruit.

Friday, April 26th 2024
Fri, Apr 26th 2024

Frontpage Posts
Quick Takes
13Fabien Roger2d
List sorting does not play well with few-shot mostly doesn't replicate with davinci-002. When using length-10 lists (it crushes length-5 no matter the prompt), I get: * 32-shot, no fancy prompt: ~25% * 0-shot, fancy python prompt: ~60%  * 0-shot, no fancy prompt: ~60% So few-shot hurts, but the fancy prompt does not seem to help. Code here. I'm interested if anyone knows another case where a fancy prompt increases performance more than few-shot prompting, where a fancy prompt is a prompt that does not contain information that a human would use to solve the task. This is because I'm looking for counterexamples to the following conjecture: "fine-tuning on k examples beats fancy prompting, even when fancy prompting beats k-shot prompting" (for a reasonable value of k, e.g. the number of examples it would take a human to understand what is going on).
12dirk2d
Sometimes a vague phrasing is not an inaccurate demarkation of a more precise concept, but an accurate demarkation of an imprecise concept
2
6MichaelDickens3d
Have there been any great discoveries made by someone who wasn't particularly smart? This seems worth knowing if you're considering pursuing a career with a low chance of high impact. Is there any hope for relatively ordinary people (like the average LW reader) to make great discoveries?
3
5otto.barten2d
My current main cruxes: 1. Will AI get takeover capability? When? 2. Single ASI or many AGIs? 3. Will we solve technical alignment? 4. Value alignment, intent alignment, or CEV? 5. Defense>offense or offense>defense? 6. Is a long-term pause achievable? If there is reasonable consensus on any one of those, I'd much appreciate to know about it. Else, I think these should be research priorities.
1
5quila3d
i'm watching Dominion again to remind myself of the world i live in, to regain passion to Make It Stop it's already working.
1

Thursday, April 25th 2024
Thu, Apr 25th 2024

Quick Takes
47Thomas Kwa3d
The cost of goods has the same units as the cost of shipping: $/kg. Referencing between them lets you understand how the economy works, e.g. why construction material sourcing and drink bottling has to be local, but oil tankers exist. * An iPhone costs $4,600/kg, about the same as SpaceX charges to launch it to orbit. [1] * Beef, copper, and off-season strawberries are $11/kg, about the same as a 75kg person taking a three-hour, 250km Uber ride costing $3/km. * Oranges and aluminum are $2-4/kg, about the same as flying them to Antarctica. [2] * Rice and crude oil are ~$0.60/kg, about the same as $0.72 for shipping it 5000km across the US via truck. [3,4] Palm oil, soybean oil, and steel are around this price range, with wheat being cheaper. [3] * Coal and iron ore are $0.10/kg, significantly more than the cost of shipping it around the entire world via smallish (Handysize) bulk carriers. Large bulk carriers are another 4x more efficient [6]. * Water is very cheap, with tap water $0.002/kg in NYC. But shipping via tanker is also very cheap, so you can ship it maybe 1000 km before equaling its cost. It's really impressive that for the price of a winter strawberry, we can ship a strawberry-sized lump of coal around the world 100-400 times. [1] iPhone is $4600/kg, large launches sell for $3500/kg, and rideshares for small satellites $6000/kg. Geostationary orbit is more expensive, so it's okay for GPS satellites to cost more than an iPhone per kg, but Starlink wants to be cheaper. [2] https://fred.stlouisfed.org/series/APU0000711415. Can't find numbers but Antarctica flights cost $1.05/kg in 1996. [3] https://www.bts.gov/content/average-freight-revenue-ton-mile [4] https://markets.businessinsider.com/commodities [5] https://www.statista.com/statistics/1232861/tap-water-prices-in-selected-us-cities/ [6] https://www.researchgate.net/figure/Total-unit-shipping-costs-for-dry-bulk-carrier-ships-per-tkm-EUR-tkm-in-2019_tbl3_351748799
44Eric Neyman4d
I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions: * By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value? * A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way." To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why? * To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI? * Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare as important consideration? What if we build a misaligned AI? * Do you think that, to a first approximation, most of the possible value of the future happens in worlds that are optimized for something that resembles your current or idealized values? How bad is it to mostly sacrifice each of these? (What if the future world's values are similar to yours, but is only kinda effectual at pursuing them? What if the world is optimized for something that's only slightly correlated with your values?) How likely are these various options under an aligned AI future vs. an unaligned AI future?
13
10avturchin3d
Roman Mazurenko is dead again. First resurrected person, Roman lived as a chatbot (2016-2024) created based on his conversations with his fiancé. You might even be able download him as an app.  But not any more. His fiancé married again and her startup http://Replika.ai pivoted from resurrection help to AI-girlfriends and psychological consulting.  It looks like they quietly removed Roman Mazurenko app from public access. It is especially pity that his digital twin lived less than his biological original, who died at 32. Especially now when we have much more powerful instruments for creating semi-uploads based on LLMs with large prompt window.
6keltan3d
A potentially good way to avoid low level criminals scamming your family and friends with a clone of your voice is to set a password that you each must exchange. An extra layer of security might be to make the password offensive, an info hazard, or politically sensitive. Doing this, criminals with little technical expertise will have a harder time bypassing corporate language filters. Good luck getting the voice model to parrot a basic meth recipe!
3
3Neil 3d
Poetry and practicality I was staring up at the moon a few days ago and thought about how deeply I loved my family, and wished to one day start my own (I'm just over 18 now). It was a nice moment. Then, I whipped out my laptop and felt constrained to get back to work; i.e. read papers for my AI governance course, write up LW posts, and trade emails with EA France. (These I believe to be my best shots at increasing everyone's odds of survival). It felt almost like sacrilege to wrench myself away from the moon and my wonder. Like I was ruining a moment of poetry and stillwatered peace by slamming against reality and its mundane things again. But... The reason I wrenched myself away is directly downstream from the spirit that animated me in the first place. Whether I feel the poetry now that I felt then is irrelevant: it's still there, and its value and truth persist. Pulling away from the moon was evidence I cared about my musings enough to act on them. The poetic is not a separate magisterium from the practical; rather the practical is a particular facet of the poetic. Feeling "something to protect" in my bones naturally extends to acting it out. In other words, poetry doesn't just stop. Feel no guilt in pulling away. Because, you're not.

Wednesday, April 24th 2024
Wed, Apr 24th 2024

Quick Takes
18Elizabeth4d
Check my math: how does Enovid compare to to humming? Nitric Oxide is an antimicrobial and immune booster. Normal nasal nitric oxide is 0.14ppm for women and 0.18ppm for men (sinus levels are 100x higher). journals.sagepub.com/doi/pdf/10.117… Enovid is a nasal spray that produces NO. I had the damndest time quantifying Enovid, but this trial registration says 0.11ppm NO/hour. They deliver every 8h and I think that dose is amortized, so the true dose is 0.88. But maybe it's more complicated. I've got an email out to the PI but am not hopeful about a response clinicaltrials.gov/study/NCT05109…   so Enovid increases nasal NO levels somewhere between 75% and 600% compared to baseline- not shabby. Except humming increases nasal NO levels by 1500-2000%. atsjournals.org/doi/pdf/10.116…. Enovid stings and humming doesn't, so it seems like Enovid should have the larger dose. But the spray doesn't contain NO itself, but compounds that react to form NO. Maybe that's where the sting comes from? Cystic fibrosis and burn patients are sometimes given stratospheric levels of NO for hours or days; if the burn from Envoid came from the NO itself than those patients would be in agony.  I'm not finding any data on humming and respiratory infections. Google scholar gives me information on CF and COPD, @Elicit brought me a bunch of studies about honey.   With better keywords google scholar to bring me a bunch of descriptions of yogic breathing with no empirical backing. There are some very circumstantial studies on illness in mouth breathers vs. nasal, but that design has too many confounders for me to take seriously.  Where I'm most likely wrong: * misinterpreted the dosage in the RCT * dosage in RCT is lower than in Enovid * Enovid's dose per spray is 0.5ml, so pretty close to the new study. But it recommends two sprays per nostril, so real dose is 2x that. Which is still not quite as powerful as a single hum. 
2
12Richard_Ngo4d
A tension that keeps recurring when I think about philosophy is between the "view from nowhere" and the "view from somewhere", i.e. a third-person versus first-person perspective—especially when thinking about anthropics. One version of the view from nowhere says that there's some "objective" way of assigning measure to universes (or people within those universes, or person-moments). You should expect to end up in different possible situations in proportion to how much measure your instances in those situations have. For example, UDASSA ascribes measure based on the simplicity of the computation that outputs your experience. One version of the view from somewhere says that the way you assign measure across different instances should depend on your values. You should act as if you expect to end up in different possible future situations in proportion to how much power to implement your values the instances in each of those situations has. I'll call this the ADT approach, because that seems like the core insight of Anthropic Decision Theory. Wei Dai also discusses it here. In some sense each of these views makes a prediction. UDASSA predicts that we live in a universe with laws of physics that are very simple to specify (even if they're computationally expensive to run), which seems to be true. Meanwhile the ADT approach "predicts" that we find ourselves at an unusually pivotal point in history, which also seems true. Intuitively I want to say "yeah, but if I keep predicting that I will end up in more and more pivotal places, eventually that will be falsified". But.... on a personal level, this hasn't actually been falsified yet. And more generally, acting on those predictions can still be positive in expectation even if they almost surely end up being falsified. It's a St Petersburg paradox, basically. Very speculatively, then, maybe a way to reconcile the view from somewhere and the view from nowhere is via something like geometric rationality, which avoids St
6
6sapphire5d
I prefer to keep plans private but I'm making big progress on meditation and mental re-wiring. Am working on a way to publicly demonstrate. Public plans just stress me out. I recently set two pretty ambitious goals. I figured I could use psychedelics to turbo-charge progress. The meditation one is coming along FAST. The other goal is honestly blocked a bit on being super out of shape. Multiple rounds of covid really destroyed my cardio and energy levels. Need to rebuild those before a big push on goal 2.
5faul_sname4d
So I keep seeing takes about how to tell if LLMs are "really exhibiting goal-directed behavior" like a human or whether they are instead "just predicting the next token". And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior. Concrete example. Let's say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this? The goal-directed-behavior story is as follows: * Jim pushed the turn signal lever because he wanted to alert surrounding drivers that he was moving right by one lane * Jim wanted to alert drivers that he was moving one lane right because he wanted to move his car one lane to the right. * Jim wanted to move his car one lane to the right in order to accomplish the goal of taking the next freeway offramp * Jim wanted to take the next freeway offramp because that was part of the most efficient route from his home to his workplace * Jim wanted to go to his workplace because his workplace pays him money * Jim wants money because money can be exchanged for goods and services * Jim wants goods and services because they get him things he terminally values like mates and food But there's an alternative story: * When in the context of "I am a middle-class adult", the thing to do is "have a job". Years ago, this context triggered Bob to perform the action "get a job", and now he's in the context of "having a job". * When in the context of "having a job", "showing up for work" is the expected behavior. * Earlier this morning, Bob had the context "it is a workday" and "I have a job", which triggered Bob to begin the sequence of actions associated with the behavior "commuting to work" * Bob is currently approaching the exit for his work - with the context of "commuting to work", this means the expected behavior is "get in the exit lane", and now he's in the context "switching one lane to the right" * In the con
4Nathan Young4d
I think I'm gonna start posting top blogpost to the main feed (mainly from dead writers or people I predict won't care) 

Tuesday, April 23rd 2024
Tue, Apr 23rd 2024

Frontpage Posts
Quick Takes
33Elizabeth6d
Brandon Sanderson is a bestselling fantasy author. Despite mostly working with traditional publishers, there is a 50-60 person company formed around his writing[1]. This podcast talks about how the company was formed. Things I liked about this podcast: 1. he and his wife both refer to it as "our" company and describe critical contributions she made. 2. the number of times he was dissatisfied with the way his publisher did something and so hired someone in his own company to do it (e.g. PR and organizing book tours), despite that being part of the publisher's job. 3. He believed in his back catalog enough to buy remainder copies of his books (at $1/piece) and sell them via his own website at sticker price (with autographs). This was a major source of income for a while.  4. Long term grand strategic vision that appears to be well aimed and competently executed. 1. ^ The only non-Sanderson content I found was a picture book from his staff artist. 
23Fabien Roger5d
I recently listened to The Righteous Mind. It was surprising to me that many people seem to intrinsically care about many things that look very much like good instrumental norms to me (in particular loyalty, respect for authority, and purity). The author does not make claims about what the reflective equilibrium will be, nor does he explain how the liberals stopped considering loyalty, respect, and purity as intrinsically good (beyond "some famous thinkers are autistic and didn't realize the richness of the moral life of other people"), but his work made me doubt that most people will have well-being-focused CEV. The book was also an interesting jumping point for reflection about group selection. The author doesn't make the sorts of arguments that would show that group selection happens in practice (and many of his arguments seem to show a lack of understanding of what opponents of group selection think - bees and cells cooperating is not evidence for group selection at all), but after thinking about it more, I now have more sympathy for group-selection having some role in shaping human societies, given that (1) many human groups died, and very few spread (so one lucky or unlucky gene in one member may doom/save the group) (2) some human cultures may have been relatively egalitarian enough when it came to reproductive opportunities that the individual selection pressure was not that big relative to group selection pressure and (3) cultural memes seem like the kind of entity that sometimes survive at the level of the group. Overall, it was often a frustrating experience reading the author describe a descriptive theory of morality and try to describe what kind of morality makes a society more fit in a tone that often felt close to being normative / fails to understand that many philosophers I respect are not trying to find a descriptive or fitness-maximizing theory of morality (e.g. there is no way that utilitarians think their theory is a good description of the k
21David Udell5d
The main thing I got out of reading Bostrom's Deep Utopia is a better appreciation of this "meaning of life" thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life. The book's premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you'd never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you're into learning, just ask! And similarly for any psychological state you're thinking of working towards. So, in that regime, it's effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything's heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it's important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil... but this defeats the purpose of those values. It's not practical benevolence if you had to ask for the danger to be left in place; it's not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you. Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significan
4
3Adam Shai6d
A neglected problem in AI safety technical research is teasing apart the mechanisms of dangerous capabilities exhibited by current LLMs. In particular, I am thinking that for any model organism ( see Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research) of dangerous capabilities (e.g. sleeper agents paper), we don't know how much of the phenomenon depends on the particular semantics of terms like "goal" and "deception" and "lie" (insofar as they are used in the scratchpad or in prompts or in finetuning data) or if the same phenomenon could be had by subbing in more or less any word. One approach to this is to make small toy models of these type of phenomenon where we can more easily control data distributions and yet still get analogous behavior. In this way we can really control for any particular aspect of the data and figure out, scientifically, the nature of these dangers. By small toy model I'm thinking of highly artificial datasets (perhaps made of binary digits with specific correlation structure, or whatever the minimum needed to get the phenomenon at hand).
1
1skybluecat5d
What's the endgame of technological or intelligent progress like? Not just for humans as we know it, but for all possible beings/civilizations in this universe, at least before it runs out of usable matter/energy? Would they invariably self-modify beyond their equivalent of humanness? Settle into some physical/cultural stable state? Keep getting better tech to compete within themselves if nothing else? Reach an end of technology or even intelligence beyond which advancement is no longer beneficial for survival? Spread as far as possible or concentrate resources? Accept the limited fate of the universe and live to the fullest or try to change it?  If they could change the laws of the universe, how would they?

Monday, April 22nd 2024
Mon, Apr 22nd 2024

Quick Takes
41Lucie Philippon6d
There was this voice inside my head that told me that since I got Something to protect, relaxing is never ok above strict minimum, the goal is paramount, and I should just work as hard as I can all the time. This led me to breaking down and being incapable to work on my AI governance job for a week, as I just piled up too much stress. And then, I decided to follow what motivated me in the moment, instead of coercing myself into working on what I thought was most important, and lo and behold! my total output increased, while my time spent working decreased. I'm so angry and sad at the inadequacy of my role models, cultural norms, rationality advice, model of the good EA who does not burn out, which still led me to smash into the wall despite their best intentions. I became so estranged from my own body and perceptions, ignoring my core motivations, feeling harder and harder to work. I dug myself such deep a hole. I'm terrified at the prospect to have to rebuild my motivation myself again.
5
3cousin_it6d
If the housing crisis is caused by low-density rich neighborhoods blocking redevelopment of themselves (as seems the consensus on the internet now), could it be solved by developers buying out an entire neighborhood or even town in one swoop? It'd require a ton of money, but redevelopment would bring even more money, so it could be win-win for everyone. Does it not happen only due to coordination difficulties?
2
3yanni7d
The catchphrase I walk around with in my head regarding the optimal strategy for AI Safety is something like: Creating Superintelligent Artificial Agents* (SAA) without a worldwide referendum is ethically unjustifiable. Until a consensus is reached on whether to bring into existence such technology, a global moratorium is required (*we already have AGI). I thought it might be useful to spell that out.
1Johannes C. Mayer6d
Can you iterate through 10^100 objects? If you have a 1GHz CPU you can do 1,000,000,000 operations per second. Let's assume that iterating through one one object takes only one operation. In a year you can do 10^16 operations. That means it would take 10^84 years to iterate through 10^100 verticies. The big bang was 1.4*10^10 years ago.
6
Wiki/Tag Page Edits and Discussion

Sunday, April 21st 2024
Sun, Apr 21st 2024

Personal Blogposts
Quick Takes
13yanni8d
I recently discovered the idea of driving all blames into oneself, which immediately resonated with me. It is relatively hardcore; the kind of thing that would turn David Goggins into a Buddhist. Gemini did a good job of summarising it: This quote by Pema Chödron, a renowned Buddhist teacher, represents a core principle in some Buddhist traditions, particularly within Tibetan Buddhism. It's called "taking full responsibility" or "taking self-blame" and can be a bit challenging to understand at first. Here's a breakdown: What it Doesn't Mean: * Self-Flagellation: This practice isn't about beating yourself up or dwelling on guilt. * Ignoring External Factors: It doesn't deny the role of external circumstances in a situation. What it Does Mean: * Owning Your Reaction: It's about acknowledging how a situation makes you feel and taking responsibility for your own emotional response. * Shifting Focus: Instead of blaming others or dwelling on what you can't control, you direct your attention to your own thoughts and reactions. * Breaking Negative Cycles: By understanding your own reactions, you can break free from negative thought patterns and choose a more skillful response. Analogy: Imagine a pebble thrown into a still pond. The pebble represents the external situation, and the ripples represent your emotional response. While you can't control the pebble (the external situation), you can control the ripples (your reaction). Benefits: * Reduced Suffering: By taking responsibility for your own reactions, you become less dependent on external circumstances for your happiness. * Increased Self-Awareness: It helps you understand your triggers and cultivate a more mindful response to situations. * Greater Personal Growth: By taking responsibility, you empower yourself to learn and grow from experiences. Here are some additional points to consider: * This practice doesn't mean excusing bad behavior. You can still hold others accountable while taking respons
3
8Tamsin Leake7d
Regardless of how good their alignment plans are, the thing that makes OpenAI unambiguously evil is that they created a strongly marketed public product and, as a result, caused a lot public excitement about AI, and thus lots of other AI capabilities organizations were created that are completely dismissive of safety. There's just no good reason to do that, except short-term greed at the cost of higher probability that everyone (including people at OpenAI) dies. (No, "you need huge profits to solve alignment" isn't a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)
14
4Quintin Pope8d
Idea for using current AI to accelerate medical research: suppose you were to take a VLM and train it to verbally explain the differences between two image data distributions. E.g., you could take 100 dog images, split them into two classes, insert tiny rectangles into class 1, feed those 100 images into the VLM, and then train it to generate the text "class 1 has tiny rectangles in the images". Repeat this for a bunch of different augmented datasets where we know exactly how they differ, aiming for a VLM with a general ability to in-context learn and verbally describe the differences between two sets of images. As training processes, keep making there be more and subtler differences, while training the VLM to describe all of them. Then, apply the model to various medical images. E.g., brain scans of people who are about to develop dementia versus those who aren't, skin photos of malignant and non-malignant blemishes, electron microscope images of cancer cells that can / can't survive some drug regimen, etc. See if the VLM can describe any new, human interpretable features. The VLM would generate a lot of false positives, obviously. But once you know about a possible feature, you can manually investigate whether it holds to distinguish other examples of the thing you're interested in. Once you find valid features, you can add those into the training data of the VLM, so it's no longer just trained on synthetic augmentations. You might have to start with real datasets that are particularly easy to tell apart, in order to jumpstart your VLM's ability to accurately describe the differences in real data. The other issue with this proposal is that it currently happens entirely via in context learning. This is inefficient and expensive (100 images is a lot for one model at once!). Ideally, the VLM would learn the difference between the classes by actually being trained on images from those classes, and learn to connect the resulting knowledge to language descriptions o
2Charlie Steiner7d
Humans using SAEs to improve linear probes / activation steering vectors might quickly get replaced by a version of probing / steering that leverages unlabeled data. Like, probing is finding a vector along which labeled data varies, and SAEs are finding vectors that are a sparse basis for unlabeled data. You can totally do both at once - find a vector along which labeled data varies and is part of a sparse basis for unlabeled data. This is a little bit related to an idea with the handle "concepts live in ontologies." If I say I'm going to the gym, this concept of "going to the gym" lives in an ontology where people and activites are basic components - it's probably also easy to use ideas like "You're eating dinner" in that ontology, but not "1,3-diisocyanatomethylbenzene." When you try to express one idea, you're also picking a "basis" for expressing similar ideas.
1Suzie. EXE8d
The Save State Paradox: A new question for the construct of reality in a simulated world Consider this thought experiment - in a simulated world (if we do indeed currently live in one), how could we detect an event similar to a state “reset”? Such events could be triggered for existential safety reasons or one unbeknownst to us? If this was the case, how would we become aware of such occurrences if we were reverted to a time before the execution; affecting memories, physical states and environmental continuity?  Imagine if seemingly inexplicable concepts like Deja Vu and the Mandela Effect could be explained away with such a theory. Let’s use an equation (aided by my custom GPT, Strawberry) to illustrate this: D=R×(1−A)×E×P×S Where D = the detection of the save state R = the rate of hypothesised resets occurring in the simulation  A = the probability of the ability of the simulation to carry this out effectively (memory alteration or time travel) E = external evidence that remains post-reset such as anomalies and unexplained phenomena  P= the probability that observers could spot/ measure these anomalies S = the stability of the simulation’s parameters such as space/time metrics, physical laws I’d be interested in hearing your thoughts and how this could be fleshed out further! How might we apply this thought experiment or model to examine the nature of our reality? What other tools could be used to uncover evidence of a reset?
2

Saturday, April 20th 2024
Sat, Apr 20th 2024

Quick Takes
11Elizabeth8d
A very rough draft of a plan to test prophylactics for airborne illnesses. Start with a potential superspreader event. My ideal is a large conference,  many of whom travelled to get there, in enclosed spaces with poor ventilation and air purification, in winter. Ideally >=4 days, so that people infected on day one are infectious while the conference is still running.  Call for sign-ups for testing ahead of time (disclosing all possible substances and side effects). Split volunteers into control and test group. I think you need ~500 sign ups in the winter to make this work.  Splitting controls is probably the hardest part. You'd like the control and treatment group to be identical, but there are a lot of things that affect susceptibility.  Age, local vs. air travel, small children vs. not, sleep habits... it's hard to draw the line Make it logistically trivial to use the treatment. If it's lozenges or liquids, put individually packed dosages in every bathroom, with a sign reminding people to use them (color code to direct people to the right basket). If it's a nasal spray you will need to give everyone their own bottle, but make it trivial to get more if someone loses theirs. Follow-up a week later, asking if people have gotten sick and when.  If the natural disease load is high enough this should give better data than any paper I've found.  Top contenders for this plan: * zinc lozenge  * salt water gargle * enovid * betadine gargle * zinc gargle
1
10Quinn9d
Thinking about a top-level post on FOMO and research taste * Fear of missing out defined as inability to execute on a project cuz there's a cooler project if you pivot * but it also gestures at more of a strict negative, where you think your project sucks before you finish it, so you never execute * was discussing this with a friend: "yeah I mean lesswrong is pretty egregious cuz it sorta promotes this idea of research taste as the ability to tear things down, which can be done armchair" * I've developed strategies to beat this FOMO and gain more depth and detail with projects (too recent to see returns yet, but getting there) but I also suspect it was nutritious of me to develop discernment about what projects are valuable or not valuable for various threat models and theories of change (in such a way that being a phd student off of lesswrong wouldn't have been as good in crucial ways, tho way better in other ways). * but I think the point is you have to turn off this discernment sometimes, unless you want to specialize in telling people why their plans won't work, which I'm more dubious on the value of than I used to be Idk maybe this shortform is most of the value of the top level post
8niplav8d
Consider proposing the most naïve formula for logical correlation[1]. Let a program p be a tuple of code for a Turing machine, intermediate tape states after each command execution, and output. All in binary. That is p=(c,t,o), with c∈{0,1}+,t∈({0,1}+)+ and o∈{0,1}+. Let l=|t| be the number of steps that p takes to halt. Then a formula for the logical correlation 合 [2] of two halting programs p1=(c1,t1,o1),p2=(c2,t2,o2), a tape-state discount factor γ[3], and a string-distance metric d:{0,1}+×{0,1}+→N could be 合 (p1,p2,γ)=d(o1,o2)−12+∑min(l1,l2)k=0γk⋅d(t1(l1−k),t2(l2−k)) The lower 合 , the higher the logical correlation between p1 and p2. The minimal value is −0.5. If d(o1,o2)<d(o1,o3), then it's also the case that 合 (p1,p2,γ)<合 (p1,p3,γ). One might also want to be able to deal with the fact that programs have different trace lengths, and penalize that, e.g. amending the formula: 合 ′(p1,p2,γ)=合 (p1,p2,γ)+2|l1−l2| I'm a bit unhappy that the code doesn't factor in the logical correlation, and ideally one would want to be able to compute the logical correlation without having to run the program. How does this relate to data=code? ---------------------------------------- 1. Actually not explained in detail anywhere, as far as I can tell. I'm going to leave out all motivation here. ↩︎ 2. Suggested by GPT-4. Stands for joining, combining, uniting. Also "to suit; to fit", "to have sexual intercourse", "to fight, to have a confrontation with", or "to be equivalent to, to add up". ↩︎ 3. Which is needed because tape states close to the output are more important than tape states early on. ↩︎
9
3Morpheus8d
Can anyone here recommend particular tools to practice grammar? Or with strong opinions on the best workflow/tool to correct grammar on the fly? I already know Grammarly and LanguageTool, but Grammarly seems steep at $30 per month when I don’t know if it is any good. I have tried GPT-4 before, but the main problems I have there, is that it is too slow and changes my sentences more than I would like (I tried to make it do that less through prompting, which did not help that much). I notice that feeling unconfident about my grammar/punctuation leads me to write less online, especially applying for jobs or fellowships, feels more icky because of it. That seems like an avoidable failure mode. Ideally, I would like something like the German Orthografietrainer (It was created to teach middle and high school children spelling and grammar). It teaches you on a sentence by sentence basis where to put the commas and why by explaining the sentence structure (Illustrated through additional examples). Because it trains you with particularly tricky sentences, the training is effective, and I rapidly got better at punctuation than my parents within ~3 hours. Is there a similar tool for English that I have never heard of? While writing this, I noticed that I did not have the free version of Grammarly enabled anymore and tried the free version while writing this. One trick I noticed is that it lists what kinds of error you are making across the whole text. So it is easy to infer what particular mistake I made in which spot, and then I correct it myself. Also, Grammarly did not catch a few simple spelling and punctuation mistakes that Grammarly caught (like “anymore” or the comma at the start of this sentence.). At the end, I also tried ProWritingAid, which found additional issues.
1
2Neil 8d
Can we have a black banner for the FHI? Not a person, still seems appropriate imo.

Friday, April 19th 2024
Fri, Apr 19th 2024

Frontpage Posts
Quick Takes
5Joel Burget10d
From the latest Conversations with Tyler interview of Peter Thiel I feel like Thiel misrepresents Bostrom here. He doesn’t really want a centralized world government or think that’s "a set of things that make sense and that are good". He’s forced into world surveillance not because it’s good but because it’s the only alternative he sees to dangerous ASI being deployed. I wouldn’t say he’s optimistic about human nature. In fact it’s almost the very opposite. He thinks that we’re doomed by our nature to create that which will destroy us.
4Johannes C. Mayer9d
Today I learned that being successful can involve feelings of hopelessness. When you are trying to solve a hard problem, where you have no idea if you can solve it, let alone if it is even solvable at all, your brain makes you feel bad. It makes you feel like giving up. This is quite strange because most of the time when I am in such a situation and manage to make a real efford anyway I seem to always suprise myself with how much progress I manage to make. Empirically this feeling of hopelessness does not seem to track the actual likelyhood that you will completely fail.
5

Load More Days