Sorted by New

Wiki Contributions


I think the term "Market Failure" describes an interesting phenomenon and there should be some term that describes situations where negative externalities are being generated, there is suboptimal production of a social good, etc. At the same time, it is easy to see how "market failure" easily gives laypeople additional connotations. 

Specifically, I agree that this phenomenon generalizes beyond what most people think of as "markets" (i.e. private firms doing business). I can see where this would bias most peoples' hasty analysis away from potential free-market solutions and towards that status quo  or cognitively-simple solutions ("we just ought to pass a law! Lets form a new agency to enforce stricter regulations!") without also taking the time to weigh the costs of those government interventions.

In some spaces, there are private self regulatory organizations, consumer watchdogs, civil liability, and licensing firms that can align firms closer towards socially optimal outcomes while having a greater incentive than the government to pay attention to the costs of those "regulations." But otherwise there's not really a market for law and regulation itself within the borders of any one country. 

In short, I fear many people perceive the words "market failure" as a local condemnation of capitalism and free markets when perhaps the better solution to these "market failures" is making more market in the form of a more responsive and accountable ecosystem of firms performing the currently monopolistic regulatory function of government.  

Im a person who is unusually eager to bite bullets when it comes to ethical thought experiments. Evolved vs. created moral patients is a new framework for me and I'm trying to think how much bullet I'd be willing to bite when it comes to privileging evolution. Especially if the future could include a really large number of created entities exhibiting agentic behavior relative to evolved ones.

I can imagine a spectrum of methods of creation that resemble evolution to various degrees. A domesticated dog seems more "created" and thus "purposed" by the evolved humans than a wolf, who can't claim a creator in the same way, but they seem to be morally equal to me, at least in this respect.

Similarly, if a person with desirable traits is chosen or chooses to be cloned, than the clones still seem to me to have the same moral weight as a normal human offspring, even though they are in some sense more purposed or artificially selected for than a typical child.

Of course, any ethical desideratum is going to have messy examples and edge cases, but I feel like I'm going to have a hard time applying this ethical framework when thinking about the future where the lines between created and evolved blur and where consequences are scaled up.

I look forward to reading the other entries in the sequence and will be sure to update this comment if I find I've profoundly missed the point.

I have seen several proposals for solving alignment (such as OpenAI's Superalignment initiative) involve harnessing incremental or "near-human level" AI to advance alignment research. I recall from recent interviews that Eliezer is skeptical of this approach at least partially on the grounds that an AI sufficiently advanced to contribute meaningfully to alignment work would 1.) already be dangerous in it's own right, and 2.) capable of deceiving human alignment researchers with false or insufficient proposals to advance alignment research. 

Would it be possible to resolve the second problem by neither holding the AI directly accountable to human researchers or an overseer AI, as is commonly suggested, but instead to a supermajority view of multiple AIs?

I am imagining having several similarly capable AIs, ideally based on fundamentally different models but at least incorporating some random argument, and prevented from directly communicating with each other. They would then be asked to answer the question, "Would the other AIs believe proposal X is a sound* alignment solution/research direction?" Answers would probably have to be limited to Yes/No or multiple choice. The AIs would be penalized for incorrectly coordinating with the other AIs to arrive at the same answer, up to and including termination. Thereby, an otherwise deceitful AI would reason that it's best chance of not being terminated early would be to provide non-deceptive answers. 

Of course, this doesn't resolve the risk of overshooting the "near-human intelligence" (which might plausibly occur to all or most of the AIs in the pool), one or more AI's discovering a way to circumvent the air gap and communicate with the other AIs, or the AIs discovering some shelling point that was stronger than truthful answers. I'm merely hoping to mitigate the deceptiveness EY foresees alignment researchers will fall prey to while leveraging incremental improvements in AI. 

 Outside view tells me this idea will be ripped to shreds. What am I missing? 

*I acknowledge the phrasing of this question side-steps the open problem of defining exactly what we want from a successful alignment program. I'm not trying to solve alignment so much as suggest an approach for combating deception specifically.

Maybe somewhat unrelated, but does anyone know if there's been an effort to narrate HP:MoR using AI? I have several friends that I think could really stand to enjoy it, but who can't get past the current audiobook narration. I mostly agree with them, although it's better on 1.5x.

Sorry for the late reply, I haven't commented much on LW and it didn't appreciate the time it would take for someone to reply to me, so I missed this until now. If I reply to you, Ape in the coat, does that notify dr_s too?

If I understand dr_s's quotation, I believe he's responding to the post I referenced. How Many Lives Does X-Risk Work Save from Non-Existence includes pretty early on:

Whenever I say "lives saved" this is shorthand for “future lives saved from nonexistence.” This is not the same as saving existing lives, which may cause profound emotional pain for people left behind, and some may consider more tragic than future people never being born.[6]

I assume a zero-discount rate for the value of future lives, meaning I assume the value of a life is not dependent on when that life occurs.

It seems pretty obvious to me that in almost any plausible scenario, the lifespan of a distant future entity with moral weight will be very different from what we currently think of as a natural life span (rounded to 100 years in the post I linked), but making estimates in terms of "lives saved from non existence" where life = 100 years is useful for making comparisons to other causes like "lives saved per $1,000 via malaria bed nets." It also seems appropriate for the post not to assume a discount rate and to leave that to the reader to apply themselves on top of the estimates presented.

I prefer something like "observer moments that might not have occurred" to "lives saved." I don't have strong preferences between a relatively small number of entities having long lives or more numerous entities having shorter lives, so long as the quality of the life per moment is held constant. 

As for dr_S's "How bad can a life be before the savings actually counts as damning" this seems easily resolvable to me by just allowing "people" of the far future the right to commit suicide, perhaps after a short waiting period. This would put a floor on the suffering they experience if they can't otherwise be guaranteed to have great lives. 

Thank you for a very thorough post. I think your writing has served me as a more organized account of some of my own impressions opposing longtermisim.

I agree with CrimsonChin in that I think there's a lot of your post many longtermists would agree with, including the practicality of focusing on short-term sub-goals. Also, I personally believe that initiatives like global health, poverty reduction, etc. probably improve the prospects of the far future, even if their expected value seems less than X-risk mitigation. 

Nonetheless, I still think we should be motivated by the immensity of the future even if it is off set by tiny probabilities and there are huge margins of error, because the lower bounds of these estimates appear to me as sufficiently high to be very compelling. The post How Many Lives Does X-Risk Work Save From Nonexistance On Average demonstrates my thinking on this by having estimates of future lives that vary by dozens of orders of magnitude(!) but still arrives at very high expected values for X-Risk work even on the lower bounds. 

Even I don't really feel anything when I read such massive numbers, and I acknowledge how large the intervals of these estimates are, but I wouldn't say they "make no sense to me" or that 'To the extent we can quantify existential risks in the far future, we can only say something like “extremely likely,' 'possible,' or 'can’t be ruled out.'" 

For what it's worth, I use to essentially be an egoist, and was unmoved by all of the charities I had ever encountered. It seemed to me that humanity was on a good trajectory and my personal impact would be negligible. It was only after I started thinking about really large numbers, like the duration of the universe, the age of humanity, the number of potential minds in the universe (credit to SFIA), how neglected these figures were, and moral uncertainty, that I started to feel like I could and should act for others. 

There are definitely many, possibly most, contexts where incredibly large or small numbers can be safely disregarded. I wouldn't be moved by them in adversarial situations, like Pascal's Mugging, or when doing my day-to-day moral decision making. But for question's like "What should I really care deeply about?" I think they should be considered. 

As for Pascal's Wager, It calls for picking a very specific God to worship out of a space of infinite possible contradictory gods, and this infinitely small probability of success cancels out the infinite reward of heaven over hell or non-existance. Dissimilarly, Longtermism isn't committed to any specific action regarding the far future, just the well being of entities in the future generally. I expect that most longtermists would gladly pivot away from a specific cause area (like AI alignment) if they were shown some other cause (E.g. a planet-killing asteroid certainly colliding with Earth in 100 years) was more likely to similarly adversely impact the far future.

Thank you for making these threads. I have been reading LW off and on for several years and this will be my first post.

My question: Is purposely leaning into creating a human wire-header an easier alignment target to hit than the more commonly touted goal of creating an aligned superintelligence that prevents the emergence of other potentially dangerous superintelligence, yet somehow reliably leaves humanity mostly in the driver's seat?

If the current forecast on aligning superintelligent AI is so dire, is there a point where it would make sense to just settle for ceding control and steering towards creating a superintelligence very likely to engage in wire heading humans (or post-humans)? I'm imagining the AI being tasked with tiling the universe with as many conscious entities as possible, with each experiencing as much pleasure as possible, and maybe with a bias towards maximizing pleasure over number of conscious entities as those goals constrain each other. I don't want to handwave this as being easy, I'm just curious if there's been much though to removing the constraint of "don't wirehead humans."

Background on me: One of my main departures from what I've come across so far is that I don't share as much concern about hedonism or wire heading. It seems to me a superintelligence would grasp that humans in their current form require things like novelty, purpose, and belonging and wouldn't just naively pump humans--as they currently are--full of drugs and call it a day. I don't see why these "higher values" couldn't be simulated or stimulated too. If I learned my current life was a simulation, but one that was being managed by an incredibly successful AI that could reliably keep the simulation running unperturbed, I would not want to exit my current life in order to seize more control in the "real world." 

Honestly, if an AI could replace humans with verifiably conscious simple entities engineered to experience greater satisfaction than current humans and without ever feeling boredom, I'm hard pressed to thing of anything more deserving of the universe's resources. 

The main concern I have with this so far is that the goal of "maximizing pleasure" could be very close in idea-space to "maximizing suffering," but it's still really hard for me to see a superintelligence cable of becoming a singleton making such an error or why it would deliberately switch to maximizing suffering.