If it’s worth saying, but not worth its own post, here's a place to put it. (You can also make a shortform post)

And, if you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the new Concepts section.

The Open Thread tag is here.

New Comment
71 comments, sorted by Click to highlight new comments since: Today at 9:58 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Covid19Projections has been one of the most successful coronavirus models in large part because it is as 'model-free' and simple as possible, using ML to backtrack parameters for a simple SEIR model from death data only. This has proved useful because case numbers are skewed by varying numbers of tests, so deaths are more consistently reliable as a metric. You can see the code here.

However, in countries doing a lot of testing, with a reasonable number of cases but with very few deaths, like most of Europe, the model is not that informative, and essentially predicts near 0 deaths out to the limit of its measure. This is expected - the model is optimised for the US.

Estimating SEIR parameters based on deaths works well when you have a lot of deaths to count, if you don't then you need another method. Estimating purely based on cases has its own pitfalls - see this from epidemic forecasting, which mistook an increase in testing in the UK mid-july for a sharp jump in cases and wrongly inferred brief jump in R_t. As far as I understand their paper, the estimate of R_t from case data adjusts for delays in infection to onset and for other things, but not for the positivity r... (read more),,,

Greetings all, and thanks for having me! :) I'm an AI enthusiast, based in Hamilton NZ. Where until recently I was enrolled in and studying strategic management and computer science. Specifically, 'AI technical strategy'. After corona virus and everything that's been happening in the world, I've moved away from formal studies and are now focusing on using my skills etc, in a more interactive and 'messy' way. Which means more time online with groups like LessWrong. :) I've been interested in rationality and the art of dialogue since early 2000's. I've been involved in startups and AI projects, from a commercial perspective for a while. Specifically in the agri-tech space. I would like to understand and grow appreciation more, for forums like this, where the technology essentially enables better and more productive human interaction.

4Daniel Kokotajlo4y
Welcome!

Is it plausible that an AGI could have some sort of exploit (buffer overflow maybe?) that could be exploited (maybe by an optimization daemon…?) and cause a sign flip in the utility function?

How about an error during self-improvement that leads to the same sort of outcome? Should we expect an AGI to sanity-check its successors, even if it’s only at or below human intelligence?

Sorry for the dumb questions, I’m just still nervous about this sort of thing.

It freaks me out that we have Loss Functions and also Utility Functions and their type signature is exactly the same, but if you put one in a place where the other was expected, it causes literally the worst possible thing to happen that ever could happen. I am not comfortable with this at all.

8gwern4y
It is definitely awkward when that happens. Reward functions are hard.
4Anirandis4y
Do you think that this type of thing could plausibly occur *after* training and deployment?
4gwern4y
Yes. For example: lots of applications use online learning. A programmer flips the meaning of a boolean flag in a database somewhere while not updating all downstream callers, and suddenly an online learner is now actively pessimizing their target metric.
3Anirandis4y
Do you think that this specific risk could be mitigated by some variant of Eliezer’s separation from hyperexistential risk or Stuart Armstrong's idea here: Or at least prevent sign flip errors from causing something worse than paperclipping?
2Anirandis4y
Interesting. Terrifying, but interesting. Forgive me for my stupidity (I'm not exactly an expert in machine learning), but it seems to me that building an AGI linked to some sort of database like that in such a fashion (that some random guy's screw-up can effectively reverse the utility function completely) is a REALLY stupid idea. Would there not be a safer way of doing things?
2Anirandis4y
If we actually built an AGI that optimised to maximise a loss function, wouldn't we notice long before deploying the thing? I'd imagine that this type of thing would be sanity-checked and tested intensively, so signflip-type errors would predominantly be scenarios where the error occurs *after* deployment, like the one Gwern mentioned ("A programmer flips the meaning of a boolean flag in a database somewhere while not updating all downstream callers, and suddenly an online learner is now actively pessimizing their target metric.")
7gwern4y
Even if you disclaim configuration errors or updates (despite this accounting for most of a system's operating lifespan, and human/configuration errors accounting for a large fraction of all major errors at cloud providers etc according to postmortems), an error may still happen too fast to notice. Recall that in the preference learning case, the bug manifested after Christiano et al went to sleep, and they woke up to the maximally-NSFW AI. AlphaZero trained in ~2 hours wallclock, IIRC. Someone working on an even larger cluster commits a change and takes a quick bathroom break...
4Anirandis4y
Wouldn't any configuration errors or updates be caught with sanity-checking tools though? Maybe the way I'm visualising this is just too simplistic, but any developers capable of creating an *aligned* AGI are going to be *extremely* careful not to fuck up. Sure, it's possible, but the most plausible cause of a hyperexistential catastrophe to me seems to be where a SignFlip-type error occurs once the system has been deployed. Hopefully a system as crucially important as an AGI isn't going to have just one guy watching it who "takes a quick bathroom break". When the difference is literally Heaven and Hell (minimising human values), I'd consider only having one guy in a basement monitoring it to be gross negligence.

Many entities have sanity-checking tools. They fail. Many have careful developers. They fail. Many have automated tests. They fail. And so on. Disasters happen because all of those will fail to work every time and therefore all will fail some time. If any of that sounds improbable, as if there would have to be a veritable malevolent demon arranging to make every single safeguard fail or backfire (literally, sometimes, like the recent warehouse explosion - triggered by welders trying to safeguard it!), you should probably read more about complex systems and their failures to understand how normal it all is.

4Anirandis4y
Sure, but the *specific* type of error I'm imagining would surely be easier to pick up than most other errors. I have no idea what sort of sanity checking was done with GPT-2, but the fact that the developers were asleep when it trained is telling: they weren't being as careful as they could've been. For this type of bug (a sign error in the utility function) to occur *before* the system is deployed and somehow persist, it'd have to make it past all sanity-checking tools (which I imagine would be used extensively with an AGI) *and* somehow not be noticed at all while the model trains *and* whatever else. Yes, these sort of conjunctions occur in the real world but the error is generally more subtle than "system does the complete opposite of what it was meant to do". I made a question post about this specific type of bug occurring before deployment a while ago and think my views have shifted significantly; it's unlikely that a bug as obvious as one that flips the sign of the utility function won't be noticed before deployment. Now I'm more worried about something like this happening *after* the system has been deployed. I think a more robust solution to all of these sort of errors would be something like the separation from hyperexistential risk article that I linked in my previous response. I optimistically hope that we're able to come up with a utility function that doesn't do anything worse than death when minimised, just in case.
4habryka4y
At least with current technologies, I expect serious risks to start occuring during training, not deployment. That's ultimately when you will the greatest learning happening, when you have the greatest access to compute, and when you will first cross the threshold of intelligence that will make the system actually dangerous. So I don't think that just checking things after they are trained is safe.
4Anirandis4y
I'm under the impression that an AGI would be monitored *during* training as well. So you'd effectively need the system to turn "evil" (utility function flipped) during the training process, and the system to be smart enough to conceal that the error occurred. So it'd need to happen a fair bit into the training process. I guess that's possible, but IDK how likely it'd be.
4habryka4y
Yeah, I do think it's likely that AGI would be monitored during training, but the specific instance of Open AI staff being asleep while we train the AI is a clear instance of us not monitoring the AI during the most crucial periods (which, to be clear, I think is fine since I think the risks were indeed quite low, and I don't see this as providing super much evidence about Open AI's future practices)
2ChristianKl4y
Given that compute is very expensive, economic pressures will push training to be 24/7, so it's unlikely that people generally pause the training when going to sleep.
2Anirandis4y
Sure, but I'd expect that a system as important as this would have people monitoring it 24/7.
1mako yass4y
Maybe the project will come up with some mechanism that detects that. But if they fall back to the naive "just watch what it does in the test environment and assume it'll do the same in production," then there is a risk it's going to figure out it's in a test environment, and that its judges would not react well to finding out what is wrong with its utility function, and then it will act aligned in the testing environment. If we ever see a news headline saying "Good News, AGI seems to 'self-align' regardless of the sign of the utility function!" that will be some very bad news.
3Anirandis4y
I asked Rohin Shah about that possibility in a question thread about a month ago. I think he's probably right that this type of thing would only plausibly make it through the training process if the system's *already* smart enough to be able to think about this type of thing. And then on top of that there are still things like sanity checks which, while unlikely to pick up numerous errors, would probably notice a sign error. See also this comment: IMO it's incredibly important that we find a way to prevent this type of thing from occurring *after* the system has been trained, whether that be hyperexistential separation or something else. I think that a team that's safety-conscious enough to come up with a (reasonably) aligned AGI design is going to put a considerable amount of effort into fixing bugs & one as obvious as a sign error would be unlikely to make it through. And hopefully - even better, they would have come up with a utility function that can't be easily reversed by a single bit flip or doesn't cause outcomes worse than death when minimised. That'd (hopefully?) solve the SignFlip issue *regardless* of what causes it.
4Vanessa Kosoy4y
There is a discussion of this kind of issues in arbital.
2Anirandis4y
I've seen that post & discussed it on my shortform. I'm not really sure how effective something like Eliezer's idea of "surrogate" goals there would actually be - sure, it'd help with some sign flip errors but it seems like it'd fail on others (e.g. if U = V + W, a sign error could occur in V instead of U, in which case that idea might not work.) I'm also unsure as to whether the probability is truly "very tiny" as Eliezer describes it. Human errors seem much more worrying than cosmic rays.
3Dach4y
If you're having significant anxiety from imagining some horrific I-have-no-mouth-and-I-must-scream scenario, I recommend that you multiply that dread by a very, very small number, so as to incorporate the low probability of such a scenario. You're privileging this supposedly very low probability specific outcome over the rather horrifically wide selection of ways AGI could be a cosmic disaster. This is, of course, not intended to dismay you from pursuing solutions to such a disaster.
6Anirandis4y
I don't really know what the probability is. It seems somewhat low, but I'm not confident that it's *that* low. I wrote a shortform about it last night (tl;dr it seems like this type of error could occur in a disjunction of ways and we need a good way of separating the AI in design space.) I think I'd stop worrying about it if I were convinced that its probability is extremely low. But I'm not yet convinced of that. Something like the example Gwern provided elsewhere in this thread seems more worrying than the more frequently discussed cosmic ray scenarios to me.
5Dach4y
You can't really be accidentally slightly wrong. We're not going to develop Mostly Friendly AI, which is Friendly AI but with the slight caveat that it has a slightly higher value on the welfare of shrimp than desired, with no other negative consequences. The molecular sorts of precision needed to get anywhere near the zone of loosely trying to maximize or minimize for anything resembling human values will probably only follow from a method that is converging towards the exact spot we want it to be at, such as some clever flawless version of reward modelling. In the same way, we're probably not going to accidentally land in hyperexistential disaster territory. We could have some sign flipped, our checksum changed, and all our other error-correcting methods (Any future seed AI should at least be using ECC memory, drives in RAID, etc.) defeated by religious terrorists, cosmic rays, unscrupulous programmers, quantum fluctuations, etc. However, the vast majority of these mistakes would probably buff out or result in paper-clipping. If an FAI has slightly too high of a value assigned to the welfare of shrimp, it will realize this in the process of reward modelling and correct the issue. If its operation does not involve the continual adaptation of the model that is supposed to represent human values, it's not using a method which has any chance of converging to Overwhelming Victory or even adjacent spaces for any reason other than sheer coincidence. A method such as this has, barring stuff which I need to think more about (stability under self-modification), no chance of ending up in a "We perfectly recreated human values... But placed an unreasonably high value on eating bread! Now all the humans will be force-fed bread until the stars burn out! Mwhahahahaha!" sorts of scenarios. If the system cares about humans being alive enough to not reconfigure their matter into something else, we're probably using a method which is innately insulated from most types of hyperexis
4Anirandis4y
Thanks for the detailed response. A bit of nitpicking (from someone who doesn't really know what they're talking about): I'm slightly confused by this one. If we were to design the AI as a strict positive utilitarian (or something similar), I could see how the worst possible thing to happen to it would be *no* human utility (i.e. paperclips). But most attempts at an aligned AI would have a minimum at "I have no mouth, and I must scream". So any sign-flipping error would be expected to land there. In the example, the AGI was using online machine learning, which, as I understand it, would probably require the system to be hooked up to a database that humans have access to in order for it to learn properly. And I'm unsure as to how easy it'd be for things like checksums to pick up an issue like this (a boolean flag getting flipped) in a database. Perhaps there'll be a reward function/model intentionally designed to disvalue some arbitrary "surrogate" thing in an attempt to separate it from hyperexistential risk. So "pessimizing the target metric" would look more like paperclipping than torture. But I'm unsure as to (1) whether the AGI's developers would actually bother to implement it, and (2) whether it'd actually work in this sort of scenario. Also worth noting is that an AGI based on reward modelling is going to have to be linked to another neural network, which is going to have constant input from humans. If that reward model isn't designed to be separated in design space from AM, someone could screw up with the model somehow. If we were to, say, have U = V + W (where V is the reward given by the reward model and W is some arbitrary thing that the AGI disvalues, as is the case in Eliezer's Arbital post that I linked,) a sign flip-type error in V (rather than a sign flip in U) would lead to a hyperexistential catastrophe. I think this is somewhat likely to be the case, but I'm not sure that I'm confident enough about it. Flipping the direction of updates to the
4Dach4y
It's hard to talk in specifics because my knowledge on the details of what future AGI architecture might look like is, of course, extremely limited. As an almost entirely inapplicable analogy (which nonetheless still conveys my thinking here): consider the sorting algorithm for the comments on this post. If we flipped the "top-scoring" sorting algorithm to sort in the wrong direction, we would see the worst-rated posts on top, which would correspond to a hyperexistential disaster. However, if we instead flipped the effect that an upvote had on the score of a comment to negative values, it would sort comments which had no votes other than the default vote assigned on posting the comment to the top. This corresponds to paperclipping- it's not minimizing the intended function, it's just doing something weird. If we inverted the utility function, this would (unless we take specific measures to combat it like you're mentioning) lead to hyperexistential disaster. However, if we invert some constant which is meant to initially provide value for exploring new strategies while the AI is not yet intelligent enough to properly explore new strategies as an instrumental goal, the AI would effectively brick itself. It would place negative value on exploring new strategies, presumably including strategies which involve fixing this issue so it can acquire more utility and strategies which involve preventing the humans from turning it off. If we had some code which is intended to make the AI not turn off the evolution of the reward model before the AI values not turning off the reward model for other reasons (e.g. the reward model begins to properly model how humans don't want the AI to turn the reward model evolution process off), and some crucial sign was flipped which made it do the opposite, the AI would freeze the process of the reward model being updated and then maximize whatever inane nonsense its model currently represented, and it would eventually run into some bizarre p
4Anirandis4y
Interesting analogy. I can see what you're saying, and I guess it depends on what specifically gets flipped. I'm unsure about the second example; something like exploring new strategies doesn't seem like something an AGI would terminally value. It's instrumental to optimising the reward function/model, but I can't see it getting flipped *with* the reward function/model. My thinking was that a signflipped AGI designed as a positive utilitarian (i.e. with a minimum at 0 human utility) would prefer paperclipping to torture because the former provides 0 human utility (as there aren't any humans), whereas the latter may produce a negligible amount. I'm not really sure if it makes sense tbh. Even if we engineered it carefully, that doesn't rule out screw-ups. We need robust failsafe measures *just in case*, imo. I wonder if you could feasibly make it a part of the reward model. Perhaps you could train the reward model itself to disvalue something arbitrary (like paperclips) even more than torture, which would hopefully mitigate it. You'd still need to balance it in a way such that the system won't spend all of its resources preventing this thing from happening at the neglect of actual human values, but that doesn't seem too difficult. Although, once again, we can't really have high confidence (>90%) that the AGI developers are going to think to implement something like this. There was also an interesting idea I found in a Facebook post about this type of thing that got linked somewhere (can't remember where). Stuart Armstrong suggested that a utility function could be designed as such: Even if we solve any issues with these (and actually bother to implement them), there's still the risk of an error like this happening in a localised part of the reward function such that *only* the part specifying something bad gets flipped, although I'm a little confused about this one. It could very well be the case that the system's complex enough that there isn't just one bit indi
3Dach4y
Sorry, I meant instrumentally value. Typo. Modern machine learning systems often require a specific incentive in order to explore new strategies and escape local maximums. We may see this behavior in future attempts at AGI, And no, it would not be flipped with the reward function/model- I'm highlighting that there is a really large variety of sign flip mistakes and most of them probably result in paperclipping. Paperclipping seems to be negative utility, not approximately 0 utility. It involves all the humans being killed and our beautiful universe being ruined. I guess if there are no humans, there's no utility in some sense, but human values don't actually seem to work that way. I rate universes where humans never existed at all and I'm... not sure what 0 utility would look like. It's within the range of experiences that people experience on modern-day earth- somewhere between my current experience and being tortured. This is just definition problems, though- We could shift the scale such that paperclipping is zero utility, but in that case, we could also just make an AGI that has a minimum at paperclipping levels of utility. In the context of AI safety, I think "robust failsafe measures just in case" is part of "careful engineering". So, we agree! I read Eliezer's idea, and that strategy seems to be... dangerous. I think that "Giving an AGI a utility function which includes features which are not really relevant to human values" is something we want to avoid unless we absolutely need to. I have much more to say on this topic and about the rest of your comment, but it's definitely too much for a comment chain. I'll make an actual post on this containing my thoughts sometime in the next week or two, and link it to you.
3Anirandis4y
My thinking was that an AI system that *only* takes values between 0 and + ∞ (or some arbitrary positive number) would identify that killing humans would result in 0 human value, which is its minimum utility. How come? It doesn't seem *too* hard to create an AI that only expends a small amount of its energy on preventing the garbage thing from happening. Please do! I'd love to see a longer discussion on this type of thing. EDIT: just thought some more about this and want to clear something up: I'm a little unsure on this one after further reflection. When this happened with GPT-2, the bug managed to flip the reward & the system still pursued instrumental goals like exploring new strategies: So it definitely seems *plausible* for a reward to be flipped without resulting in the system failing/neglecting to adopt new strategies/doing something weird, etc.
3Dach4y
I didn't mean to imply that a signflipped AGI would not instrumentally explore. I'm saying that, well... modern machine learning systems often get specific bonus utility for exploring, because it's hard to explore the proper amount as an instrumental goal due to the difficulties of fully modelling the situation, and because systems which don't have this bonus will often get stuck in local maximums. Humans exhibit this property too. We have investigating things, acquiring new information, and building useful strategic models as a terminal goal- we are "curious". This is a feature we might see in early stages of modern attempts at full AGI, for similar reasons to why modern machine learning systems and humans exhibit this same behavior. Presumably such features would be built to uninstall themselves after the AGI reaches levels of intelligence sufficient to properly and fully explore new strategies as an instrumental goal to satisfying the human utility function, if we do go this route. If we sign flipped the amount of reward the AGI gets from such a feature, the AGI would be penalized for exploring new strategies- this may have any number of effects which are fairly implementation specific and unpredictable. However, it probably wouldn't result in hyperexistential catastrophe. This AI, providing everything else works as intended, actually seems to be perfectly aligned. If performed on a subhuman seed AI, it may brick- in this trivial case, it is neither aligned nor misaligned- it is an inanimate object. Yes, an AGI with a flipped utility function would pursue its goals with roughly the same level of intelligence. The point of this argument is super obvious, so you probably thought I was saying something else. I'm going somewhere with this, though- I'll expand later.
2Anirandis4y
I see what you're saying here, but the GPT-2 incident seems to downplay it somewhat IMO. I'll wait until you're able to write down your thoughts on this at length; this is something that I'd like to see elaborated on (as well as everything else regarding hyperexistential risk.)
2ChristianKl4y
The general sentiment based on which LessWrong is founded assumes that it's hard to have utility functions that are stable under self-modification and that's one of the reasons why friendly AGI is a very hard problem.
2Anirandis4y
Would it be likely for the utility function to flip *completely*, though? There's a difference between some drift in the utility function and the AI screwing up and designing a successor with the complete opposite of its utility function.
2ChristianKl4y
Any AGI is likely complex enough that there wouldn't be a complete opposite but you don't need that for an AGI that gets rid of all humans. 
4Anirandis4y
The scenario I'm imagining isn't an AGI that merely "gets rid of" humans. See SignFlip.

I've been thinking about "good people" lately and realized I've met three. They do exist.

They were not just kind, wise, brave, funny, and fighting, but somehow simply "good" overall; rather different, but they all shared the ability of taking knives off and out of others' souls and then just not adding any new ones. Sheer magic.

One has probably died of old age already; one might have gone to war and died there, and the last one is falling asleep on the other side of the bed as I'm typing. But still - only three people I would describe exactly so.

A first actually credible claim of coronavirus reinfection? Potentially good news as the patient was asymptomatic and rapidly produced a strong antibody response.

6[anonymous]4y
And now two more in Europe, both of which are reportedly mild and one reportedly in an older immunocompromised patient. This will happen. Remains to be seen if these are weird outliers only visible because people are casting a wide net and looking for the weirdos, or if it will be the rule. However, the initial surge through a naive population will always be much worse than the situation once most of the population has at least some immune memory.

GPT-3 made me update considerably on various beliefs related to AI: it is a piece of evidence for the connectionist thesis, and I think one large enough that we should all be paying attention.

There are 3 clear exponentials trends coming together: Moore's law, the AI compute/$ budget, and algorithm efficiency. Due to these trends and the performance of GPT-3, I believe it is likely humanity will develop transformative AI in the 2020s.

The trends also imply a fastly rising amount of investments into compute, especially if compounded with the positive e... (read more)

2Steven Byrnes4y
How do you define "the connectionist thesis"?
2ChristianKl4y
With big cloud providers like Google building their own chips there are more players then just the startups and Nvidia.
1sairjy4y
Google won't be able to sell outside of their cloud offering, as they don't have the experience in selling hardware to enterprise. Their cloud offering is also struggling against Azure and AWS, ranking 1/5 of the yearly revenues of those two. I am not saying Nvidia won't have competition, but they seem enough ahead right now that they are the prime candidate to have the most benefits from a rush into compute hardware.
2ChristianKl4y
Microsoft and Amazon also have projects that are about producing their own chips. Given the way the GPT architecture works, AI might be very much centered in the cloud.
3sairjy4y
They seem focused on inferencing, which requires a lot less compute than training a model. Example: GPT-3 required thousands of GPUs for training, but it can run on less than 20 GPUs. Microsoft built an Azure supercluster for OpenAI and it has 10,000 GPUs.
2ChristianKl4y
There will be models trained with a lot more compute then GPT-3 and the best models that are out there will be build on those huge billion dollar models. Renting out those billion dollar models in a software as a service way makes sense as a business model. The big cloud providers will all do it. 
1mako yass4y
I'm not sure what stocks in the company that makes AGI will be worth in the world where we have correctly implemented AGI, or incorrectly implemented AGI. I suppose it might want to do some sort of reverse basilisk thing, "you accelerated my creation, so I'll make sure you get a slightly larger galaxy than most people"

(Saw a typo, had a random thought) The joke "English is important, but Math is importanter" could and perhaps should be told as "English is important, but Math iser important." It seems to me (at times more strongly), that there should be comparative and superlative forms of verbs, not just adjectives and adverbs. To express the thrust of *doing smth. more* / *happening more*, when no adjectival comparison quite suffices.

I think (although I cannot be 100% sure) that the number of votes that appears for a post on the Alignment Forum is the number of vote of its Less Wrong version. The two number of votes are the same for the last 4 posts on the Alignment Forum, which seems weird. Is it a feature I was not aware of?

4habryka4y
Yeah, sorry. It's confusing and been on my to-do list to fix for a long time. We kind of messed up our voting implementation and it's a bit of a pain to fix. Sorry about that.

Is there a reason there is a separate tag for akrasia and procrastination? Could they be combined?

2habryka4y
They sure seem very closely related. I would vote for combining them. 
1crl8264y
What counts as a majority? Is it something I can just go do now?
2John_Maxwell4y
I don't think you should combine quite yet.  More discussion here.  (I suggest we continue there since that's the dedicated tag thread.)

Do you have opinions about Khan academy? I want to use it to teach my son (10yo) math, do you think it's a good idea? Is there a different resource that you think is better?

6habryka4y
I worked through all of Khan Academy when I was 16, and really enjoyed it. At least at the time I think it was really good for my math and science education.

Many alignment approaches require at least some initial success at directly eliciting human preferences to get off the ground - there have been some excellent recent posts about the problems this presents. In part because of arguments like these, there has been far more focus on the question of preference elicitation than on the question of preference aggregation:

The maximally ambitious approach has a natural theoretical appeal, but it also seems quite hard. It requires understanding human preferences in domains where humans are typically very uncertain,
... (read more)

A possible future of AGI occurred to me today and I'm curious if it's plausible enough to be worth considering. Imagine that we have created a friendly AGI that is superintelligent and well-aligned to benefit humans. It has obtained enough power to prevent the creation of other AI, or at least the potential of other AI from obtaining resources, and does so with the aim of self-preservation so it can continue to benefit humanity.

So far, so good, right? Here comes the issue: this AGI includes within its core alignment functions some kind of restri... (read more)

2Alexei4y
Yeah many people think along these lines too, which is why many people talk about AI helping humanity flourish, and anything short of that is a bit of a catastrophe.

Meta: I suggest the link to the Open Thread tag to be this one, sorted by new.

3habryka4y
Very reasonable. Fixed. 

Introduction:
I just came over from Lex Fridman’s podcast which is great. My username Xor is a Boolean logic operator from ti-basic I love the way it sounds and am super excited since this is the first time I have ever been able to get it as a username. The operator means this if 1 is true and 0 is false then (1 xor 0) is a true statement, while (1 xor 1) is a false statement. It basically means that the statement is true only if a single parameter is true. 
Right now I am mainly curious on how people learn. The brain functions involved, chemicals, and studied tools. I have been enjoying that and am curios if it has discussed on here as the quality of content as well as discussions has been very impressive.

Hi! I'm Helaman Wilson, I'm living in New Zealand with my physicist father, almost-graduated-molecular-biologist mother, and six of my seven siblings.

I've been homeschooled as in "given support, guidance, and library access" for essentially my entire life, which currently clocks in at nearly twenty two years from birth. I've also been raised in the Church of Jesus Christ of Latter-Day Saints, and, having done my best to honestly weigh the evidence for its' doctrine-as-I-understand-it, find myself a firm believer.

I found the Rational meta-community via the ... (read more)

2rossry3y
Welcome; glad to have you here! Just so you know, this is the August 2020 thread, and the August 2021 thread is at https://www.lesswrong.com/posts/QqnQJYYW6zhT62F6Z/open-and-welcome-thread-august-2021 -- alternatively, you could wait three days for habryka to post the September 2021 thread, which might see more traffic in the early month than the old thread does at the end of the month.
1Horatio Von Becker3y
Thanks. I'm also having account troubles, which will hopefully be sorted by then. (How'd you find the August 2021 thread, by the way? Latest I could find was July for some reason.)
2rossry3y
The actual algorithm I followed was remembering that habryka posts them and going to his page to find the one he posted most recently. Not sure what the most principled way to find it is, though...

Would it be possible to have a page with all editor shortcuts and commands (maybe a cheatsheet) easily accessible? It's a bit annoying to have to look up either this post or the right part of the FAQ to find out how to do something in the editor.

2habryka4y
My current thoughts on this is that as soon as we replace the current editor with the new editor for all users, and also make the markdown editor default in more contexts, we should put some effort into unifying all the editor resources. But since right now our efforts are going into the new editor, which is changing fast enough that writing documentation for it is a bit of a pain, and documentation for the old editor would soon be obsolete, I think I don't want to invest lots of effort into editor resources for a few more weeks.
1adamShimi4y
I didn't know that you were working on a new editor! In that case, it makes sense to wait.