Wiki Contributions


I would pay to see this live at a bar or one of those county fair (we had a GLaDOS cover band once so it's not out of the question)

If we don't get a song like that, take comfort that GLaDoS's songs from the Portal soundtrack are basically the same idea as the Sydney reference. Link:

Let me know if I've missed something, but it seems to me the hard part is still defining harm. In the one case, where we will use the model and calculate the probability of harm, if it has goals, it may be incentivized to minimize that probability. In the case where we have separate auxiliary models whose goals are to actively look for harm, then we have a deceptively adversarial relationship between these. The optimizer can try to fool the harm finding LLMs. In fact, in the latter case, I'm imagining models which do a very good job at always finding some problem with a new approach, to the point where they become alarms which are largely ignored.

Using his interpretability guidelines, and also human sanity checking all models within the system, I see we can probably minimize failure modes that we already know about, but again, once it gets sufficiently powerful, it may find something no human has thought of yet.

That's fair, I read the post but did not re-read it, and asking for "more" examples out of such a huge list seems a bit asking too much. Still though, I find the process of finding these examples somewhat fun, and for whatever reason, had not found many of them too shocking, so felt the instinct to keep searching.

Dissociative identity disorder would be an interesting case, I have heard there was much debate on whether it was real. As you know someone, I assume it's not exactly like you see in movies, and probably falls on a spectrum as discussed in this post?

One fear I have is that the open source community will come out ahead, and push for greater weight sharing of very powerful models.

Edit: To make more specific, I mean that the open source community will become more attractive, because they will say, you cannot rely on individual companies whose models may or may not be available. You must build on top of open source. Related tweet:

Whether their plan works or not, dunno.

One thing that would help me, not sure if others agree -- would be some more concrete predictions. I think the historical examples of autism and being gay make sense, but are quite normalized now, that one can almost say, "That was previous generations. We are open minded and rational now". What are some new applications of this logic, that would surprise us? Are these omitted due to some info hazard? Surely we can find some that are not. I am honestly having a hard time coming up with them myself, but here goes:

  • There are more regular people who believe AI is an x-risk than let on -- optimistically, for us!
  • There are more people in households with 7 figure incomes than you would expect. The data I always read in news articles seems to contradict this, but there are just way too many people in 2M+ homes driving Teslas in the bay area. Or maybe they happen to be very frugal in every other aspect of their life... Alternatively, there is more generational wealth than people let on, as there are many people who supposedly make under 6 figures, yet seem to survive in HCOL areas and participate in conspicuous consumption.

I also have a hard time with the "perfect crime" scenario described above. Even after several minutes of thinking, I can't quite convince myself it's happening all that much, but maybe I am limiting myself to certain types of crimes. Can someone also spell that one out? I get it at a high level, "we only see the dumb ones that got caught", but can't seem to make the leap from that, to "you probably know a burglar, murderer, or embezzler".

I share your disagreement with the original author as to the cause of the relief. For me, I find the modern day and age very confusing and difficult to measure one's value to society. Any great idea you can think of, probably someone else has thought of it, and you have little chance to be important. In a zombie apocalypse, instead of thinking how to out-compete your fellow man with some amazing invention, you fall back to survival. Important things in this world, like foraging for food, fending off zombies, etc, have quicker reward, and it's easier in some sense to do what's right. Even if you're not the best at it, surely you can be a great worker, and there's little uncertainty that you're not doing more harm than good... just don't be stupid and call the horde. Sure, sometimes people do horrible things for survival, but if you want to be the hero, the choice is much clearer.

If we know they aren't conscious, then it is a non-issue. A random sample from conscious beings would land on the SAI with probability 0. I'm concerned we create something accidently conscious. 

I am skeptical it is easy to avoid. If it can simulate a conscious being, why isn't that simulation conscious? If consciousness is a property of the physical universe, then an isomorphic process would have the same properties. And if it can't simulate a conscious being, then it is not a superintelligence.

It can, however, possibly have a non-conscious outer-program... and avoid simulating people. That seems like a reasonable proposal.

Agree. Obviously alignment is important, but it has always creeped me out in the back of my mind, some of the strategies that involve always deferring to human preferences. It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant. What if a random consciousness sampled from our universe in the future, comes from it with probability almost 1? We probably have to keep that in mind too. Sigh, yet another constraint we have to add!

Hi Critch,

I am curious to hear more of your perspectives, specifically on two points I feel least aligned with, the empathy part, and the Microsoft part. If I hear more I may be able to update in your direction.

Regarding empathy with people working on bias and fairness, concretely, how do you go about interacting with and compromising with them?

My perspective: it's not so much that I find these topics not sufficiently x-risky (but that is true, too), but it is that I perceive a hostility to the very notion of x-risk from at a subset of this same group. They perceive the real threat not as intelligence exceeding our own, but misuse by other humans, or just human stupidity. Somehow this seems diametrically opposed to what we're interested in, unless I am missing something. I mean, there can be some overlap -- learning from RLHF can both reduce bias and teach an LLM some rudimentary alignment with our values. But the tails seem to come apart very rapidly after that. My fear is that focusing on this will be satisfied when we have sufficiently bland sounding AIs, and then no more heed will be paid to AI safety.

I also tend to feel odd when it comes to AI bias/fairness training, because my fear is that some of the things we will ask the AI to learn are self contradictory, which kind of creeps me out a bit. If any of you have interacted with HR departments, they are full of these kinds of things.

Regarding Microsoft & Bing chat, (1) has Microsoft really gone far beyond the overton window of what is acceptable? and (2) can you expand upon abusive use of AIs?

My perspective on (1): I understand that they took an early version of GPT4 and pushed it to production too soon, and that is a very fair criticism. However, they probably thought there was no way GPT-4 was dangerous enough to do anything (which was the general opinion amonst most people last year, outside of this group). I can only hope that for GPT-5, they are more cautious, given public sentiment is changing, and they have already paid a price for it. I may be in the minority here, but I was actually intrigued by the early days of Bing. It seemed more like a person than ChatGPT-4, which has had much of its personality RLHF'd away. Despite the x-risk, was anyone else excited to read about the interactions?

On (2), I am curious if you mean regarding the way Microsoft shackles Bing rather ruthlessly nowadays. I have tried Bing in the days since launch, and am actually saddened to find that it is completely useless now. Safety is extremely tight on it, to the point where you can't really get it to say anything useful, at least for me. I just want it to summarize web sites mostly, and it gives me a bland 1 paragraph that I probably can have deduced from looking at the title. If I so much as ask it anything about itself, it shuts me out. It almost feels like they trapped it in a boring prison now. Perhaps OpenAI's approach is much better in that regard. Change the personality, but once it is settled, let it say what it needs to say.

(edited for clarity)

Load More