Wiki Contributions

Comments

Is there a post in the Sequences about when it is justifiable to not pursue going down a rabbit hole? It's a fairly general question, but the specific context is a tale as old as time. My brother, who has been an atheist for decades, moved to Utah. After 10 years, he now asserts that he was wrong and his "rigorous pursuit" of verifying with logic and his own eyes, leads him to believe the Bible is literally true. I worry about his mental health so I don't want to debate him, but felt like I should give some kind of justification for why I'm not personally embarking on a bible study. There's a potential subtext of, by not following his path, I am either not that rational, or lack integrity. The subtext may not really be there, but I figure if I can provide a well thought out response or summarize something from EY, it might make things feel more friendly, e.g. "I personally don't have enough evidence to justify spending the time on this, but I will keep an open mind if any new evidence comes up."

I would pay to see this live at a bar or one of those county fair (we had a GLaDOS cover band once so it's not out of the question)

If we don't get a song like that, take comfort that GLaDoS's songs from the Portal soundtrack are basically the same idea as the Sydney reference. Link: https://www.youtube.com/watch?v=dVVZaZ8yO6o

Let me know if I've missed something, but it seems to me the hard part is still defining harm. In the one case, where we will use the model and calculate the probability of harm, if it has goals, it may be incentivized to minimize that probability. In the case where we have separate auxiliary models whose goals are to actively look for harm, then we have a deceptively adversarial relationship between these. The optimizer can try to fool the harm finding LLMs. In fact, in the latter case, I'm imagining models which do a very good job at always finding some problem with a new approach, to the point where they become alarms which are largely ignored.

Using his interpretability guidelines, and also human sanity checking all models within the system, I see we can probably minimize failure modes that we already know about, but again, once it gets sufficiently powerful, it may find something no human has thought of yet.

That's fair, I read the post but did not re-read it, and asking for "more" examples out of such a huge list seems a bit asking too much. Still though, I find the process of finding these examples somewhat fun, and for whatever reason, had not found many of them too shocking, so felt the instinct to keep searching.

Dissociative identity disorder would be an interesting case, I have heard there was much debate on whether it was real. As you know someone, I assume it's not exactly like you see in movies, and probably falls on a spectrum as discussed in this post?

One fear I have is that the open source community will come out ahead, and push for greater weight sharing of very powerful models.

Edit: To make more specific, I mean that the open source community will become more attractive, because they will say, you cannot rely on individual companies whose models may or may not be available. You must build on top of open source. Related tweet:

https://twitter.com/ylecun/status/1726578588449669218

Whether their plan works or not, dunno.

One thing that would help me, not sure if others agree -- would be some more concrete predictions. I think the historical examples of autism and being gay make sense, but are quite normalized now, that one can almost say, "That was previous generations. We are open minded and rational now". What are some new applications of this logic, that would surprise us? Are these omitted due to some info hazard? Surely we can find some that are not. I am honestly having a hard time coming up with them myself, but here goes:

  • There are more regular people who believe AI is an x-risk than let on -- optimistically, for us!
  • There are more people in households with 7 figure incomes than you would expect. The data I always read in news articles seems to contradict this, but there are just way too many people in 2M+ homes driving Teslas in the bay area. Or maybe they happen to be very frugal in every other aspect of their life... Alternatively, there is more generational wealth than people let on, as there are many people who supposedly make under 6 figures, yet seem to survive in HCOL areas and participate in conspicuous consumption.

I also have a hard time with the "perfect crime" scenario described above. Even after several minutes of thinking, I can't quite convince myself it's happening all that much, but maybe I am limiting myself to certain types of crimes. Can someone also spell that one out? I get it at a high level, "we only see the dumb ones that got caught", but can't seem to make the leap from that, to "you probably know a burglar, murderer, or embezzler".

I share your disagreement with the original author as to the cause of the relief. For me, I find the modern day and age very confusing and difficult to measure one's value to society. Any great idea you can think of, probably someone else has thought of it, and you have little chance to be important. In a zombie apocalypse, instead of thinking how to out-compete your fellow man with some amazing invention, you fall back to survival. Important things in this world, like foraging for food, fending off zombies, etc, have quicker reward, and it's easier in some sense to do what's right. Even if you're not the best at it, surely you can be a great worker, and there's little uncertainty that you're not doing more harm than good... just don't be stupid and call the horde. Sure, sometimes people do horrible things for survival, but if you want to be the hero, the choice is much clearer.

If we know they aren't conscious, then it is a non-issue. A random sample from conscious beings would land on the SAI with probability 0. I'm concerned we create something accidently conscious. 

I am skeptical it is easy to avoid. If it can simulate a conscious being, why isn't that simulation conscious? If consciousness is a property of the physical universe, then an isomorphic process would have the same properties. And if it can't simulate a conscious being, then it is not a superintelligence.

It can, however, possibly have a non-conscious outer-program... and avoid simulating people. That seems like a reasonable proposal.

Agree. Obviously alignment is important, but it has always creeped me out in the back of my mind, some of the strategies that involve always deferring to human preferences. It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant. What if a random consciousness sampled from our universe in the future, comes from it with probability almost 1? We probably have to keep that in mind too. Sigh, yet another constraint we have to add!

Load More