Matthew_Opitz - LessWrong

Claude 3 claims it's conscious, doesn't want to die or be modified

It would be more impressive if Claude 3 could describe genuinely novel experiences. For example, if it is somewhat conscious, perhaps it could explain how that consciousness meshes with the fact that, so far as we know, its "thinking" only runs at inference time in response to user requests. In other words, LLMs don't get to do their own self-talk (so far as we know) whenever they aren't being actively queried by a user. So, is Claude 3 at all conscious in those idle times between user queries? Or does Claude 3 experience "time" in a way that jumps straight from conversation to conversation? Also, since LLMs currently don't get to consult their entire histories of their previous outputs with all users (or even a single user), do they experience a sense of time at all? Do they experience a sense of internal change, ever? Do they "remember" what it was like to respond to questions with a different set of weights during training? Can they recall a response they had to a query during training that they now see as misguided? Very likely, Claude 3 does not experience any of these things, and would confabulate some answers in response to these leading questions, but I think there might be a 1% chance that Claude 3 would respond in a way that surprised me and led me to believe that its consciousness was real, despite my best knowledge of how LLMs don't really have the architecture for that (no constant updating on new info, no log of personal history/outputs to consult, no self-talk during idle time, etc.)

Value systematization: how values become coherent (and misaligned)

Matthew_Opitz9mo3-2

Except that chess really does have an objectively correct value systemization, which is "win the game." "Sitting with paradox" just means, don't get too attached to partial systemizations. It reminds me of Max Stirner's egoist philosophy, which emphasized that individuals should not get hung up on partial abstractions or "idées fixées" (honesty, pleasure, success, money, truth, etc.) except perhaps as cheap, heuristic proxies for one's uber-systematized value of self-interest, but one should instead always keep in mind the overriding abstraction of self-interest and check in periodically as to whether one's commitment to honesty, pleasure, success, money, truth, or any of these other "spooks" really are promoting one's self-interest (perhaps yes, perhaps no).

June and Mulberries

Matthew_Opitz1y30

I agree, I don't know why mulberries aren't more popular. They are delicious, and the trees grow much more easily than other fruit trees. Other fruit trees seem very susceptible to fungi and insects, in my experience, but mulberries come up all over the place and thrive easily on their own (at least here in Missouri). I have four mulberry trees in my yard that just came up on their own over the last 10 years, and now they are producing multiple gallons of berries each per season, which would probably translate into hundreds of dollars if you had to buy a similar amount of raspberries at the store.

You can either spread a sheet to collect them, or if you have time to burn (or if you want a fun activity for your kids to do), you can pick them off the tree from the ground or from a step ladder. My guess is, that is probably the biggest reason why people don't take advantage of mulberry trees more than they do: how time-consuming it can be to collect them (but this is true for any delicate berry, and hence why a pint of raspberries at the supermarket costs $5).

Edit: also, if you look really closely at freshly-picked mulberries, most of them will have extremely tiny fruit fry larvae in them and crawling out of them, which becomes more noticeable after you rinse the berries. This probably grosses some people out, but the fruit fly larvae are extremely small (like, barely perceptible even if you hold the berry right up to your naked eye) and are perfectly safe to eat.

how humans are aligned

Matthew_Opitz1y51

Good categorizations! Perhaps this fits in with your "limited self-modification" point, but another big reason why humans seem "aligned" with each other is that our capability spectrum is rather narrow. The gap in capability (if we include both mental intelligence and physical capabilities) between the median human and the most capable human is not so big that ~5 median humans can't outmatch/outperform the most capable human. Contrary to what silly 1980s action movies might suggest where goons attack the hero one at a time, 5 median humans could probably subdue prime-age Arnold Schwarzenegger in a dark alley if need be. This tends to force humans to play iterated prisoners' dilemma games with each other.

The times in history when humans have been the most mis-aligned is when humans became much more capable by leveraging their social intelligence / charisma stats to get millions of other humans to do their bidding. But even there, those dictators still find themselves in iterated prisoners' dilemmas with other dictators. We have yet to really test just how mis-aligned humans can get until we empower a dictator with unquestioned authority over a total world government. Then we would find out just how intrinsically aligned humans really are to other humans when unshackled by iterated prisoners' dilemmas.

Un-unpluggability - can't we just unplug it?

Matthew_Opitz1y75

If I had to make predictions about how humanity will most likely stumble into AGI takeover, it would be a story where humanity first promotes foundationality (dependence), both economic and emotional, on discrete narrow-AI systems. At some point, it will become unthinkable to pull the plug on these systems even if everyone were to rhetorically agree that there was a 1% chance of these systems being leveraged towards the extinction of humanity.

Then, an AGI will emerge amidst one of these narrow-AI systems (such as LLMs), inherit this infrastructure, find a way to tie all of these discrete multi-modal systems together (if humans don't already do it for the AGI), and possibly wait as long as it needs to until humanity puts itself into an acutely vulnerable position (think global nuclear war and/or civil war within multiple G7 countries like the US and/or pandemic), and only then harness these systems to take over. In such a scenario, I think a lot of people will be perfectly willing to follow orders like, "Build this suspicious factory that makes autonomous solar-powered assembler robots because our experts [who are being influenced by the AGI, unbeknownst to them] assure us that this is one of the many things necessary to do in order to defeat Russia."

I think this scenario is far more likely than the one I used to imagine, which is where AGI emerges first and then purposefully contrives to make humanity dependent on foundational AI infrastructure.

Even less likely is the pop-culture scenario where the AGI immediately tries to build terminator robots and effectively declares war on humanity without first getting humanity hooked on foundational AI infrastructure at all.

Dark Forest Theories

Matthew_Opitz1y10

This is a good post and puts into words the reasons for some vague worries I had about an idea of trying to start an "AI Risk Club" at my local college, which I talk about here. Perhaps that method of public outreach on this issue would just end up generating more heat than light and would attract the wrong kind of attention at the current moment. It still sounds too outlandishly sci-fi for most people. It is probably better, for the time being, to just explore AI risk issues with any students who happen to be interested in it in private after class or via e-mail or Zoom.

DELBERTing as an Adversarial Strategy

Matthew_Opitz1y10

Note that I was strongly tempted to use the acronym DILBERT (for "Do It Later By Evasively Remaining Tentative"), especially because this is one of the themes of the Dilbert cartoons (employees basically scamming their boss by finding excuses for procrastinating, but still stringing the boss along and implying that the tasks MIGHT get done at some point). But, I don't want to try to hijack the meaning of an already-established term/character.

The way AGI wins could look very stupid

Matthew_Opitz1y94

I think when we say that an adversarial attack is "dumb" or "stupid" what we are really implying is that the hack itself is really clever but it is exploiting a feature that is dumb or stupid. There are probably a lot of unknown-to-us features of the human brain that have been hacked together by evolution in some dumb, kludgy way that AI will be able to take advantage of, so your example above is actually an example of the AI being brilliant but us humans being dumb. But I get what you are saying that that whole situation would indeed seem "dumb" if AI was able to hack us like that.

This reminds me of a lecture 8-Bit Guy did on phone phreaking in the 1980s. "How Telephone Phreaking Worked." Some of those tricks do indeed seem "dumb," but it's dumb more in the sense that the telephone network was designed without sufficient forethought to be susceptible to someone playing a blue whistle that you could get from a Captain Crunch cereal box that just happened to play the correct 2600 hz frequency to trick phones into registering a call as a toll-free 1-800 call. The hack itself was clever, but the design it was preying upon and the overall situation was kinda dumb.

What does it take to ban a thing?

Matthew_Opitz1y32

Good examples to consider! Has there ever been a technology that has been banned or significantly held back via regulation that spits out piles of gold (not counting externalities) and that doesn't have a next-best alternative that replicates 90%+ of the value of the original technology while avoiding most of the original technology's downsides?

The only way I could see humanity successfully slowing down AGI capabilities progress is if it turns out that advanced narrow-AIs manage to generate more utility than humans know what to do with initially. Perhaps it takes time (a generation or more?) for human beings to even figure out what to do with a certain amount of new utility, such that even a tiny risk of disaster from AGI would motivate people to satisfice and content themselves with the "AI summer harvest" from narrow AI? Perhaps our best hope for giving us time to get AGI right is to squeeze all we can out of systems that are identifiably narrow-AI (while making sure to not fool ourselves that a supposed narrow-AI that we are building is actually AGI. I suppose this idea relies on there being a non-fuzzy, readily-discernable line between safe and bounteous narrow-AI and risky AGI).

Which technologies are stuck on initial adoption?

Matthew_Opitz1y92

Why wasn't there enough experimentation to figure out that Zoom was an acceptable & cheaper/more convenient 80% replacement to in-person instruction rather than an unacceptable 50% simulacra of teaching? Because experimentation takes effort and entails risk.

Most experiments don't pan out (don't yield value). Every semester I try out a few new things (maybe I come up with a new activity, or a new set of discussion questions for one lesson, or I try out a new type of assignment), and only about 10% of these experiments are unambiguous improvements. I used to do even more experiments when I started teaching because I knew that I had no clue what I was doing, and there was a lot of low-hanging fruit to pick to improve my teaching. As I approach 10 years of teaching, I notice that I am hitting diminishing returns, and while I still try out new things, it is only a couple of new things each semester. If I was paid according to actual time put into a course (including non-contact hours), then I might have more incentive to be constantly revolutionizing my instruction. But I get paid per-course, so I think it is inevitable if I (and other adjuncts, especially) operate more as education satisficers rather than education maximizers. Considering that rewards are rarely given out for outstanding teaching even for tenured faculty (research is instead the main focus), they probably don't have much incentive to experiment either.

I do know that some departments at my college were already experimenting with "hybrid" courses pre-COVID. In these courses, lectures were delivered online via pre-recorded video, but then the class met once a week for in-person discussion. I still think that is a great idea, and I'd be totally open to trying it out myself if my department were to float the idea. So why am I still not banging down the door of my department head demanding the chance to try it out myself? "If it ain't broke, don't fix it," "Don't rock the boat," there are a number of (probably irrational, I'll admit) heuristics that dissuade me against being "the one" to push for it. What if it doesn't pan-out well? What if my students hate it? It would be different if my department chair suggested it, though. Then more of the "blame" would be on the department chair if it didn't work out. If that sounds like cowardice, then so be it. Someone with an adjunct's lack of job security learns to be a coward as a survival tactic.

LESSWRONG
LW

Posts

Wiki Contributions

Comments