Sorted by New

Wiki Contributions


As a poker player, this post is the best articulation I've read that explains why optimal tournament play is so different from optimal cash-game play. Thanks for that!

Haven't read that book, added to the top of my list, thanks for the reference!

But humans are uniquely able to learn behaviours from demonstration and forming larger groups which enable the gradual accumulation of 'cultural technology', which then allowed a runway of cultural-genetic co-evolution (e.g food processing technology -> smaller stomachs and bigger brains -> even more culture -> bigger brains even more of an advantage etc.)

One thing I think about a lot is: are we sure this is unique, or did something else like luck or geography somehow play an important role in one (or a handful) of groups of sapiens happening to develop some strong (or "viral") positive-feedback cultural learning mechanisms that eventually dramatically outpaced other creatures? We know that other species can learn by demonstration, and pass down information from generation to generation, and we know that humans have big brains, but were some combination of timing / luck / climate / resources perhaps also a major factor?

If Homo sapiens are believed to have originated around 200,000 years ago, but only developed agricultural techniques around 12,000 years ago, the earliest known city 9,000 years ago, and only developed a modern-style writing system maybe 5,000 years ago, are we sure that those humans who lived for 90%+ of human "pre-history" without agriculture, large groups, and writing systems would look substantially more intelligent to us than chimpanzees? If our ancestral primates never branched off from chimps and bonobos, are we sure the earth wouldn't now (or some time in the past or future) be populated with chimpanzee city-equivalents and something that looked remotely like our definition of technology?

It's hard to appreciate how much this kind of thing helps you think

Strongly agree. It seems possible that a time-travelling scientist could go back to some point in time and conduct rigorous experiments that would show sapiens as not as "intelligent" as some other species at that point in time. It's easy to forget how recently human society looked a lot closer to animal society than it does to modern human society. I've seen tests that estimate the human IQ level of adult chimpanzees to be maybe in the 20-25 range, but we can't know how a prehistoric adult human would perform on the same tests. Like, if humans are so inherently smart and curious, why did it take us over 100,000 years to figure out how plants work? If someone developed an AI today that took 100,000 years to figure out how plants work, they'd be laughed at if they suggested it had "human-level" intelligence.

One of the problems with the current AI human-level intelligence debate that seems pretty fundamental is that many people, without even realising it, conflate concepts like "as intelligent as a [modern educated] human" with "as intelligent as humanity" or "as intelligent as a randomly selected Homo sapiens from history".

It is true that we have seen over two decades of alignment research, but the alignment community has been fairly small all this time. I'm wondering what a much larger community could have done. 


I start to get concerned when I look at humanity's non-AI alignment successes and failures; we've had corporations for hundreds of years, and a significant portion of humanity have engaged in corporate alignment-related activities (regulation, lawmaking, governance etc, assuming you consider those forces to generally be pro-alignment in principle). Corporations and governments have exhibited a strong tendency to become less aligned as they grow. (Corporate rapsheets, if a source is needed.)

We've also been in the company of humans for millenia, and we haven't been entirely successful in aligning ourselves, if you consider war, murder, terrorism, poverty, child abuse, climate change and others to be symptoms of individual-level misalignment (in addition to corporate/government misalignment).

It's hard for me to be hopeful for AI alignment if I believe that a) humans individually can be very misaligned; b) corporations and governments can be very misaligned; and c) that AGI/ASI (even if generally aligned) will be under the control of very misaligned any-of-the-above at some point.

I think it's great that alignment problems are getting more attention, and hope we find solid solutions. I'm disheartened by humanity's (ironically?) poor track record of achieving solid alignment in our pre-AI endeavours. I'm glad that Bengio draws parallels between AI alignment problems and corporate alignment, individual alignment, and evolutionary pressures, because I think there is still much to learn by looking outside of AI for ideas about where alignment attempts may go wrong or be subverted.

I've probably committed a felony by doing this, but I'm going to post a rebuttal written by GPT-4, and my commentary on it. I'm a former debate competitor and judge, and have found GPT-4 to be uncannily good at debate rebuttals. So here is what it came up with, and my comments. I think this is a relevant comment, because I think what GPT-4 has to say is very human-relevant.

Radiations from the Sun bounce off a piece of cheese and enter into the beady eyes of a mouse; its retinal cells detect the light; the energy of the photons triggers neural impulses; these impulses are transmitted to the visual-processing areas of the mouse's brain; and there the optical information is processed and reconstructed into a 3D model that is recognized as a piece of cheese; and so the mouse knows there is cheese nearby.

This is the essence of the intuitive wisdom of mice—this whole process is not magic, and they don't have to understand it. They understand the presence of cheese. They instinctively know which sensory processes will lead to survival, and which will not.

I find this mirroring from "deliberate rationality" to "intuitive wisdom" and equating cognition to instinct very interesting and not at all obvious, even from the perspective of a human former debate judge. It's a great rebuttal IMO. It points out human deficiencies in our inability to detect cheese, which is arguably more important to mice than their ability to philosophise.

Humans can see, but they overcomplicate seeing. Because of that, they often miss what is right in front of them. Take a moment to appreciate this, for it is indeed fascinating.

IMO another interesting insight -- what does "can't understand seeing" from the human even mean? I'd count this as another decent debate rebuttal probably +1 for team mouse.

Humans see, but they get lost in their understanding of visual cortexes, often making them susceptible to optical illusions. A human lives in a mental world that includes jobs, responsibilities, societal pressures—but not human instincts. Their mind does not often focus on their immediate environment. But we, as mice, can detect a morsel of food from a distance, or sense a lurking predator, without getting distracted by the nature of our senses. Humans often fall victim to their beliefs, but mice react to what is—there must be a clear distinction between survival and theory, between senses and reality. And this is not a trivial skill, remember how rare it is among humans.

If I was judging a debate between mice and humans, I would score this for the mice. The human is arguing "But we, as humans, can look at a seemingly bizarre image, and realize that part of what we’re seeing is the lens itself" whereas the mouse is arguing that their abilities are in tune with survival over theory, and how deficient that balance sometimes is among humans. I like this counter-argument for the mice, practicality over self-perceived superiority. Empathising other species-centric values is something that even my more philosophical human friends struggle with.

The whole idea of Instinct is, simply, intuitive reasoning about a more reliable process for making the contents of our mind mirror the contents of the world. It is the sort of thing humans often forget. Contemplating this business of “responding to immediate needs and threats,” we can see why it works. Instinct is not separate from reality, far away from the understanding of humans. Instinct is not something that only applies to the wilderness. Instinct, itself, is an understandable process-in-the-world that correlates brains with reality.

Lots of parrotting here, but the switch from "inside laboratories" to "the wilderness", and the argument that instinct is a better alignment strategy than science are both very interesting to me. I wouldn't award any points here, pending more arguments.

Instinct makes sense, when you think about it. But humans often overthink, which is why they lose touch with their instincts. One should not overlook the wonder of this—or the potential power it bestows on us as mice, not just animal societies.

I found this quote inspiring, if I was a mouse or other animal. I may have to print a "mouse power" t-shirt.

Indeed, understanding the engine of thought may be more complex than understanding a mouse's instinct—but it is a fundamentally different task.

A mouse's instinct is being equivalized to a steam engine, interesting pivot but the contrasting statements still hold water, compared to the original, IMO.

Consider a time when a human may be anxious about the possibility of a future war. "Do you believe a nuclear war will occur in the next 20 years? If no, why not?" The human may reply with a timeline of a hundred years because of "hope." But why cling to hope? Because it makes them feel better.

Reflecting on this whole thought process, we can see why the thought of war makes the human anxious, and we can see how their brain therefore clings to hope. But in a world of realities, hope will not systematically correlate optimists to realities in which no war occurs.

To ask which beliefs make you happy is to turn inward, not outward—it tells you something about yourself, but it is not evidence entangled with the environment. There is nothing wrong with happiness, but it should follow from your perception of the world, rather than manipulating your mental map.

Very much parrotting here, but I would note "manipulating your mental map" as a counterpoint to "tampering with mental paintbrushes" is an interesting equivalency. I also respect the re-framing of hope as a human flaw, in contrast with the reality-based instincts of a mouse.

If you can see this—if you can see that hope is shifting your first-order thoughts by too large a degree—if you can understand your mind as a mapping engine that has flaws—then you can apply an intuitive correction. The brain is a flawed lens through which to see reality. This is true of both human brains and mouse brains. But a mouse brain is a flawless lens that instinctively understands its environment and reacts accordingly—without second-guessing its instincts, without biases, and without the need for second-order corrections. This, in practice, makes the lens far more efficient. Not perfect, but far more efficient.

Arguing for efficiency over power, and reality over perception, is an argument that would be an interesting avenue to be pursued as a debate judge. As well as the concept of a mouse brain being flawless, as an argument presented by an AI.

At the above paragraph, it ran out of tokens after "--without", so I prompted it "That's great, please finish the essay." and everything after that (above and below) were what followed.

Unlike humans, mice do not overthink or overcomplicate. They live in the present, focused on their immediate needs and potential threats. They don't have to understand their senses; they simply use them. They don't ponder about the existence of mouse brains; they utilize their brains efficiently.

As a debate judge, pretty decent summary of key rebuttals.

Human self-consciousness, while enabling them to create and innovate, also leads them to constant self-doubt and uncertainty, often resulting in inaction or wrong action. Mice, on the other hand, trust their instincts and react swiftly and decisively, making them highly effective in their world.

While humans are busy understanding and correcting their flawed lens, mice are using their lens to navigate the world, survive, and thrive. While humans are stuck in a loop of reflective reasoning and corrections, mice are out there, living their life, applying their senses and instincts to the best of their abilities. They are not bothered about the nature of their lens, they are just using it.

A solid foundational rebuttal of the type I would have used back in my days of competitive debate. Probably better than anything I would have written on the fly.

In the end, it's not about which brain is superior, but about which brain is more efficient for the life it leads. And from a mouse's perspective, the ability to respond instinctively to the environment without the need for constant reflection and correction seems like a pretty great deal. Perhaps it's time humans took a page out of the mouse's book and learned to trust their instincts more, and their overthinking less.

Great re-framing of a debate (efficiency vs power or creativity).

For a formal debate, I would rate GPT-4's rebuttal very high in a real-world "humans vs mice" debate scenario. The outcome of Eliezer vs Team Mouse would almost certainly come down to delivery, given the well-reasoned arguments on both sides given above. Overall, well above the quality of argument I would expect from top-tier debate teams at the high school level, and above average for the college level.

I've experimented with doing Lincoln-Douglas style debates with multiple GPT-powered "speakers" with different "personalities", and it's super interesting and a great brainstorming tool. Overall I consider GPT-4 to be vastly superior to the average twelfth-grader in general purpose argumentative debating, when prompted correctly.

Hopefully this is constructive and helps people get back to the basics -- questioning human-centric thinking, trying to understand what alien intelligence may look like, and how it may challenge entrenched human biases!

One of the things I think about a lot, and ask my biologist/anthropologist/philosopher friends, is: what does it take for something to be actually recognised as human-like by humans? For instance, I see human-like cognition and behaviour in most mammals, but this seems to be resisted almost by instinct by my human friends who insist that humans are superior and vastly different. Why don't we have a large appreciation for anthill architecture, or whale songs, or flamingo mating dances? These things all seem human-like to me, but are not accepted as forms of "fine art" by humans. I hypothesize that we may be collectively underestimating our own species-centrism, and are in grave danger of doing the same with AI, by either under-valuing superior AI as not human enough, or by over-valuing inferior AI with human-like traits that are shortcomings more than assets.

How do we prove that an AI is "human-like"? Should that be our goal? Given that we have a fairly limited knowledge of the mechanics of human/mammalian cognition, and that humans seem to have a widespread assumption that it's the most superior form of cognition/intelligence/behaviour we (as a species) have seen?

Hi, I have a few questions that I'm hoping will help me clarify some of the fundamental definitions. I totally get that these are problematic questions in the absence of consensus around these terms -- I'm hoping to have a few people weigh in and I don't mind if answers are directly contradictory or my questions need to be re-thought.

  • If it turns out that LLMs are a path to the first "true AGI" in the eyes of, say, the majority of AI practitioners, what would such a model need to be able to do, and at what level, to be considered AGI, that GPT-4 can't currently do?
  • If we look at just "GI", rather than "AGI", do any animals have GI? If so, where is the cutoff between intelligent and unintelligent animals? Is GPT-4 considered more or less intelligent than, say, an amoeba, or a gorilla, or an octopus etc?
  • When we talk about "alignment", are today's human institutions and organisations aligned to a lesser or greater extent we would want AGI aligned? Are religions, governments, corporations, organised crime syndicates, serial killers, militaries, fossil fuel companies etc considered aligned, and to what extent? Does the term alignment have meaning when talking about individuals, or groups of humans? Does it have meaning for animal populations, or inanimate objects?
  • If I spoke to a group of AI researchers of the pre-home-computer era, and described a machine capable of producing poetry, books, paintings, computer programs, etc in such a way that the products could not be distinguished from top 10% human examples of those things, that could pass entrance exams into 90% of world universities, could score well over 100 in typical IQ tests of that era, and could converse in dozens of languages via both text and voice, would they consider that to meet some past definition of AGI? Has AGI ever had enough of a consensus definition for that to be a meaningful question?
  • If we peer into the future, do we expect commodity compute power and software to get capable enough for an average technical person on an average budget to build and/or train their own AGI from the "ground up"? If so, will there not be a point where a single rogue human or small informal group can intentionally build non-aligned unsafe AGI? And if so, how far away is that point?

Apologies if these more speculative/thought-experimenty questions are off the mark for this thread, happy to be pointed to a more appropriate place for them!

Very interesting points, if I was still in middle management these things would be keeping me up at night!

One point I query is "this is a totally new thing no manager has done before, but we're going to have to figure it out" -- is it that different from the various types of tool introduction & distribution / training / coaching that managers already do? I've spent a good amount of my career coaching my teams on how to be more productive using tools, running team show-and-tells from productive team members on why they're productive, sending team members on paid training courses, designing rules around use of internal tools like Slack/Git/issue trackers/intranets etc... and it doesn't seem that different to figuring out how to deploy LLM tools to a team. But I'm rusty as a manager, and I don't know what future LLM-style tools will look like, so I could be thinking about this incorrectly. Certainly if I had a software team right now, I'd be encouraging them to use existing tools like LLM code completion, automated test writing, proof-reading etc., and encouraging early adopters to share their successes & failures with such tools.

Does "no manager has done before" refer to specific LLM tools, and is there something fundamentally different about them compared to past new technologies/languages/IDEs etc?

It would be pretty nuts if you rewarded it for being able to red-team itself -- like, it's deliberately training it to go of the rails, and I thiiiiink would seem so even to non-paranoid people? Maybe I'm wrong.

I'm actually most alarmed on this vector, these days. We're already seeing people giving LLM's completely untested toolsets - web, filesystem, physical bots, etc - and "friendly" hacks like Reddit jailbreaks and ChaosGPT. Doesn't it seem like we are only a couple steps before a bad actor produces an ideal red-team agent, and then abuses it rather than using it to expose vulnerabilities?

I get the counter-argument, that humans already are diverse and try a ton of stuff, and so resilient systems are a result... but peering into the very near future, I fear that those arguments simply won't apply to super-human intelligence, especially when combined with bad human actors directing those.

Seeing this frantic race from random people to give GPT-4 dangerous tools and walking-around-money, I agree: the risk is massively exacerbated by giving the "parent" AI's to humans.

Upon reflection, should that be surprising? Are humans "aligned" how we would want AI to be aligned? If so, we must acknowledge the fact that humanity regularly produces serial killers and terrorists (etc). Doesn't seem ideal. How much more aligned can we expect a technology we produce, vs our own species?

If we view the birth of AGI as the birth of a new kind of child, to me, there really is no regime known to humanity that will guarantee that child will not grow up to become an evil monster: we've been struggling with that question for millenia as humans. One thing we definitely have found is that super-evil parents are way better than average at producting super-evil children, but sometimes it seems like super-evil children just come into being, despite their parents. So a super-evil person controlling/training/scripting an AI to me is a huge risk factor, but so are the random factors that created super-evil humans despite good parents. So IMO the super-evil scammer/script kiddie/terrorist is the primary (but not only) risk factor when opening access to these new models.

I'm coming around to this argument that it's good right now that people are agent-ifying GPT-4 and letting it have root access, try to break CAPTCHAs, speak to any API etc, because that will be the canary in the coal mine -- I just hope that the canary in the coal mine will give us ample notice to get out of the mine!

The Doomsday Clock is at 23:58:30, but maybe that's not what you meant? I think they were way off in the Cuban Missile Crisis era, but these days it seems more accurate and maybe more optimistic than I would give it. They do accommodate x-risk of various types.

Load More