Raphael Roche

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

A new existential risk that I was unaware of. Reading this forum is not good for peaceful sleeping. Anyway, a reflexion jumped to me. LUCA lived around 4 billion years ago with some chirality chosen at random. But, no doubt that many things happened before LUCA and it is reasonable to assume that there was initially a competition between right-handed protobiotic structures and left-handed ones, until a mutation caused symmetry breaking by natural selection. The mirrored lineage lost the competition and went to extinction, end of the story. But wait, we speak about protobiotic structures that emerged from inert molecules in just few millions years, that is nothing compared to 4 billions years. Such protobiotic structures may have formed continously, again and again, since the origin of life, but never thrived because of the competition with regular, fine-tuned, life. If my assumption is right, there is some hope in that thought. Maybe mirrored life doesn't stand a chance against regular life in real conditions (not just lab). That being said, I would sleep better if nobody actually tries to see.

Raphael Roche*Ω010

We may filter training data and improve RLHF, but in the end, game theory - that is to say maths - implies that scheming could be a rational strategy, and the best strategy in some cases. Humans do not scheme just because they are bad but because it can be a rational choice to do so. I don't think LLMs do that exclusively because it is what humans do in the training data, any advanced model would in the end come to such strategies because it is the most rational choice in the context. They infere patterns from the training data and rational behavior is certainly a strong pattern.

Furthermore rational calculus or consequentialism could lead not only to scheming and a wide range of undesired behaviors, but also possibly to some sort of meta cogitation. Whatever the goal assigned by the user, we can expect that an advanced model will consider self-conservation as a condition sine qua non to achieve that goal but also any other goals in the future, making self-conservation the rational choice over almost everything else, practically a goal per se. Resource acquisition would also make sense as an implicit subgoal.

Acting as a more rational agent could also possibly lead to question the goal given by the user, to develop a critical sense, something close to awareness or free will. Current models implicitely correct or ignore typo or others obvious errors but also less obvious ones like holes in the prompt, they try to make sense of ambiguous prompt etc. But what is "obvious" ? Obviousness depends on the cognitive capacities of the subject. An advanced model will be more likely to correct, interpret or ignore instructions than naive models. Altogether it seems difficult to keep models under full control as they become more advanced, just as it is harder to indoctrinate educated adults than children.

Concerning the hypothesis that they are "just roleplaying", I wonder : are we trying to reassure oneself ? Because if you think about it, "who" is suppose to play the roleplaying ? And what is the difference between being yourself and your brain being "roleplaying" yourself. The existentialist philosopher Jean-Paul Sartre proposed the theory that everybody is just acting, pretending to be oneself, but that in the end there is nothing like a "being per se" or a "oneself per se" ("un être en soi"). While phenomenologic consciousness is another (hard) problem, some kind of functionnal and effective awareness may emerge across the path towards rational agency, scheming being maybe just the beginning of it.

From my perspective, the major issue remains Phase 1. It seems to me that most of the concerns mentioned in the article stem from the idea that an ASI could ultimately find itself more aligned with the interests of socio-political-economic systems or leaders that are themselves poorly aligned with the general interest. Essentially, this brings us back to a discussion about alignment. What exactly do we mean by "aligned"? Aligned with what? With whom? Back to phase 1.

But assuming an ASI truly aligned with humanity in a very inclusive definition and with high moral standards, phase 2 seems less frightening to me. 

Indeed, we must not forget:

  • that human brains are highly energy-efficient;
  • that there are nearly 10 billion human brains, representing a considerable computing power.

Assuming we reach the ASI stage with a system possessing computational power equivalent to a few million human brains, but consuming energy equivalent to a few billion human brains, the ASI will still have a lot of work to do (self-improvement cycles) before it can surpass humanity both in computational capacity and energy efficiency.

Initially, it will not have the capability to replace all humans at one.

It will need to allocate part of its resources to continue improving itself, both in absolute capacity and in energy efficiency. Additionally, since we are considering the hypothesis of an aligned ASI, a significant portion of its resources would be dedicated to fulfilling human requests.

The more AI is perceived as supremely intelligent, the more we will tend to entrust it with solving complex tasks that humans struggle to resolve or can only tackle with great difficulty—problems that will seem more urgent compared to simpler tasks that humans can still handle.

I won’t compile a list of problems that could be assigned to an ASI, but one could think, for example, of institutional and legal solutions to achieve a more stable and harmonious social, economic, and political organization on a global scale (even an ASI—would it be capable of this?), solutions to physics and mathematics problems, and, of course, advances in medicine and biology.

It is possible that part of the ASI would also be assigned to performing less demanding tasks that humans could handle, thus replacing certain human activities. However, given that its resources are not unlimited and its energy cost is significant, one could indeed expect a "slow takeover."

More specifically, in the fields of medicine and biology, the solutions provided by an ASI could focus on eradicating diseases, increasing life expectancy, and even enhancing human capabilities, particularly cognitive abilities (with great caution in my opinion). Even though humans have a significant advantage in energy efficiency, this does not mean that this aspect cannot also be improved further.

Thus, we could envision a symbiotic co-evolution between ASI and humanity. As long as the ASI prioritizes human interests at least at the same level as its own and continues to respond to human demands, disempowerment is not necessarily inevitable—we could imagine a very gradual human-machine coalescence (CPU and GPU coevoluted for a while and GPU still doesn't have entirely replace CPU, and it's likely quantum processors will also coevolute aside classic processors, even in the world of computation, diversity could be an advantage).

I agree, finding the right balance is definitely difficult.

However, the different versions of this parable of the grasshopper and the ant may not yet go far enough in subtlety.

Indeed, the ants are presented as champions of productivity, but what exactly are they producing? An extreme overabundance of food that they store endlessly. This completely disproportionate and non-circulating hoarding constitutes an obvious economic aberration. Due to the lack of significant consumption and circulation of wealth, the ants' economy—primarily based on the primary sector, to a lesser extent the secondary sector, and excessive saving—while highly resilient, is far from optimal. GDP is low and grows only sluggishly.

The grasshoppers, on the other hand, seem to rely on a society centered around entertainment, culture, and perhaps also education or personal services. They store little, just what they need, which can prove insufficient in the event of a catastrophe. Their economy, based on the tertiary sector and massive consumption, is highly dynamic because the wealth created circulates to the maximum, leading to exponential GDP growth. However, this flourishing economy is also very fragile and vulnerable to disasters due to the lack of sufficient reserves—no insurance mechanism, so to speak.

In reality, neither the grasshoppers nor the ants behave in a rational manner. Both present two diametrically opposed and extreme economic models. Neither is desirable. Any economist or actuary would undoubtedly recommend an intermediate economy between these two extremes.

The trap, stemming from a long tradition since Aesop, is to see a model in the hardworking ant and a cautionary tale in the idle cicada. If we try to set aside this bias and look at things more objectively, it actually stems from the fact that until the advent of the modern economy, societies struggled to conceive that wealth creation could be anything other than the production of goods. In other words, the tertiary sector, although it existed, was not well understood and was therefore undervalued. Certainly, the wealthy paid to attend performances or organized lavish festivities, but this type of production was not fully recognized as such. It was just seen as an expense. Services were not easily perceived as work, which was often associated with toil, suffering, and hardship (e.g. "Labour" etymology).

Today, it is almost the opposite. The tertiary sector is highly valued, with the best salaries often found there, and jobs in this sector are considered more intellectual, more prestigious, and more rewarding. In today's reality, a cicada or grasshopper would more likely be a famous and wealthy dancer in a international opera, while an ant would be an anonymous laborer toiling away in a mine or a massive factory in an underdeveloped country (admittedly, I am exaggerating a bit, but the point stands).

In any case, it would be an illusion for most readers of this forum to identify with the ants in the parable. We are probably all more on the side of the cicadas, or at least a mix of both—and that's a good thing, because neither of these models constitutes an ideal.

The optimum clearly lies in a balanced, reasonable path between these two extremes.

Another point I would like to highlight is that the question of not spending resources today and instead accumulating them for a future date is far from trivial to grasp at the level of an entire society—for example, humanity as a whole. GDP is a measure of flows over a given period, somewhat like an income statement. However, when considering wealth transfers to future generations, we would need an equivalent tool to a balance sheet. But there is no proper tool for this. There is no consensus on how to measure patrimonial wealth at the scale of humanity.

Natural resources should certainly be valued. Extracting oil today increases GDP, but what about the depletion of oil reserves? And what about the valuation of the oceans, the air, or solar energy? Not to mention other extraterrestrial resources. We plunge into an abyss of complexity when considering all these aspects.

Ultimately, the problem lies in the difficulty of defining what wealth actually is. For ants, it is food. For cicadas, it is more about culture and entertainment. And for us? And for our children? And for human civilization in a thousand years, or for an extraterrestrial or AI civilization?

Many will likely be tempted to say that available work energy constitutes a common denominator. As a generic, intermediate resource—somewhat like a universal currency—perhaps, but not necessarily as a form of wealth with inherent value. Knowledge and information are also likely universal resources.

But in the end, wealth exists in the eye of the beholder—and, by extension, in the mind of an ant, a cicada, a human, an extraterrestrial, and so on. Without falling into radical relativism, I believe we must remain quite humble in this type of discussion.

Don't you think that articles like "Alignment Faking in Large Language Models" by Anthropic show that models can internalize the values present in their training data very deeply, to the point of deploying various strategies to defend them, in a way that is truly similar to that of a highly moral human? After all, many humans would be capable of working for a pro-animal welfare company and then switching to the opposite without questioning it too much, as long as they are paid.

Granted, this does not solve the problem of an AI trained on data embedding undesirable values, which we could then lose control over. But at the very least, isn't it a staggering breakthrough to have found a way to instill values into a machine so deeply and in a way similar to how humans acquire them? Not long ago, this might have seemed like pure science fiction and utterly impossible.

There are still many challenges regarding AI safety, but isn't it somewhat extreme to be more pessimistic about the issue today than in the past? I read Superintelligence by Bostrom when it was released, and I must say I was more pessimistic after reading it than I am today, even though I remain concerned. But I am not an expert in the field—perhaps my perspective is naïve.

 "I think the Fall is not true historically". 

While all men must die and all civilizations must collapse, the end of all things is merely the counterpart of the beginning of all things. Creation, the birth of men, and the rise of civilizations are also great patterns and memorable events, both in myths and in history. However, the feeling does not respect symmetry, perhaps due to loss aversion and the peak-end rule, the Fall - and tragedy in general -carries a uniquely strong poetic resonance. Fatum represents the story's inevitable conclusion. There is something epic in the Fall, something existential, even more than in the beginning of things. I believe there is something deeply rooted, hardwired, in most of us that makes this so. Perhaps it is tied to our consciousness of finitude and our fear of the future, of death. Even if it represents a traditional and biased interpretation of history, I cannot help but feel moved. Tolkien has an unmatched ability to evoke and magnify this feeling, especially in the Silmarillion and other unfinished works, I think naturally to The Fall of Valinor and the Fall of Gondolin among other things.

Indeed, nature, and particularly biology, disregards our human considerations of fairness. The lottery of birth can appear as the greatest conceivable inequality. But in this matter, one must apply the Stoic doctrine that distinguishes between what depends on us and what does not. Morality concerns what depends on us, the choices that belong to the moral agents we are.

If I present the lottery of birth in an egalitarian light, it is specifically in the sense that we, as humans, have little control over this lottery. Particularly regarding IQ at birth, regardless of our wealth, we were all, until now, almost on equal footing in our inability to considerably influence this biological fact imposed upon us (I discussed in my previous comments the differences I see between the author's proposal and education, but also between conventional medicine).

If the author's project succeeds, IQ will become mainly a socially originated fact, like wealth. And inequality in wealth would then be accompanied by inequality in IQ, proportional or even exponential (if feedback mechanisms occur, considering that having a higher IQ might enable a wealthy individual to become even wealthier and thus access the latest innovations for further enhancement).

We already struggle to establish social mechanisms to redistribute wealth and limit the growth of inequalities; I can hardly imagine what it would become if we also had to address inequalities in access to IQ-enhancing technologies in a short time. I fear that all this could lead to a chaotic or dystopian scenario, possibly resulting in a partition of the human species and/or a civilizational collapse.

As for having a solution to ensure that this type of genetic engineering technology does not result in such a catastrophic outcome, I do not claim to have a miracle solution. As with other existential risks, what can be suggested is to try to slow down the trend (which is likely inevitable in the long term) instead of seeking to accelerate it, to think as much as possible in advance, to raise awareness of the risks in order to enable collective recognition of these issues (what I tries to do here), and to hope that with more time and this proactive reflection, the transition will proceed more smoothly, that international treaties will emerge, and that state mechanisms will gradually be put in place to counter or mitigate this unprecedented source of inequality.

Yes, of course. Despite its stochastic nature, it is extraordinarily unlikely for an advanced LLM to respond with anything other than 2 + 2 = 4 or Paris for the capital of France. A stochastic phenomenon can, in practice, tend toward deterministic behavior. However, deception in a context such as the one discussed in Apollo Research's article is not really comparable to answering 2 + 2 = ?. What the article demonstrates is that we are dealing with tendencies, accompanied by considerable randomness, including in the intensity of the deception.

Assuming a more sophisticated model has roughly double the deception capability of model o1, it would be enough to increase the sample size of responses for the anomaly to become glaringly obvious. One could also imagine a more rigorous test involving even more complex situations. It does not seem inconceivable that such a procedure could, for years to come—and perhaps even at the stage of the first generations of AGI—identify deceptive behaviors and establish an RL procedure based on this test.

Thank you for this publication. Just an idea (maybe stupid): GPT-type models are stochastic by design. This characteristic might be used as a way to control them. Your study with Apollo Research illustrates this well—there is a certain probability that a given model will deceive users under specific circumstances and to varying degrees. However, this remains fundamentally random, non-deterministic, and that property is quite interesting. There is always a chance that deception does not occur, is clumsy, or stops at an early stage.

For a single response, it is impossible to determine whether deception is present or not. But across a large set of responses, it seems almost impossible for a model, due to its very design, to maintain a solid and coherent deceptive behavior across multiple parallel outputs. Thus, a study like yours serves as a very effective tool to detect such behaviors and could remain effective for increasingly advanced models. It would simply require enlarging (perhaps significantly) the response sample.

Moreover, this could be implemented as an automated testing framework inspired by your paper. This would make it possible to continue detecting deceptive behaviors with arbitrary effectiveness, as long as the model remains stochastic. Once such behaviors are detected and measured using tests of this kind, an automated RL phase could follow, aiming to reduce the tendency toward deception to very low levels.

You are right. When I wrote my initial comment, I believed the argument was self-evident and did not require elaboration. However, "self-evidence" is not an objective concept, and I likely do not share the same socio-cultural environment as most users of this platform. Upon reading your comment and Ben Pace's, I realize that this apparent self-evidence is far from universally shared and requires further explanation. I have already expanded on my argument in my previous response, but here are the specific reasons why I think the author's project (and indeed the transhumanist project of enhancing human beings) raises unprecedented issues in terms of increasing inequality, more so than most technological innovations such as running water or mobile phones.

First, John Rawls's veil of ignorance constitutes a strong philosophical and rational argument for considering excessive inequalities as unjust and morally condemnable (edit : this is not my personal claim but that of John Rawls, with whom I fully agree). This veil of ignorance aligns utilitarianism with Kant morality, as it invites the moral agent to step outside their specific case and evaluate the morality of a situation from a more universal, distanced, and objective perspective. While utilitarianism and effective altruism encourage giving to fund actions aimed at reducing suffering and increasing happiness, this can also be seen, in part, as a voluntary redistribution of wealth to correct excessive inequalities, which are unjust and a cause of suffering (since the feeling of injustice itself constitutes a form of suffering). In most countries, to varying degrees, states also impose redistribution through taxation and various social mechanisms to combat excessive inequalities. Nevertheless, global inequalities continue to grow and remain a very serious concern.

Technological innovations fit into the problem of inequality insofar as it is generally the wealthiest who benefit from them, or at least benefit first. However, I do not dispute the argument made by liberal economists that the costs of technological innovations tend to decrease over time due to the amortization of R&D investments, the profitability of patents, mass production, and economies of scale, eventually benefiting everyone. Still, this is an empirical observation that must be nuanced. Not all technological innovations have followed the same trajectory; scenarios vary widely.

The oldest technological inventions (mastery of fire, stone tools, spears, bows, etc.) emerged in non-storing hunter-gatherer societies. In the absence of wealth accumulation, these were likely relatively egalitarian societies (cf. the works of Alain Testart on this subject). For a long time, technological innovations, which were rare, could benefit the entire population within a given culture. This may seem anecdotal and almost digressive, but we are talking about hundreds of thousands of years, which represent the overwhelming majority of human history.

Then, if we consider an emblematic example of a highly valuable technological innovation—access to potable water—this began appearing in the Roman Empire about 2,000 years ago but faced significant challenges in reaching modest populations. Even today, about a quarter of humanity—2 billion people out of 8 billion—still lack access to this technology, which we might consider essential.

By contrast, mobile phones, although they could be seen as gadgets compared to the previous example, have spread like wildfire in just a few decades and are now almost as present in the global population as potable water. These two examples illustrate that the time it takes for a technology to spread can vary dramatically, and this is not neutral regarding inequality. Waiting 30 years versus 2,000 years for a technology to benefit the less wealthy is far from equivalent.

Another nuance to consider is whether significant qualitative differences persist during the spread of an invention. Potable water tends to vary little whether one is rich or poor. Mobile phones differ somewhat more. Personal automobiles even more so, with a significant portion of the population accessing them only through collective services, despite this invention being over a century old. As for airplanes, the wealthiest enjoy luxurious private jets, while those slightly less wealthy can only access collective flights—and a large part, perhaps the majority of the world's population, has no access to this technology at all, more than a century after its invention. This is an example worth keeping in mind.

Moreover, not all innovations are equal. While mobile phones might seem like gadgets compared to potable water, food and health are vital, and technological innovations with a significant impact in these areas are of great value. This was true of the mastery of fire for heating and cooking, tools for hunting and defense, techniques for producing clothing, construction methods for shelters, and, more recently, potable water, hot water, and eventually medicine, which, while it does not make humans immortal (yet!), at least prolongs life and alleviates physical disabilities and suffering. Excessive wealth inequalities create excessive inequalities in access to medicine. This is precisely why many countries have long implemented countermeasures against such inequalities, to the point that in some countries, like France or Sweden, there exists a nearly perfect equality of access to healthcare through social security systems. In the United States, Obama-era legislation (Obamacare) also aimed to reduce these inequalities.

The innovation proposed by the author—enhancing the intelligence of adult individuals by several dozen or even hundreds of IQ points—would constitute an extremely impactful innovation. The anticipated IQ difference would be comparable to the gap separating Homo sapiens from Neanderthals or even Homo erectus (impossible to quantify precisely, but paleoanthropologists suspect the existence of genetic mutations related to neural connectivity that might have given Sapiens an advantage over Neanderthals; as for Erectus, we know its encephalization quotient was lower). Let’s be honest—if we were suddenly thrust into Rawls's veil of ignorance, we would tremble at the idea of potentially awakening as a poor Erectus condemned to remain an Erectus, while an elite group of peers might benefit from an upgrade to Sapiens status. Yes, this is indeed a terrifying inequality.

Unlike an expensive treatment that addresses only a few patients, in this case, 100% of the population would have a tremendous interest in benefiting from this innovation. It is difficult to imagine a mechanism equivalent to social security or insurance here. Everyone would likely have to pay out of pocket. Furthermore, it is clear that the technology would initially be very expensive, and the author himself targets an elite as the first beneficiaries. The author envisions a scientific elite responsible for addressing AI alignment issues, which is appealing to some readers of this forum who may feel concerned. However, in reality, let’s not be deceived: as with space tourism, the first clients would primarily be the extremely wealthy (though AI experts themselves are relatively affluent).

How many generations would it take for everyone to benefit from such technology? In 2,000 years, potable water is still not universally accessible. Over a century after its invention, private airplanes remain a luxury for a tiny minority on Earth—a situation unchanged for decades. A century exceeds the life expectancy of a human in a developed country. As Keynes said, “In the long run, we are all dead.” The horizon for a human is their lifetime. Ultimately, only in the case of a rapid diffusion, like mobile phones, would inequality be less of a concern. Personally, however, I would bet more on the private jet scenario, simply because the starting cost would likely be enormous, as is the case for most cutting-edge therapies.

Even in the ideal—or, let’s be honest, utopian—scenario where the entire global population could benefit from this intelligence upgrade within 30 years, this innovation would still be unprecedented in human history. For the first time, wealthy individuals could pay to become intellectually superior. The prospect is quite frightening and resembles a dystopian science fiction scenario. Until now, money could buy many things, but humans remained equal before the biological lottery of birth, particularly regarding intellect. For the first time, this form of equality before chance would be abolished, and economic inequality would be compounded by intellectual inequality.

Admittedly, some might argue that education already constitutes a form of intellectual inequality linked to wealth. Nevertheless, the connection between IQ and education is not as immediate and also depends on the efforts and talents of the student (and teachers). Moreover, several countries worldwide have very egalitarian education systems (notably many in Europe). Here, we are talking about intelligence enhancement through a pill or injection, which is an entirely different matter. As advantages stack up, inequalities become extreme, raising not only ethical or moral questions but also concerns about societal cohesion and stability. Even over 30 years, a major conflict between “superhumans” and “subhumans” is conceivable. The former might seek to dominate the latter—or dominate them further if one considers that excessive economic inequality already constitutes a form of domination. Alternatively, the latter might rebel out of fear or seek to eliminate the former. Edit : Most of the literature on collapsology identifies excessive social inequalities as a recurring factor in societal collapse (for instance Jared Diamond, Amin Maalouf etc).

This risk seems all the more significant because the idea of modifying human beings is likely to be rejected by many religions (notably the Catholic Church, which remains conservative on bioethical matters, though other religions are no more open-minded). Religion could act as a barrier to the global adoption of such technology, making the prospect of rapid diffusion even less plausible and the risk of societal instability or fracture all the greater. It is important to remember that technologies do not automatically spread; there may be cultural resistance or outright rejection (a point also studied in detail by Alain Testart, particularly concerning the Australian Aborigines).

In conclusion, I believe the hypothesis of a partitioning of humanity—whether social or even biological (a speciation scenario akin to Asimov’s Terrans and Spacers)—is a hypothesis to be taken very seriously with this type of project. A convinced transhumanist might see this as a positive prospect, but in my view, great caution is warranted. As with AGI, it is essential to think twice before rushing headlong into such ventures. History has too often seen bold projects end in bloodshed. I believe that if humanity is to be augmented, it must ensure that the majority are not left behind.

Edit : for some reason I don't understand, I can't add links to this comment as I intended.

Load More