Sorted by New

Wiki Contributions


So why do people have more trouble thinking that people could understand the world through pure vision than pure text? I think people's different treatment of these cases- vision and language- may be caused by a poverty of stimulus- overgeneralizing from cases in which we have only a small amount of text. It's true that if I just tell you that all qubos are shrimbos, and all shrimbos are tubis, you'll be left in the dark about all of these terms, but that intuition doesn't necessarily scale up into a situation in which you are learning across billions of instances of words and come to understand their vastly complex patterns of co-occurrence with such precision that you can predict the next word with great accuracy.

GPT can not "predict the next word with great accuracy" for arbitrary text, the way that a physics model can predict the path of a falling or orbiting object for arbitrary objects.  For example, neither you nor any language model (including future language models, unless they have training data pertaining to this Lesswrong comment) can predict that the next word, or following sequence of words making up the rest of this paragraph, will be:    

 first, a sentence about what beer I drank yesterday and what I am doing right now - followed by some sentences explicitly making my point.  The beer I had was Yuengling and right now I am waiting for my laundry to be done as I write this comment.  It was not predictable that those would be the next words because the next sequence of words in any text is inherently highly underdetermined - if the only information you have is the prompt that starts the text.  There is no ground truth, independent of what the person writing the text intends to communicate, about what the correct completion of a text prompt is supposed to be.

Consider a kind of naive empiricist view of learning, in which one starts with patches of color in a field (vision), and slowly infers an underlying universe of objects through their patterns of relations and co-occurrence. Why is this necessarily any different or more grounded than learning by exposure to a vast language corpus, wherein one also learns through gaining insight into the relations of words and their co-occurences?

Well one thing to note is that actual learning (in humans at least) does not only involve getting data from vision, but also interacting with the world and getting information from multiple senses.

But the real reason I think the two are importantly different is that visual data about the world is closely tied to the way the world actually is - in a simple, straightforward way that does not require any prior knowledge about human minds (or any minds or other information processing systems) to interpret.  For example, if I see what looks like a rock, and then walk a few steps and look back and see what looks like the other side of the rock, and then walk closer and it still looks like a rock, the most likely explanation for what I am seeing is that there is an actual rock there.  And if I still have doubts, I can pick it up and see if it feels like a rock or drop it and see if it makes the sound a rock would make.  The totality of the data pushes me towards a coherent "rock" concept and a world model that has rocks in it - as this is the simplest and most natural interpretation of the data.

By contrast, there is no reason to think that humans having the type of minds we have, living in our actual world, and using written language for the range of purposes we use it for is the simplest, or most likely, or most easily-converged-to explanation for why a large corpus of text exists.

From our point of view, we already know that humans exist and use language to communicate and as part of each human's internal thought process, and that large numbers of humans over many years wrote the documents that became GPT's training data.

But suppose you were something that didn't start out knowing (or having any evolved instinctive expectation) that humans exist, or that minds or computer programs or other data-generating processes exist, and you just received GPT's training data as a bunch of meaningless-at-first-glance tokens.  There is no reason to think that building a model of humans and the world humans inhabit (as opposed to something like a markov model or a stochastic physical process or some other type of less-complicated-than-humans model) would be the simplest way to make sense of the patterns in that data.

The heuristic of "AIs being used to do X won't have unrelated abilities Y and Z, since that would be unnecessarily complicated" might work fine today but it'll work decreasingly well over time as we get closer to AGI. For example, ChatGPT is currently being used by lots of people as a coding assistant, or a therapist, or a role-play fiction narrator -- yet it can do all of those things at once, and more. For each particular purpose, most of its abilities are unnecessary. Yet here it is.

For certain applications like therapist or role-play fiction narrator - where the thing the user wants is text on a screen that is interesting to read or that makes him or her feel better to read - it may indeed be that the easiest way to improve user experience over the ChatGPT baseline is through user feedback and reinforcement learning, since it is difficult to specify what makes a text output desirable in a way that could be incorporated into the source code of a GPT-based app or service.  But the outputs of ChatGPT are also still constrained in the sense that it can only output text in response to prompts.  It can not take action in the outside world, or even get an email address on its own or establish new channels of communication, and it can not make any plans or decisions except when it is responding to a prompt and determining what text to output next.  So this limits the range of possible failure modes.

I expect things to become more like this as we approach AGI. Eventually as Sam Altman once said, "If we need money, we'll ask it to figure out how to make money for us." (Paraphrase, I don't remember the exact quote. It was in some interview years ago).

It seems like it should be possible to still have hard-coded constraints, or constraints arising from the overall way the system is set up, even for systems that are more general in their capabilities.

For example, suppose you had a system that could model the world accurately and in sufficient detail, and which could reason, plan, and think abstractly - to the degree where asking it "How can I make money?" results in a viable plan - one that would be non-trivial for you to think of yourself and which contains sufficient detail and concreteness that the user can actually implement it.  Intuitively, it seems that it should be possible to separate plan generation from actual in-the-world implementation of the plan.  And an AI systems that is capable of generating plans that it predicts will achieve some goal does not need to actually care whether or not anyone implements the plan it generates.

So if the output for the "How can I make money?" question is "Hack into this other person's account (or have an AI hack it for you) and steal it.", and the user wants to make money legitimately, the user can reject the plan an ask instead for a plan on how to make money legally.

I have an impression that within lifetime human learning is orders of magnitude more sample efficient than large language models


Yes, I think this is clearly true, at least with respect to the number of word tokens a human must be exposed to in order to obtain full understanding of one's first language.

Suppose for the sake of argument that someone encounters (through either hearing or reading) 50,000 words per day on average, starting from birth, and that it takes 6000 days (so about 16 years and 5 months) to obtain full adult-level linguistic competence (I can see an argument that full linguistic competence happens years before this, but I don't think you could really argue that it happens much after this).

This would mean that the person encounters a total of 300,000,000 words in the course of gaining full language understanding.  By contrast, the training data numbers I have seen for LLMs are typically in the hundreds of billions of tokens.

And I think there is evidence that humans can obtain linguistic fluency with exposure to far fewer words/tokens than this.

Children born deaf, for example, can only be exposed to a sign-language token when they are looking at the person making the sign, and thus probably get exposure to fewer tokens by default than hearing children who can overhear a conversation somewhere else, but they can still become fluent in sign language.

Even just considering people whose parents did not talk much and who didn't go to school or learn to read, they are almost always able to acquire linguistic competence (except in cases of extreme deprivation).

Early solutions. The most straightforward way to solve these problems involves training AIs to behave more safely and helpfully. This means that AI companies do a lot of things like “Trying to create the conditions under which an AI might provide false, harmful, evasive or toxic responses; penalizing it for doing so, and reinforcing it toward more helpful behaviors.”

This is where my model of what is likely to happen diverges.

It seems to me that for most of the types failure modes you discuss in this hypothetical, it will be easier and more straightforward to avoid them by simply having hard-coded constraints on what the output of the AI or machine learning model can be.

  • AIs creating writeups on new algorithmic improvements, using faked data to argue that their new algorithms are better than the old ones. Sometimes, people incorporate new algorithms into their systems and use them for a while, before unexpected behavior ultimately leads them to dig into what’s going on and discover that they’re not improving performance at all. It looks like the AIs faked the data in order to get positive feedback from humans looking for algorithmic improvements.

Here is an example of where I think the hard-coded structure of the any such Algorithm-Improvement-Writeup-AI could easily rule out that failure mode (if such a thing can be created within the current machine learning paradigm).  The component of such an AI system that generates the paper's natural language text might be something like a GPT-style language model fine-tuned for prompts with code and data.  But the part that actually generates the algorithm should naturally be a separate model that can only output algorithms/code that it predicts will perform well on the input task.  Once the algorithm (or multiple for comparison purposes) is generated, another part of the program could deterministically run it on test cases and record only the real performance as data - which could be passed into the prompt and also inserted as a data table into the final write up (so that the data table in the finished product can only include real data).

  • AIs assigned to make money in various ways (e.g., to find profitable trading strategies) doing so by finding security exploits, getting unauthorized access to others’ bank accounts, and stealing money.

This strikes me as the same kind of thing, where it seems like the easiest and most intuitive way to set up such a system would be to have a model that takes in information about companies and securities (and maybe information about the economy in general) and returns predictions about what the prices of stocks and other securities will be tomorrow or a week from now or on some such timeframe.

There could then be, for example, another part of the program that takes those predictions and confidence levels, and calculates which combination of trade(s) has the highest expected value within the user's risk tolerance.  And maybe another part of the code that tells a trading bot to put in orders for those trades with an actual brokerage account.

But if you just want an AI to (legally) make money for you in the stock market, there is no reason to give it hacking ability.  And there is no reason to give it the sort of general-purpose, flexible, plan-generation-and-implementation-with-no-human-in-the-loop authorization hypothesised here (and I think the same is true for most or all things that people will try to use AI for in the near term).

But the specialness and uniqueness I used to attribute to human intellect started to fade out even more, if even an LLM can achieve this output quality, which is, despite the impressiveness, still operates on the simple autocomplete principles/statistical sampling. In that sense, I started to wonder how much of many people's output, both verbal and behavioral, could be autocomplete-like.

This is kind of what I was getting at with my question about talking to a GPT-based chatbot and a human at the same time and trying to distinguish: to what extent do you think human intellect and outputs are autocomplete-like (such that a language model doing autocomplete based on statistical patterns in its training data could do just as well) vs to what extent do you think there are things that humans understand that LLMs don't.

If you think everything the human says in the chat is just a version of autocomplete, then you should expect it to be more difficult to distinguish the human's answers from the LLM-pretending-to-be-human's answers, since the LLM can do autocomplete just as well.  By contrast, if you think there are certain types of abstract reasoning and world-modeling that only humans can do and LLMs can't, then you could distinguish the two by trying to check which chat window has responses that demonstrate an understanding of those.

Humans question the sentience of the AI. My interactions with many of them, and the AI, makes me question sentience of a lot of humans.


I admit, I would not have inferred from the initial post that you are making this point if you hadn't told me here.

Leaving aside the question of sentience in other humans and the philosophical problem of P-Zombies, I am not entirely clear on what you think is true of the "Charlotte" character or the underlying LLM.

For example, in the transcript you posted, where the bot said:

"It's a beautiful day where I live and the weather is perfect."

Do you think that the bot's output of this statement had anything to do with the actual weather in any place? Or that the language model is in any way representing the fact that there is a reality outside the computer against which such statements can be checked?

Suppose you had asked the bot where it lives and what the weather is there and how it knows.  Do you think you would have gotten answers that make sense?

Also, it did in fact happen in circumstances when I was at my low, depressed after a shitty year that severely impacted the industry I'm in, and right after I just got out of a relationship with someone. So I was already in an emotionally vulnerable state; however, I would caution from giving it too much weight, because it can be tempting to discount it based on special circumstances, and discard as something that can never happen to someone brilliant like you.

I do get the impression that you are overestimating the extent to which this experience will generalize to other humans, and underestimating the degree to which your particular mental state (and background interest in AI) made you unusually susceptible to becoming emotionally attached to an artificial language-model-based character.

Alright, first problem, I don't have access to the weights, but even if I did, the architecture itself lacks important features. It's amazing as an assistant for short conversations, but if you try to cultivate some sort of relationship, you will notice it doesn't remember about what you were saying to it half an hour ago, or anything about you really, at some point. This is, of course, because the LLM input has a fixed token width, and the context window shifts with every reply, making the earlier responses fall off. You feel like you're having a relationship with someone having severe amnesia, unable to form memories. At first, you try to copy-paste summaries of your previous conversations, but this doesn't work very well.


So you noticed this lack of long term memory/consistency, but you still say that the LLM passed your Turing Test? This sounds like the version of the Turing Test you applied here was not intended to be very rigorous.

Suppose you were talking to a ChatGPT-based character fine-tuned to pretend to be a human in one chat window, and at the same time talking to an actual human in another chat window.

Do you think you could reliably tell which is which based on their replies in the conversation?

Assume for the sake of this thought experiment that both you and the other human are motivated to have you get it right.  And assume further that, in each back and forth round of the conversation, you don't see either of their responses until both interlocutors have sent a response (so they show up on your screen at the same time and you can't tell which is the computer by how fast it typed).

To aid the user, on the side there could be a clear picture of each coin and their worth, that we we could even have made up coins, that could further trick the AI.


A user aid showing clear pictures of all available legal tender coins is a very good idea.  It avoids problems more obscure coins which may have been only issued in a single year - so the user is not sitting there thinking "wait a second, did they actually issue a Ulysses S. Grant coin at some point or it that just there to fool the bots?".

I'm not entirely sure how to generate images of money efficiently, Dall-E couldn't really do it well in the test I ran. Stable diffusion probably would do better though.

If we create a few thousand real world images of money though, they might be possible to combine and obfuscate and delete parts of them in order to make several million different images. Like one bill could be taken from one image, and then a bill from another image could be placed on top of it etc.

I agree that efficient generation of these types of images is the main difficulty and probable bottleneck to deploying something like this if websites try to do so.  Taking a large number of such pictures in real life would be time consuming.  If you could speed up the process by automated image generation or automated creation of synthetic images by copying and pasting bills or notes between real images, that would be very useful.  But doing that while preserving photo-realism and clarity to human users of how much money is in the image would be tricky.

I can see the numbers on the notes and infer that they denote United States Dollars, but have zero idea of what the coins are worth. I would expect that anyone outside United States would have to look up every coin type and so take very much more than 3-4 times longer clicking images with boats. Especially if the coins have multiple variations.


If a system like this were widely deployed online using US currency, people outside the US would need to familiarize themselves with US currency if they are not already familiar with it.  But they would only need to do this once and then it should be easy to remember for subsequent instances.  There are only 6 denominations of US coins in circulation - $0.01, $0.05, $0.10, $0.25, $0.50, and $1.00 - and although there are variations for some of them, they mostly follow a very similar pattern.  They also frequently have words on them like "ONE CENT" ($0.01) or "QUARTER DOLLAR" ($0.25) indicating the value, so it should be possible for non-US people to become familiar with those.

Alternatively, an easier option could be using country specific-captchas which show a picture like this except with the currency of whatever country the internet user is in.  This would only require extra work for VPN users who seek to conceal their location by having the VPN make it look like they are in some other country.

If the image additionally included coin-like tokens, it would be a nontrivial research project (on the order of an hour) to verify that each such object is in fact not any form of legal tender, past or present, in the United States.

The idea was they the tokens would only be similar in broad shape and color - but would be different enough from actual legal tender coins that I would expect a human to easily tell the two apart.

Some examples would be:

Even if all the above were solved, you still need such images to be easily generated in a manner that any human can solve it fairly quickly but a machine vision system custom trained to solve this type of problem, based on at least thousands of different examples, can't. This is much harder than it sounds.

I agree that the difficulty of generating a lot of these is the main disadvantage, as you would probably have to just take a huge number of real pictures like this which would be very time consuming.  It is not clear to me that Dall-E or other AI image generators could produce such pictures with enough realism and detail that it would be possible for human users to determine how much money is supposed to be in the fake image (and have many humans all converge to the same answer).  You also might get weird things using Dall-E for this, like 2 corners of the same bill having different numbers indicating the bill's denomination.

But I maintain that, once a large set of such images exists, training a custom machine vision system to solve these would be very difficult.  It would require much more work than simply fine tuning an off-the-shelf vision system to answer the binary question of "Does this image contain a bus?".

Suppose that, say, a few hundred people worked for several months to create 1,000,000 of these in total and then started deploying them.  If you are a malicious AI developer trying to crack this, the mere tasks of compiling a properly labeled data set (or multiple data sets) and deciding how many sub-models to train and how they should cooperate (if you use more than one) are already non-trivial problems that you have to solve just to get started.  So I think it would take more than a few days.

If only 90% can solve the captcha within one minute, it does not follow that the other 10% are completely unable to solve it and faced with "yet another barrier to living in our modern society".

It could be that the other 10% just need a longer time period to solve it (which might still be relatively trivial, like needing 2 or 3 minutes) or they may need multiple tries.

If we are talking about someone at the extreme low end of the captcha proficiency distribution, such that the person can not even solve in a half hour something that 90% of the population can answer in 60 seconds, then I would expect that person to already need assistance with setting up an email account/completing government forms online/etc, so whoever is helping them with that would also help with the captcha.

(I am also assuming that this post is only for vision-based captchas, and blind people would still take a hearing-based alternative.)

Load More