derek shiller

Wiki Contributions


Thanks for your comments!

My sense you are writing this as someone without lots of experience in writing and publishing scientific articles (correct me if I am wrong).

You're correct in that I haven't published any scientific articles -- my publication experience is entirely in academic philosophy and my suggestions are based on my frustrations there. This may be a much more reasonable proposal for academic philosophy than other disciplines, since philosophy deals more with conceptually nebulous issues and has fewer objective standards.

linearly presenting ideas on paper - "writing" - is a form of extended creative cognitive creation that is difficult to replicate

I agree that writing is a useful exercise for thinking. I'm not so sure that it is difficult to replicate, or that the forms of writing for publication are the best ways of thinking. I think getting feedback on your work is also very important, and something that would be easier, faster, working with an avatar. So part of the process of training an avatar might be sketching an argument in a rough written form and then answering a lot of questions about it. That isn't obviously a worse way to think through issues than writing linearly for publication.

My other comment is that most of the advantages can be gained by AI interpretations and re-imagining of a text e.g. you can ask ChatGPT to take a paper and explain it in more detail by expanding points, or make it simpler.

This could probably get a lot of the same advantages. Maybe the ideal is to have people write extremely long papers that LLMs condense for different readers. My thought was that at least as papers are currently written, some important details are generally left out. This means that arguments require some creative interpretation on the part of a serious reader.

The interesting question for me though which is what might be the optimal publication format to allow LLM's to progress science

I've been thinking about these issues in part in connection with how to use LLMs to make progress in philosophy. This seems less clear cut than science, where there are at least processes for verifying which results are correct. You can train AIs to prove mathematical theorems. You might be able to train an AI to design physics experiments and interpret the data from them. Philosophy, in contrast, comes down more to formulating ideas and considerations that people find compelling; it is possible that LLMs could write pretty convincing articles with all manners of conclusions. It is harder to know how to pick out the ones that are correct.

I don’t follow. How is it easier (or more special as an opportunity) to decide how to relate to an AI system than to a chicken or a distant human?

I think that our treatment of animals is a historical problem. If there were no animals, if everyone was accustomed to eating vegetarian meals, and then you introduced chickens into the world, I believe people wouldn't be inclined to stuff them into factory farms and eat their flesh. People do care about animals where they are not complicit in harming them (whaling, dog fighting), but it is hard for most people to leave the moral herd and it is hard to break with tradition. The advantage of thinking about digital minds is that traditions haven't been established yet and the moral herd doesn't know what to think. There is no precedence or complicity in ill treatment. That is why it is easier for us to decide how to relate with them.

Really? Given the amount of change we’ve caused in natural creatures, the amount of effort we spend in controlling/guiding fellow humans, and the difficulty in defining and measuring this aspect of ANY creature, I can’t agree.

In order to make a natural creature happy and healthy, you need to work with its basic evolution-produced physiology and psychology. You've got to feed it, educate it, socialize it, accommodate its arbitrary needs and neurotic tendencies. We would likely be able to design the psychology and physiology of artificial systems to our specifications. That is what I mean by having a lot more potential control.

Turing test is sentient

I'm not sure why we should think that the Turing test provides any evidence regarding consciousness. Dogs can't pass the test, but that is little reason to think that they're not conscious. Large language models might be able to pass the test before long, but it looks like they're doing something very different inside, and so the fact that they are able to hold conversations is little reason to think they're anything like us. There is a danger with being too conservative. Sure, assuming sentience may avoid causing unnecessary harms, but if we mistakenly believe some systems are sentient when they are not, we may waste time or resources for the sake of their (non-existent) welfare.

Your suggestion may simply be that we have nothing better to go on, and we've got to draw the line somewhere. If there is no right place to draw the line, then we might as well pick something. But I think there are better and worse place to draw the line. And I don't think our epistemic situation is quite so bad. We may not ever be completely sure which precise theory is right, but we can get a sense of which theories are contenders by continuing to explore the human brain and develop existing theories, and we can adopt policies that respect the diversity of opinion.

Meanwhile, we can focus less on ethics and more on alignment.

This strikes me as somewhat odd, as alignment and ethics are clearly related. On the one hand, there is the technical question of how to align an AI to specific values. But there is also the important question of which values to align. How we think about digital consciousness may come be extremely important to that.

I'm not particularly worried that we may harm AIs that do not have valenced states, at least in the near term. The issue is more over precedent and expectations going forward. I would worry about a future in which we create and destroy conscious systems willy-nilly because of how it might affect our understanding of our relationship to them, and ultimately to how we act toward AIs that do have morally relevant states. These worries are nebulous, and I very well might be wrong to be so concerned, but it feels risky to rush into things.

We've been struggling with natural consciousnesses, both human and animal, for a long long time, and it's not obvious to me that artificial consciousness can avoid any of that pain.

You're right, but there are a couple of important differences:

  • There is widespread agreement on the status of many animals. People believe most tetrapods are conscious. The terrible stuff we do to them is done in spite of this.
  • We have a special opportunity at the start of our interactions with AI systems to decide how we're going to relate to them. It is better to get things right off the bat then to try to catch up (and shift public opinion) decades later.
  • We have a lot more potential control over artificial systems than we do over natural creatures. It is possible that very simple changes and low-cost changes could make a huge difference to their welfare (or whether they have any.)

Thanks for writing this up! It's great to see all of the major categories after having thought about it for awhile. Given the convergence, does this change your outlook on the problem?

If you try to give feedback during training, there is a risk you'll just reward it for being deceptive. One advantage to selecting post hoc is that you can avoid incentivizing deception.

Interesting proposal!

Is there a reason you have it switch to the human net just once in the middle?

I would worry that the predictor might switch ontologies as time goes on. Perhaps, to make the best use of the human compute time, it reasons in a human ontology up until n/2. Once the threat of translation is past, it might switch to its own ontology from n/2 to n. If so, the encoder that works up to n/2 might be useless thereafter. A natural alternative would be to have it switch back and forth some random number of times at random intervals.