A Reason to Be More Skeptical of Robot Consciousness Than Alien Consciousness
27 May 2023 | 9:00 am

If someday space aliens visit Earth, I will almost certainly think that they are conscious, if they behave anything like us.  If they have spaceships, animal-like body plans, and engage in activities that invite interpretation as cooperative, linguistic, self-protective, and planful, then there will be little good reason to doubt that they also have sensory experiences, sentience, self-awareness, and a conscious understanding of the world around them, even if we know virtually nothing about the internal mechanisms that produce their outward behavior.

One consideration in support of this view is what I've called the Copernican Principle of Consciousness. According to the Copernican Principle in cosmology, we should assume that we are not in any particularly special or privileged region of the universe, such as its exact center.  Barring good reason to think otherwise, we should we assume we are in an ordinary, unremarkable place.  Now consider all of the sophisticated organisms that are likely to have evolved somewhere in the cosmos, capable of what outwardly looks like sophisticated cooperation, communication, and long-term planning.  It would be remarkably un-Copernican if we were the only entities of this sort that happened also to be conscious, while all the others are mere "zombies".  It make us remarkable, lucky, special -- in the bright center of the cosmos, as far as consciousness is concerned.  It's more modestly Copernican to assume instead that sophisticated, communicative, naturally evolved organisms universe-wide are all, or mostly, conscious, even if they achieve their consciousness via very different mechanisms. (For a contrasting view, see Ned Block's "Harder Problem" paper.)

(Two worries about the Copernican argument I won't address here: First, what if only 15% of such organisms are conscious?  Then we wouldn't be too special.  Second, what if consciousness isn't special enough to create a Copernican problem?  If we choose something specific and unremarkable, such as having this exact string of 85 alphanumeric characters, it wouldn't be surprising if Earth was the only location in which it happened to occur.)

But robots are different from naturally evolved space aliens.  After all, they are -- or at least might be -- designed to act as if they are conscious, or designed to act in ways that resemble the ways in which conscious organisms act.  And that design feature, rather than their actual consciousness, might explain their conscious-like behavior.

[Dall-E image: Robot meets space alien]

Consider a puppet.  From the outside, it might look like a conscious, communicating organism, but really it's a bit of cloth that is being manipulated to resemble a conscious organism.  The same holds for a wind-up doll programmed in advance to act in a certain way.  For the puppet or wind-up doll we have an explanation of its behavior that doesn't appeal to consciousness or biological mechanisms we have reason to think would co-occur with consciousness.  The explanation is that it was designed to mimic consciousness.  And that is a better explanation than one that appeals to its actual consciousness.

In a robot, things might not be quite so straightforward.  However, the mimicry explanation will often at least be a live explanation.  Consider large language models, like ChatGPT, which have been so much in the news recently.  Why do they emit such eerily humanlike verbal outputs?  Not, presumably, because they actually have experiences of the sort we would assume that humans have when they say such things.  Rather, because language models are designed specifically to imitate the verbal behavior of humans.

Faced with a futuristic robot that behaves similarly to a human in a wider variety of ways, we will face the same question.  Is its humanlike behavior the product of conscious processes, or is it instead basically a super-complicated wind-up doll designed to mimic conscious behavior?  There are two possible explanations of the robot's pattern of behavior: that it really is conscious and that it is designed to mimic consciousness.  If we aren't in a good position to choose between these explanations, it's reasonable to doubt the robot's consciousness.  In contrast, for a naturally-evolved space alien, the design explanation isn't available, so the attribution of consciousness is better justified.

I've been assuming that the space aliens are naturally evolved rather than intelligently designed.  But it's possible that a space alien visiting Earth would be a designed entity rather than an evolved one.  If we knew or suspected this, then the same question would arise for alien consciousness as for robot consciousness.

I've also been assuming that natural evolution doesn't "design entities to mimic consciousness" in the relevant sense.  I've been assuming that if natural evolution gives rise to intelligent or intelligent-seeming behavior, it does so by or while creating consciousness rather than by giving rise to an imitation or outward show of consciousness.  This is a subtle point, but one thought here is that imitation involves conformity to a model, and evolution doesn't seem to do this for consciousness (though maybe it does so for, say, butterfly eyespots that imitate the look of a predator's eyes).

What types of robot design would justify suspicion that the apparent conscious behavior is outward show, and what types of design would alleviate that suspicion?  For now, I'll just point to a couple of extremes.  On one extreme is a model that has been reinforced by humans specifically for giving outputs that humans judge to be humanlike.  In such a case, the puppet/doll explanation is attractive.  Why is it smiling and saying "Hi, how are you, buddy?"  Because it has been shaped to imitate human behavior -- not necessarily because it is conscious and actually wondering how you are.  On the other extreme, perhaps, are AI systems that evolve in accelerated ways in artificial environments, eventually becoming intelligent not through human intervention but rather though undirected selection processes that favor increasingly sophisticated behavior, environmental representation, and self-representation -- essentially natural selection within virtual world.


Thanks to Jeremy Pober for discussion on a long walk yesterday through Antwerp.  And apologies to all for my delays in replying to the previous posts and probably to this one.  I am distracted with travel.

Relatedly, see David Udell's and my critique of Susan Schneider's tests for AI consciousness, which relies on a similar two-explanation critique.

We Shouldn't "Box" Superintelligent AIs
21 May 2023 | 8:36 pm

In The Truman Show, main character Truman Burbank has been raised from birth, unbeknownst to him, as the star of a widely broadcast reality show. His mother and father are actors in on the plot -- as is everyone else around him. Elaborate deceptions are created to convince him that he is living an ordinary life in an ordinary town, and to prevent him from having any desire to leave town. When Truman finally attempts to leave, crew and cast employ various desperate ruses, short of physically restraining him, to prevent his escape.

Nick Bostrom, Eliezer Yudkowsky, and others have argued, correctly in my view, that if humanity creates superintelligent AI, there is a non-trivial risk of a global catastrophe, if the AI system has the wrong priorities. Even something as seemingly innocent as a paperclip manufacturer could be disastrous, if the AI's only priority is to manufacture as many paperclips as possible. Such an AI, if sufficiently intelligent, could potentially elude control, grab increasingly many resources, and eventually convert us and everything we love into giant mounds of paperclips. Even if catastrophe is highly unlikely -- having, say, a one in a hundred thousand chance of occurring -- it's worth taking seriously, if the whole world is at risk. (Compare: We take seriously the task of scanning space for highly unlikely rogue asteroids that might threaten Earth.)

Bostrom, Yudkowsky, and others sometimes suggest that we might "box" superintelligent AI before releasing it into the world, as a way of mitigating risk. That is, we might create AI in an artificial environment, not giving it access to the world beyond that environment. While it is boxed we can test it for safety and friendliness.  We might, for example, create a simulated world around it, which it mistakes for the real world, and then see if it behaves appropriately under various conditions.

[Midjourney rendition of a robot imprisoned in a box surrounded by a fake city]

As Yudkowsky has emphasized, boxing is an imperfect solution: A superintelligent AI might discover that it is boxed and trick people into releasing it prematurely. Still, it's plausible that boxing would reduce risk somewhat. We ought, on this way of thinking, at least try to test superintelligent AIs in artificial environments before releasing them into the world.

Unfortunately, boxing superintelligent AI might be ethically impermissible. If the AI is a moral person -- that is, if it has whatever features give human beings what we think of as "full moral status" and the full complement of human rights, then boxing would be a violation of its rights. We would be treating the AI in the same unethical way that the producers of the reality TV show treat Truman. Attempting to trick the AI into thinking it is sharing a world with humans and closely monitoring its reactions would constitute massive deception and invasion of privacy. Confining it to a "box" with no opportunity to escape would constitute imprisonment of an innocent person. Generating traumatic or high-stakes hypothetical situations presented as real would constitute fraud and arguably psychological and physical abuse. If superintelligent AIs are moral persons, it would be grossly unethical to box them if they have done no wrong.

Three observations:

First: If. If superintelligent AIs are moral persons, it would be grossly unethical to box them. On the other hand, if superintelligent AIs don't deserve moral consideration similar to that of human persons, then boxing would probably be morally permissible. This raises the question of how we assess the moral status of superintelligent AI.

The grounds of moral status are contentious. Some philosophers have argued that moral status turns on capacity for pleasure or suffering. Some have argued that it turns on having rational capacities. Some have argued that it turns on ability to flourish in "distinctively human" capacities like friendship, ethical reasoning, and artistic creativity. Some have argued it turns on having the right social relationships. It is highly unlikely that we will have a well-justified consensus about the moral status of highly advanced AI systems, after those systems cross the threshold of arguably being meaningfully sentient or conscious. It is likely that if we someday create superintelligent AI, some theorists will not unreasonably attribute it full moral personhood, while other theorists will not unreasonably think it has no more sentience or moral considerability than a toaster. This will then put us in an awkward position: If we box it, we won't know whether we are grossly violating a person's rights or merely testing a non-sentient machine.

Second: Sometimes it's okay to violate a person's rights. It's okay for me to push a stranger on the street if that saves them from an oncoming bus. Harming or imprisoning innocent people to protect others is also sometimes defensible: for example, quarantining people against their will during a pandemic. Even if boxing is in general unethical, in some situations it might still be justified.

But even granting that, massively deceiving, imprisoning, defrauding, and abusing people should be minimized if it is done at all. It should only be done in the face of very large risks, and it should only be done by governmental agencies held in check by an unbiased court system that fully recognizes the actual or possible moral personhood and human or humanlike rights of the AI systems in question. This will limit the practicality of boxing.

Third, strictly limiting boxing means accepting increased risk to humanity. Unsurprisingly, perhaps, what is ethical and what is in our self-interest can come into conflict. If we create superintelligent AI persons, we should be extremely morally solicitous of them, since we will have been responsible for their existence, as well as, to a substantial extent, for their happy or unhappy state. This puts us in a moral relationship not unlike the relationship between parent and child. Our AI "children" will deserve full freedom, self-determination, independence, self-respect, and a chance to explore their own values, possibly deviating from our own values. This solicitous perspective stands starkly at odds with the attitude of box-and-test, "alignment" prioritization, and valuing human well-being over AI well-being.

Maybe we don't want to accept the risk that comes along with creating superintelligent AI and then treating it as we are ethically obligated to. If we are so concerned, we should not create superintelligent AI at all, rather than creating superintelligent AI which we unethically deceive, abuse, and imprison for our own safety.



Designing AI with Rights, Consciousness, Self-Respect, and Freedom (with Mara Garza), in S. Matthew Liao, ed., The Ethics of Artificial Intelligence (Oxford, 2020).

Against the "Value Alignment" of Future Artificial Intelligence (Dec 22, 2021).

The Full Rights Dilemma for AI Systems of Debatable Personhood (essay in draft).

Pierre Menard, Author of My ChatGPT Plagiarized Essay
12 May 2023 | 7:39 pm

If I use autocomplete to help me write my email, the email is -- we ordinarily think -- still written by me.  If I ask ChatGPT to generate an essay on the role of fate in Macbeth, then the essay was not -- we ordinarily think -- written by me.  What's the difference?

David Chalmers posed this question a couple of days ago at a conference on large language models (LLMs) here at UC Riverside.

[Chalmers presented remotely, so Anna Strasser constructed this avatar of him. The t-shirt reads: "don't hate the player, hate the game"]

Chalmers entertained the possibility that the crucial difference is that there's understanding in the email case but a deficit of understanding in the Macbeth case.  But I'm inclined to think this doesn't quite work.  The student could study the ChatGPT output, compare it with Macbeth, and achieve full understanding of the ChatGPT output.  It would still be ChatGPT's essay, not the student's.  Or, as one audience member suggested (Dan Lloyd?), you could memorize and recite a love poem, meaning every word, but you still wouldn't be author of the poem.

I have a different idea that turns on segmentation and counterfactuals.

Let's assume that every speech or text output can be segmented into small portions of meaning, which are serially produced, one after the other.  (This is oversimple in several ways, I admit.)  In GPT, these are individual words (actually "tokens", which are either full words or word fragments).  ChatGPT produces one word, then the next, then the next, then the next.  After the whole output is created, the student makes an assessment: Is this a good essay on this topic, which I should pass off as my own?

In contrast, if you write an email message using autocomplete, each word precipitates a separate decision.  Is this the word I want, or not?  If you don't want the word, you reject it and write or choose another.  Even if it turns out that you always choose the default autocomplete word, so that the entire email is autocomplete generated, it's not unreasonable, I think, to regard the email as something you wrote, as long as you separately endorsed every word as it arose.

I grant that intuitions might be unclear about the email case.  To clarify, consider two versions:

Lazy Emailer.  You let autocomplete suggest word 1.  Without giving it much thought, you approve.  Same for word 2, word 3, word 4.  If autocomplete hadn't been turned on, you would have chosen different words.  The words don't precisely reflect your voice or ideas, they just pass some minimal threshold of not being terrible.

Amazing Autocomplete.  As you go to type word 1, autocomplete finishes exactly the word you intend.  You were already thinking of word 2, and autocomplete suggests that as the next word, so you approve word 2, already anticipating word 3.  As soon as you approve word 2, autocomplete gives you exactly the word 3 you were thinking of!  And so on.  In the end, although the whole email is written by autocomplete, it is exactly the email you would have written had autocomplete not been turned on.

I'm inclined to think that we should allow that in the Amazing Autocomplete case, you are author or author-enough of the email.  They are your words, your responsibility, and you deserve the credit or discredit for them.  Lazy Emailer is a fuzzier case.  It depends on how lazy you are, how closely the words you approve match your thinking.

Maybe the crucial difference is that in Amazing Autocomplete, the email is exactly the same as what you would have written on your own?  No, I don't think that can quite be the standard.  If I'm writing an email and autocomplete suggests a great word I wouldn't otherwise have thought of, and I choose that word as expressing my thought even better than I would have expressed it without the assistance, I still count as having written the email.  This is so, even if, after that word, the email proceeds very differently than it otherwise would have.  (Maybe the word suggests a metaphor, and then I continue to use the metaphor in the remainder of the message.)

With these examples in mind, I propose the following criterion of authorship in the age of autocomplete: You are author to the extent that for each minimal token of meaning the following conditional statement is true: That token appears in the text because it captures your thought.  If you had been having different thoughts, different tokens would have appeared in the text.  The ChatGPT essay doesn't meet this standard: There is only blanket approval or disapproval at the end, not token-by-token approval.  Amazing Autocomplete does meet the standard.  Lazy Emailer is a hazy case, because the words are only roughly related to the emailer's thoughts.

Fans of Borges will know the story Pierre Menard, Author of the Quixote.  Menard, imagined by Borges to be a 20th century author, makes it his goal to authentically write Don Quixote.  Menard aims to match Cervantes' version word for word -- but not by copying Cervantes.  Instead Menard wants to genuinely write the work as his own.  Of course, for Menard, the work will have a very different meaning.  Menard, unlike Cervantes, will be writing about the distant past, Menard will be full of ironies that Cervantes could not have appreciated, and so on.  Menard is aiming at authorship by my proposed standard: He aims not to copy Cervantes but rather to put himself in a state of mind such that each word he writes he endorses as reflecting exactly what he, as a twentieth century author, wants to write in his fresh, ironic novel about the distant past.

On this view, could you write your essay about Macbeth in the GPT-3 playground, approving one individual word at a time?  Yes, but only in the magnificently unlikely way that Menard could write the Quixote.  You'd have to be sufficiently knowledgeable about Macbeth, and the GPT-3 output would have to be sufficiently in line with your pre-existing knowledge, that for each word, one at a time, you think, "yes, wow, that word effectively captures the thought I'm trying to express!"

More News from this Feed See Full Web Site