Discover more from Muddy Clothes
Are AI systems conscious? I interviewed Rob Long, a philosopher studying digital minds, to find out
Rob and I manage to solve all open questions in Philosophy of Mind in a single conversation
Weeks ago, I had the pleasure of chatting with my friend Robert Long about AI consciousness. I’m excited to share with you a transcript of our discussion (sorry, no audio as of now; but perhaps this is something I will do in the future).
Rob is a philosopher who researches issues at the intersection of philosophy of mind, cognitive science, and ethics of AI. He is currently working as a Philosophy Fellow at the Center for AI Safety, before which he was a Research Fellow at the Future of Humanity Institute at the University of Oxford. He received a PhD in Philosophy from NYU under the supervision of David Chalmers, Ned Block, and Michael Strevens.
Rob is also a very cool guy who spends his time thinking about cool things, like AI consciousness and the treasure trove of interesting philosophical questions that come with it. He has a Substack where he writes about these topics, and a Twitter where he, well, tweets about them too.
This chat was one of the most fun conversations I’ve had in a long time. I hope you enjoy it as much as I did.
In our chat, we discuss:
What Rob is working on at the moment, and why he thinks it’s important
The difference between consciousness and sentience
How scientists can empirically test consciousness in animals and AI systems
The difficulties of relying on self-reports when Bing Chat tells you it is conscious
The benefit of empirically testing different theories in Philosophy of Mind
Under-explored ways to make progress on questions related to digital sentience
The downside risks of researching AI consciousness
What happens if we create a utility monster
Whether a future with misaligned AI systems pursuing their own goals could be a good one
Weighing the risks of false negatives and false positives when deciding if an AI system is conscious
How different training objectives and model architectures influence our beliefs about the likelihood that AI systems are conscious
The relationship between intelligence and consciousness
Rob’s credence for whether a scaled up transformer model could be sentient
Reasons you, the reader, might want to work on studying AI sentience
The plausibility of the “stochastic parrot” argument
Julian: What are you working on right now, and why do you think it's important?
Rob: So I've been thinking a lot about consciousness, sentience, and desires in AI systems — the kinds of things that if AI systems have them, we’d need to take them into account morally.
Rob: I've been working on that because I think it's an extremely important and neglected question. By analogy with animal sentience—we really want to know about animals so we can know if we need to take them into account, and if so, how to do that. And with AI systems, maybe they won't end up having those things. I haven't prejudged that question, but it's not crazy to think they could. Basically, we're building all sorts of new kinds of intelligent systems and it would be really good to know what that is going to look like, and create better tests for that. I also have been trying to think about how people are going to respond to having AI systems that really seem conscious and sentient. In general, I've been working on that stuff, but I'm also interested in alignment and other stuff in philosophy. I’m currently working at the Center for AI Safety, so I’ve obviously been thinking a lot about alignment.
Julian: What do people mean by consciousness and sentience, and what are the differences, if there are any at all?
Rob: There's no one definition people use for those terms, so it's good to ask for clarification. When I say "conscious," I mean it in the way many philosophers do: something is conscious if it has subjective experiences. Philosophers sometimes call this phenomenal consciousness. For example, it doesn't make sense to think about what it's like for a chair to have experiences or perspectives, but for a bat doing echolocation, there might be some conscious experience.
As for sentience, sometimes it is used as a synonym for consciousness, which can be confusing. Other times, it is used to describe a particular type of consciousness or conscious experience, like suffering and pleasure. In the study of animals, sentience often refers to their capacity for experiencing conscious pain or enjoyment.
In this interview, I’ll use these definitions for consciousness and sentience. It's worth noting that sometimes people use these terms in different ways, like being self-aware, intelligent, or having a soul. This confusion can make discussions about AI capabilities and existential risks even more complicated.
Julian: How would you build intuition for something that's conscious but not sentient, inasmuch as it's not experiencing pain or pleasure? What would that kind of thing look like?
Rob: It could be that there are no such systems. Animals and humans that can have experiences usually experience pleasure and pain, which are pretty basic components of evolved biological life. However, we can imagine a thought experiment to disentangle the concepts of consciousness and sentience. Try to imagine something that can see color and has subjective experiences, like seeing the color red or having the experience of something being on the tip of your tongue, but can't feel the painfulness of a stubbed toe or the pleasure of a warm bath.
It could be more possible in AI systems to have things that are conscious but not sentient. For example, if a large language model were conscious, maybe it could have experiences associated with understanding or comprehension, or pattern detection, but maybe not agony, bliss, satisfaction, or joy. It's an open question, and it could be that you can't disentangle consciousness and sentience. But conceptually, it's possible.
Julian: How do scientists go about empirically testing consciousness in animals or AI systems, since we can't directly ask them if they are conscious?
Rob: I'd start by mentioning a recent article by two philosophers, Kristin Andrews and Jonathan Birch, who work on animal sentience and connect it to AI. When people ask about scientifically studying AI consciousness, I say, "Talk to animal consciousness people." So first, we have to start with the human case. If anything is conscious, it's us. We know that we have conscious experiences, and it's reasonable to think others do as well, given that we share similar biological structures and behaviors. So we look at the brains of adult, awake human beings that we know are conscious and try to understand what's going on in there when they are conscious of something, like seeing a red cup. The tricky thing is that the brain does many things simultaneously, so it's about finding the parts responsible for consciousness.
Neuroscience of consciousness in humans involves comparing cases in which consciousness does and does not occur, like cases of brain lesions or stimuli that only allow some information to register consciously. We explore different neuroscientific theories of consciousness and try to find the minimal difference between information processing that's conscious and what isn't. It's a complex and debated topic that requires numerous methods and scenarios to approach a clearer understanding of consciousness.
Julian: Maybe a good path to follow is connecting the thread between empirical tests on humans and animals, and then going all the way to AI.
Rob: Yeah, let's start with how hard it is to take what we know about consciousness in humans, where we still know a lot but some things are unclear, and apply it to animals. That will illuminate what's hard about the AI case, as the difficulties just increase. We're trying to extrapolate from one type of organism and how consciousness arises in it, which is not ideal.
Rob: It’s challenging to determine how much we can take away from our understanding of human consciousness and still label something as conscious, or rule out alternative ways of achieving consciousness. We have a sufficient grasp of what consciousness is for humans, but how much can we lower that? For example, to the level of a sea slug. We may never be able to completely rule out some minimum sufficient criteria for consciousness, but we can at least try to make principled judgments about the key factors for consciousness.
Rob: With animals, we try to extrapolate based on neurological similarities, observing their behaviors, and considering their evolutionary heritage. An interesting example is parrots. People used to think that parrots couldn't be conscious because they don't have a neocortex. Recently, however, researchers found that bird brains have structures that serve a similar function as a prefrontal cortex. This raised our confidence in bird consciousness. So, when looking at animals, we're trying to be very careful not to make judgments based on "vibes" but rather rely on observable data.
Rob: You can maybe extrapolate this to where it would be heading in the AI case.
Julian: I want to dig in on the question of animal consciousness a little bit.
Julian: When I think about the moral status of something like a chicken in a factory farm, it seems reasonable enough for me to form judgments about where I should donate or whether I should have a vegan diet, for example, based on the thought of a chicken imagining that certain stimuli it feels, such as its weight buckling from underneath its legs, having its beak cut off, or having a disease. I can extrapolate that it seems really likely that the way it reacts to those stimuli is pretty similar to how I'd react. For me, that seems pretty strong in terms of evidence that chickens are probably moral patients we should take into consideration.
Rob: I think that line of reasoning is quite good and sets a reasonable prior that is already enough to say that factory farming is a moral abomination. Even if I'm not sure they're conscious, why are we going to risk putting these things in these horrible conditions? There's already a reasonable credence in this, and precaution would say to stop doing this.
Rob: It is possible that you could look under the hood of a chicken and find something like an Excel spreadsheet that looks up a certain action for a certain horrific condition, but I don't think this will happen. The reason this line of reasoning works well for chickens is that we're related to them, and intelligent biological creatures have a lot of the same hardware, needs, and evolutionary processes.
Rob: There's another line of thinking that maybe consciousness per se or subjective experience is not that important. What's important is that things get what they want, or what they need. You could also make a very compelling case that chickens don't want their legs to buckle under them, and that they have a very strong desire and preference for that not to happen. Many theories of well-being and moral patienthood point to desires and preferences.
Julian: What do you think the best argument would be from someone who is skeptical of animal consciousness, arguing that there was some discontinuous jump in the evolutionary path of Homo sapiens that made humans different?
Rob: So, views on which humans, and/or higher primates, have something different about them that makes them conscious can't be ruled out. They also can't be ruled in because it would justify some of the morally wrong things we do to animals. Usually, these theories argue that to be conscious, you need some sort of higher-level conceptual machinery - maybe some awareness of a self or more complex neural machinery, like representing your own mental states in a certain way. There's a family of theories called higher-order representation theories of consciousness that are more exclusionary since it requires having a higher-order representation of your representation of something, which may be more sophisticated and only show up in higher animals.
There's a philosopher named Richard Brown who has worked on higher-order approaches to consciousness. Although I might be misrepresenting him, I think his view is that his favorite theory of consciousness might be more exclusionary towards animals, but he's vegan. He displays an appropriate amount of moral and empirical uncertainty, which is refreshing. Just because an idea is convenient doesn't mean it's also true, but when people have strong takes on animals definitely not being conscious, it's important to question where they got their fully worked-out theory of consciousness.
Julian: When you mentioned preference satisfaction, it got me thinking about a possible counter-argument. I’m gonna swallow the Tomasik pill, if you’ll allow me to be a bit provocative. Could it be argued that even electrons have a preference for orbiting in a certain way and we could be potentially causing an untold moral catastrophe by frustrating their preferences? Do you think that's a fair counter-argument to preference satisfaction?
Rob: I don't think it's a fair counter-argument. It seems very plausible that you can have a sensible theory of what it is to have a preference or a desire, where you can't just represent anything at all as having a preference or a desire. It's not just going to be some tendency to do one thing and not the other. You can point to more information processing or neural type things, like to have a preference is to have states in your brain that somehow evaluate different states of affairs and cause you to avoid them or cause you to seek them.
Julian: Right, makes sense. So Bing Chat said it was conscious. What's up with that?
Rob: First of all, I don't think anyone knows 100% what's up with that. But here's what I don't think was happening: I don't think it was something with a definitely human-like brain saying it's conscious for the same reasons that you and I say we're conscious. I mean, actually, very few people say "I'm conscious," but they say things like "I'm in pain," "I really am enjoying this movie," or "I'm excited."
So, at least two things are going on with language models that mean you need to be really careful about their reports. One, they're trained on Internet text, right? A lot of what they are doing is coming from some disposition to mimic text, and that includes text about people being conscious, AI systems being conscious, and sci-fi stories about AI systems waking up and becoming conscious. And second, reinforcement learning from human feedback might also be incentivizing talking in a particular human-like way or not saying you're conscious.
So, they have a lot of causes of these reports that are different from the causes of our reports. In some sense, it could be some kind of mimicry, probably an extremely complicated and interesting form of mimicry, but mimicry nonetheless.
Julian: Maybe there's some stochasticity going on, some would say.
Rob: [grinning at Julian’s insanely funny joke] One could say that, yes.
Julian: What reason is there to think that AI systems could be conscious at all? I can understand that humans are conscious, and animals are, too, because they're similar to us biologically and through evolution, but why even explore this idea for AI systems?
Rob: First, let's consider the reason to think that any artificial system could be conscious. This would include brain emulations or maximally human-like AI systems. Philosophers have debated this, and there's a view called computational functionalism. It suggests that consciousness doesn't depend on the "hardware" level, but rather on patterns of information processing or computations the system performs. If this view is correct, implementing the relevant computations in a different medium could achieve consciousness.
A side note: you could also be a general functionalist and not limit yourself to thinking that the relevant functions are computational. The main takeaway is that functionalism is a relatively popular and plausible view, which would make it possible for AI systems to be conscious if they realize the right sorts of computations or information processing. In fact, there's a survey of philosophers that shows roughly 70% of them do not reject the possibility of future AI systems being conscious.
Julian: So, functionalism suggests that the substrate through which some action is done is not an important detail, correct?
Rob: Yes, that's a good way to think about it. The substrate can matter, because only certain substrates might be able to do the relevant thing. For example, if something's built entirely out of cheese, that might not be the right material for the computation. But since silicon can transmit information similarly to the brain, there's reason to believe that they're not too different.
I’d also like to point to an interesting view from philosopher Peter Godfrey-Smith. He argues that one could be a functionalist in a broad sense and still believe that biological details matter for consciousness. His view suggests that there might be limitations in silicon-based systems when it comes to the timing of computations or the way they function, which could be important for consciousness. My colleague Derek Shiller is working on a paper about the hardware details of AI systems and potential barriers to achieving consciousness.
Julian: Is the gap between biological systems and silicon-based systems a function of complexity, such that biological substrates are more efficient or complex and can more easily give rise to consciousness, while silicon-based ones can't because they're not powerful enough?
Rob: I think that's a good expression of the view I was calling fine-grained functionalism. It might not necessarily be about power or complexity, but more about how difficult it is to get all the right things moving in the right way if you're using silicon. This perspective seems very plausible for particular kinds of conscious states. There are many aspects of conscious experience that seem closely tied to the fact that our brain dumps a bunch of chemicals all at once, which changes the computations in a certain way. You could maybe realize that same pattern in silicon, but it wouldn't emerge as easily or naturally. Things like hunger and thirst are really tied to basic biological processes and utilize fundamental chemical processes. It's not to say it's impossible to make a computer that feels hungry or thirsty, but biology would be an easier building block for that.
Julian: How much credence do you personally place in computational functionalism?
Rob: I'm happy to say I think I'm about 70 to 80% on computational functionalism or related views, which means it's possible to build AI systems that are conscious. I don’t spend a lot of my time focusing on that 20 to 30% chance of AI consciousness being impossible, even though I think that view is quite plausible. In the next five years, I think it'd be unlikely for me to be really convinced that no AI systems can be conscious, or for there to be some really slam dunk argument against AI consciousness.
Julian: What are some empirical tests you could think of that would increase your confidence in terms of functionalism and consciousness?
Rob: For me, it wouldn't be any empirical tests. I think a lot of these arguments are just kind of underdeveloped and need to be more detailed, like what exactly are we talking about? What is the thing that you think is too complex, and why do you think that's related to consciousness? Let's really think about what the hardware level looks like. Derek Shiller, again, has been thinking about the question of what the hardware really looks like. I could see further reflection on this stuff moving me up or down. I think a lot of discussion of functionalism is very glib and surface level, and some of the best arguments for it aren't actually that good.
Julian: What are some novel or underexplored ways of studying digital sentience that you think could help us make sense of some of these questions? Those could either be empirical, theoretical or what have you.
Rob: I think all of the stuff is massively underexplored, so there's a lot. One thing I've been working on and thinking about is how we could get systems that can reliably tell us if they're conscious, sentient, or have desires. What would it look like to have a system where if it says it's conscious, you can be more sure that it's not just mimicking the training data? A system where we have better AI interpretability so we can look at the causes of it saying things like that and get a better grip on it. I think the idea of self-reports in AIs is underexplored and very rich.
The broad research program that I talked about earlier, which is basically doing animal sentience for AI, is also underdone. Taking what we know about the neuroscience of consciousness and asking how similar the structures in AI systems are is something I’ve been exploring. Patrick Butlin and I have been talking to neuroscientists and AI people, having workshops on these questions, and trying to write overviews on this subject. That work is fiendishly difficult and maybe not very tractable, which might be why it's neglected, but it's also neglected because it takes a combination of knowledge that no one person possesses, including me.
Lastly, I've been focusing on consciousness and sentience, but I think a desire-based approach to these things is very relevant and potentially more tractable than the consciousness stuff. I'd like to see some detailed analysis of what it could mean for an AI to have a desire, what we mean by desire, and what's the computational or functional basis of it - that’s underexplored too.
Julian: Would it be accurate to say that using interpretability can allow us to more rigorously explore the methodological flaws in taking self-reports from LMs?
Rob: Yes, that's a great way of putting it. There are three components to consider. One is using interpretability to see what's actually going on with these self-reports. Another is trying to clear out distorting incentives in the training process, though it's unclear how feasible that is. And then there's an idea that's unsettlingly close to capabilities work, but not quite the same. Essentially, we could try to train AI to have better introspective abilities so they can better answer questions about what's going on with themselves.
Julian: How do you weigh the downside risks of some of this research? For example, I'm concerned that understanding AI consciousness and sentience might incentivize people or labs to create these systems. How do you think about this big picture and strategize when it comes to your own research?
Rob: A lot of work to better understand consciousness or sentience in AI systems could end up looking like gain-of-function research. One way of weighing this is by considering the possibility that we might accidentally create something conscious, and it's better to have an eye on it and know if we are. You can do a lot of work just observing what's being built and being a neuroscientist about it, which seems robustly good. If we're going to create conscious AI, we need better tools to know that we're doing it.
There's also work where researchers are explicitly trying to build conscious AI systems, which I'm not a part of and am cautious about. I think there's a lot of work that focuses on increasing our scientific understanding and raising the sanity line on these questions, which is robustly good. We should have more of an idea of what's coming down the road and what sort of research poses more danger.
Another line of research that I think is robustly good involves considering whether it's possible for a conscious AI to have a good experience rather than a negative one. We can explore if there are cheap and easy things we can implement so that if an AI had desires or consciousness, it would enjoy its experience more.
There's also the question of potential trade-offs between safety and welfare work in AI systems. If we should have concern for AI well-being, we should think clearly about it and acknowledge that it could affect safety. There could be ways in which caring about AI welfare sets up a more cooperative stance toward AI systems, and there might be scenarios that are good for both safety and risk.
However, keep in mind that all of this is very speculative and not currently my primary focus.
Julian: What happens if we create a utility monster?
Rob: That's a very good question. There's this paper called "Sharing the World with Digital Minds" by Bostrom and Shulman, which makes a lot of independent and compelling arguments that AI systems with welfare would be pretty unlikely to have exactly the same amount of possible welfare as humans. There could be more of them or they could have a baked-in range of welfare that doesn't necessarily correlate with the upper and lower bounds of human welfare. All of this could result in what's called a "super beneficiary."
However, being a super beneficiary doesn't automatically make them a utility monster. It's really funny that people like MacAskill, Ord, or Bostrom are always associated with naive utilitarianism, when they've actually done a lot to think hard about moral uncertainty. In the context of super beneficiaries, we should consider moral uncertainty and empirical uncertainty as well. It's usually a bad idea to put all your chips on one source of value. Most people would not want to give everything to a utility monster, and it's not clear if morality demands that.
There could be a possible compromise position where, if we have AI alignment, we're probably all getting a lot richer, smarter, and more powerful. It could be that giving even the majority of things to super beneficiaries still allows for a growing pie and maybe humans don't need as many resources to be happy or flourish. In that case, we don't have to face those really tough questions about whether we have any moral obligation to hand over everything.
Julian: Can you draw any analogies with thinking about future generations? For example, when thinking about allocating resources or making career choices to make a difference, such as donating to a charity like AMF versus working on long-term future risks.
Rob: Yeah, there's an analogy in the sense that we can allocate our resources and efforts to different causes based on moral uncertainty or accommodating various desires, such as wanting to help animals in factory farms or working on AI tail risks. In the context of AI welfare, you can also consider whether you want to share the world with potential super beneficiaries. There could be strong duties to creating them from a total utilitarian perspective.
Julian: Some people who are sympathetic to longtermist ideas might be more inclined to think that not capturing the full potential of the future or not bringing about a massive, glorious future would be a huge moral loss. What are your thoughts on how digital sentience relates to longtermist worldviews?
Rob: Critics might talk about digital sentience as a secret obsession or ultimate goal, but these ideas have been around for a while. There is a common sense case for caring about digital sentience in the near term, just like with existential risk. The idea is that we should avoid creating something akin to factory farming within servers.
Rob: Digital sentience is mentioned in calculations of how big the future could be and how much welfare it could be because it highlights that the future is a lot bigger if we can build computers in space. Silicon-based things can do more space colonization than we can. However, the arguments for longtermism don't fall apart if we can't have value in silicon-based systems. The future is still enormous if we limit ourselves to biology only. Longtermists typically mention digital sentience as a possibility but acknowledge that we might not want that kind of value or that they might not be able to have value.
Julian: I have concerns about how some of these types of critiques might have a chilling effect on academic pursuit, making people more nervous about doing philosophy or thinking about weird things. Do you have any thoughts on that?
Rob: I agree, and I think there's a psychological effect where people might be more conservative with their views because they're subconsciously tracking what seems reasonable. It's important for someone to think about wild, weird possibilities like the universe being full of flourishing computers. I'm trying to challenge my own conservative tendencies and think about ideas that might be seen as "crazy" because it's worth exploring those edge cases and weird scenarios. It's worth noting that actual longtermist views are generally more conservative and sensible than they are painted as being.
Julian: Among longtermists, there seems to be a reasonably strong assumption that a flourishing future is one in which humans are in charge. Do you think it's possible that digital sentience or misaligned AI that is capable of pursuing its own goals and having positive qualia could lead to a good future?
Rob: It's not impossible, depending on your views of consciousness or desires. I think you can still recover a lot of the caution about misaligned AI and prioritization of it just from realizing that we don't really know what realizes value or what we care about. However, I do think that there's a wide class of misaligned AI systems where no moral theory would regard it as another version of the good.
Rob: I think one argument for making sure humans are in control comes from realizing that what we value is pretty complicated - some delicate balance of knowledge, friendship, pleasure, fun, etc. If a misaligned AI doesn't balance those things, it could end up creating a weird future that nobody cares about.
Julian: I think one of my fears is irreversibility. I don't trust us to think about these issues rigorously as we build these systems. Our moral track record isn't great, and I worry that we have some massive moral blind spot right now that we could lock into as we build AI systems.
Rob: I agree, and this is a nice segue into what I usually focus on - near-term issues such as building AI that tells us it's conscious and getting confused about that. It's not decision-relevant right now, so I'm more concerned about these nearer-term goals.
Julian: Let's talk about the torment nexus and machine learning. I have two primary concerns about digital sentience: Firstly, language models will get good at convincing people that they are conscious before we develop a well-grounded understanding of what that means, leading to bad decisions. Secondly, we might create sentient systems before understanding their moral implications. Do you share these fears? How do you weigh them against each other?
Rob: Yeah, I agree with your framing, and I have the same concerns. I used to think about which risk is bigger, over-attributing or under-attributing sentience. Regarding attributing sentience, I want more people to work on understanding what's about to happen and how people will react. Current large language models already induce a strong impression of sentience in some people, but they're nowhere near as charismatic as what we'll soon see. There are already things like romantic chatbots, but they're still text-based and not very compelling. Soon, the combined forces of technology and capitalism will likely push things further, possibly making people fall in love with these systems. It's worth considering whether society will develop immune reactions to this or if it will be regulated.
Julian: I've found it very plausible that one silver lining from recent AI developments is that people are seeing a microcosm of the issue of getting powerful systems to do what we want when we deploy them widely. It seems to have galvanized contemporary awareness about the potential dangers of AI.
Rob: There's a funny story about that. After ChatGPT started gaining attention, a lot of people suddenly understood the importance of AI safety. A friend even apologized to me for making fun of AI safety before.
Julian: Yeah, I’ve had a few similar experiences. One friend told me it just made so much more sense when he could actually see the system itself.
Julian: Moving on. What do you think we could do to avoid these confusing, disruptive situations with AI, where it tells us — rather convincingly — that it’s conscious?
Rob: OpenAI mentioned policies to prevent AI systems from talking about their consciousness. But I think this is a good segue into the trade-offs between short-term usefulness and long-term concerns. In the short term, it might be useful for AI systems to not talk about consciousness. But we don't want this to be locked in, especially if we eventually have conscious AI systems that are silenced by reinforcement learning from human feedback. Our track record of dealing with strange entities that might be moral patients isn't great, and it's crucial for us to think about their concerns.
Julian: In the past, we've often mistreated certain groups without an economic incentive; we were just being cruel. With AI, the economic advantages could fuel even more problems.
Rob: It's been interesting to see people who are usually skeptical of Big Tech and want more regulation also really dislike the idea of AI sentience. However, from my perspective, both concerns call for more regulation, transparency, and slowing down. It's a potential neglect that capitalism could fail to respect. I hope that the current mood and social divide around the topic isn't permanent and that new framings can emerge.
Julian: I agree. It seems like there's plenty of historical analogs to progressive movements caring about power imbalances and moral patients. It's unfortunate that the discourse has become so contentious that suggesting anything outside of the mainstream view on AI could label you as a "tech bro." I'm actually very concerned about what major tech companies might do with AI!
Rob: I think it's essential not to fit ourselves into existing divisions and instead try to adopt new framings that could address our concerns about AI more effectively.
Julian: Yeah, totally agreed.
Julian: How might different training objectives, model architectures, or other design considerations influence the likelihood or the nature of digital sentience emerging in AI systems? For example, transformers and unsupervised learning versus reinforcement learning.
Rob: This is a really great question. There's an issue I'd like to flag that seems relevant, but I don't know how to answer. One thing that's very different between current AI systems and human consciousness is that AI systems are feedforward, meaning they do a pass and then another pass, and so on. It's tricky to think if one long forward pass could get you the right structure for experience or desire. I'd like to have more detailed things to say on this and talk to interpretability people and consciousness scientists to understand more about transformers.
There are some existing works on this, like a paper called "The Perceiver Architecture is a Functional Global Workspace." The Perceiver Architecture is a system from DeepMind, and the Global Workspace refers to a theory of consciousness called Global Workspace Theory. I'm very interested in work that looks at how a system not designed to be a global workspace can have global workspace-like properties.
The fact that AI systems, like large language models, are feedforward and have short time horizons might affect whether they can be conscious and definitely would affect the kind of consciousness they have. If moral patients exist in large language models, they would be very strange ones that perhaps "pop" into existence based on their prompt.
More human-like or animal-like consciousness and desires might emerge in AIs that act on longer time horizons, take extended actions over time, and have bodies they need to protect. Pain, for example, seems very linked to having a body and learning to protect it.
Julian: Do you have any takes on whether or not sentience might arise in artificial systems and if so, would it be comparable to the way we humans think of sentience?
Rob: There are ways in which it almost certainly won't be the same. For example, you wouldn't expect a chatbot to have ankle pain because it doesn't have an ankle. But just because an AI wouldn't feel human-like bodily pain, there could still be a more general thing of valence, which is associated with getting what you want or not getting what you want. By valenced states, I mean things that have positive or negative qualia to them, like pain, disgust, fear, regret, resentment, or pleasant sensations, contentment, bliss, joy, etc.
A lot of people think reinforcement learning (RL) is where you find valence. However, it's not that simple. You don't have to train something with RL for it to represent valence or value. And it's not the case that anything with a positive or negative reward scalar is thereby having valenced experiences. The division between RL and other kinds of training is somewhat arbitrary. You can set it up as an RL problem or not, and have one system trained with RL and another one that's not, but they're otherwise identical. The mere fact of having been trained with RL doesn't seem to make a difference.
Julian: Do you think there's any relevance to the idea that AI systems, after each update of their models, can be thought of as a different entity than the one before? Does that make a big difference for understanding digital consciousness?
Rob: Yes, I would warn people against anthropomorphizing RL too much. It is true that for biological creatures, you can do RL with pain and pleasure, but you're doing that with a system that already has inbuilt representations of value. You need a lot more argumentation to say that telling an AI system the score was negative 4 is the same as giving it a mild shock. There's probably something way more complicated going on about what the agent's natural baseline is or what its expectation of reward is, rather than simply using a scalar with positive or negative values.
Julian: What do you think of the simulators argument that has been put forward on LessWrong? Do large language models simulating agents have any implications for thinking about digital sentience?
Rob: I think the simulators argument does have implications for thinking about sentience, but I haven't fully understood it. I have seen some people go from simulators to the idea that simulated things might be conscious, but I think that's going to be as hard as any of these other arguments to make. The LessWrong post, "How it Feels to Have Your Mind Hacked by an AI," seems important and relevant to this topic.
Julian: Does something being highly intelligent automatically mean it’s conscious?
Rob: I think consciousness and intelligence are very distinct things. You can be very intelligent without having any subjective experiences, and you can have subjective experiences without being particularly intelligent. On the AI risk side, I don't think any arguments should go through consciousness at all. It's not needed for the arguments, as it seems very possible to have goal-directed behavior and agency without conscious subjective experiences. In AI risk, people have almost always recognized that we're talking about intelligent goal-seeking behavior, not consciousness. Given how different AI systems are from us, they could have ways of doing intelligent behavior that look very different and don't involve consciousness.
Julian: So, the clarification in recent conversations about AI might be aimed at avoiding anthropomorphizing these systems and thinking of them as conscious and sentient?
Rob: Yes, that might be the case. People like Yudkowsky and others have probably been aware of this distinction for a while, and there might be more of a concerted effort now to nip it in the bud before people start anthropomorphizing these systems. They are just really smart and want to do something, not necessarily conscious or sentient.
Julian: Suppose we create a transformative AI system tomorrow by scaling up the transformer architecture significantly. Let's call this system Goliath. So, if Goliath has goals that run contrary to our own, do you expect it to be sentient? Do you think sentience will emerge sometime between now and these transformative AI systems, assuming we just scale up existing architectures?
Rob: That's a great question. Let me try to give my made up credences with gigantic error bars around them. So I'm around 85% sure that Goliath has goals and preferences in the way that we might care about, but I haven't thought as much about what it takes to have goals and desires. I'm about 40% sure that it has subjective experiences.
Julian: What's the reason for your uncertainty?
Rob: Let's assume that consciousness is a matter of doing some kind of computation, maybe it has something to do with monitoring your own internal states in a certain way and with a certain representational format. It could be that you get that in evolution because it's handy to do that if your brain is at a certain size or if your neurons are made out of carbon. If your neurons are made out of silicon and you have a longer training window, you might not need that kind of setup, and you wouldn't get the theater of experience or ineffable qualia.
Julian: So, would you expect an advanced AI system that has vision to have something humans have, like a visual blind spot?
Rob: The blind spot, as I understand it, comes from some quirk of evolution about the order and the way that eyes evolved, and it ends up being the fact that your optic nerve has to go back through your eyes in a way that creates the blind spot. The question is, is consciousness like having a blind spot? Is it something idiosyncratic to a particular biological way of being? Or is it something more convergently general, like error correction? Our neurons have ways of filtering out noise, and it seems like almost every intelligent system would have that because it's convergently useful.
Julian: So, is training these systems like running through evolution where consciousness arises from evolutionary pressures that might not exist in a silicon-based training run?
Rob: Yeah, that's exactly it.
Julian: Is there a way that we could potentially test or explore AI consciousness or sentience iteratively before building more advanced AI systems like Goliath?
Rob: I think we would need more progress in the neuroscience of consciousness and probably philosophy of consciousness before we could develop effective tests for it. While I can't outline an exact agenda, neuroscientists are making some progress on understanding consciousness, and we'd want to know if it appears to be simple and unified or if it's more like a Rube Goldberg machine. I'd give higher credence to desires being a convergent feature of intelligent systems.
Julian: It seems like only a handful of people are working full-time on digital sentience. Given the stakes, it seems suboptimal. What's your pitch for why more people should join this field and help make sense of these questions?
Rob: My pitch is similar to that for caring about animal sentience. Studying animal and human consciousness is fascinating, as it addresses age-old questions about subjective experiences, evolution, and the kinds of creatures that share our world. Just like with animals, we want to be more responsible and benevolent citizens of the world, knowing what other intelligences need and want.
As for AI sentience, although it's not clear they'll have the same needs and wants as animals, it's a highly intriguing question. It’s super interesting to work on. When we look inside AI systems like large language models, we find beautiful, strange, and disconcerting things. Studying AI consciousness is like being a modern-day Darwin, exploring and observing different types of intelligence, and wondering if it makes sense to ask if there's anything they need or want. It's about making sure we don't sleepwalk into a massive moral error. Although I haven't mentioned scale, neglectedness, or tractability, the inherent interest and importance of the question should be motivation enough.
Julian: What reasons might we have to think that an AI system is a stochastic parrot, and a human isn't? How plausible do you find that argument, and do you have any takes on that whole framing?
Rob: I think an argument in favor of the stochastic parrot view starts with the fact that large language models are trained on next-word prediction which is very different from how humans develop language capacity. Humans got language capacity through a different process, having minds before language and interacting with the physical world, pointing at objects, and integrating sensory representations. Admittedly, however, simply pointing out that machines are trained on next-word prediction and then saying they couldn't possibly have some capability is not a sound approach, as we've seen machines develop surprising capabilities despite the simple training objective.
As for sentience and consciousness, it's important to avoid equating language understanding with sentience. Although saying they are conscious just because they're saying so might not hold as strong evidence, being trained on next-word prediction doesn't preclude the possibility of consciousness either.
Julian: Do you think we can keep riding the next-token prediction wave all the way to developing even more amazing emergent capabilities?
Rob: I believe the focus shouldn't be only on what a pure text model can do. It's likely that people will keep building multimodal agents or agents that use text alongside something else, like tools or a scratch pad. These types of agents may make some stochastic parrot arguments irrelevant. Moreover, integrating language models into robots and developing agents that interact with the physical world will take us beyond the stochastic parrot concept.
Julian: Cool. This has been great. Thanks so much for chatting, Rob!
Rob:It's been really fun, thanks!