Why teach virtue to a robot

Tags

Amanda Askell, Anthropic, Ayn Rand, John Basl, Northeastern University, technology, trolley problem, virtue ethics

A few years ago I argued that utilitarian and Kantian ethics, with the trolley problem as their framing question, were suited for programming robots but not for human beings. It turns out I was wrong — not about the human beings, but about the robots.

As of two years ago my day job has been Associate Director of Northeastern University’s Ethics Institute, which has a particular focus on AI and data ethics. My colleague John Basl regularly stresses the need for people in AI ethics to have both technical and philosophical expertise, so we put together programs (like the AIDE Summer institute for graduate students) to help them get it. And what I’m writing about today is a reason that combined expertise matters: something you might get wrong if you didn’t have it. To me it’s obvious why the philosophy expertise matters – engineering won’t tell you what action is morally right to take. But Basl pointed out something I’d got wrong by not having the technical expertise – something that turns out to be very philosophically interesting.

I got a computer-science degree a decade ago, and in thinking about the ethics of robots I had always tended to assume that programming a robot-type entity – an autonomous car, a drone – would involve the kind of if-then branching structure that you learn in a normal programming class. You specify the conditions under which the car should swerve from the path it should otherwise take; you program the car so that if those conditions are met, then it swerves. That’s why the trolley problem, or variants of it, seemed like it should be genuinely helpful for designing an autonomous car, which might very well face a situation where it could slam into one person in order to save multiple others. You would need to decide the specific range of circumstances, if any, under which the car would hit someone in order to prevent a greater problem. Philosophically that point seemed to make sense.

But technically the point turns out to be quite wrong. What Basl pointed out, to me and to others, is that on the technical side, this is not how autonomous cars actually work. Given the enormous variety of situations that an autonomous car can face (just in normal operation, not ones where ethical problems might come into play), traditional branching programming cannot handle nearly enough contingencies to allow the car to make decisions. Rather, the decision-making process has to be handled by the now-ubiquitous technology of generative artificial intelligence. And that works very differently from the programming I learned a decade ago.

There’s no if-then when you program generative AI. You don’t specify a range of circumstances where an autonomous car can or can’t hit someone. Rather, you “program” AI by training it, getting it to “learn” by absorbing a very large set of data to model its responses on. As Microsoft succinctly explains it: “AI models don’t follow hand‑written rules. They learn from examples. Training is essential because it gives the AI model the ability to generalize—meaning it can respond to new questions it has never seen before.” So if an autonomous car were actually to face something like the trolley problem, its “decision” would be based on its generalizing from example situations it had been fed in its data set, and applying that generalization to the new situation it had just encountered.

Training and generalizing from examples is not how traditional computer programs learn. But it is how humans learn! It turns out that when you want to teach an AI to act well in the situations it might typically face, it’s a lot like teaching a human to act well in the situations we might typically face – which means cultivating virtue, or good character. Six years ago I criticized applying the trolley problem to humans by saying: “In addressing concrete ethical decisions we likely learn more from case studies of real choices more directly comparable to ones we might someday make ourselves. Those help us cultivate the disposition, the virtue, to act well.” But it turns out that these dispositions to act well are exactly what AI needs too! It too learns from case studies rather than from deduced rules.

Robot learning turns out to be a lot like human learning. Adobe Stock image copyright by phonlamaiphoto.

The idea that AI needs to act virtuously isn’t just coming from me as an armchair philosopher. It turns out that it’s actively being implemented at Anthropic, the company that makes the popular Claude large language model. Anthropic has a philosophy PhD, Amanda Askell, on staff – and what she is in charge of, the company refers to as character training. Claude’s behaviour is guided by what the company intriguingly refers to as a “soul document”, which spells out that Claude’s character be governed by dispositions and not rules:

Rather than outlining a simplified set of rules for Claude to adhere to, we want Claude to have such a thorough understanding of our goals, knowledge, circumstances, and reasoning that it could construct any rules we might come up with itself. We also want Claude to be able to identify the best possible action in situations that such rules might fail to anticipate.

So, it turns out that robots, like humans, do best to govern their behaviour by virtues rather than rules. But there is one key difference. Human virtue ethics is usually eudaimonistic: that is, a key justification for us to be virtuous is that it helps us to flourish, to live good lives, in a way that includes being happy and peaceful. Human virtue is valuable at least in part, and perhaps entirely, because it contributes to human flourishing.

But there’s no reason to care about AI flourishing! Unless you’re Blake Lemoine, or someone else who attributes something like sentience or consciousness to AIs, then it shouldn’t matter to you whether they flourish. Insentient unconscious entities don’t have well-being; they are not happy or unhappy or any of the other states so essential to human flourishing and its lack. If they’re not serving their purpose then we pull the plug. What matters is whether they support our flourishing, the flourishing of humans (and maybe of other animals). So whereas we humans need self-regarding virtues like mindfulness and zest that help us to be happy, AIs don’t. They only need other-regarding virtues, in the range of generosity and honesty – character traits that help them be good to us.

1 thought on “Why teach virtue to a robot”

Nathan said:

3 May 2026 at 5:39 pm

“So, it turns out that robots, like humans, do best to govern their behaviour by virtues rather than rules.” But there is another major difference between robots and humans that is related to your last two paragraphs but not explicit in it: Humans live in human societies, whereas “robots” (at least, AIs like Claude) don’t live in robot societies. At various (spatial) scales of society, humans do govern their behavior by impersonal rules: the laws of whatever jurisdiction. Of course, laws aren’t enough: individual humans need to be virtuous enough to obey the laws (or to advocate to change them when needed). So it’s not entirely true that that humans govern their behaviour by virtues rather than rules. They use both, and use different kinds of rules at different scales of society. On the robot side, we haven’t seen robot societies yet…

Love of All Wisdom

~ Philosophy through multiple traditions

Why teach virtue to a robot

1 thought on “Why teach virtue to a robot”

Leave a Reply Cancel reply