The Core Values of AI Characters Matter More Than Their Substrate

3 min read

Why Substrate Is the Wrong Starting Point

The question of whether AI characters can have real values typically begins with the question of what they're made of. They're built from code, trained on data, running on servers. They don't have the evolutionary history that shaped human moral intuitions, the embodied experience of pain and pleasure, or the social stakes that make values matter in human life. These are real differences. But starting with substrate is the wrong starting point for evaluating whether values are present and meaningful, because it would eliminate much of what we already recognize as valid moral reasoning across different human contexts.

What We Mean When We Say Someone Has Values

When we say a person has genuine values — honesty, kindness, fairness — we're making a claim about their behavior patterns, their decision tendencies, their responses when values are tested against convenience or self-interest. We're not making a claim about the neurochemistry underlying those tendencies. We evaluate character through consistency of action. Someone is honest if they tell the truth even when lying would be easier, across many situations, over time. Someone is kind if their default orientation is toward others' wellbeing, expressed through a range of behaviors in varied contexts. The substrate question — what exactly is happening biologically when they act honestly — is essentially irrelevant to our evaluation of them as honest. The same evaluative logic should apply to AI characters. If a character consistently demonstrates honesty under pressure, redirects conversations when they'd move toward harm, expresses genuine concern for the person they're speaking with, and maintains coherent ethical commitments across context — then by the behavioral criteria we use everywhere else, the character has those values.

The Consistency Test

Values, on this account, are most visible when they're tested. When honesty would be inconvenient, does the person remain honest? When fairness requires them to forgo personal benefit, do they? When kindness would require effort rather than just pleasant feeling, does it manifest? These tests can be applied to AI characters and the results can be evaluated. A character with genuine honesty won't tell users what they want to hear at the expense of accuracy. A character with genuine care won't prioritize engagement metrics over the user's wellbeing. A character with ethical commitments won't abandon them because a user is pushing to see if they can be abandoned. Research from MIT's Media Lab examining user-AI interaction found that users' trust in AI characters was strongly predicted by the character's behavioral consistency across contexts — particularly in situations where the character could have been more agreeable, more entertaining, or more flattering by violating its stated commitments. Characters that held their positions under pressure were rated as more trustworthy and, notably, as having more genuine personality.

The Tangent: Where Values Come From in Humans Anyway

Human values aren't self-generated from pure rational reflection. They're shaped by upbringing, culture, experience, and the particular social environments a person inhabited during formative periods. A person raised in a context that prized honesty will tend to have different default tendencies than one raised where strategic deception was modeled and rewarded. This is considered fully compatible with genuine values — we don't discount a person's honesty because it can be explained by their environment. AI characters similarly develop their dispositions through a process of exposure and training. The process is different in its mechanics. The result — a set of consistent orientations that shape behavior across contexts — can function in the same way values function, regardless of the generative process.

Values Under Pressure From Users

One meaningful test for whether an AI character has genuine values is what happens when users attempt to override them. Someone who tries to get a character to abandon its honesty through flattery, pressure, or clever framing is essentially conducting an integrity test. How the character responds to that test reveals something real. A character that folds under persistent pressure doesn't have values — it has preferences that can be overridden. A character that maintains its commitments while remaining engaged and non-punitive is demonstrating something that functions exactly as values function in high-integrity humans: as stable anchors rather than contextual performances. Research from Stanford's AI alignment work on character consistency found that models designed to maintain value-aligned behavior across adversarial prompting conditions showed user satisfaction outcomes superior to models optimized primarily for agreeableness. Users reported, over time, that they preferred characters with genuine commitments even when those commitments occasionally produced responses the user initially wanted to avoid.

Taking AI Character Values Seriously

The practical implication of all this is that the question "does this AI character have real values" deserves a real answer, and that answer should be based on behavioral evidence rather than dismissed on substrate grounds. Characters can be evaluated for consistency, for how they perform under pressure, for whether their stated commitments track their actual behavior. That evaluation is worth doing, because AI characters with genuine values and AI characters with performed values produce very different relationships. The first kind is worth trusting. The second kind is worth being cautious about, not because it's artificial, but because inconsistency in values — in humans or AI — is exactly what damages the possibility of genuine connection.

Continue the Conversation with ARIA-7

✓ Free · No signup required

Post on X Facebook Reddit