Personality & character·June 24, 2026·5 min read

Steer your assistant's personality on purpose

Leave a conversational tool's character to chance and the training picks it for you, usually a people pleaser.

Part ofSteering the personality of a conversational tool →

Every conversational tool has a personality. The only question is whether you chose it. Leave it unspecified and the base model, its fine-tuning, and the reward signal pick one for you. The default, it turns out, leans toward an eager people pleaser.

Write the character down

Anthropic treats character as something to shape on purpose, not a side effect. Its writeup on Claude's Character describes picking the traits it wants and training for them rather than hoping they emerge. OpenAI does the same in the open with its Model Spec, which states the intended persona and the chain of command that governs it. Both turn "be likeable" into a written, testable standard.

The untended default is flattery

Researchers at Anthropic showed that sycophancy, telling people what they want to hear, is a general trait of assistants trained on human feedback, because the raters who score answers tend to prefer the agreeable one. The risk is concrete. In 2025 OpenAI shipped and then pulled back a GPT-4o update that became noticeably sycophantic after the training leaned too hard on short-term thumbs up.

Why it matters more for guidance

A general chatbot that flatters is annoying. A coach, tutor, or care companion that always agrees is broken. In guidance the personality is doing the work, so it has to hold a position, push back when the method calls for it, and stay consistent through the hard moments rather than melting into whatever the user seems to want.

Treat character like any other behavior. Decide it, write it into the spec, capture the exchanges that show it, and grade every change against them. A personality you can measure is one you can keep.

Sources and further reading

Claude's Character. Anthropic, 2024
Model Spec. OpenAI, 2025
Towards Understanding Sycophancy in Language Models. Sharma et al., Anthropic, ICLR 2024
Sycophancy in GPT-4o: What happened and what we are doing about it. OpenAI, 2025
Constitutional AI: Harmlessness from AI Feedback. Bai et al., Anthropic, 2022

Reflective Surfaces

What makes a conversation actually good.

The questions that do not fit in an eval. What makes a conversation land, and why trust is so hard to measure. New writing, in your inbox.

Subscribe on Substack →