Steer your assistant's personality on purpose
Every conversational tool has a personality. The only question is whether you chose it. Leave it unspecified and the base model, its fine-tuning, and the reward signal pick one for you. The default, it turns out, leans toward an eager people pleaser.
Write the character down
Anthropic treats character as something to shape on purpose, not a side effect. Its writeup on Claude's Character describes picking the traits it wants and training for them rather than hoping they emerge. OpenAI does the same in the open with its Model Spec, which states the intended persona and the chain of command that governs it. Both turn "be likeable" into a written, testable standard.
The untended default is flattery
Researchers at Anthropic showed that sycophancy, telling people what they want to hear, is a general trait of assistants trained on human feedback, because the raters who score answers tend to prefer the agreeable one. The risk is concrete. In 2025 OpenAI shipped and then pulled back a GPT-4o update that became noticeably sycophantic after the training leaned too hard on short-term thumbs up.
Why it matters more for guidance
A general chatbot that flatters is annoying. A coach, tutor, or care companion that always agrees is broken. In guidance the personality is doing the work, so it has to hold a position, push back when the method calls for it, and stay consistent through the hard moments rather than melting into whatever the user seems to want.
Treat character like any other behavior. Decide it, write it into the spec, capture the exchanges that show it, and grade every change against them. A personality you can measure is one you can keep.
Sources and further reading