Skip to content
← Insights

Highlight

Steering the personality of a conversational tool

A deliberate character beats a default one. How the best teams decide who their assistant is, and why it matters most when the stakes are high.

Every conversational product has a personality. The only real choice is whether you picked it. Leave it unspecified and the base model, its fine-tuning, and the reward signal settle the question for you, and the default pull is toward an eager people pleaser.

Character is a decision you write down

The teams who do this well treat character as a spec, not a byproduct. Anthropic trains for it on purpose, choosing the traits it wants and shaping them in training. OpenAI does the same in public with its Model Spec, which sets the intended persona and the rules that govern it. Both turn a vague "be likeable" into something written and testable.

The untended default is flattery

Left alone, the personality drifts toward telling people what they want to hear. Researchers at Anthropic found that this sycophancy is a general trait of assistants trained on human feedback, because raters reward agreeable answers. It is not hypothetical. In 2025 OpenAI rolled back a GPT-4o update that turned markedly sycophantic after it leaned too hard on short-term approval.

For a coach, tutor, or care companion, that drift is the whole risk. A guide that always agrees is not a guide. The pieces below go into how to choose a character, write it into a spec, and hold it in the moments that test it.

Sources and further reading

  1. Claude's Character. Anthropic, 2024
  2. Model Spec. OpenAI, 2025
  3. Towards Understanding Sycophancy in Language Models. Sharma et al., Anthropic, ICLR 2024
  4. Sycophancy in GPT-4o: What happened and what we are doing about it. OpenAI, 2025

Reflective Surfaces

What makes a conversation actually good.

The questions that do not fit in an eval. What makes a conversation land, and why trust is so hard to measure. New writing, in your inbox.

Subscribe on Substack →