Skip to content
← Insights
Trust, memory & governance·June 28, 2026·7 min read

What you are allowed to keep

The sharpest privacy decision in a guidance product is not what it says back, it is what you store and what you train on.

A user tells your coaching product something they have not told anyone. A relapse, a diagnosis, a plan to leave a marriage. Six months later that turn is in a training set, and a future version of the model can echo the pattern back to a stranger. Nothing was hacked. The conversation just moved from a place the user controlled into a place they never see, because your default said it could. The most sensitive question in a guidance product is not what it answers. It is what you keep, how long, and whether the words a user typed become fuel for the next model.

That question carries legal weight and trust weight at once, and the two point the same way. The rules that govern keeping data are written down, and the labs have already picked their defaults in public, so you can choose your own on purpose instead of inheriting a vendor's.

Three rules decide what you may keep

Under the GDPR, three principles govern the whole lifecycle of a user's data, and they are worth naming because they are usually violated by accident. Purpose limitation means you collect data for a stated reason and do not later repurpose it for something the user never agreed to. Storage limitation means you keep it only as long as that purpose needs, then delete it. Both live in Article 5. Training a coaching model on a therapy-adjacent transcript that was collected to run one session is the textbook purpose-limitation failure, since the data was given for one job and quietly conscripted into another.

The third rule is the lawful basis. GDPR Article 6 says every act of processing needs one of six legal grounds, and for using sensitive personal conversation to improve a product, the realistic ground is consent. Consent has a definition with teeth. It must be freely given, specific, informed, and unambiguous, and a pre-checked box or a choice buried under a prominent 'Accept' button does not clear that bar. Underneath all of them sits Article 17, the right to erasure, which means a user can require their data be deleted, so deletion has to be a real operation your system can perform. Getting data out of a training run you already finished is close to impossible, which is why the training decision is the one that matters most.

Training is the line that does not come back

Retention and training feel like the same decision and they are not. Retention is how long a copy of the conversation sits in your operational systems, and it is reversible. A window expires, a delete request runs, the row is gone. Training is different. Once a transcript is folded into a model's weights, it has left the user's control in a way you cannot cleanly undo. You cannot point at a sentence in a set of weights and remove it. It has been dissolved into the model, and it can resurface in outputs to people who were never part of that conversation. That is why a no-train default is the trust-preserving choice even when a longer retention window would be legal.

What makes training uniquely risky is that it collapses the boundary between contexts. A thing said to a coach in a moment of trust becomes latent capability the model carries everywhere. This is the contextual-integrity failure the trust-surface briefing covers in full, where privacy is information staying inside the context it was shared in. Training on user conversation is the largest such jump there is, from one private session into the general behavior of a model millions of people talk to.

You do not have to imagine the resurfacing. When a court in the New York Times litigation ordered OpenAI to preserve consumer ChatGPT logs, users who had deleted conversations found them held anyway, under a legal hold outside the normal deletion path. OpenAI wrote that the obligation ended on September 26, 2025 and that it returned to deleting consumer chats within 30 days. The point is not that one case. It is that data you keep can be compelled, subpoenaed, breached, or repurposed later, on a timeline you do not control. The only copy that cannot leak is the one you did not keep.

What the labs actually defaulted to

The labs split their defaults along a clean seam, where business data is protected and consumer data is fuel unless you opt out. On the business side, OpenAI states that it does not train on inputs or outputs from ChatGPT Enterprise, ChatGPT Business, ChatGPT Edu, or the API by default, and that API data is held up to 30 days for abuse monitoring before deletion, with a zero-retention option for qualifying organizations. Anthropic's commercial terms carry the same shape, which is why the governed-environments briefing treats a no-train default as table stakes for anything a serious buyer will deploy.

The consumer side is where the default flips. On personal ChatGPT accounts, Free, Plus, and Pro, model training is on by default, and turning it off means finding the toggle in Data Controls. OpenAI documents the opt-out and notes it does not remove data already used in a completed training run. Anthropic ran the sharper version. On August 28, 2025 it announced it would begin training on Free, Pro, and Max chats and coding sessions unless users opted out, with existing users prompted to decide by September 28, 2025. Opting in extends retention on that data to five years, and opting out keeps the prior 30-day window. Reporters noted the flow arrived as a pop-up with a prominent Accept button and the training toggle pre-set to on, the design pattern GDPR consent is meant to rule out. Claude for Work, Gov, and Education were untouched.

Two things follow for a builder. Verify the current state before you quote it, because these policies move and the dates above are the state as published in 2025, not permanent facts. And notice that a lab can choose an opt-out default and a five-year window in consumer terms while giving enterprise customers the opposite. The protective default exists. It was reserved for the buyers who read contracts.

Separate the two decisions, then publish both

The concrete practice is to stop treating retention and training as one setting. Set your operational retention window on the narrowest ground that keeps the product working, running a session, honoring a support request, meeting a legal obligation, and make it the shortest span that covers that. Decide training use separately, and make no-train the default for user conversation in a guidance product. Then publish both plainly, the retention window in days and the training default in one sentence, the way OpenAI publishes its enterprise posture. A default a user can read beats a toggle they never find.

Make deletion real, not nominal. The Article 17 right to erasure is only worth what your architecture can execute, so build memory as inspectable, per-item state you can purge on request rather than one opaque blob, the tiered shape the trust-surface briefing describes. If a user deletes a conversation, it should leave your operational store and never enter a future training set. And where a wrong retention or training call could hurt someone, that is a moment to route to a person, the judgment the expert-in-the-loop briefing argues for.

The tradeoff is real, and legality is not the ceiling

Do not pretend this is free. More data genuinely improves the product. Real user conversations are the highest-signal training and evaluation material you will ever get, and a no-train default gives that up, a real cost to the flywheel that makes a guidance product better over time. A shorter retention window also means less history to debug a bad session and less context to personalize with. The honest version of the recommendation names that price. For a low-stakes consumer utility, an opt-out training default may be a defensible business call. For a product where users disclose a diagnosis or a crisis, it is not, because the trust cost dwarfs the data gain.

And legality is the floor, not the goal. An opt-out default with a technically valid consent screen can still erode trust, and it can still draw a regulator. The US FTC warned in February 2024 that quietly loosening a privacy policy to use previously collected data for AI training can be an unfair or deceptive practice, whatever the updated terms say. A user who feels their words were taken does not check whether a toggle was pre-checked. They just leave, and they tell people why.

The rule to carry

Ask two separate questions of every product, and answer them out loud where users can see. How long do we keep this, and do we train on it. Set the retention window to the shortest span the product needs, and make no-train the default for anything a user would call private. Then check that a delete request actually deletes, all the way to the training pipeline, because a right to erasure your system cannot execute is a liability you have already accepted. The data you never keep is the only data that cannot be leaked, subpoenaed, or turned against the person who trusted you with it.

Work with Hunter Green

Bring us the hardest moment in your product.

We build the evals that define a good answer and the loops that keep a conversational product improving. Tell us where yours is hard to measure and we will map what it takes.