Skip to content
← Insights
Integration & deployment·June 29, 2026·7 min read

Deploy inside the client's boundary

Residency, gateways, and egress decide whether a conversational tool can ship into an enterprise at all. Design for them from the first sprint or they become a rebuild.

A conversational tool that reads a client's data and calls a model is moving that data somewhere. Where it moves, where it is processed, and how it leaves the network are the questions that decide whether an enterprise can deploy the tool at all. These are not late-stage security details. They set the architecture, and a team that leaves them to the security review finds out in the review that the pilot has to be rebuilt. Draw the data-flow map first, make it the source of truth, and let the residency rule pick the model host and the egress path.

Residency decides where the model can run

Sending a conversation to a model provider is data leaving the boundary, even when the provider does not train on it. So the client's residency rule, keep data in a region, in a country, or inside our own tenant, often decides which model you are even allowed to call. This is why the cloud model services exist in the shape they do. Google states plainly that Vertex AI customer data does not leave the customer's tenant, is encrypted in transit and at rest, and is not used to train Google's models, and its data governance page repeats that training and inference data is not used to train or enhance the foundation models. Those properties are the point. They let a buyer keep the model inside a region they control.

Match the host to the rule. No residency requirement, and a reviewed vendor API can carry a pilot. A region or country rule, and the model moves to a cloud service you can pin to that region. Nothing may leave the client boundary, and the model, the retrieval, and the logs all move inside it. Decide this on the data-flow map, not in the security review.

The gateway is one control point in front of every model

Many enterprises will not let each service call a model directly. They route every call through an AI gateway, one control point where logging, rate limiting, retries, and policy live in front of every provider. Cloudflare describes its AI Gateway as a way to observe and control AI apps with analytics, caching, rate limiting, and model fallback through a single point, in front of providers like Anthropic, Google Gemini, and OpenAI. Open-source proxies do the same job inside the boundary. LiteLLM puts a single OpenAI-format interface in front of a hundred providers with virtual keys and spend tracking, and Kong routes across providers with observability and semantic security built in.

If the client runs the gateway, that changes your build. Your retries and your request logging move to their gateway rather than living in your app, and duplicating them double-counts. Ask early who owns the gateway, because the answer moves where half your controls live.

The controls are named, so build to the names

An enterprise buyer does not ask you to be secure. They ask for controls and evidence by name, and the frameworks tell you which names. The NIST AI Risk Management Framework organizes the work into govern, map, measure, and manage, and its terms show up in security questionnaires. ISO/IEC 42001, published in 2023, is the management-system standard specifically for AI, and buyers increasingly ask whether you are working toward it. In the EU, the AI Act adds obligations on top of GDPR. You do not have to quote these at anyone. You have to map your control to the category the buyer references and show the evidence.

Health data raises the floor. Handling US protected health information means a signed business associate agreement before any real data flows, and cloud providers will sign one, though as Google notes there is no HHS certification for HIPAA compliance and the responsibility is shared. No BAA, no PHI, full stop.

Name your sub-processors before they are asked for

The moment you route a client's data to a model provider or a gateway vendor, that vendor is your sub-processor, and the client's data processing agreement will ask you to name it. Providers model this in the open. Google's data processing addendum defines a sub-processor as a third party authorized to process customer data to provide part of the service, and it publishes the list with a change-notification process. Keep your own list current, because a residency promise means nothing if an unlisted sub-processor in the wrong region is quietly in the path.

The boundary is the source of truth

Draw the data-flow and residency map first and let it decide the rest. It is the artifact the client's security team reads before anything else, and when a later control seems to conflict with the boundary, the boundary wins. Get it right and every new client is a fill-in-the-blank. Get it wrong and each one is a rebuild. The workshop walks a team through drawing it, then through the experience controls that decide how the deployment feels once the data starts moving.

Work with Hunter Green

Bring us the hardest moment in your product.

We build the evals that define a good answer and the loops that keep a conversational product improving. Tell us where yours is hard to measure and we will map what it takes.