Multi-User MCP Servers Need a Different Architecture

In the past few months, I've watched countless companies rush to build and utilize remote MCP servers. The appeal is obvious: the Model Context Protocol promises standardized agentic tool use, seamless discovery, and a front-row seat on the AI hype train. GitHub is flooded with MCP server repositories. Model providers are racing to integrate them as built-in skills. The gold rush is on.

But there's a fundamental architectural mismatch being overlooked. For multi-user cloud applications (the kind aiXbrain is building) remote MCP servers create more problems than they solve. We're cargo culting a pattern that works for a narrow set of use cases and trying to force it everywhere.

Let me explain why, and what we should be doing instead.

Neues Bild Simon Blogartikel

Understanding MCP: Supermarkets vs Recipe Cookbooks

First, some context. The Model Context Protocol represents a genuine paradigm shift in how we build with LLMs. We're moving from static workflows and brittle instruction-following to emergent behavior through standardized tool use. This matters.

Here's the mental model that's helped me think about this: Public APIs are like supermarkets, and MCP servers are like recipe cookbooks.

When you interact with a public API, you're walking through a supermarket stocked with hundreds of items. As the client developer, you pick what you need from that long list, combine ingredients, and create your intended result. The supermarket doesn't care what recipe you're making: use case knowledge lives entirely on the consumer side.

This works fine when humans are orchestrating the calls. But AI agents? They get overwhelmed. Expose all of Jira's API endpoints to an agent and watch it struggle to figure out which handful it actually needs for basic ticket management. Agents drown in optionality.

MCP servers are recipe cookbooks. They encapsulate use cases. Instead of exposing raw API endpoints (supermarket items), they expose tools that represent meaningful actions (recipes). "Submit a ticket" is a recipe. It still calls the underlying API endpoints, but it's a shortcut to client intent. This is a smart optimization given current tool use limitations and context constraints.

The protocol supports tool discovery and tool calling. It simplifies subagent setup. It doesn't eliminate context overload, but it provides mechanisms to reduce it. This is valuable technology.

So far, so good. The problems start when we think about where these MCP servers should run.

The Remote MCP Server Problem

The prevailing approach, the one being built by default, is the remote MCP server. A company exposes an MCP-compatible service, agents connect to it over the network, and tools are discovered dynamically. This seems elegant. It's also deeply problematic for most real-world applications.

Context Overload at Scale

Complex applications have complex use cases. A sophisticated project management tool might need hundreds of recipes to cover the spectrum of what users do. But if your MCP server exposes all of them, you've just handed the agent a 500-page cookbook and asked it to find the one recipe it needs.

MCP was designed around dynamic discovery. The intended data flow is: LLM connects to MCP server, discovers all available tools, decides what to call. There's no elegant mechanism for client-side filtering. You can't easily prune the tool list between the LLM and the MCP server without violating the architecture.

So you're stuck. Expose too few tools, and the MCP server can't cover your actual use cases. Expose too many, and you overwhelm the agent's context window. Remote MCP servers force you into a one-size-fits-all posture that satisfies no one.

The Security Nightmare

MCP was not designed with security as a primary concern. That's actually fine, but only if the connection between the LLM and the MCP server exists within a trusted environment. I'm certain that this was always implied and intended by Anthropic.

Remote MCP servers break this assumption. Now you have a non-deterministic system making authenticated remote procedure calls over the network. This is already risky. But the authentication story gets worse.

Your client-side application likely already has authentication solved. You're running in the user's context, you have an API key for your own services, maybe you have OAuth tokens for third-party integrations. That state exists within the app's memory or session.

But remote MCP servers can't access any of that. Now you need a separate authentication channel just so your LLM can talk to a remote process. This means:

  • Additional OAuth flows that your users must complete
  • Credential management complexity multiplied by every remote MCP server you integrate
  • Attack surface expansion as credentials flow through multiple systems
  • Zero guarantee that the remote MCP server respects the same permission boundaries your application enforces

OAuth has been awkwardly bolted onto MCP servers, but it's not elegant. The exception is exactly the scenario everyone is copying: generic connectors for popular apps like Google Drive or Slack. For those, OAuth makes sense because the use cases are broad and the integration is the same for everyone.

But for your complex, domain-specific application? This is architectural malpractice.

Loss of Control

Here's the part that bothers me most: remote MCP servers don't let you tune your tools.

I have many cookbooks in my kitchen. For any given one, there's a short list of five recipes, maybe ten at most, that I come back to repeatedly. Those are my goto recipes. The rest of the cookbook might as well not exist for my daily use.

This is how real applications work too. Your power users have patterns. They use specific workflows repeatedly. The tools surfaced to their agents should reflect their patterns, not some generic averaging across all possible users.

Remote MCP servers are one-size-fits-all. The provider decides which tools to expose and how to describe them. You don't get to customize tool descriptions to maximize agent understanding. You don't get to hide the 90% of tools any given user will never touch. You lose control precisely where you need it most: at the interface between your users' intent and the agent's tool selection.

When Remote MCPs Actually Make Sense

I want to be clear: I'm not saying remote MCP servers are always wrong. They make perfect sense for popular applications with relatively generic use cases.

Google Drive is a great example. What I or my agent want from Google Drive is pretty standard: read files, write files, share files, search. The use case space is narrow enough that a remote MCP server can expose a tight set of tools without overwhelming context. That's why model providers have been eager to integrate these as built-in skills in their chat UIs.

If you're building an app that needs a connector for Slack or Notion? Go ahead, use their remote MCP server. But that's not what most of us are building. We're building complex, multi-user applications with domain-specific workflows and per-user permission boundaries. For us, remote MCPs are a trap.

Local MCP Servers: Better, But Not Enough

Okay, so what about local MCP servers? These run on the client machine, usually launched with environment variables containing the user's authentication and API keys. Most of the MCP server examples on GitHub actually assume this model, in which your server instance is tied to the authentication you provide on startup.

For single-user environments, this works great. You avoid the authentication nightmare. You can customize tools freely. You might even be able to set up a local MCP server for your company this way, as long as using a shared account is feasible.

But here's the problem: what about multi-user cloud applications?

This is the crucial question for anyone building AI capabilities into a web app or SaaS product. You need the AI to act on behalf of individual users, respecting their specific permissions both within your system and when calling third-party integrations. But you can't run a persistent MCP server per user if you want to scale.

So where does that leave us?

The Solution: Ephemeral, User-Scoped MCP Servers

Here's my current answer: ephemeral, user-scoped MCP servers.

The core idea is simple. Your stateless REST application receives a request. During the lifecycle of that single request, it spins up one or more MCP server processes in the user's context. The agent does its work, calling tools as needed. When the request completes, the MCP servers are torn down.

 

Let me break down why this works:

Authentication flows naturally. The user is already authenticated in your app. Your backend already has their credentials, their permissions, their OAuth tokens for integrated services. When you create an ephemeral MCP server in the request context, all of that state is available. No separate OAuth dance. No credential coordination across systems. The MCP server inherits the user's security context automatically.

Permissions are respected. Because the MCP server runs in the user's session context, it respects the same permission boundaries your application enforces. If a user doesn't have access to delete resources, a well-designed MCP server won't expose a "delete resource" tool for that user; and even if it did, the request would be rejected. The permissions are per-request, per-user, dynamically scoped.

Tools can be tuned. Because you're instantiating the MCP server programmatically, you can decide which tools to expose based on user role, current task context, or learned patterns. Power users can get advanced tools. New users can get simplified sets. The MCP server becomes a dynamic adapter between user intent and system capabilities.

Third-party credentials are already available. If your app integrates with external services, you already manage those OAuth tokens on behalf of your users. The ephemeral MCP server has access to them within the request scope. No additional integration complexity.

The critical success factor here is startup and loading time. If instantiating an MCP server takes seconds, this pattern breaks. You need these to spin up in milliseconds. Interestingly, this is the exact same constraint facing the alternative "code mode" approaches to MCP (Anthropic, Cloudflare). Both paradigms depend on how quickly you can create a secure, user-scoped environment for the LLM to operate in.

This is the new performance bottleneck to optimize. Fast startup MCP servers. Lightweight runtimes. Minimal initialization overhead. Get this right, and the ephemeral pattern becomes incredibly powerful.

Rethinking MCP Architecture

There's no doubt that the Model Context Protocol was and is a valuable step forward. It standardizes tool use across Large Language Models, enables agent interoperability, and provides a foundation for the community to converge on shared abstractions. These are real benefits.

But the deployment model matters enormously. The remote MCP server pattern is being cargo culted because it's visible, because it's what the big providers are building, and because it feels like "best practice." For complex, multi-user applications, it's usually the wrong choice.

Here's my recommendation:

If you need a connector for a popular app with generic use cases, use their remote MCP server if they have one. Make it robust, handle OAuth properly, just like ChatGPT, Claude, and other agent platforms would integrate it as a built-in skill.

If you're building AI capabilities into a complex multi-user application, use ephemeral user-scoped MCP servers. Spin them up per request, inherit the user's security context, tune tools dynamically, and tear them down when you're done. Optimize for startup time. This is the path to agents that respect permissions, adapt to user patterns, and integrate cleanly with your existing auth infrastructure.

The architecture you choose determines whether your agent integration is a security liability or a powerful force multiplier. Choose carefully.

 


Have you encountered these challenges building with MCP? I'd love to hear how you're solving multi-user scenarios. Connect on LinkedIn to get in touch.

Dr. Simon Görtzen Dr. Simon Görtzen, CTO aiXbrain GmbH