MCP From Prototype to Production: How We Built Serif Health’s Remotely Hosted and Secure MCP Servers

As we shared in our first post, the Model Context Protocol (MCP) is opening up new ways for LLMs to work with structured data. For price transparency, where billions of negotiated rates change monthly, it’s a breakthrough. But moving from a local MCP prototype to a secure, scalable, cloud-hosted MCP server is far from trivial.

A Quick MCP Refresher

MCP is an open protocol that lets AI models interact with external tools and data sources in real time. Instead of treating an LLM as a static text generator, MCP gives it tools: structured actions that can reach out and interact with new sources of context—like querying an API to retrieve structured data or performing a custom backend operation.

If an API is the engine, MCP is the transmission, providing an LLM with a defined pathway for leveraging its power.

From Prototype to Production

Most teams begin where we did: with a fast-moving local prototype using the Python FastMCP template and STDIO transport. With Anthropic’s examples, you can get Claude Desktop invoking a custom tool in under an hour, and the experience is undeniably magical.

But local MCP ≠ production MCP.

Once you decide to host servers remotely with Streamable HTTP transport and comply with auth standards, large sections of the prototype will need to be re-designed.

We also built our core API services in Go, so while Python helped us validate the concept quickly, we ultimately rebuilt our MCP servers using the official Go SDK to keep our infrastructure consistent.

What follows are the biggest decisions and tradeoffs we encountered during that transition.

Stateful or Stateless? A Core Architectural Decision

When you move from a local prototype to a hosted MCP server, you must choose between:

Stateful MCP servers

Maintain session state
Support server-initiated events (sampling, elicitation)
Enable richer conversational flows
Harder to deploy and scale in distributed cloud environments

These advanced features are powerful—elicitation, for example, lets the MCP server call back into an LLM to ask the user for more information. But stateful servers with session logic present hosting and deployment challenges for scalable cloud deployments.

Stateless MCP servers

Behave more like APIs
Do not depend on session state
Support core MCP actions: tool invocations, resources, prompts
Far easier to deploy and scale in cloud environments

For Serif Health, our priority is enabling tool calls into TiC-backed datasets, not server-initiated conversation flows, so we built our MCP servers stateless. Hugging Face has a helpful blog post that goes into detail on their choice to go stateless as well.

Authentication & Authorization: The Greatest Challenge of Production MCP

Authentication is one of the biggest differences between local and remote MCP servers.

MCP for HTTP servers mandates OAuth 2.1 as the long-term standard. But the reality is:

Many production APIs still rely on API keys.

And many LLM applications (Claude Desktop/Code, Cursor, Copilot) support passing a key in the Authorization header.

Stripe and Hugging Face both provide this option in their MCP servers.

The most robust approach is to implement the Dynamic Client Registration OAuth flow, which allows for an LLM application such as ChatGPT or Claude to register with an auth provider and create a new client and secret without any user involvement.

We’ve implemented our own custom version of this and both WorkOS and Auth0 have added it to their offerings recently.

So in practice, Serif Health supports:

API-key-as-bearer-token auth for simplicity and developer friendliness
Dynamic client registration (OAuth 2.1) for more advanced clients

Schema Design: The Foundation of Usefulness

MCP’s magic isn’t just that LLMs can call tools, it’s that they can call them correctly.

Clear input/output schemas are what allow a model to:

Select the right tool parameters
Understand required fields
Validate requests
Anticipate and use predictable structured responses

For this reason, schema design is a crucial aspect of an MCP server implementation.

We use the jsonschema-go builder library to define our tool schemas, which are then registered via the Go SDK’s AddTool method. This allows clients to validate requests before they hit our servers—and gives LLMs enough structure to reliably generate correct parameters for complex healthcare queries.

If you’re building an MCP server, treat schema design as a first-class citizen. It’s where most production reliability comes from.

The Result: Two Production MCP Servers for Price Transparency Data

After several rounds of iteration, we now operate two hosted MCP servers, available for public preview:

Neuron MCP

Payer and hospital price transparency data
Tools for market-level analytics and rate analysis

Find Care MCP

Provider search by payer, network, procedure, and location
Powered by TiC + claims utilization data

It’s still early for MCP, and the ecosystem is evolving quickly, but our experience suggests that a well-architected MCP server can provide a powerful, safe way for LLMs to interact with complex healthcare datasets.

Try It Yourself

If you’re experimenting with MCP or looking to add natural language access to healthcare cost data, we’d love for you to try our servers, now debuting for public preview:

Neuron MCP — market analytics tools
Find Care MCP — provider search and rate lookup tools

To get access or learn more about our API suite, reach out at hello@serifhealth.com.

Overview

Payer Data

Hospital Data

Claims Data

Practice Affiliations

Signal

APIs

Data Delivery

Reporting & Analytics

Providers

Plans

Employers & Benefits Partners

Life Sciences

Innovators

Payers

Blog