Development of the MCP Registry · modelcontextprotocol/registry · Discussion #11 (original) (raw)
Edit (May 27, 2025): Here is a slide deck presentation from the recent MCP Developers Summit outlining the goals of the official registry.
Spurred by this discussion, a group of MCP community members have begun working on the official MCP metaregistry in this modelcontextprotocol/registry repository.
The initial deliverable we are sprinting towards is a REST API that centralizes metadata about MCP servers by leaning on server creators to submit and maintain metadata about their servers in a standardized format.
We envision that MCP client applications and other "server aggregator" type consumers will be able to leverage this metadata as a source of truth touchpoint in order to power functions like "extensions", "MCP marketplaces", "integrations", and other UX that involves discovering and installing MCP servers.
An official UI experience will likely come as a next-step after the initial API launch.
Coordinating this effort: @toby (GitHub), @alexhancock (Block), me (PulseMCP)
Leading development: @sridharavinash (GitHub)
Big thank you to everyone else who has contributed input on this working group so far: @dsp-ant, @jspahrsummers, Marc-Antoine Belanger, Benjamin Eckel, Chris Dickinson, Nils Adermann, @macoughl, @ravinahp, @jerome3o-anthropic, @topherbullock, @calclavia, @cliffhall
If anyone in the community has feedback or commentary on any aspect of the metaregistry, I encourage you to open up a Discussion explaining your use case and feedback. We feel good about the high level architecture and scope of the metaregistry, but are very open to feedback and want to make sure we are properly addressing the ecosystem's needs that should be in scope for this work.
Below is a high level review of where we currently stand and some architectural decisions we have made, along with some open questions we'd love feedback and input on (we'll spin some of these out to separate Discussions, but feel free to open your own if you don't see one already out there). The best way to contribute to this effort is to either (1) offer specific examples and use cases of your needs, or (2) pick up well-defined chunks of work that we have already aligned on and contribute to the codebase.
Details below are subject to change; I won't be updating the below as it goes stale. Consider this a starting point at the time of writing, but the codebase and its associated documentation is the ultimate source of truth, and other more fine-grained Discussions/Issues will likely progress forward these details.
Scope
Problem Statement
End-users of MCP clients have a need to be shown MCP servers they may want to install. MCP client hosts need to surface this data for their users.
Currently, the way MCP clients do this is one of the following:
- Web scraping combined with inference at install-time (e.g. scrape GitHub README, infer metadata)
- Web scraping pre-install time, effectively maintaining their own copy of the ecosystem
- Plug into a third party registry service that is doing one of the above
This is a very fragmented approach that results in significant amounts of duplicated effort across the ecosystem, and still results in an inefficient, non-comprehensive UX for end-users. In addition, it puts server maintainers in a position where they must maintain their metadata on potentially dozens of different solutions (whether they be third party registries or MCP client registries).
Solution
Create a single source of truth "centralized metaregistry", to which server maintainers can push their metadata, and MCP clients (or other consumers) can consume that metadata, filter it, curate it, and serve it to their end-users.
This metadata should contain:
- References to where source code / packages / containers are published (i.e. other "registries"; hence this is a "metaregistry")
- A small amount of descriptive metadata to identify the server and understand its purpose (e.g. name, description)
- Installation guidance, in accordance with how the ecosystem is using packages today (so namely:
npxanduvx-style installation instructions). - A notion of version management, to make it easy to understand updates to this data over time
We aim to make the shape of this metaregistry be fit for re-use in other contexts, such as internal private metaregistries, or opinionated intermediate metaregistries that can be consumed by MCP clients instead of (or in addition to) the official centralized metaregistry.
Out of Scope
Massive scale & reliability (millions of consumers): this metaregistry should not be consumed by MCP client end-users. MCP client hosts should maintain asynchronous systems that integrate with the metaregistry. This means the maximum number of direct consumers of this metaregistry is roughly equal to "number of MCP client apps", rather than "number of MCP client app users".
Solving the long tail of MCP filtering & curation capabilities: MCP server "selection" is a complex problem with a myriad of long-tail use cases. The right "sort order" or "ranked search results" for a given use case or query is not something we want to solve centrally. MCP Client IDE's will solve this problem differently than MCP Client chat apps, who will solve it differently from MCP Client agent-building frameworks. While we may choose to include optional data fields that help facilitate solving this problem for end-users, we do not anticipate including that kind of data in the centralized metaregistry itself.
Reusability of infrastructure, implementation detail decisions: While we expect the ecosystem to reuse the shapes (such as OpenAPI shape, mcp.json shape) associated with this work, we are not designing for reuse of the underlying implementation details and infrastructure. As such, we will not provide instructions on "how to serve your own instance of this metaregistry".
Comprehensive security assurances: while there is some opportunity to improve security guarantees across the MCP ecosystem by working assurances into a centralized metaregistry, we are starting off by delegating source-code level concerns to package registries (we will not host any source code), and deferring opportunities like source code scanning, tool poisoning countermeasures, etc. to post-MVP iterations of the metaregistry.
Architecture
REST API: No authentication required to read; GitHub authentication required to publish. Publishers will submit metadata in a standardized format ("mcp.json") describing their MCP servers. The API will accept consumer requests to read published metadata, and have internal capability to perform async jobs and potentially issue webhook calls. It will be implemented with Go.
Note
We are choosing a REST API rather than, say, a daily data dump, because we expect our API shape to be repurposed by various consumers that augment (or make private versions of) the metaregistry. So while the centralized metaregistry is not designed for direct consumption by MCP client apps, the intermediate consumers may choose to mirror the shape of the centralized metaregistry, and it would be useful for them to expose their data to MCP client apps in a consistent manner.
NoSQL Database: To store application-level metadata and mcp.json data.
GitHub OAuth: Will serve as our auth provider. Requests to publish data into our REST API must go through GitHub OAuth.
Source Code Registries: npm, PyPi, crates, dockerhub, GHCR, etc. will store original source code and version data. The metaregistry will store references to these on a per-server basis.
CLI tool: Used by developers to trigger publication requests to the REST API.
Building blocks
Domain-based namespacing: Publishers can use TXT-based DNS verification to get access to publish packages namespaced under reverse DNS (e.g. com.microsoft(.subdomain)/my-mcp-server). For those publishing source code to GitHub without DNS verification, we'll offer a namespace like io.github.{username}/my-mcp-server.
Note
Whether we should use forward DNS or reverse DNS is still an open question. See #12
Auth delegation to GitHub OAuth: In its simplest form, this lets us tie a DNS verification to a GitHub user or an entire GitHub organization, used when executing metadata updates.
Delegation to existing package & container registries: Gives us a base level of security and anti-spam assurance.
Store only metadata that serves as a single source of truth: Anything that can be a reference to structured data stored elsewhere should be a reference, not a copy.
Design for consumers to poll & transform: Because we will not solve "curation" in a centralized manner, our design assumes that there will be at least one middle-layer of data transformation in between data published by the metaregistry and consumption by MCP client end-users.
Considerations
Third parties should be able to mirror our design for private or downstream usage: There will be a need for registering MCP server metadata beyond a single, centralized public repository. Consumers of this metadata may want to compose the centralized repository with their own private repositories, or augment data in the metaregistry with their own opinionation; our API surface area should keep those use cases in mind.
Example Flows
Here are example flows showing how the system will work in practice:
GitHub user publishes to personal namespace with no domain verification
sequenceDiagram participant User participant CLI as CLI Tool participant GH as GitHub App participant API as Registry API participant DB as Registry Database participant Storage as Object Storage
User->>CLI: Run with mcp.json<br/>(name: io.github.tadasant/my-mcp-server)
CLI->>GH: Initiate OAuth flow
GH->>User: Prompt for authorization
User->>GH: Authorize
GH->>CLI: Return Authorization Code
CLI->>GH: Exchange code for token
GH->>CLI: Return OAuth token
CLI->>API: Submit publish request with<br/>OAuth token + mcp.json
API->>GH: Verify token belongs to<br/>@tadasant user
GH->>API: Confirm ownership
API->>Storage: Store mcp.json file
API->>DB: Record metadata
API->>CLI: Confirm successful publish
CLI->>User: Display success messageLoading
GitHub user verifies domain ownership
sequenceDiagram participant User participant CLI as CLI Tool participant GH as GitHub App participant API as Registry API participant DNS as DNS System participant DB as Registry Database
User->>CLI: Run verify-domain command<br/>(domain: example.com)
CLI->>GH: Initiate OAuth flow
GH->>User: Prompt for authorization
User->>GH: Authorize
GH->>CLI: Return Authorization Code
CLI->>GH: Exchange code for token
GH->>CLI: Return OAuth token
CLI->>API: Submit domain verification request<br/>(domain + OAuth token)
API->>GH: Verify token belongs to user
GH->>API: Confirm ownership
API->>CLI: Return TXT record to add<br/>(e.g., mcp-verify=abc123)
CLI->>User: Display TXT record instructions
User->>DNS: Add TXT record to<br/>example.com DNS settings
DNS-->>User: Confirm TXT record added
User->>CLI: Continue verification process
CLI->>API: Request domain verification check
API->>DNS: Query for TXT record at<br/>example.com
DNS->>API: Return TXT record
API->>API: Verify TXT record matches<br/>expected value
API->>DB: Store domain verification mapping<br/>(GitHub user -> verified domain)
API->>CLI: Confirm successful verification
CLI->>User: Display success message<br/>(Now authorized to publish under com.example/*)Loading
GitHub user publishes under namespace they have previously verified
sequenceDiagram participant User participant CLI as CLI Tool participant GH as GitHub App participant API as Registry API participant DB as Registry Database participant Storage as Object Storage
User->>CLI: Run with mcp.json<br/>(name: com.example/my-mcp-server)
CLI->>GH: Initiate OAuth flow
GH->>User: Prompt for authorization
User->>GH: Authorize
GH->>CLI: Return Authorization Code
CLI->>GH: Exchange code for token
GH->>CLI: Return OAuth token
CLI->>API: Submit publish request with<br/>OAuth token + mcp.json
API->>GH: Verify token belongs to user
GH->>API: Confirm ownership
API->>DB: Check if user has verified<br/>example.com domain
DB->>API: Confirm domain verification
API->>Storage: Store mcp.json file
API->>DB: Record metadata
API->>CLI: Confirm successful publish
CLI->>User: Display success messageLoading
Frequently Asked Questions
How do I know if someone creates a server named @block/goose, that it is actually made by Block?
- The server name will be reverse DNS based on the verification flow above. So xyz.block/goose would be a valid name for someone who has DNS-verified against block.xyz.
- Of course, there is some risk with phishing attempts like com.blockxyz/goose trying to masquerade as official implementations. We'll rely on community reports/PRs to address phishing/malicious typosquatting attempts.
What should the name of the server be? Should we enforce that the name is the repo name or can it be freeform?
- As long as it is within a verified reverse DNS namespace, you can have as much or as little alignment with repositories as you want.
- We should probably create a UUID for a server when it is first created (and embed it into the mcp.json), so that it is possible to later rename without dropping version history.
How do we handle domain transfers? E.g. domain expires, someone else buys it; we'd need to make sure the old owners don't still have access.
- We'll periodically (daily?) verify that TXT records are still available. If they become unavailable, further publishing by the relevant verified users is blocked.
- We'd likely not invalidate historical packages in that case, but may display a warning to anyone who requests information about them.
What fallbacks do we have in case spammers or other bad actors succeed in publishing malicious/spam content?
- Allow blacklisting GitHub users, organizations, or DNS namespaces in the metaregistry code or by some environment variable.
- The blacklist could have an option to retroactively remove those entries from public consumption.
Will there be a problem with spam and abuse detection?
- Initially, we will not have a public UI, so this risk is somewhat mitigated.
- MCP client marketplaces will be incentivized to curate and create cutoffs that make sense for their use cases (e.g. GH star thresholds, download count thresholds, manual curation).
- Mitigation ideas:
- Make as many fields as possible enums, and otherwise have reasonable character limits and regexes
- Rate limit to one new server per user/org per day
- Have AI analyze submissions async to detect spam → open a PR to blacklist GitHub accounts of spammers
Will we maintain server quality guarantees in some way?
- Beyond anti-spam discussed above; no. The threshold is: someone thought their work was relevant enough to others that they invested some time to get it published.
How often do we expect consumers to read our data?
- /servers: once per day
- /servers/:id: once per version and store it
- We should design for CDN caching, so our infrastructure will handle deviations from the above expectations just fine
Who will be responsible for monitoring, alerts, on-call?
- Frame announcement in such a way that downtime, while unideal, is not unacceptable for up to a business day of time. Because consumers of the API stand between end-users and the metaregistry, and are making daily copies of relevant data, there should not be a significant need for on-demand information from the metaregistry.
- Means the OSS maintainers can jump on issues the next time they are online
- Set up basic observability/alerting that shows up in a queue that can be reviewed when online
How does versioning work?
- An mcp.json file has a
version_detailattribute - Publish requests must include a version bump in the submitted mcp.json data
- We store up to one mcp.json file per version (it's immutable)
- Consumers can use an endpoint that allows fetching an old version of the mcp.json for a server when specifically requested
How will people discover servers they're interested in?
- Rely on consumers of the metaregistry to solve it for end-users. For example, the Cline Marketplace should ingest the metadata and organize it appropriately for their users as they see fit.
How will we get off the ground w.r.t. data in the metaregistry?
- Use the user-submitted data in the
modelcontextprotocol/serversREADME as a starting point (scrape and ingest); to be replaced by actual CLI-published data over time
Which component will manage GitHub auth and verifying repo ownership? - The CLI tool will facilitate an OAuth flow to ensure the CLI tool user either personally verified the corresponding DNS namespace, OR is a member of an organization which some individual has granted unilateral permission to publish on the DNS namespace
- Notably, this means the controls are not very fine-grained: a DNS verification is either applicable to just the user performing the verification, or otherwise to the entire organization; no in-between
How will identity work?
- Publication works by the CLI + backend API managing OAuth into GitHub. So GitHub is the source of truth for identity
How will regaining access (such as when leaving a job) work?
- Delegated to GitHub
How do we do change management w.r.t. to our API's and tools?
- Include versions on schemas where appropriate
- Eventually: manage an email list
How do we encourage people to actually use this system?
- Include in official docs / getting started
- Work with partners at e.g. GitHub, Cloudflare to tack on to server creation workflows
How do we know whether folks are using our solution and happy with it?
- Initially, just qualitative feedback from the community and trend metrics re: how many servers are available
- Later can expand to include more fine-grained analytics
What reliability guarantees do we need?
- Being that we ask intermediaries (like MCP client creators) to maintain a copy of the data they are interested in, we avoid being a mission-critical system for end-users. 24h of downtime would be acceptable
Is there any way to trick the official metaregistry to serving a package that a consumer is not expecting?
- Idea: require that CLI submitters link up their npm/pypi/etc entry to their source code repo prior to submitting mcp.json via CLI. This ensures that the publisher has access to both the source code (e.g. GitHub) repository as well as the associated entry in the package registry.
- This would also require us to verify that some
source_code_location(e.g. to a GitHub repository) reference is pushed by an authenticated user. It may make sense to specifically OAuth on any attempt to linksource_code_location(which could mean needing to integrate e.g. Gitlab OAuth alongside GitHub OAuth in order to support publishing Gitlab-hosted source code).
- This would also require us to verify that some
- Need to investigate whether this kind of back-reference is possible across all relevant registries
Note
This note on avoiding misrepresentative packages is not fully fleshed out and could use more input.
How do we handle typosquatting?
- Seeing as we aren't serving the use case of direct downloading of packages (like some
npm install <packagename>flow), this is not a particularly notable risk - We should consider blocking publication of packages within a certain edit distance of existing packages (esp. popular ones)
How do we manage repo-jacking?
- GitHub repositories have ID's - use them to make any GitHub reads so name changes don't create unexpected behavior
- Run a daily cron job that checks if repo's (based on ID) have moved URL's; update their names accordingly
- Means there is a 24h window where someone could theoretically repo-jack an old name
- We could offer a mechanism to trigger this check manually, and also run it any time someone pushes an update to the mcp.json
Have a way to quickly delete accidentally published private data
- Allow reverse-publication requests by appropriately OAuth-d requests
Are we introducing any privacy and/or security risks for end-users?
- At the open source / local level, we are piggybacking the registries
- By including remote server URL's, we do really have no idea what might live at that URL. But that will be largely true no matter the URL, and an onerous verification process would likely drive people to a third party solution for this
- Maybe we could require that remote URL submissions be done under a GH org (rather than individual GH users)? Is that a meaningful threshold that adds some accountability? Any other GitHub mechanisms we could piggyback?
Note
Could use community input on potential risks and mitigation ideas here.
What registries should we support?
- Npm, pypi, gopkg, crates, GHCR, Docker Hub...
How will "search" work?
- It is a non-goal to try to rank related MCP servers. For example, we have no intention of serving queries like "show me all memory MCP servers". We will remain vendor-neutral and avoid being a target of abuse or SEO tactics.
- But it would be reasonable for someone to try to find a specific server they are already aware of and seeking information about. e.g. if they want to find the official GitHub MCP server, they should be able to search "github" (or e.g. typo with "gthub"), and find the result.
- A possible solution here is to match only on server names, and allow fuzzy matching. This means "brands" (like GitHub) are searchable, and categorization (like "memory servers") are not meaningfully searchable.
API Shape
See OpenAPI PR here.
mcp.json Schema
TBD: will likely be similar to/derived from the OpenAPI shape above.
Open questions
Not comprehensive and likely will evolve, but some questions we acknowledge are not yet fully solved:
- Versioning: handling yanking of versions
- CI: how will OAuth work for a CI-based publishing flow?
- Proper design and scope for "search" capability
- Should we bake any "sorting signals" into the official metaregistry?
- What should the name of this official metaregistry be?
- What should the name of the CLI tool be?
- Are we missing any considerations for spam, abuse, security, privacy risks?
- How to properly ensure ownership/access to source code locations & registry packages? Any way besides introducing OAuth flows for dozens of different services?