DataShield MCP Dataset Library

MCP server for agents

Run the MCP server and expose tools like datasets_search, fields_search, and crawler_run_project.

What you get

This repo includes an MCP server with two transport modes and 13 tools.

Tools

Data discovery:

  • datasets_search — full-text search with pagination (totalCount, hasMore)
  • dataset_get — full detail with resources, fields, analytics profiles
  • fields_search — find datasets by column/field name or semantic type

Crawler management:

  • crawler_list_projects — list configured crawlers and their status
  • crawler_run_project — trigger a crawl immediately
  • discover_portals — run the portal discovery pipeline

Tier & usage management:

  • tiers_get_all — get all tier configs, API keys, usage, and revenue summary
  • tier_update — modify a tier's pricing, limits, or features
  • account_update — override limits for a specific API key
  • account_usage — get usage details and 30-day history for an API key

Platform operations:

  • health_check — verify platform status, DB latency, worker health, migration state
  • help_page — access built-in help docs (full index or specific page by slug)
  • events_ingest — log dataset usage events for AI agent telemetry

Stdio transport (local)

For Claude Desktop, Claude Code, or local agent workflows:

npm run mcp

Claude Desktop config:

{
  "mcpServers": {
    "dataset-library": {
      "command": "npm",
      "args": ["run", "mcp"],
      "cwd": "/path/to/this/project"
    }
  }
}

HTTP transport (remote / Claude.ai connector)

For Claude.ai custom connectors or remote agent integration:

npm run mcp:http
# Listens on http://0.0.0.0:3100
#   POST /mcp  → JSON-RPC handler (stateless)
#   GET  /mcp  → server info / health check

Uses stateless Streamable HTTP transport. Each POST creates a fresh server, handles the JSON-RPC request, returns JSON, and tears down. No sessions, no SSE streams, no long-lived connections. Works reliably behind any reverse proxy.

Set MCP_PORT env to change the port (default: 3100).

IMPORTANT: The @modelcontextprotocol/sdk version is pinned to 1.26.0 in package.json. Newer versions introduce Origin validation and connection lifecycle changes that break under LiteSpeed and some nginx configs. Do not use a caret range.

Claude.ai custom connector

When deployed behind a reverse proxy at library.myorg.ai, set the connector URL to:

https://library.myorg.ai/mcp

No special proxy settings required — standard HTTP POST proxying works.

Configure

The MCP server uses the same DB config as the web/worker:

  • .env / DATABASE_URL
  • config/registry.config.local.json