DataShield MCP Dataset Library

Provider notes

What the crawler extracts per provider and how to add new provider implementations.

Providers

This build includes provider plugins in lib/server/providers/*.

  • Socrata (includes Tyler Data & Insights sites)
  • ArcGIS Hub/Open Data (dataset URLs; portal discovery is intentionally conservative)
  • CKAN
  • OpenDataSoft
  • DCAT-US (data.json feeds)
  • Filesystem / network share (file:// paths)
  • Other (basic HTML title/description)

Adding a provider

  1. Create a plugin in lib/server/providers/<name>.ts
  2. Export a ProviderPlugin
  3. Register it in lib/server/providers/registry.ts

A plugin implements:

  • discoverFromSourceUrl()
  • ingestDatasetFromUrl()

Detection

Source URLs are auto-detected with heuristics in lib/server/providers/detector.ts.

Detection is best-effort. You can always override the project/provider strategy by creating separate projects.