DataShield MCP Dataset Library

Management UI guide

What each page does and the safest way to operate the crawlers.

Access

Open /app and enter your admin token.

  • The token is stored only in your browser localStorage.
  • Configure the server-side expected token with EMERGENCY_ADMIN_TOKEN.

Pages

Dashboard (/app)

  • High-level metrics
  • Provider breakdown
  • Recent runs
  • Recent logs

Crawler projects (/app/crawlers)

  • Edit project:
    • enabled/paused
    • cron schedule
    • concurrency
    • rate limit (RPS)
    • max requests per run
  • Run now

Best practice

  • Start with enabled=false and manual runs.
  • Verify the provider works and you’re not being blocked.
  • Only then add cron schedules.

Sources (/app/sources)

  • Add portal roots or dataset landing pages.
  • Provider auto-detection is stored with the source.

Tip: Use multiple projects per provider if you want to split work (e.g. “Socrata - Cities” vs “Socrata - States”).

Runs (/app/runs)

  • Each run records what changed.
  • Inspect run items if something unexpected happened.

Datasets (/app/datasets)

  • Search the library.
  • Add/edit:
    • steward notes
    • notes_json_text
    • extras_json_text

Alerts (/app/alerts)

  • Quick view of errors and failing sources.

App admin (/app/admin)

  • Doctor health report
  • Test DB credentials and optionally write config file
  • Purge logs
  • Create/disable API keys
  • Reset registry schema (testing)

Operating modes

Safe mode (recommended)

  • Use manual runs only
  • Keep concurrency <= 2
  • Keep RPS <= 1

Growth mode

  • Split providers into multiple projects
  • Stagger cron schedules
  • Watch logs and HTTP errors

Agent-managed mode

  • Create API keys with limited scopes
  • Use /api/events to record agent usage
  • Use MCP server for tool-based access