DataShield MCP Dataset Library

Quick start

Install, configure Postgres, run migrations, seed projects, and start web + worker.

What you get

  • A public website that helps people find datasets for projects.
  • A management console for crawler projects, sources, and run history.
  • A worker process that runs crawls via pg-boss.
  • A Postgres schema that stores datasets, resources, runs, logs, and analytics profiles.

Prerequisites

  • Node.js 20+ (22 recommended)
  • Postgres 14+

1) Configure

Copy the example config and edit your values:

cp config/registry.config.example.json config/registry.config.local.json

Set:

  • db.* connection details (or db.connectionString)
  • security.emergencyAdminToken

2) Install

npm install

3) Migrate + seed

npm run db:migrate
npm run db:seed

Or via the CLI (same thing):

registry db migrate
registry db seed

Or do a clean reset:

npm run db:reset

CLI equivalent:

registry db reset

4) Start the web app

npm run dev
# or
npm run build && npm run start

Open:

  • Public: http://localhost:3000
  • Management UI: http://localhost:3000/app

5) Start the worker

In a second terminal:

npm run worker

6) Run a seeded crawler project

  • Go to Management → Crawler projects
  • Click Run now on a seeded project
  • Inspect Crawl runs for results

7) Health checks

  • GET /api/health
  • CLI: npm run doctor

Notes

  • For schedule changes (cron), restart the worker (current build).
  • For production, use a process manager (systemd / pm2 / supervisor).