DataShield MCP Dataset Library

Filesystem / network share

Crawl local directories or mounted shares (with strict allow-listing).

Safety first

Filesystem crawling is disabled unless you set allowedRoots.

This prevents accidentally crawling your whole server.

Configure

In config/registry.config.local.json:

{
  "filesystem": {
    "allowedRoots": ["/mnt/data"],
    "maxFiles": 2000,
    "maxDepth": 6,
    "maxFileSizeBytes": 209715200,
    "allowedExtensions": [".csv", ".json", ".parquet", ".xlsx", ".tsv", ".geojson"]
  }
}

Add a source

Use a source URL:

  • file:///mnt/data

The crawler will walk the directory and ingest matching files.

Notes

  • Extensions are filtered
  • Files over max size are rejected