DataShield MCP Dataset Library

Other portal discovery

How the generic portal discovery works (sitemap + link heuristics) and when to use explicit sources.

When to use OTHER

Use the OTHER provider when you have a portal that doesn't match Socrata/ArcGIS/CKAN/OpenDataSoft/DCAT signatures.

Discovery strategies

  1. Sitemap parsing

    • tries /sitemap.xml and common variants
    • follows sitemap indexes (bounded)
    • keeps URLs that look dataset-like (path includes dataset, data, catalog, or file downloads)
  2. Landing page link extraction

    • if no sitemap is found, fetches the portal homepage
    • extracts links and applies the same dataset-like heuristics

Practical guidance

  • Prefer explicit DATASET_URL sources when you can.
  • For large portals, use SITEMAP sources for predictable discovery.