Other portal discovery
How the generic portal discovery works (sitemap + link heuristics) and when to use explicit sources.
When to use OTHER
Use the OTHER provider when you have a portal that doesn't match Socrata/ArcGIS/CKAN/OpenDataSoft/DCAT signatures.
Discovery strategies
-
Sitemap parsing
- tries
/sitemap.xmland common variants - follows sitemap indexes (bounded)
- keeps URLs that look dataset-like (path includes
dataset,data,catalog, or file downloads)
- tries
-
Landing page link extraction
- if no sitemap is found, fetches the portal homepage
- extracts links and applies the same dataset-like heuristics
Practical guidance
- Prefer explicit DATASET_URL sources when you can.
- For large portals, use SITEMAP sources for predictable discovery.